模型计算接口调用说明

实时接口调用

系统通过服务网关提供实时接口，供外部进行模型计算的调用，输入json格式的输入数据，输出为模型结果返回

接口说明

接口url：http://hostname:port/consumer/calc/{modelCode}
接口参数：
- url参数：
  - modelCode：即要调用的模型编码
- requestBody参数：模型输出内容，json格式
接口返回值
- 格式：json
- 返回值内容：
  - status: 状态值，2000表示计算结果正常
  - message: 计算结果的说明
  - data: 模型返回值

接口调中示例

以curl方式为例：

curl -X POST -H 'Content-Type: application/json' \
   -d '{
           "sepal.width": 1.2,
           "sepal.length": 2.5,
           "petal.width": 3.0,
           "petal.length": 1.2
       }' \
   http://localhost:11524/consumer/calc/test-01

返回结果输出示例

{
    "status":2000,
    "message":"计算成功",
    "data":{
        "probability(Setosa)":0.0,
        "probability(Virginica)":0.125,
        "probability(Versicolor)":0.875,
        "variety":"Versicolor"
    }
}

批量数据接口调用

当前批量数据调用仅支持针对oracle、mysql数据库的批量调用，hive的批量调用建议采用udf方式进行

获取批量调用任务包

需要使用eppdev-mlib-batch任务包，具体所需文件为两个：

eppdev-mlib-batch.jar: 主要执行文件
application.properties：相应的配置文件

上述两个文件需要在统一目录

前提准备

需要准备模型计算所需数据宽表和结果回填表两个表，其中宽表数据中需要有唯一主键，输出结果表需要有主键和create_time字段，便于进行库表数据对应，示例格式如下

-- --------------------------------------
-- tableName：test_model_input
-- author: jinlong.hao
-- date: 2019-11-24
-- desc: 
--    1. 模型输入库表
--    2. 主键为id
--    3. 用于进行模型测试
-- ---------------------------------------
CREATE TABLE test_model_input(
    id CHAR(32)             comment '主键'
   ,sepal_width DOUBLE      comment 'sepal.width'
   ,sepal_height DOUBLE     comment 'sepal.height'
   ,petal_width DOUBLE      comment 'petal.width'
   ,petal_height DOUBLE     comment 'petal.height'
   ,create_time DOUBLE      comment '创建时间，用于进行增量计算'
) comment '模型测试输入表';
 
-- --------------------------------------
-- tableName：test_model_output
-- author: jinlong.hao
-- date: 2019-11-24
-- desc: 
--    1. 模型输出结果表
--    2. 主键为id
-- ---------------------------------------
CREATE TABLE test_model_output(
    id VARCHAR(32)                  comment '唯一主键'
   ,probability_setosa  DOUBLE      comment 'Setosa概率'
   ,probability_virginica DOUBLE    comment 'Virginica概率'
   ,probability_versicolor DOUBLE   comment 'Versicolor概率'
   ,variety VARCHAR(20)             comment '模型预测结果'
   ,create_time datetime            comment '模型计算时间'
) comment '模型测试输出表';
~~~

环境配置

修改application.properties文件，主要修改以下内容：

spring.datasource.* ：数据库相关配置
eppdev.mlib.consumer.basic-url: 服务网关的基础地址
eppdev.mlib.batch.model-code: 模型编码
eppdev.mlib.batch.input-table.name: 输出表的表名
eppdev.mlib.batch.input-table.key: 输入表主键
eppdev.mlib.batch.input-table.columns: 输出表需要查询的字段列表
eppdev.mlib.batch.input-table.where: 定制查询条件，可以包括参数，用于命令传参数
eppdev.mlib.batch.fetch-size: 每次读取的数据量，避免一次性加载过量数据
eppdev.mlib.batch.output-table.name: 输出库表表名
eppdev.mlib.batch.output-table.key: 输出表的主键名称
eppdev.mlib.batch.output-table.column-maps: 输出字段的映射

其中spring.datasource.*配置模式同普通springboot工程， eppdev.mlib相关配置示例如下：

eppdev.mlib.consumer.basic-url = http://localhost:11524/consumer 
eppdev.mlib.batch.model-code = test-01
eppdev.mlib.batch.input-table.name = test_model_input
eppdev.mlib.batch.input-table.key = id
eppdev.mlib.batch.input-table.columns = id, sepal_width as `sepal.width`, sepal_height as `sepal.height`, petal_width as `petal.width`, petal_height as `petal.height`
eppdev.mlib.batch.input-table.where = create_time >= ${begin_time}  and create_time <= ${end_time}
eppdev.mlib.batch.fetch-size = 1000
eppdev.mlib.batch.output-table.name = test_model_output
eppdev.mlib.batch.output-table.key = id
eppdev.mlib.batch.output-table.column-maps = probability(Setosa) as probability_setosa, probability(Virginica) as probability_virginica, probability(Versicolor) as probability_versicolor, variety as variety

执行调用

./eppdev-mlib-sdk-batch.jar -Dbegin_time="2019-11-20 11:24:32" -Dend_time="2019-11-21 12:23:24"

HIVE UDF方式进行调用

使用hive udf进行模型调用，有以下两种方案：

使用eppdev-mlib-sdk-hive-udf原生方案，在进行udf调用的时候需要首先输入

网关地址url、模型编码信息

使用自定义udf，在进行调用的时候可以配置无需输入网关地址、模型编码信

使用原生方案

下载相应的软件包

目前eppdev-mlib提供的原生udf支持hive1.2, 2.3, 3.1三个版本，需要分别下载不同的jar包来完成模型的调用：

hive1.2: eppdev-mlib-sdk-hive-udf12.jar
hive2.3: eppdev-mlib-sdk-hive-udf23.jar
hive3.1: eppdev-mlib-sdk-hive-udf31.jar

上传jar到hdfs中

以hive2.3为例：

hdfs dfs put eppdev-mlib-sdk-hive-udf23.jar /user/udf/hive/

在hive中创建自定义函数

CREATE FUNCTION eppdev_to_json AS 'cn.eppdev.mlib.sdk.hive.udf.EppdevMlibToJsonUDF' 
     USING jar 'hdfs://user/udf/hive/eppdev-mlib-sdk-hive-udf23.jar';
CREATE FUNCTION eppdev_mlib_calc AS 'cn.eppdev.mlib.sdk.hive.udf.EppdevMlibCalcUDF'
     USING jar 'hdfs://user/udf/hive/eppdev-mlib-sdk-hive-udf23.jar';

在hive中进行模型调用

进行hive调用可以有两种方式：

输入3个参数(网关地址、模型编码、请求数据json），可以获取到全量的模型输入
输入4个参数(网关地址、模型编码、请求json和所需的输出项)，可以获取指定的输出项

方法1：三个参数获取结果json

SELECT 
    eppdev_mlib_calc(
        'http://localhost:11541/consumer',
        'test-01',
        eppdev_to_json(
            'sepal.width', sepal_with,
            'sepal.height', sepal_height,
            'petal.width', petal_width,
            'petal.height', petal_height
        )
    ) AS full_result
FROM iris_data;

输出结果为全量的json：

{
    "probability(Setosa)": 1.0,
    "probability(Virginica)": 0.0,
    "probability(Versicolor)": 0.0,
    "variety": "Setosa"
}

方法2：输入4个参数，直接获取具体结果

SELECT 
    eppdev_mlib_calc(
        'http://localhost:11541/consumer',
        'test-01',
        eppdev_to_json(
            'sepal.width', sepal_with,
            'sepal.height', sepal_height,
            'petal.width', petal_width,
            'petal.height', petal_height
        ),
        'variety'
    ) AS variety
FROM iris_data;

输出结果为veriety结果，如：Setosa

EPPDEV-MLIB

用户工具

站点工具

目录