【FATE实战】纵向联邦学习FATE框架下实战的过程解释(2)

上一篇随笔中介绍了FATE框架学习初期的一些学习过程,这里我将介绍我用了一段时间这个框架之后的心得和更方便的用法。

在上一篇中,我们需要做些什么呢?

1、上传docker中/fate/examples/data的数据文件到experiment中。写的json文件是/fate/examples/dsl/v1/upload_data_guest.json:

{
    "file": "/fate/examples/data/tag_value_1000_140.csv",
    "head": 1,
    "partition": 16,
    "work_mode": 0,
    "table_name": "tag_value_1000_140",
    "namespace": "experiment"
}

其中只要改"file"后面的.csv之前的文件名并且使"table_name"后面的名称与之一致,就可以了,然而我们之前用的上传代码是:

python ${your_install_path}/fate_flow/fate_flow_client.py -f upload -c ${upload_data_json_path}

在我的容器里面代码大概是:

python /fate/python/fate_flow/fate_flow_client.py -f upload -c /fate/examples/dsl/v1/upload_data_guest.json

但是代码也太丑了。。。而且根本记不住这么长的代码啊。。。于是我们可以这么搞:

首先,data的数据文件都在/fate/examples/data中,那么为什么不把upload_data_guest.json也放到其中,那么我们操作的时候只用在/fate/examples/data文件下执行上传操作就可以了。

cp /fate/examples/dsl/v1/upload_data_guest.json /fate/examples/data/upload_data.json

我们就把upload_data_guest.json复制到/fate/examples/data并命名为upload_data.json了

其次,python命令行命令太丑了而且容易出错,那么我们可以改进一下上传命令。

cd /fate/examples/data
flow data upload -c upload_data.json

 

 

上传成功。

如果想要查看我们上传的数据文件在experiment的情况,可以使用命令:

flow table info -n experiment -t UCI_Credit_Card_hetero_guest

 

 

 如果上传错误可以删除:

flow table delete -n experiment -t UCI_Credit_Card_hetero_guest

重新上传。

一般情况下我们的数据文件要在我们的windows进行一些处理,再把数据文件复制到VMware虚拟机上。

或者允许VMware虚拟机共享文件,然后把共享文件挂载在虚拟机上,可以使得挂载文件夹与主机内容同步,还是很方便的。

2、一般我们需要上传dsl和conf文件,可是我在后来用pipeline上传的时候发现其实后面用不着自行制作dsl和conf文件,大家可以通过执行/fate/examples/pipeline里面文件夹里面的pipeline文件来看看怎么运行的,截取其中一段代码看一下:

def main(config="../../config.yaml", namespace=""):
    # obtain config
    if isinstance(config, str):
        config = load_job_config(config)
    parties = config.parties
    guest = parties.guest[0]
    host = parties.host[0]
    arbiter = parties.arbiter[0]
    backend = config.backend
    work_mode = config.work_mode

    guest_train_data = {"name": "breast_hetero_guest", "namespace": f"experiment{namespace}"}
    host_train_data = {"name": "breast_hetero_host", "namespace": f"experiment{namespace}"}

    pipeline = PipeLine().set_initiator(role='guest', party_id=guest).set_roles(guest=guest, host=host, arbiter=arbiter)

    reader_0 = Reader(name="reader_0")
    reader_0.get_party_instance(role='guest', party_id=guest).component_param(table=guest_train_data)
    reader_0.get_party_instance(role='host', party_id=host).component_param(table=host_train_data)

    dataio_0 = DataIO(name="dataio_0")
    dataio_0.get_party_instance(role='guest', party_id=guest).component_param(with_label=True, missing_fill=True,
                                                                              outlier_replace=True)
    dataio_0.get_party_instance(role='host', party_id=host).component_param(with_label=False, missing_fill=True,
                                                                            outlier_replace=True)

    intersection_0 = Intersection(name="intersection_0")
    federated_sample_0 = FederatedSample(name="federated_sample_0", mode="stratified", method="upsample",
                                         fractions=[[0, 1.5], [1, 2.0]])
    feature_scale_0 = FeatureScale(name="feature_scale_0")
    hetero_feature_binning_0 = HeteroFeatureBinning(name="hetero_feature_binning_0")
    one_hot_0 = OneHotEncoder(name="one_hot_0")
    hetero_lr_0 = HeteroLR(name="hetero_lr_0", penalty="L2", optimizer="rmsprop", tol=1e-5,
                           init_param={"init_method": "random_uniform"},
                           alpha=0.01, max_iter=10, early_stop="diff", batch_size=320, learning_rate=0.15)
    hetero_lr_1 = HeteroLR(name="hetero_lr_1", penalty="L2", optimizer="rmsprop", tol=1e-5,
                           init_param={"init_method": "random_uniform"},
                           alpha=0.01, max_iter=10, early_stop="diff", batch_size=320, learning_rate=0.15,
                           cv_param={"n_splits": 5,
                                     "shuffle": True,
                                     "random_seed": 103,
                                     "need_cv": True})

    hetero_secureboost_0 = HeteroSecureBoost(name="hetero_secureboost_0", num_trees=5,
                                             cv_param={"shuffle": False, "need_cv": True})
    hetero_secureboost_1 = HeteroSecureBoost(name="hetero_secureboost_1", num_trees=5)
    evaluation_0 = Evaluation(name="evaluation_0")
    evaluation_1 = Evaluation(name="evaluation_1")

    pipeline.add_component(reader_0)
    pipeline.add_component(dataio_0, data=Data(data=reader_0.output.data))
    pipeline.add_component(intersection_0, data=Data(data=dataio_0.output.data))
    pipeline.add_component(federated_sample_0, data=Data(data=intersection_0.output.data))
    pipeline.add_component(feature_scale_0, data=Data(data=federated_sample_0.output.data))
    pipeline.add_component(hetero_feature_binning_0, data=Data(data=feature_scale_0.output.data))
    pipeline.add_component(one_hot_0, data=Data(data=hetero_feature_binning_0.output.data))
    pipeline.add_component(hetero_lr_0, data=Data(train_data=one_hot_0.output.data))
    pipeline.add_component(hetero_lr_1, data=Data(train_data=one_hot_0.output.data))
    pipeline.add_component(hetero_secureboost_0, data=Data(train_data=one_hot_0.output.data))
    pipeline.add_component(hetero_secureboost_1, data=Data(train_data=one_hot_0.output.data))
    pipeline.add_component(evaluation_0, data=Data(data=hetero_lr_0.output.data))
    pipeline.add_component(evaluation_1, data=Data(data=hetero_secureboost_1.output.data))
    pipeline.compile()

    job_parameters = JobParameters(backend=backend, work_mode=work_mode)
    pipeline.fit(job_parameters)

    print(pipeline.get_component("evaluation_0").get_summary())

其实dsl就简化为了

 

 conf简化成了

 

 只需要执行python文件就可以了。

python /fate/examples/pipeline/multi_model/pipeline-multi-model.py 

再在浏览器输入:localhost:8080/查看运行情况。

 

posted @ 2022-06-23 15:17  linear345  阅读(681)  评论(0编辑  收藏  举报