PaddlePaddle inference 源码分析(二)
这一部分开始介绍创建Predictor过程, 以下代码均位于paddle/fluid/inference/api目录下
1、对外暴露的接口均在paddle_inference_api.h中
namespace paddle_infer
using Config = paddle::AnalysisConfig;
/// /// \brief A factory to help create predictors. /// /// Usage: /// /// \code{.cpp} /// Config config; /// ... // change the configs. /// auto predictor = CreatePredictor(config); /// \endcode /// PD_INFER_DECL std::shared_ptr<Predictor> CreatePredictor( const Config& config); // NOLINT
2、Config的具体实现在analysis_config.cc,包括开启GPU、XPU(百度昆仑)、NPU(华为昇腾),设置mkl等待配置均使用Config设置。以开启GPU为例
void AnalysisConfig::EnableUseGpu(uint64_t memory_pool_init_size_mb, int device_id) { #if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) use_gpu_ = true; memory_pool_init_size_mb_ = memory_pool_init_size_mb; FLAGS_initial_gpu_memory_in_mb = memory_pool_init_size_mb_; gpu_device_id_ = device_id; #else LOG(ERROR) << "Please compile with gpu to EnableGpu()"; use_gpu_ = false; #endif Update(); }
每次修改配置都会调用Update函数
2.1 Update函数会将修改的配置更新到Config保存的pass_builder_中。
mutable std::unique_ptr<PassStrategy> pass_builder_;
如果开启GPU,会将padd_builder设置为GpuPassStrategy,包含了各种预设好的GPU配置
// Transfer pass_builder and copy the existing compatible passes. if (!pass_builder_ || ((use_gpu() ^ pass_builder_->use_gpu())) || ((use_xpu() ^ pass_builder_->use_xpu())) || ((use_npu() ^ pass_builder_->use_npu()))) { if (use_gpu()) { pass_builder_.reset(new GpuPassStrategy); if (use_tensorrt_) { // Append after the Affine_channel_conv_fuse pass. pass_builder()->InsertPass(3, "tensorrt_subgraph_pass"); } } else if (use_xpu()) { PADDLE_ENFORCE_EQ( use_gpu(), false, platform::errors::InvalidArgument( "Only one choice can be made between CPU and XPU.")); pass_builder_.reset(new XpuPassStrategy); } else if (use_npu()) { PADDLE_ENFORCE_EQ( use_gpu(), false, platform::errors::InvalidArgument( "Only one choice can be made between GPU and NPU.")); pass_builder_.reset(new NpuPassStrategy); } else { pass_builder_.reset(new CpuPassStrategy); } }
3、设置好Config后,就可以进行CreatePredictor操作。具体实现在analysis_predictor.cc中。
namespace paddle_infer { std::shared_ptr<Predictor> CreatePredictor(const Config &config) { // NOLINT std::shared_ptr<Predictor> predictor(new Predictor(config)); return predictor; }
Predictor::Predictor(const Config &config) { const_cast<Config *>(&config)->SwitchUseFeedFetchOps(false); // The second parameter indicates that the discard log is not printed predictor_ = paddle::CreatePaddlePredictor< Config, paddle::PaddleEngineKind::kAnalysis>(config); }
4、CreatePaddlePredictor的声明在paddle_api.h中,有2种特化,这里使用的是Analysis
enum class PaddleEngineKind { kNative = 0, ///< Use the native Fluid facility. kAutoMixedTensorRT, ///< Automatically mix Fluid with TensorRT. kAnalysis, ///< More optimization. }; template <typename ConfigT, PaddleEngineKind engine> PD_INFER_DECL std::unique_ptr<PaddlePredictor> CreatePaddlePredictor( const ConfigT& config); template <> PD_INFER_DECL std::unique_ptr<PaddlePredictor> CreatePaddlePredictor< NativeConfig, PaddleEngineKind::kNative>(const NativeConfig& config); template <> PD_INFER_DECL std::unique_ptr<PaddlePredictor> CreatePaddlePredictor< AnalysisConfig, PaddleEngineKind::kAnalysis>(const AnalysisConfig& config);
Native的具体实现在api_impl.cc中,而Analysis的实现仍在analysis_predictor.cc中
5、CreatePaddlePredictor<AnalysisConfig, PaddleEngineKind::kAnalysis>的具体实现如下,这里省略了部分代码,着重介绍逻辑
template <> std::unique_ptr<PaddlePredictor> CreatePaddlePredictor< AnalysisConfig, PaddleEngineKind::kAnalysis>(const AnalysisConfig &config) { ...
// 创建完成会将Config设置InValid,保障一个Config对应一个Predictor
VLOG(3) << "create AnalysisConfig"; PADDLE_ENFORCE_EQ( config.is_valid(), true, platform::errors::InvalidArgument( "Note: Each config can only be used for one predictor.")); // 注册OP,只会执行一次 // Register custom operators compiled by the user. // This function can only be executed once per process. static std::once_flag custom_operators_registered; std::call_once(custom_operators_registered, []() { inference::RegisterAllCustomOperator(); }); // 设置GPU参数 if (config.use_gpu()) { ...if (config.thread_local_stream_enabled() && process_level_allocator_enabled) { PADDLE_THROW(platform::errors::Fatal( "When binding threads and streams, the use of " "process-level allocators will result in undefined result " "errors due to memory asynchronous operations." "The thread and stream binding configuration of all " "predictors should be the same in a single process.")); } } std::unique_ptr<PaddlePredictor> predictor(new AnalysisPredictor(config)); // Each config can only be used for one predictor. config.SetInValid(); auto predictor_p = dynamic_cast<AnalysisPredictor *>(predictor.get()); if (!predictor_p->Init(nullptr)) { return nullptr; } if (config.mkldnn_quantizer_enabled() && !predictor_p->MkldnnQuantize()) { return nullptr; } return predictor; }
6、每个AnalysisPredictor有一个自己的id
explicit AnalysisPredictor(const AnalysisConfig &config) : config_(config) { if (config_.shape_range_info_collected()) { config_.SwitchIrOptim(false); config_.EnableMemoryOptim(false); } predictor_id_ = inference::GetUniqueId(); }
7、AnalysisPredictor::Init这里是最主要的初始化逻辑,资源占用。
bool AnalysisPredictor::Init( const std::shared_ptr<framework::Scope> &parent_scope, const std::shared_ptr<framework::ProgramDesc> &program) { VLOG(3) << "Predictor::init()"; ... // no matter with or without MKLDNN paddle::platform::SetNumThreads(config_.cpu_math_library_num_threads()); if (!PrepareScope(parent_scope)) { return false; } if (!CreateExecutor()) { return false; } if (!PrepareProgram(program)) { return false; } // Prepare executor, create local variables. if (!PrepareExecutor()) { return true; } // Get the feed_target_names and fetch_target_names PrepareFeedFetch(); return true; }
8、首先是scope初始化。scope是变量容器,用于保存输入输出变量。PrepareScope会读取所有设备信息,同时创建Scope对象。
//parent_scope=nullptr
bool AnalysisPredictor::PrepareScope( const std::shared_ptr<framework::Scope> &parent_scope) { if (parent_scope) { PADDLE_ENFORCE_NOT_NULL( parent_scope, platform::errors::PreconditionNotMet( "Both program and parent_scope should be set in Clone mode.")); scope_ = parent_scope; status_is_cloned_ = true; } else {
// 获取设备,例如GPU就会调用cuda接口,与设备相关的内容都在platform目录下。这里会读取所有设备,并将其信息保存 paddle::framework::InitDevices(); // TODO(wilber): we need to release memory occupied by weights. scope_.reset(new paddle::framework::Scope()); status_is_cloned_ = false; } sub_scope_ = &scope_->NewScope(); return true; }
Scope中保存了所有变量,使用unordered_map保存
mutable std::unordered_map<std::string, std::unique_ptr<Variable>, KeyHasher> vars_;
Scope为链表结构,可以生成sub_scope,而sub_scope->parent指向父节点。这里保存参数时,持久化参数全部保存在父节点Scope中,非持久化参数保存在sub_scope中。Scope为每个predictor独自持有。
9、创建Executor,这里会根据配置创建对应Place,例如CPUPlace、CUDAPlace。然后根据place_创建对应的NaiveExecutor,NaiveExecutor只用于inference
place_ = paddle::platform::CPUPlace(); } executor_.reset(new paddle::framework::NaiveExecutor(place_));
10、PrepareProgram(program)这里program=nullptr,这一步是比较重的操作,会读取模型文件,读取参数,并进行pass优化等等。所以这一步会进行详细介绍。
10.1 LoadProgramDesc,读取模型文件内容。这里proto定义为framework/framework.proto::ProgramDesc。然后通过proto对象初始化framework::ProgramDesc对象
message OpDesc { message Attr { required string name = 1; required AttrType type = 2; optional int32 i = 3; optional float f = 4; optional string s = 5; repeated int32 ints = 6; repeated float floats = 7; repeated string strings = 8; optional bool b = 10; repeated bool bools = 11; optional int32 block_idx = 12; optional int64 l = 13; repeated int32 blocks_idx = 14; repeated int64 longs = 15; repeated double float64s = 16; }; message Var { required string parameter = 1; repeated string arguments = 2; }; required string type = 3; repeated Var inputs = 1; repeated Var outputs = 2; repeated Attr attrs = 4; optional bool is_target = 5 [ default = false ]; }; message VarDesc { message Attr { required string name = 1; required AttrType type = 2; optional int32 i = 3; optional string s = 4; repeated int32 ints = 5; }; required string name = 1; required VarType type = 2; optional bool persistable = 3 [ default = false ]; // True if the variable is an input data and // have to check the feed data shape and dtype optional bool need_check_feed = 4 [ default = false ]; optional bool is_parameter = 5 [ default = false ]; optional bool stop_gradient = 6 [ default = false ]; repeated Attr attrs = 7; } message BlockDesc { required int32 idx = 1; required int32 parent_idx = 2; repeated VarDesc vars = 3; repeated OpDesc ops = 4; optional int32 forward_block_idx = 5 [ default = -1 ]; } // In some cases, Paddle may perform operator definition iterations, // and the operator uses OpVersionMap for compatibility testing. message OpVersion { required int32 version = 1; } message OpVersionMap { message OpVersionPair { required string op_name = 1; required OpVersion op_version = 2; } repeated OpVersionPair pair = 1; } // Please refer to // https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/program.md // for more details. // TODO(panyx0718): A model can have multiple programs. Need a // way to distinguish them. Maybe ID or name? message ProgramDesc { reserved 2, 3; // For backward compatibility. repeated BlockDesc blocks = 1; optional Version version = 4; optional OpVersionMap op_version_map = 5; }
10.2 NaiveExecutor->CreateVariables,这里将读取的模型文件中的参数信息保存到Scope中vars_。这里会调用两次,一次是将持久化的参数信息保存到父Scope中。第二次将非持久化的参数信息保存到子sub_scope中。
//block_id=0,persistable第一次true,第二次调用false,scope=sub_scope void NaiveExecutor::CreateVariables(const ProgramDesc &desc, int block_id, bool persistable, Scope *scope) { PADDLE_ENFORCE_NOT_NULL(scope, platform::errors::InvalidArgument( "The Scope to hold variables is nullptr.")); auto &global_block = desc.Block(block_id); const auto *anc = scope; PADDLE_ENFORCE_NE( anc->parent(), anc, platform::errors::InvalidArgument("Input scope should be child scope.")); while (anc->parent()) { anc = anc->parent(); } int num_vars = 0; for (auto &var : global_block.AllVars()) { if (var->Name() == framework::kEmptyVarName) { continue; } num_vars++; if (persistable == var->Persistable()) { if (persistable) { if (!anc->FindVar(var->Name())) { auto *ptr = const_cast<Scope *>(anc)->Var(var->Name()); VLOG(3) << scope << " Create persistable variable " << var->Name() << ", which pointer is " << ptr; InitializeVariable(ptr, var->GetType()); } } else { auto *ptr = const_cast<Scope *>(scope)->Var(var->Name()); VLOG(3) << scope << " Create variable " << var->Name() << ", which pointer is " << ptr; InitializeVariable(ptr, var->GetType()); } } } VLOG(4) << "naive executor create " << num_vars << " vars"; }
10.3 OptimizeInferenceProgram.这里会调用Analyzer根据配置过一遍所有PASS,生成经过优化的ProgramDesc,并将inference_program_重置为优化后的argument_.ir_analyzed_program()
// NOTE All the members in AnalysisConfig should be copied to Argument. void AnalysisPredictor::OptimizeInferenceProgram() { // 将config的配置设置到argument中 PrepareArgument(); // 遍历analysis_passes,使用Pass对argument进行处理 Analyzer().Run(&argument_); PADDLE_ENFORCE_EQ( argument_.scope_valid(), true, platform::errors::InvalidArgument("The argument scope should be valid.")); VLOG(5) << "to prepare executor"; ARGUMENT_CHECK_FIELD((&argument_), ir_analyzed_program); inference_program_.reset( new framework::ProgramDesc(argument_.ir_analyzed_program()), [](framework::ProgramDesc *prog) { // Note, please do NOT use any member variables, because member variables may // have been destructed in multiple threads. #if PADDLE_WITH_TENSORRT ... #endif delete prog; }); // The config and argument take a lot of storage, // when the predictor settings are complete, we release these stores. argument_.PartiallyRelease(); config_.PartiallyRelease(); LOG(INFO) << "======= optimize end ======="; }
11、PrepareExecutor.
首先执行了DisablePrepareDataOpt,将inference_program_中的op进行了一次梳理,如果发现不友好的op就将准备数据disable掉。然后执行NaiveExecutor->Prepare.将sub_scope放入Executor中,然后根据优化后的ProgramDesc调用CreateOps逐个创建OP并保存到Executor中。
bool AnalysisPredictor::PrepareExecutor() { DisablePrepareDataOpt(inference_program_, 0, false); executor_->Prepare(sub_scope_, *inference_program_, 0, config_.use_feed_fetch_ops_); PADDLE_ENFORCE_NOT_NULL(sub_scope_, platform::errors::PreconditionNotMet( "The sub_scope should not be nullptr.")); return true; }
12、PrepareFeedFetch,在sub_scope中创建feed和fetch变量,将其与模型中的输入输出op绑定
void AnalysisPredictor::PrepareFeedFetch() { PADDLE_ENFORCE_NOT_NULL(sub_scope_, platform::errors::InvalidArgument( "The sub_scope should not be nullptr.")); CreateFeedFetchVar(sub_scope_); for (auto *op : inference_program_->Block(0).AllOps()) { if (op->Type() == "feed") { int idx = BOOST_GET_CONST(int, op->GetAttr("col")); if (feeds_.size() <= static_cast<size_t>(idx)) { feeds_.resize(idx + 1); } feeds_[idx] = op; feed_names_[op->Output("Out")[0]] = idx; idx2feeds_[idx] = op->Output("Out")[0]; } else if (op->Type() == "fetch") { int idx = BOOST_GET_CONST(int, op->GetAttr("col")); if (fetches_.size() <= static_cast<size_t>(idx)) { fetches_.resize(idx + 1); } fetches_[idx] = op; idx2fetches_[idx] = op->Input("X")[0]; } } }
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】凌霞软件回馈社区,博客园 & 1Panel & Halo 联合会员上线
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】博客园社区专享云产品让利特惠,阿里云新客6.5折上折
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· [.NET]调用本地 Deepseek 模型
· 一个费力不讨好的项目,让我损失了近一半的绩效!
· .NET Core 托管堆内存泄露/CPU异常的常见思路
· PostgreSQL 和 SQL Server 在统计信息维护中的关键差异
· C++代码改造为UTF-8编码问题的总结
· 【.NET】调用本地 Deepseek 模型
· CSnakes vs Python.NET:高效嵌入与灵活互通的跨语言方案对比
· DeepSeek “源神”启动!「GitHub 热点速览」
· 我与微信审核的“相爱相杀”看个人小程序副业
· Plotly.NET 一个为 .NET 打造的强大开源交互式图表库
2017-12-14 C10K——千万级并发实现的秘密:内核不是解决方案,而是问题所在!(转)