http://ilearning.huawei.com/edx/next/portal/tseacademy/tsea-software-development/
@args_type_check(mode=int, precompile_only=bool, device_target=str, device_id=int, save_graphs=bool,
save_graphs_path=str, enable_dump=bool, auto_tune_mode=str,
save_dump_path=str, enable_reduce_precision=bool, variable_memory_max_size=str,
enable_profiling=bool, profiling_options=str, enable_auto_mixed_precision=bool,
enable_graph_kernel=bool, check_bprop=bool, max_device_memory=str, print_file_path=str,
enable_sparse=bool, max_call_depth=int, env_config_path=str, graph_kernel_flags=str,
save_compile_cache=bool, load_compile_cache=bool, grad_for_scalar=bool, pynative_synchronize=bool,
reserve_class_name_in_scope=bool)
auto_tune_mode (str): The mode of auto tune when op building, get the best tiling performance,
default: NO_TUNE. The value must be in [NO_TUNE', 'RL', 'GA', 'RL,GA'].
RL: Reinforcement Learning tune.
GA: Genetic Algorithm tune.
RL,GA: When both RL and GA optimization are enabled, the tool automatically selects RL or GA based on
different types of operators in the network model. The sequence of RL and GA is not differentiated.
(Automatic selection).
配置参数分类
|
|
注释、接口参数与硬件支持类型是否一致
|
功能描述
|
public
|
mode=int
|
一致
|
在图形模式(0)或PYNT模式(1)中运行。默认值:图形模式(0)
|
device_target=str
|
一致
|
运行的目标设备,支持“Ascend”、“GPU”和“CPU”
|
device_id=int (GE)
|
一致
|
目标设备ID,必须在[0,device_num_per_host-1]
并且device_num_per_host不应超过4096。默认值:0。
|
mem
|
max_device_memory=str 步学
|
一致(注释说明缺少场景使用)
|
设置设备可用的最大内存
|
variable_memory_max_size=str (GE)步学
|
一致(注释说明缺少场景使用)
|
设置可变内存max size的最大大小。默认值:“0GB”。
|
IR
|
save_graphs=bool
|
一致
|
是否保存图形。默认值:False。
|
save_graphs_path=str
|
一致
|
保存图形的路径。默认值:“.”。
|
dump
|
enable_dump=bool (GE)
|
一致
|
是否启用转储。默认值:False
|
save_dump_path=str (GE)
|
一致
|
当程序在Ascend上执行时,可以在此路径下转储数据。
|
profiler
|
enable_profiling=bool (GE)
|
一致
|
是否保存执行时间的数据
|
profiling_options=str (GE)步学
|
一致
|
profiling 的选项
|
print
|
print_file_path=str
|
一致(注释说明缺少场景使用)https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.3/context.html?highlight=print_file_path#print算子落盘
|
打印数据的保存路径。如果设置了此参数,则打印数据将保存到
默认情况下是文件,并关闭打印到屏幕。如果文件已经存在,请添加时间戳
文件的后缀。默认值:“”。
|
dfx
|
env_config_path=str
|
一致(接口注释说明需表明具体设置变量)https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.3/custom_debugging_info.html?highlight=env_config_path#running-data-recorder
|
DFX的配置路径。通过配置文件配置RDR : "rdr": {
"enable": true,
"path":
"/home/mindspore/rdr"
}
内存复用功能(Mem Reuse)
"sys": {
"mem_reuse": true
}
|
图算-程彬
|
enable_graph_kernel=bool 硬件(GPU)
|
一致https://www.mindspore.cn/docs/programming_guide/zh-CN/master/enable_graph_kernel_fusion.html
|
是否启用图算融合,优化网络执行性能。设置true可以使能加速
默认值:False。
这是一个加速特性,具体看图算融合文档。用户需要性能就打开,不影响功能
|
graph_kernel_flags=str 硬件(GPU)
|
一致
|
高阶使用图核融合的优化选项。
例如,`context.set_context(图_kernel_标志="--opt_level=2--dump_as_text")`。
一些一般选项:
enable_graph_kernel设置为true时,
-选项级别:0到3之间的优化级别。默认值:2。可以启用图核融合
等效地将选项级别设置为大于0。
-转储_as_text:将详细信息转储为文本文件。默认值:false。
更多选项可以参考实现代码。
这些选项也可以由环境变量`MS_G_KERNEL_FLAGS`设置,而无需修改
网络源代码。例如,`导出MS_G_KERNEL_FLAGS="--opt_level=2--dump_as_text"'。
|
精度-张清华
|
enable_reduce_precision=bool (VM)
|
一致
|
是否启用降精功能。默认值:True
|
enable_auto_mixed_precision=bool(GE)对外接口无注释说明
|
上ccb评审删除
|
混合精度
|
Auto tune
|
auto_tune_mode=str
(布宇)
|
一致(注释说明缺少场景使用)https://www.mindspore.cn/docs/programming_guide/zh-CN/master/enable_auto_tune.html
|
操作构建时自动调整的模式,获得最佳的平铺性能(调优) NO_TUNE:不开启调优(关闭调优)。
RL:开启RL调优,针对支持RL调优的算子进行调优。
GA:开启GA调优,针对支持GA调优的算子进行调优。
RL,GA:同时开启RL和GA调优,工具会根据网络模型中不同类型的算子自动选择RL或者GA。不区分RL,GA的先后顺序。
|
anf graph compile config
|
precompile_only=bool (vm)梁志博
|
硬件支持类型无关
|
图模式包括图编译和运行;precompile_only 设置true时,只编译,不运行图
是否仅预编译网络。如果设置,则仅编译网络和
未执行。默认值:False。
|
check_bprop=bool(余坚锋)
|
一致
|
在编译阶段,是否检查反向传播节点。检查确保形状和dtype
的反向传播节点输出与输入参数相同。默认值:False
|
max_call_depth=int--(余坚锋)
|
一致(注释说明缺少场景使用)
|
指定函数调用的最大深度。必须为正整数。默认值:1000,使用场景:嵌套调用过深时或循环时需要设置max_call_depth,子图数量太多。
|
enable_sparse=bool--张清华
|
一致(注释说明缺少场景使用)
|
是否启用稀疏特性。默认值:False
|
grad_for_scalar=bool-(余坚锋)
|
一致
|
是否获取标量的梯度。如果设置,则标量输入参数的梯度
可以计算。现在,只有部分标量运算符支持此计算。默认值:False。
|
load_compile_cache=bool-余坚锋
|
一致(注释说明缺少场景使用)
|
需要配合save_compile_cache使用,save_compile_cache设置为True后,第一次执行网络,会生成硬件无关的编译缓存,导出为mindir格式的文件。再次执行网络时,如果load_compile_cache=True,则会加载该编译缓存。
|
save_compile_cache=bool-余坚锋
|
一致(注释说明缺少场景使用)
|
save_compile_cache设置为True后,第一次执行网络,会生成硬件无关的编译缓存,导出为mindir格式的文件
|
|
reserve_class_name_in_scope=bool-张清华
|
补齐参数类型检查
|
是否将网络类名称保存在作用域中。默认值:True。
每个节点都有一个作用域。子节点的作用域是其父节点的名称。如果保留类名在范围内
设置后,类名将保存在作用域中的关键字“net-”之后。例如:
默认值/net-Net1/net-Net2(保留类名称_in_范围=True)
默认值/net/net(保留类名称_in_Scope=False)
|
|
pynative_synchronize 褚金锦
|
一致(注释说明缺少场景使用)
|
是否启用Pynative模式下设备异步执行。在Pynative模式下,增加了一个pynative_synchronize的设置,控制算子在device上是否开启使用异步执行
默认值:False。
|
|
|
|
|
max_device_memory
max_device_memory (str): Sets the maximum memory available for devices. Currently, it is only supported on GPU. The format is "xxGB". Default: "1024GB". The actual used memory size is the minimum of the available memory of the device and max_device_memory.
load_compile_cache
需要配合save_compile_cache使用,save_compile_cache设置为True后,第一次执行网络,会生成硬件无关的编译缓存,导出为mindir格式的文件。再次执行网络时,如果load_compile_cache=True,则会加载该编译缓存。
save_compile_cache
:save_compile_cache设置为True后,第一次执行网络,会生成硬件无关的编译缓存,导出为mindir格式的文件
graph_kernel_flags (str) –
Optimization options of graph kernel fusion. Experienced user only. For example, context.set_context(graph_kernel_flags=”–opt_level=2 –dump_as_text”). Some general options:
opt_level: Set the optimization level. Default: 2. Graph kernel fusion can be enabled equivalently by setting opt_level greater than 0. Avaiable values are: 0: Disable graph kernel fusion; 1: enable the basic fusion of operators; 2: includes all optimizations of level 1, and turns on more optimizations such as CSE, arithmetic simplication and so on; 3: includes all optimizations of level 2, and turns on more optimizations such as SitchingFusion, ParallelFusion and so on. Optimizations of this level are radical and unstable in some scenarios. Be caution when using this level.
dump_as_text: dump detail info as text files. Default: false.
More options can be referred from the implementation code. These options can also be set by environment variable MS_GRAPH_KERNEL_FLAGS, without modifying network source code. For example, export MS_GRAPH_KERNEL_FLAGS=”–opt_level=2 –dump_as_text”.
max_call_depth:
The max_call_depth parameter needs to be set when the nested call is too deep or the number of subgraphs is too large.
precompile_only
Graph mode includes graph compilation and running.
When precompile_only is set to true, only the graph is compiled.
grad_for_scalar:
Whether to get gradient for scalar. If set, the gradient of scalar input parameter
When grad_for_scalar is set to True, the function's scalar input is derived. The default value is False. Currently,
the backend does not support the scaling operation. Therefore, this interface supports only simple operations that can be deduced by the frontend.
save_compile_cache:
Whether to cache the graph compiled by frontend. Default: False.
Set save_compile_cache to true to save the graph to the Mindir file.
load_compile_cache:
Whether to use the cache of the graph compiled by frontend.
When it is true, the graph compilation will skip the frontend compilation process. It means that
you should make sure the network has not been changed since the last execution. By now, we have
not support automatically checking the changes yet. Default: False.
reserve_class_name_in_scope :
Whether to save the network class name in the scope. Default: True.
Each node has a scope. A scope of a subnode is the name of its parent node. If reserve_class_name_in_scope
is set, the class name will be saved after keyword 'net-' in the scope. For example:
Default/net-Net1/net-Net2 (reserve_class_name_in_scope=True)
Default/net/net (reserve_class_name_in_scope=False)
pynative_synchronize :
Whether to enable asynchronous execution of the device in Pynative mode. Default: False.
The pynative_synchronize setting is added to control whether to enable the asynchronous execution of the operator on the device.
When pynative_synchronize=True, the control operator is enabled to use synchronous execution on the device.
When the operator fails to execute, you can easily see the error code location through the call stack.
|
|
|