context_api 整改

If device target is not set, the version of Mindpore package is used.

graph_kernel_flags (str) –

Optimization options of graph kernel fusion, and the priority is higher when it conflicts with enable_graph_kernel. Experienced user only. For example, context.set_context(graph_kernel_flags=”–opt_level=2 –dump_as_text”). Some general options:

opt_level: Set the optimization level. Default: 2. Graph kernel fusion can be enabled equivalently by setting opt_level greater than 0. Avaiable values are:
0: Disable graph kernel fusion;
1: enable the basic fusion of operators;
2: includes all optimizations of level 1, and turns on more optimizations such as CSE, arithmetic simplication and so on;
3: includes all optimizations of level 2, and turns on more optimizations such as SitchingFusion, ParallelFusion and so on. Optimizations of this level are radical and unstable in some scenarios. Be caution when using this level.

dump_as_text: dump detail info as text files. Default: false.

More options can be referred from the implementation code. These options can also be set by environment variable MS_GRAPH_KERNEL_FLAGS, without modifying network source code. For example, export MS_GRAPH_KERNEL_FLAGS=”–opt_level=2 –dump_as_text”.

enable: controls whether the RDR is enabled to collects key data during training and saves key data in the fault scenario. When set to true, the RDR is turned on. When set to false, the RDR is turned off.


Running Data Recorder(RDR):
enable: controls whether the RDR is enabled to collect key data during training for faulting locating.
path: sets the directory path where RDR saves the collected data in the fault scenario. The value must be an absolute path.





enable_profiling (bool) and profiling_options (str): This parameters is deprecated. Please use Profiler instead.
if key == "enable_auto_mixed_precision":
pdb.set_trace()
logger.warning("enable_auto_mixed_precision mixing precision is amp, and this parameter"
"will be deleted later.")
continue
if key in ('enable_profiling', 'profiling_options'):
logger.warning(f" '{key}' is deprecated. Please use Profiler instead.")
continue
 
warnings.warn("Environment variables RANK_ID and OMPI_COMM_WORLD_RANK both exist, we will use RANK_ID to get rank id by default.")
 
 
 
You can save the print operator data to a file and disable the screen printing function. If the saved file already exists, the timestamp suffix will be added to the file. Saving data to a file solves the problem of data loss in screen printing when a large amount of data is generated.
 

Indicates whether to precompile only the network. If precompile_only is set to True, only the network is precompiled and  the network will not be executed. Default value: False.

 

When precomile_only is set to True, only graphics are compiled and the training will not be executed.

 

When grad_for_scalar is set to True, the function's scalar input can be derived. The default value is False.Because the back-end does not currently support scaling operations, this interface supports only simple operations that can be deduced by the front-end. > Because the back-end does not support scaling operations currently, this interface only supports simple operations that can be deduced by the front-end. 

 

After save_compile_cache is set to True, a hardware-independent compilation cache is generated and exported to a mindir file,

This parameter must be used together with save_compile_cache. After save_compile_cache is set to True, a hardware-independent compilation cache is generated and exported to a Mindir file. When the network is executed again, if load_compile_cache=True, the compile cache is loaded.

 

 `sparse tensor <https://www.mindspore.cn/docs/programming_guide/zh-CN/master/tensor.html#l#稀疏张量>`_ 

 

`Enable the operator optimization tool <https://www.mindspore.cn/docs/programming_guide/zh-CN/master/enable_auto_tune.html>`_ 

`Enabling Graph-Accounting Convergence <https://www.mindspore.cn/docs/programming_guide/zh-CN/master/enable_graph_kernel_fusion.html>`_ 

 

 

 

 

 

https://www.mindspore.cn/docs/programming_guide/zh-CN/master/tensor.html#l#稀疏张量
 
 
http://ilearning.huawei.com/edx/next/portal/tseacademy/tsea-software-development/

@args_type_check(mode=int, precompile_only=bool, device_target=str, device_id=int, save_graphs=bool,
                 save_graphs_path=str, enable_dump=bool, auto_tune_mode=str,
                 save_dump_path=str, enable_reduce_precision=bool, variable_memory_max_size=str,
                 enable_profiling=bool, profiling_options=str, enable_auto_mixed_precision=bool,
                 enable_graph_kernel=bool, check_bprop=bool, max_device_memory=str, print_file_path=str,
                 enable_sparse=bool, max_call_depth=int, env_config_path=str, graph_kernel_flags=str,
                 save_compile_cache=bool, load_compile_cache=bool, grad_for_scalar=bool, pynative_synchronize=bool,
                 reserve_class_name_in_scope=bool)
 
auto_tune_mode (str): The mode of auto tune when op building, get the best tiling performance,
            default: NO_TUNE. The value must be in [NO_TUNE', 'RL', 'GA', 'RL,GA'].
            RL: Reinforcement Learning tune.
            GA: Genetic Algorithm tune.
            RL,GA: When both RL and GA optimization are enabled, the tool automatically selects RL or GA based on
                   different types of operators in the network model. The sequence of RL and GA is not differentiated.
                   (Automatic selection).
 
 

配置参数分类

 

注释、接口参数与硬件支持类型是否一致

功能描述

public

mode=int

一致

在图形模式(0)或PYNT模式(1)中运行。默认值:图形模式(0)

device_target=str

一致

运行的目标设备,支持“Ascend”、“GPU”和“CPU”

device_id=int                 (GE)

一致

目标设备ID,必须在[0,device_num_per_host-1]
并且device_num_per_host不应超过4096。默认值:0。

mem

max_device_memory=str               步学

一致(注释说明缺少场景使用)

设置设备可用的最大内存

variable_memory_max_size=str  (GE)步学

一致(注释说明缺少场景使用)

设置可变内存max size的最大大小。默认值:“0GB”。

IR

save_graphs=bool

一致

是否保存图形。默认值:False。

save_graphs_path=str

一致

保存图形的路径。默认值:“./”。如果需要保存到指定目录下,则设置save_graphs_path目标路径。如果指定的目录不存在时,会自动创建目录

dump

enable_dump=bool              (GE)

一致

是否启用转储。默认值:False

save_dump_path=str            (GE)

一致

当程序在Ascend上执行时,可以在此路径下转储数据。

profiler

enable_profiling=bool         (GE)

一致

是否保存执行时间的数据

profiling_options=str         (GE)步学

一致

profiling 的选项

print

print_file_path=str

一致(注释说明缺少场景使用)https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.3/context.html?highlight=print_file_path#print算子落盘

打印数据的保存路径。如果设置了此参数,则打印数据将保存到
默认情况下是文件,并关闭打印到屏幕。如果文件已经存在,请添加时间戳
文件的后缀。默认值:“”。

 

可以将print算子数据保存到文件,同时关闭屏幕打印功能。如果保存的文件已经存在,则会给文件添加时间戳后缀。数据保存到文件可以解决数据量较大时屏幕打印数据丢失的问题You can save the print operator data to a file and disable the screen printing function. If the saved file already exists, the timestamp suffix is added to the file. Saving data to a file solves the problem of data loss in screen printing when a large amount of data is generated.

dfx

env_config_path=str

一致(接口注释说明需表明具体设置变量)https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.3/custom_debugging_info.html?highlight=env_config_path#running-data-recorder

DFX的配置路径。通过配置文件配置RDR  :        "rdr": {
        "enable": true,
        "path": "/home/mindspore/rdr"
    }                                             内存复用功能(Mem Reuse)                                     "sys": {
        "mem_reuse": true
    }

图算-程彬

enable_graph_kernel=bool       硬件(GPU)

一致https://www.mindspore.cn/docs/programming_guide/zh-CN/master/enable_graph_kernel_fusion.html

是否启用图算融合,优化网络执行性能。设置true可以使能加速

默认值:False。

这是一个加速特性,具体看图算融合文档。用户需要性能就打开,不影响功能

graph_kernel_flags=str         硬件(GPU)

一致

高阶使用图核融合的优化选项。
例如,`context.set_context(图_kernel_标志="--opt_level=2--dump_as_text")`。
一些一般选项:
enable_graph_kernel设置为true时,
-选项级别:0到3之间的优化级别。默认值:2。可以启用图核融合
等效地将选项级别设置为大于0。
-转储_as_text:将详细信息转储为文本文件。默认值:false。

更多选项可以参考实现代码。
这些选项也可以由环境变量`MS_G_KERNEL_FLAGS`设置,而无需修改
网络源代码。例如,`导出MS_G_KERNEL_FLAGS="--opt_level=2--dump_as_text"'。

精度-张清华

enable_reduce_precision=bool        (VM)

一致

是否启用降精功能。默认值:True

enable_auto_mixed_precision=bool(GE)对外接口无注释说明

上ccb评审删除

混合精度

Auto tune

auto_tune_mode=str  (布宇)

一致(注释说明缺少场景使用)https://www.mindspore.cn/docs/programming_guide/zh-CN/master/enable_auto_tune.html

操作构建时自动调整的模式,获得最佳的平铺性能(调优)                                                       NO_TUNE:不开启调优(关闭调优)。

RL:开启RL调优,针对支持RL调优的算子进行调优。

GA:开启GA调优,针对支持GA调优的算子进行调优。

RL,GA:同时开启RL和GA调优,工具会根据网络模型中不同类型的算子自动选择RL或者GA。不区分RL,GA的先后顺序。

anf graph compile config

precompile_only=bool (vm)梁志博

硬件支持类型无关

图模式包括图编译和运行;precompile_only 设置true时,只编译,不运行图

是否仅预编译网络。如果设置,则仅编译网络和
未执行。默认值:False。

check_bprop=bool(余坚锋)

一致

在编译阶段,是否检查反向传播节点。检查确保形状和dtype
的反向传播节点输出与输入参数相同。默认值:False

max_call_depth=int--(余坚锋)

一致(注释说明缺少场景使用)

指定函数调用的最大深度。必须为正整数。默认值:1000,使用场景:嵌套调用过深时或循环时需要设置max_call_depth,子图数量太多。

enable_sparse=bool--张清华

一致(注释说明缺少场景使用)

是否启用稀疏特性。默认值:False

grad_for_scalar=bool-(余坚锋)

一致

是否获取标量的梯度。如果设置,则标量输入参数的梯度
可以计算。现在,只有部分标量运算符支持此计算。默认值:False。

load_compile_cache=bool-余坚锋

一致(注释说明缺少场景使用)

需要配合save_compile_cache使用,save_compile_cache设置为True后,第一次执行网络,会生成硬件无关的编译缓存,导出为mindir格式的文件。再次执行网络时,如果load_compile_cache=True,则会加载该编译缓存。

save_compile_cache=bool-余坚锋

一致(注释说明缺少场景使用)

save_compile_cache设置为True后,第一次执行网络,会生成硬件无关的编译缓存,导出为mindir格式的文件

 

reserve_class_name_in_scope=bool-张清华

补齐参数类型检查

是否将网络类名称保存在作用域中。默认值:True。
每个节点都有一个作用域。子节点的作用域是其父节点的名称。如果保留类名在范围内
设置后,类名将保存在作用域中的关键字“net-”之后。例如:
默认值/net-Net1/net-Net2(保留类名称_in_范围=True)
默认值/net/net(保留类名称_in_Scope=False)

 

pynative_synchronize  褚金锦 

一致(注释说明缺少场景使用)

是否启用Pynative模式下设备异步执行。在Pynative模式下,增加了一个pynative_synchronize的设置,控制算子在device上是否开启使用异步执行
默认值:False。

       
print_file_path 

可以将print算子数据保存到文件,同时关闭屏幕打印功能。如果保存的文件已经存在,则会给文件添加时间戳后缀。数据保存到文件可以解决数据量较大时屏幕打印数据丢失的问题You can save the print operator data to a file and disable the screen printing function. If the saved file already exists, the timestamp suffix is added to the file. Saving data to a file solves the problem of data loss in screen printing when a large amount of data is generated.
 
 
save_graphs
 
 .....
save_graphs_path
 
To save the data to a specified directory, set save_graphs_path. If the specified directory does not exist, the system automatically creates the directory.
 
variable_memory_max_size
After this parameter is set, the maximum memory used by the framework is restricted to the configured value.
 
max_device_memory

max_device_memory (str): Sets the maximum memory available for devices.
Currently, it is only supported on GPU. The format is "xxGB". Default: "1024GB".
The actual used memory size is the minimum of the available memory of the device and max_device_memory.

 
load_compile_cache
需要配合save_compile_cache使用,save_compile_cache设置为True后,第一次执行网络,会生成硬件无关的编译缓存,导出为mindir格式的文件。再次执行网络时,如果load_compile_cache=True,则会加载该编译缓存。
 
 
 save_compile_cache
:save_compile_cache设置为True后,第一次执行网络,会生成硬件无关的编译缓存,导出为mindir格式的文件
 
graph_kernel_flags (str) –

Optimization options of graph kernel fusion. Experienced user only. For example, context.set_context(graph_kernel_flags=”–opt_level=2 –dump_as_text”). Some general options:

opt_level: Set the optimization level. Default: 2. Graph kernel fusion can be enabled equivalently by setting opt_level greater than 0. Avaiable values are:
0: Disable graph kernel fusion;
1: enable the basic fusion of operators;
2: includes all optimizations of level 1, and turns on more optimizations such as CSE, arithmetic simplication and so on;
3: includes all optimizations of level 2, and turns on more optimizations such as SitchingFusion, ParallelFusion and so on. Optimizations of this level are radical and unstable in some scenarios. Be caution when using this level.

dump_as_text: dump detail info as text files. Default: false.

More options can be referred from the implementation code. These options can also be set by environment variable MS_GRAPH_KERNEL_FLAGS, without modifying network source code. For example, export MS_GRAPH_KERNEL_FLAGS=”–opt_level=2 –dump_as_text”.

 

 

 

max_call_depth:

The max_call_depth parameter needs to be set when the nested call is too deep or the number of subgraphs is too large.

 

precompile_only

     Graph mode includes graph compilation and running.

     When precompile_only is set to true, only the graph is compiled.

 

grad_for_scalar:

Whether to get gradient for scalar. If set, the gradient of scalar input parameter

 When grad_for_scalar is set to True, the function's scalar input is derived. The default value is False. Currently,

the backend does not support the scaling operation. Therefore, this interface supports only simple operations that can be deduced by the frontend.

 

save_compile_cache:

Whether to cache the graph compiled by frontend. Default: False.

     Set save_compile_cache to true to save the graph to the Mindir file.

 

load_compile_cache:
 Whether to use the cache of the graph compiled by frontend.
            When it is true, the graph compilation will skip the frontend compilation process. It means that
            you should make sure the network has not been changed since the last execution. By now, we have
            not support automatically checking the changes yet. Default: False.
 

reserve_class_name_in_scope :

Whether to save the network class name in the scope. Default: True.
             Each node has a scope. A scope of a subnode is the name of its parent node. If reserve_class_name_in_scope
             is set, the class name will be saved after keyword 'net-' in the scope. For example:
             Default/net-Net1/net-Net2 (reserve_class_name_in_scope=True)
             Default/net/net (reserve_class_name_in_scope=False)
 

pynative_synchronize :

Whether to enable asynchronous execution of the device in Pynative mode.  Default: False.

            The pynative_synchronize setting is added to control whether to enable the asynchronous execution of the operator on the device. 
 
When pynative_synchronize=True, the control operator is enabled to use synchronous execution on the device.
When the operator fails to execute, you can easily see the error code location through the call stack.
            

 

 

 

 

 

 

posted @ 2021-09-15 16:13  丁培飞  阅读(136)  评论(0编辑  收藏  举报