Android/Linux Thermal Governor之IPA分析与使用

IPA（Intelligent Power Allocator）模型的核心是利用PID控制器，Thermal Zone的温度作为输入，可分配功耗值作为输出，调节Allocator的频率和电压值。

由Power Management一般开发模型可知，包括模型建立，模型实现，验证。

1 IPA模型

PID控制器在Sustainable Power基础上，根据当前温度和Control Temp之间的差值，来调节可分配功耗值的大小，进而调节Cooling设备的状态，也即调整OPP（Voltage和Frequency组合）。

所谓Sustainable Power是在不同OPP情境下，某一个最大OPP的温度保持基本稳定。比其大者，温度上升明显；比其小者温度保持不变或者下降。这可以通过监测不同OPP对应的温度值，得到一个Sustainable Power。

另一个就是根据当前环境预估下一个场景功耗值。一般认为包括两部分Dynamic Power和Static Leakage，这是由实测过程中得出的经验。Dynamic Power可以认为跟Voltage和Frequency相关；Static Leakage跟Voltage和Temperature有关。根据实测得到的数据，进行分析得到最吻合数据的一组算式。由于的HiKey实测中，Static Leakage比较小，就被忽略了。所以最终Power值就只跟Voltage和Frequency相关，据此就可以算出OPP对应的功耗值。OPP和功耗之间就建立了联系。

在一个重要参数就是PID控制器的参数P、I、D的确定，这部分也存在一定的经验值。需要测试几组不同参数，然后看温度控制效果。

2 IPA测试环境

1. 在最靠近CPU的地方引出测试点。

2. 接出Ground、V+、V-到ARM Energy Probe。

3. 通过软件设置特殊状态：

1. 对于sustainable power需要将8核跑在100%workload。

2. 对于测试Cluster Power和CPU Power就比较复杂，下面单列。

4. 使用Ipython脚本读取Thermal Zone温度和测试点功耗。

HiKey对应的Cluster和CPU功耗状态如下：

Power State	PD_CPUx/CLKIN	PDCORTEXA53	PD_L2	LinuxKernel
CPU	CPU P-State	On	On	On	P-State
WFI	On, internal clock gating	On	On	C-State
CPU Off	Off	On	On	C-State
Cluster	Cluster P-State	On or Off	On	On	P-State
Cluster L2 Retention	Off	Off	Retention	C-State
Cluster Off	Off	Off	Off	C-State

图表 1 HiKey Cluster和CPU状态

3 IPA重要参数

sustainable-power

OPP(MHz)	Sustainable power
729	2155
960	3326
1200	5285

图表 2 Sustainable power

sustainable-power在thermal-zone里面，是因为测量的温度是基于thermal-sensors的，然后每个thermal-zone包含若干trips和cooling-maps。

通过观察温度，在729MHz的时候温度不会增加，在960MHz的时候温度缓慢增加，在1200MHz的时候温度增加很快。所以确定sustainable-power在960MHz。

在Thermal框架中有一个work queue会去轮询thermal_zone_device_check，根据Trip类型不同会执行不同的delay，passive模式100ms，其他1000ms。

control_temp

IPA模型有两个温度参数很重要，当温度低于65C的时候IPA处于关闭模式，reset PID控制器。当温度高于65C，IPA开始起作用；75C是IPA的control_temp，也即高于75C，IPA就会考虑降低可分配功耗，以达到降低温度的目的。

图表 3 Thermal Zones DTS

对于cooling-maps，需要上下两张图结合理解。trip表示在target开始启动cooling；contribution是针对对个Allocator进行权重分配；cooling-device参数是<设备 min max>。这里面设置的min和max需要在cooling-min-level和cooling-max-level之间。cpufreq会将对应值转换成OPP对应的voltage和frequency进行设置。

dynamic-power-coefficient

echo 0 > /sys/devices/system/cpu/cpu[1…7]/online，关闭CPU1-CPU7，只保留CPU0。

echo mem > /sys/power/state，通过对内核代码hack使SoC相对于CPU0工作状态，逐渐关闭CPU0，Cluster0，整个SoC。得到如下数据：

OPP(MHz)	Voltage(V)	Cluster Power Off State (mW)	Cluster P-State (mW)	Cluster Power (mW)	CPU WFI (mW)	CPU P-State (mW)	CPU Dynamic Power(mW)
208	1.04	344	360	16	379	429	69
432	1.04	345	374	29	387	498	124
729	1.09	346	393	47	408	617	224
960	1.18	352	427	75	442	794	367
1200	1.33	367	479	112	508	1149	670

图表 4 HiKey功耗测试数据

功耗计算公式：

power = dyn_coeff * (freq * volt^2) + static_coeff * F(volt) * F(Temp)

Dynamic power = capacitance * (freq * volt^2)

Cluster model

Freq	Voltage	*F V^2**	Power	Model power	Zero model
208	1.04	224.9728	16	16	12
432	1.04	467.2512	29	29	25
729	1.09	866.1249	47	49	47
960	1.18	1336.704	75	73	72
1200	1.33	2122.68	112	113	115

	Gradient (capacitance)	Intercept (staic power)
Linear regression	0.051	4.716716513
L.R. thru zero	0.054	0

图表 5 Cluster系数计算

图表 6 Cluster线性图表

CPU model

Freq	Voltage	*F V^2**	Power	Model power	Zero model
208	1.04	224.9728	69	44	67
432	1.04	467.2512	124	121	139
729	1.09	866.1249	224	247	258
960	1.18	1336.704	367	396	399
1200	1.33	2122.68	670	645	633

	Gradient (capacitance)	Intercept (staic power)
Linear regression	0.317	-27.12625497
L.R. thru zero	0.298	0

图表 7 CPU功耗系数计算

图表 8 CPU线性图标

由以上Cluster和CPU的coefficient得到，dynamic-power-coefficient = (0.298 + (0.054/4 CPUs)) * 1000 = 311。

LINEST：使用最小二乘法对已知数据进行最佳直线拟合，然后返回描述此直线的数组。

LINEST(known_y's,known_x's,const,stats)

Known_y's 是关系表达式 y = mx + b 中已知的 y 值集合。

如果数组 known_y's 在单独一列中，则 known_x's 的每一列被视为一个独立的变量。

如果数组 known_y's 在单独一行中，则 known_x's 的每一行被视为一个独立的变量。

Known_x's 是关系表达式 y = mx + b 中已知的可选 x 值集合。

数组 known_x's 可以包含一组或多组变量。如果仅使用一个变量，那么只要 known_x's 和 known_y's 具有相同的维数，则它们可以是任何形状的区域。如果用到多个变量，则 known_y's 必须为向量（即必须为一行或一列）。

如果省略 known_x's，则假设该数组为 {1,2,3,...}，其大小与 known_y's 相同。

Const 为一逻辑值，用于指定是否将常量 b 强制设为 0。

如果 const 为 TRUE 或省略，b 将按正常计算。

如果 const 为 FALSE，b 将被设为 0，并同时调整 m 值使 y = mx。

Stats 为一逻辑值，指定是否返回附加回归统计值。

如果 stats 为 TRUE，则 LINEST 函数返回附加回归统计值，这时返回的数组为 {mn,mn-1,...,m1,b;sen,sen-1,...,se1,seb;r2,sey;F,df;ssreg,ssresid}。

如果 stats 为 FALSE 或省略，LINEST 函数只返

4 IPA实现

static struct thermal_governor thermal_gov_power_allocator = {

.name = "power_allocator",

.bind_to_tz = power_allocator_bind,

.unbind_from_tz = power_allocator_unbind,

.throttle = power_allocator_throttle,

};

static int power_allocator_bind(struct thermal_zone_device *tz)

Power Allocator的结构体，包括三个核心函数power_allocator_bind、power_allocator_unbind、power_allocator_throttle。

初始化PID控制器的参数并且将power_allocator_params绑定到tz->governor_data。

struct power_allocator_params {

bool allocated_tzp;

s64 err_integral; //accumulated error in the PID controller

s32 prev_err; //error in the previous iteration of the PID controller

int trip_switch_on; //first passive trip point of the thermal zone. The governor switches on when this trip point is crossed.

int trip_max_desired_temperature; //last passive trip point of the thermal zone. The temperature we are controlling for.

};

PID参数

if (!tz->tzp->k_po || force)

tz->tzp->k_po = int_to_frac(sustainable_power) / temperature_threshold;

if (!tz->tzp->k_pu || force)

tz->tzp->k_pu = int_to_frac(2 * sustainable_power) / temperature_threshold;

if (!tz->tzp->k_i || force)

tz->tzp->k_i = int_to_frac(10) / 1000;

从DTS获得的参数可知，temperature_threshold = control_temp - switch_on_temp = 75000-65000 = 10000。

tz->tzp->k_po = int_to_frac(sustainable_power) /temperature_threshold =3326*1024/10000=340.5824

tz->tzp->k_pu = int_to_frac(2 * sustainable_power) /temperature_threshold =3326*2*1024/10000=681.1648

tz->tzp->k_i = int_to_frac(10) / 1000 = 10*1024/1000=10.24

另两个参数tz->tzp->k_d、tz->tzp->integral_cutoff默认为0。

PID控制器

图表 9 power_allocator_throttle流程

power_allocator_throttle作为IPA的调节功能，首先判断当前温度是否小于switch_on_temp。如果小于的话，就不进入PID调节，分配最大可用功耗。反之，则使用PID进行功耗分配。当PID调节一段时间后，如果温度低于switch_on_temp时，PID控制器的所有参数也会被重启，所以PID控制器也会得到纠正。

图表 10 allocate_power流程

allocate_power作为IPA的核心，遍历所有thermal_instances，获得actor数目及其权重；然后计算每个actor的max_power、weighted_req_power和所有actor的max_allocatable_power、total_weighted_req_power。

pid_controller根据control_temp、max_allocatable_power即pid参数计算出power_range作为下一次分配的功耗预算。

divvy_up_power基于weighted_req_power、max_power、num_actors、total_weighted_req_power、power_range在每个actor之间分配可用功耗，得出granted_power。

power_actor_set_power根据分配到的功耗设置cooling设备。cdev->ops->power2state将功耗值转换成cooling设备状态值，thermal_cdev_update的cdev->ops->set_cur_state对cooling进行设置。至此完成整个Thermal Zone的调节。

有几个重要的概念，thermal_instance指的是特定thermal_zone中特定trip上的cooling设备；power actor是一个功耗消耗实体，并且可进行功耗状态转换，能通过调节状态达到调节功耗的目的；actor的权重，默认是1024，如果比较重要可以增加weight值，反之可以减小。功耗分配不是基于req_power而是weighted_req_power。

IPA的缺陷：PID控制器在周期性tick环境下效果比较好，如果不规则重复则可能表现不太好，比如中断触发。

posted on 2017-02-10 22:14 ArnoldLu 阅读(8012) 评论(0) 收藏举报

刷新页面返回顶部

Arnold Lu@南京