Android 系统自动重启Bug(高通平台)
最近客户反馈了一个Bug,我们的系统用着用着会自动重启,尤其是在拨号的时候极容易死机或者进入下载模式。根据老大和高通的支持得到了一个解决方案。
在Android系统中,有这么一个文件夹:sys/bus/msm_subsys/devices,里面分别有三个文件夹:subsys0、subsys1、subsys2,这三个都是android系统中运行的子系统。根据高通的解释,subsys0主要是负责adsp(音视频媒体的相关服务)的启动和运行,subsys1主要负责modem(拨打电话和蓝牙wifi等服务)的业务处理,subsys2主要管理wcnss的相关业务,当然还有很多其它模块的子系统,就不一一举例了。
subsys0、subsys1、subsys2都有个叫restart_level的文件,用cat命令查看发现这些文件的内容都是SYSTEM,就是这个SYSTEM导致系统在遇到问题的时候死机或者下载模式,应该要把restart_level的设置为related,当系统遇到难于处理的问题的时候,比如打电话过程中遇到错误,那就只重启subsys1子系统(子系统都是在后台运行的,重启过程用户是看不到的),android系统本身是不重启或进入下载模式的,这样用户体验也好些。
那有人会问了,既然这样那高通为什么不默认把这些值设为related呢?其实我也有相同的疑问,我老大说如果一遇到问题就让子系统重启那很多BUG就测不出来了,这样做是让有些BUG更好的浮出水面。
现在我们的机器经常在拨打电话的时候死掉或者进入下载模式(download模式),那应该就是subsys1这个子系统出了问题了。找subsys1的相关代码,在init.qcom.ssr.sh中有调用,源码位置:device/qcom/common/rootdir/etc/init.qcom.ssr.sh,是个脚本文件,代码贴出来:
#!/system/bin/sh
# Copyright (c) 2013, The Linux Foundation. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
# * Neither the name of The Linux Foundation nor
# the names of its contributors may be used to endorse or promote
# products derived from this software without specific prior written
# permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NON-INFRINGEMENT ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
# OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
# WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
# OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
ssr_str="$1"
IFS=,
ssr_array=($ssr_str)
declare -i subsys_mask=0
# check user input subsystem with system device
ssr_check_subsystem_name()
{
declare -i i=0
subsys=`cat /sys/bus/msm_subsys/devices/subsys$i/name`
while [ "$subsys" != "" ]
do
if [ "$subsys" == "$ssr_name" ]; then
return 1
fi
i=$i+1
subsys=`cat /sys/bus/msm_subsys/devices/subsys$i/name`
done
return 0
}
# set subsystem mask to indicate which subsystem needs to be enabled
for num in "${!ssr_array[@]}"
do
case "${ssr_array[$num]}" in
"1")
subsys_mask=0
;;
"riva")
subsys_mask=$subsys_mask+1
;;
"3")
subsys_mask=63
;;
"adsp")
ssr_name=adsp
if ( ssr_check_subsystem_name ); then
subsys_mask=$subsys_mask+2
fi
;;
"modem")
ssr_name=modem
if ( ssr_check_subsystem_name ); then
subsys_mask=$subsys_mask+4
fi
;;
"wcnss")
ssr_name=wcnss
if ( ssr_check_subsystem_name ); then
subsys_mask=$subsys_mask+8
fi
;;
"venus")
ssr_name=venus
if ( ssr_check_subsystem_name ); then
subsys_mask=$subsys_mask+16
fi
;;
"external_modem")
ssr_name=external_modem
if ( ssr_check_subsystem_name ); then
subsys_mask=$subsys_mask+32
fi
;;
esac
done
# enable selected subsystem restart
if [ $((subsys_mask & 1)) == 1 ]; then
echo 1 > /sys/module/wcnss_ssr_8960/parameters/enable_riva_ssr
else
echo 0 > /sys/module/wcnss_ssr_8960/parameters/enable_riva_ssr
fi
if [ $((subsys_mask & 2)) == 2 ]; then
echo "related" > /sys/bus/msm_subsys/devices/subsys0/restart_level
else
echo "system" > /sys/bus/msm_subsys/devices/subsys0/restart_level
fi
if [ $((subsys_mask & 4)) == 4 ]; then
echo "related" > /sys/bus/msm_subsys/devices/subsys1/restart_level
else
echo "system" > /sys/bus/msm_subsys/devices/subsys1/restart_level
fi
if [ $((subsys_mask & 8)) == 8 ]; then
echo "related" > /sys/bus/msm_subsys/devices/subsys2/restart_level
else
echo "system" > /sys/bus/msm_subsys/devices/subsys2/restart_level
fi
if [ $((subsys_mask & 16)) == 16 ]; then
echo "related" > /sys/bus/msm_subsys/devices/subsys3/restart_level
else
echo "system" > /sys/bus/msm_subsys/devices/subsys3/restart_level
fi
if [ $((subsys_mask & 32)) == 32 ]; then
echo "related" > /sys/bus/msm_subsys/devices/subsys4/restart_level
else
echo "system" > /sys/bus/msm_subsys/devices/subsys4/restart_level
fi
if [ $((subsys_mask & 63)) == 63 ]; then
echo 3 > /sys/module/subsystem_restart/parameters/restart_level
else
echo 1 > /sys/module/subsystem_restart/parameters/restart_level
fi
代码中有这么一段:
if [ $((subsys_mask & 4)) == 4 ]; then
echo "related" > /sys/bus/msm_subsys/devices/subsys1/restart_level
else
echo "system" > /sys/bus/msm_subsys/devices/subsys1/restart_level
fi
判断subsys_mask和4的位运算来给subsys1赋值, 要么赋值related,要么赋值system。subsys_mask的值是通过调用这个脚本时传进来的参数获得的,再就是找到调用这个脚本的地方,init.qcom.rc, 源码位置:device/qcom/common/rootdir/etc/init.qcom.rc,这里要说明一下,*.rc的文件中加载的都是系统的服务,里面的值都是写在系统文件里的,比如build.prop文件,当这些值发生改变的时候(比如进入shell模式可以setprop xxx yyy来更改系统服务),这个服务就会重新执行一遍,以加载不同的文件属性。
里面有这么一段:
# SSR setting
on property:persist.sys.ssr.restart_level=*
exec /system/bin/sh /init.qcom.ssr.sh ${persist.sys.ssr.restart_level}
这里就是调用那个脚本的地方,也就是说当property:persist.sys.ssr.restart_level这个属性发生改变的时候就执行init.qcom.ssr.sh这个脚本,并且把自己当参数传给ssr_str。再回到init.qcom.ssr.sh这个脚本文件,ssr_str是个数组,也就是说property:persist.sys.ssr.restart_level这个属性可以是有多个参数的(参数之间以逗号分隔),不同的参数给subsys_mask赋不同的值,因为我们的问题是出在subsys1,所以要让subsys_mask & 4 == 4,就需要将property:persist.sys.ssr.restart_level的参数设为modem,就会将/sys/bus/msm_subsys/devices/subsys1/restart_level的值设为related,让subsys1子系统在遇到处理不了的问题的时候自行后台重启。
剩下的问题就简单了,在系统中预先给property:persist.sys.ssr.restart_level赋值一个modem值,那系统属性subsys1/restart_level的值就是related了,在buildspec.mk中加上这一句:ADDITIONAL_BUILD_PROPERTIES += persist.sys.ssr.restart_level=$(call add2prop,$(PWV_SSR_RESTART_LEVEL)),PWV_SSR_RESTART_LEVEL是自己定义的宏,其值就是modem,编译,升级软件,验证OK。
觉得这些问题还是挺有价值的,就记录下来,以便以后回味。