ARM GPU(Mali G610)及其驱动、OpenGLES、性能测试相关
关键词:MALI、Valhall、G610、OpenGL、OpenGL-ES、OpenCL、Vulkan、libmali等等。
以RK3588实例简单了解:
- ARM MALI G610硬件驱动,以及相关的libmali库文件。
- 通用的OpenGL-ES、EGL协议以及库文件。
- OpenGL-ES测试工具:glmark2-es2-wayland/glmark2-es2-drm。
- MALI GPU性能调优工具。
1 GPU硬件规格和相关协议
1.1 ARM GPU Mali硬件规格
ARM GPU相关资料:
- ARM GPU架构:
- Valhall架构规格书:The Valhall shader core。
- 相关技术:
- ASTC:Adaptive Scalable Texture Compression。
- AFBC:Arm Frame Buffer Compression。
- Understanding Render Passes。
- Pixel Local Storage on ARM Mali GPUs。
- Vulkan Multipass at GDC 2017。
- Killing Pixels - A New Optimization for Shading on ARM Mali GPUs。
- GPU调优工具:
Mali-G610提供相关规格书介绍,其架构图如下:
1.2 GPU相关协议
G610支持如下规格API:
- OpenCL 2.2 Full Profile:Index of /OpenCL/specs/2.2/pdf (khronos.org)。
- OpenGLES 1.1, 2.0, and 3.2:The OpenGL ES® Shading Language, Version 3.20.8 (khronos.org),OpenGL ES 2.0.25 (November 2, 2010) (khronos.org),The OpenGL Graphics System: A Specification (khronos.org)。
- Vulkan 1.2:Vulkan® 1.2.284 - A Specification (khronos.org)。
2 GPU软硬件框架
以glmark2-es2-wayland为例,库依赖关系如下:
如下是一个weston作为图形显示框架的环境中,运行glmark2-es2-wayland进行OpenGLES测试实例:
- glmark2-es2-wayland调用libGLES库文件进行2D/3D渲染。libGLES是OpenGL-ES协议的实现。
- libGLES调用libmali,进而调用MALI GPU功能。libmali是OpenGL-ES和MALI Driver之间的适配层,ARM仅提供库文件。在Mali Drivers | Mali GPU User-Space Binary Drivers – Arm Developer提供了MALI用户空间库文件下载列表。
- glmark2-es2-wayland将显示需求交给weston进行处理,weston通过libdrm调用DRM驱动VPU进行处理。
- mali driver是ARM提供的MALI开源驱动。Mali GPU目前经历5代:
- Mali Drivers | Open Source Mali Utgard GPU Kernel Drivers – Arm Developer
- Mali Drivers | Open Source Mali Midgard GPU Kernel Drivers – Arm Developer
- Mali Drivers | Open Source Bifrost Mali 3rd Gen GPU Architecture Kernel Drivers – Arm Developer
- Mali Drivers | Open Source Valhall Mali 4th Gen GPU Architecture Kernel Drivers – Arm Developer
- Mali Drivers | Open Source Mali 5th Gen GPU Architecture Kernel Drivers – Arm Developer
在实际实现中libGLES、libEGL是空的,OpenGL-ES等实现转移到libmali中。
《Mali - Rockchip open source Document》:Rockchip关于Mali的简单介绍。
3 GPU Mali-G610 Driver配置
G610是第4代Valhall。在Rockchip Kernel中,Bifrost驱动支持了Valhall:
Device Drivers ->Graphics support
->Mali-300/400/450 support
->Mali Midgard series support
->Mali Kernel Unit Test Framework
->Mali Bifrost series support
->Platform name
->Platform specific options
Enable Mali CSF based GPU support--使能Command Stream Frontend,否则是Job Manager。
Enable devfreq support for Mali--对Mali使用devfreq调频节能。
Enable Streamline tracing support--支持arm Streamline Performance Analyser跟踪信息。
Enable kbase tracing--在/sys/kernel/debug/mali0/mali_trace跟踪kbase。
Enable map imported dma-bufs on demand
Enable legacy compatibility cache flush on dma-buf map
Enable Expert Settings
Attempt to allocate 2MB pages
Enable memory fully physically-backed
Enable support of GPU core stack power control
*** Platform options ***
Enable No Mali
Enable build of Mali kernel driver for GEM5
*** Debug options ***
Enable support for FW core dump
Enable debug build
Enable debug sync fence usage
Enable system event tracing support
*** Instrumentation options ***
Select Performance counters set (Primary) --->
Enable runtime selection of performance counters set via debugfs
Enable system level support needed for job dumping
*** Workarounds ***
Enable workaround for PWRSOFT-765
Disable workaround for BASE_HW_ISSUE_GPU2017_1336
Use alternative workaround for BASE_HW_ISSUE_GPU2017_1336
Enable Virtualization reference code
4 GPU Mali-G610驱动
下面简单梳理Mali在Kernel的驱动。
4.1 G610 DTS
G610的DTS包括两部分:G610基本配置和OPP表。
gpu: gpu@fb000000 { compatible = "arm,mali-bifrost"; reg = <0x0 0xfb000000 0x0 0x200000>; interrupts = <GIC_SPI 94 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 93 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 92 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "GPU", "MMU", "JOB"; clocks = <&scmi_clk SCMI_CLK_GPU>, <&cru CLK_GPU_COREGROUP>, <&cru CLK_GPU_STACKS>, <&cru CLK_GPU>; clock-names = "clk_mali", "clk_gpu_coregroup", "clk_gpu_stacks", "clk_gpu"; assigned-clocks = <&scmi_clk SCMI_CLK_GPU>; assigned-clock-rates = <200000000>; power-domains = <&power RK3588_PD_GPU>; operating-points-v2 = <&gpu_opp_table>; #cooling-cells = <2>; dynamic-power-coefficient = <2982>; upthreshold = <30>; downdifferential = <10>; status = "disabled"; }; gpu_opp_table: gpu-opp-table { compatible = "operating-points-v2"; nvmem-cells = <&gpu_leakage>, <&specification_serial_number>; nvmem-cell-names = "leakage", "specification_serial_number"; rockchip,supported-hw; rockchip,pvtm-voltage-sel = < 0 815 0 816 835 1 836 860 2 861 885 3 886 910 4 911 9999 5 >; rockchip,pvtm-pvtpll; rockchip,pvtm-offset = <0x1c>; rockchip,pvtm-sample-time = <1100>; rockchip,pvtm-freq = <800000>; rockchip,pvtm-volt = <750000>; rockchip,pvtm-ref-temp = <25>; rockchip,pvtm-temp-prop = <(-135) (-135)>; rockchip,pvtm-thermal-zone = "gpu-thermal"; clocks = <&cru CLK_GPU>; clock-names = "clk"; rockchip,grf = <&gpu_grf>; volt-mem-read-margin = < 855000 1 765000 2 675000 3 495000 4 >; low-volt-mem-read-margin = <4>; intermediate-threshold-freq = <400000>; /* KHz */ rockchip,temp-hysteresis = <5000>; rockchip,low-temp = <10000>; rockchip,low-temp-min-volt = <750000>; rockchip,high-temp = <85000>; rockchip,high-temp-max-freq = <800000>; opp-300000000 { opp-supported-hw = <0xff 0xffff>; opp-hz = /bits/ 64 <300000000>; opp-microvolt = <675000 675000 850000>, <675000 675000 850000>; }; opp-400000000 { opp-supported-hw = <0xff 0xffff>; opp-hz = /bits/ 64 <400000000>; opp-microvolt = <675000 675000 850000>, <675000 675000 850000>; }; ... }; &gpu { mali-supply = <&vdd_gpu_s0>; mem-supply = <&vdd_gpu_mem_s0>; status = "okay"; };
4.2 G610驱动
G610属于Valhall架构,Rockchip集成的驱动以Valhall Mali 4th Gen GPU Architecture Kernel Drivers的VX504X08X-SW-99002-r40p0-01eac0.tar为基础。
Rockchip将Midgard,Bifrost,Valhall架构驱动兼容到一起:
kbase_driver_init
kbase_platform_register
kbase_platform_driver--注册platfrom驱动,匹配"arm,mali-valhall"。
kbase_device_alloc
kbase_device_init
kbase_device_id_init--配置设备名称和序号。
kbase_disjoint_init--
->遍历dev_init,调用其init()函数。
GPU的probe函数中依次调用下面init()函数:
static const struct kbase_device_init dev_init[] = { #if IS_ENABLED(CONFIG_MALI_BIFROST_NO_MALI) { kbase_gpu_device_create, kbase_gpu_device_destroy, "Dummy model initialization failed" }, #else { assign_irqs, NULL, "IRQ search failed" },--依次获取JOB/MMU/GPU中断,以及触发方式。 { registers_map, registers_unmap, "Register map failed" },--映射寄存器到kernel内存区域。 #endif { power_control_init, power_control_term, "Power control initialization failed" },--获取Regulator/Clock资源。 { kbase_device_io_history_init, kbase_device_io_history_term, "Register access history initialization failed" }, { kbase_device_early_init, kbase_device_early_term, "Early device initialization failed" }, { kbase_device_populate_max_freq, NULL, "Populating max frequency failed" }, { kbase_pm_lowest_gpu_freq_init, NULL, "Lowest freq initialization failed" }, { kbase_device_misc_init, kbase_device_misc_term, "Miscellaneous device initialization failed" }, { kbase_device_pcm_dev_init, kbase_device_pcm_dev_term, "Priority control manager initialization failed" }, { kbase_ctx_sched_init, kbase_ctx_sched_term, "Context scheduler initialization failed" }, { kbase_mem_init, kbase_mem_term, "Memory subsystem initialization failed" }, { kbase_csf_protected_memory_init, kbase_csf_protected_memory_term,--根据"protected-memory-allocator"加载Module。 "Protected memory allocator initialization failed" }, { kbase_device_coherency_init, NULL, "Device coherency init failed" }, { kbase_protected_mode_init, kbase_protected_mode_term, "Protected mode subsystem initialization failed" }, { kbase_device_list_init, kbase_device_list_term, "Device list setup failed" }, { kbase_device_timeline_init, kbase_device_timeline_term, "Timeline stream initialization failed" }, { kbase_clk_rate_trace_manager_init, kbase_clk_rate_trace_manager_term, "Clock rate trace manager initialization failed" }, { kbase_device_hwcnt_watchdog_if_init, kbase_device_hwcnt_watchdog_if_term, "GPU hwcnt backend watchdog interface creation failed" }, { kbase_device_hwcnt_backend_csf_if_init, kbase_device_hwcnt_backend_csf_if_term, "GPU hwcnt backend CSF interface creation failed" }, { kbase_device_hwcnt_backend_csf_init, kbase_device_hwcnt_backend_csf_term, "GPU hwcnt backend creation failed" }, { kbase_device_hwcnt_context_init, kbase_device_hwcnt_context_term, "GPU hwcnt context initialization failed" }, { kbase_csf_early_init, kbase_csf_early_term, "Early CSF initialization failed" }, { kbase_backend_late_init, kbase_backend_late_term, "Late backend initialization failed" }, { kbase_csf_late_init, NULL, "Late CSF initialization failed" }, { NULL, kbase_device_firmware_hwcnt_term, NULL }, { kbase_debug_csf_fault_init, kbase_debug_csf_fault_term, "CSF fault debug initialization failed" }, { kbase_device_debugfs_init, kbase_device_debugfs_term, "DebugFS initialization failed" }, { kbase_sysfs_init, kbase_sysfs_term, "SysFS group creation failed" },--配置Mali Misc设备,以及设备属性节点。 { kbase_device_misc_register, kbase_device_misc_deregister, "Misc device registration failed" },--注册Misc设备。 { kbase_gpuprops_populate_user_buffer, kbase_gpuprops_free_user_buffer, "GPU property population failed" }, { kbase_device_late_init, kbase_device_late_term, "Late device initialization failed" }, };
4.2.1 创建mali misc设备及其属性节点和调试节点
kbase_sysfs_init
->初始化Misc设备,操作函数集为看base_fops。
->创建Misc设备的属性节点:kbase_attrs/kbase_mempool_attr_group/kbase_scheduling_attr_group。
kbase_fops是/dev/malix设备的操作函数集:
static const struct file_operations kbase_fops = { .owner = THIS_MODULE, .open = kbase_open, .release = kbase_release, .read = kbase_read, .poll = kbase_poll, .unlocked_ioctl = kbase_ioctl, .compat_ioctl = kbase_ioctl, .mmap = kbase_mmap, .check_flags = kbase_check_flags, .get_unmapped_area = kbase_get_unmapped_area, };
kbase_open()打开/dev/malix设备:
kbase_open
kbase_find_device
kbase_mem_migrate_set_address_space_ops
kbase_device_firmware_init_once
kbase_csf_firmware_deferred_init
kbase_csf_firmware_load_init
kbase_mmu_init
kbase_mcu_shared_interface_region_tracker_init
request_firmware
load_firmware_entry
kbase_csf_firmware_trace_buffers_init
kbase_pm_wait_for_l2_powered
load_mmu_tables
boot_csf_firmware
parse_capabilities
kbase_csf_doorbell_mapping_init
kbase_csf_scheduler_init
kbase_csf_setup_dummy_user_reg_page
kbase_csf_timeout_init
global_init_on_boot
kbase_csf_firmware_cfg_init
kbase_device_csf_iterator_trace_init
kbase_csf_firmware_log_init
kbase_device_hwcnt_csf_deferred_init
kbase_csf_debugfs_init--创建active_groups、scheduling_timer_enabled、scheduling_timer_kick、scheduler_state等调试节点。
kbase_timeline_io_debugfs_init--创建tlstream调试节点。
kbase_file_new
5 GPU Mali-G610测试
5.1 OpenGL测试:glmark2-es2-wayland/glmark2-es2-drm
glmark2是测试OpenGL协议的Benchmark工具。
glmark2-es2-wanland是基于Wayland协议的OpenGL测试程序:
glmark2-es2-wayland
结果如下:
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'. ======================================================= glmark2 2021.02 ======================================================= OpenGL Information GL_VENDOR: ARM GL_RENDERER: Mali-LODX GL_VERSION: OpenGL ES 3.2 v1.g6p0-01eac0.ba52c908d926792b8f5fe28f383a2b03 ======================================================= [build] use-vbo=false: FPS: 2901 FrameTime: 0.345 ms [build] use-vbo=true: FPS: 2663 FrameTime: 0.376 ms [texture] texture-filter=nearest: FPS: 3188 FrameTime: 0.314 ms [texture] texture-filter=linear: FPS: 3355 FrameTime: 0.298 ms [texture] texture-filter=mipmap: FPS: 2915 FrameTime: 0.343 ms ... [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2305 FrameTime: 0.434 ms [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2362 FrameTime: 0.423 ms [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2306 FrameTime: 0.434 ms ======================================================= glmark2 Score: 2113 =======================================================
关闭Weston,使用glmark2-es2-drm测试结果:
arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'. ======================================================= glmark2 2021.02 ======================================================= OpenGL Information GL_VENDOR: ARM GL_RENDERER: Mali-LODX GL_VERSION: OpenGL ES 3.2 v1.g6p0-01eac0.ba52c908d926792b8f5fe28f383a2b03 ======================================================= [build] use-vbo=false: FPS: 46 FrameTime: 21.739 ms [build] use-vbo=true: FPS: 46 FrameTime: 21.739 ms [texture] texture-filter=nearest: FPS: 46 FrameTime: 21.739 ms [texture] texture-filter=linear: FPS: 46 FrameTime: 21.739 ms [texture] texture-filter=mipmap: FPS: 46 FrameTime: 21.739 ms ... [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 46 FrameTime: 21.739 ms [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 46 FrameTime: 21.739 ms [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 46 FrameTime: 21.739 ms ======================================================= glmark2 Score: 46 =======================================================
5.2 vkmark
基于Vulkan的Benchmark测试工具。
5.3 vulkan-tools
Vulkan-Tools提供Vulkan相关工具。
6 G610调试节点
Mali设备属性节点:
/sys/devices/platform/fb000000.gpu/ |-- core_mask |-- csg_scheduling_period |-- debug_command |-- devfreq | `-- fb000000.gpu | |-- available_frequencies | |-- available_governors | |-- cur_freq | |-- device -> ../../../fb000000.gpu | |-- governor | |-- load | |-- max_freq | |-- min_freq | |-- name | |-- polling_interval | |-- power | |-- subsystem -> ../../../../../class/devfreq | |-- target_freq | |-- timer | |-- trans_stat | `-- uevent |-- driver -> ../../../bus/platform/drivers/mali |-- driver_override |-- dvfs_period |-- firmware_config | |-- Compute\ iterator\ suspend\ stage\ skip\ mask | | |-- cur | | |-- max | | `-- min | |-- Fragment\ iterator\ suspend\ stage\ skip\ mask | | |-- cur | | |-- max | | `-- min | |-- Log\ verbosity | | |-- cur | | |-- max | | `-- min | `-- Tiler\ iterator\ suspend\ stage\ skip\ mask | |-- cur | |-- max | `-- min |-- fw_timeout |-- gpuinfo |-- idle_hysteresis_time |-- lp_mem_pool_max_size |-- lp_mem_pool_size |-- mcu_shader_pwroff_timeout |-- mem_pool_max_size |-- mem_pool_size |-- mempool | |-- ctx_default_max_size | |-- lp_max_size | `-- max_size |-- misc | `-- mali0 | |-- dev | |-- device -> ../../../fb000000.gpu | |-- power | |-- subsystem -> ../../../../../class/misc | `-- uevent |-- modalias |-- of_node -> ../../../firmware/devicetree/base/gpu@fb000000 |-- pm_poweroff |-- power |-- power_policy |-- progress_timeout |-- reset_timeout |-- scheduling |-- subsystem -> ../../../bus/platform |-- supplier:platform:fd8d8000.power-management:power-controller -> ../../virtual/devlink/platform:fd8d8000.power-management:power-controller--platform:fb000000.gpu |-- supplier:platform:firmware:scmi -> ../../virtual/devlink/platform:firmware:scmi--platform:fb000000.gpu |-- supplier:regulator:regulator.10 -> ../../virtual/devlink/regulator:regulator.10--platform:fb000000.gpu |-- supplier:spi:spi2.0 -> ../../virtual/devlink/spi:spi2.0--platform:fb000000.gpu |-- uevent |-- utilisation `-- utilisation_period
Mali相关ftrace:
/sys/kernel/debug/tracing/events/mali/ |-- enable |-- filter |-- mali_CORE_CTX_DESTROY |-- mali_CORE_CTX_HWINSTR_TERM |-- mali_CORE_GPU_CLEAN_INV_CACHES |-- mali_CORE_GPU_HARD_RESET ... |-- mali_total_alloc_pages_change |-- sysgraph `-- sysgraph_gpu
Mali相关debugfs调试节点:
/sys/kernel/debug/mali0/ |-- active_groups |-- address_spaces | |-- as0 | |-- as1 | |-- as2 | |-- as3 | |-- as4 | |-- as5 | |-- as6 | `-- as7 |-- csf_fault |-- ctx | |-- 922_0 | | |-- cpu_queue | | |-- force_same_va | | |-- groups | | |-- infinite_cache | | |-- kcpu_queues | | |-- lp_mem_pool_max_size | | |-- lp_mem_pool_size | | |-- mem_allocs | | |-- mem_jit_count | | |-- mem_jit_phys | | |-- mem_jit_vm | | |-- mem_pool_max_size | | |-- mem_pool_size | | |-- mem_view | | |-- mem_zones | | |-- tiler_heaps | | `-- tiler_heaps_total | `-- defaults | |-- infinite_cache | |-- lp_mem_pool_max_size | `-- mem_pool_max_size |-- dvfs_utilization |-- fw_trace_enable_mask |-- fw_trace_mode |-- fw_traces |-- gpu_memory |-- instrumentation | `-- csf_tl_poll_interval_in_ms |-- mali_trace |-- protected_debug_mode |-- quirks_gpu |-- quirks_mmu |-- quirks_sc |-- quirks_tiler |-- regs_history |-- regs_history_enabled |-- regs_history_size |-- reset |-- scheduler_state |-- scheduling_timer_enabled |-- scheduling_timer_kick `-- tlstream
7 ARM Mali性能调试工具
7.1 Streamline
DS-5 Streamline是ARM提供的一个强大的图形化性能分析和抓取工具,其不仅可以用来做CPU的运行时性能分析,最主要的是还可以用来做Mali系列的GPU分析。甚至可以做基本上是Mali上面最强大的GPU性能分析工具。
Streamline Performance Analyzer (arm.com):Streamline工具官方介绍;DS-5 Streamline分析GPU执行性能和Mali GPU OpenGL ES 应用性能优化--测试+定位+优化流程:Streamlne是实践示例。
7.2 Arm Performance Studio
包括如下工具:
- Performance Advisor - Intuitive summary reports pinpoint problem areas and cut down profiling time.
- Streamline - For deeper analysis of GPU and 32 and 64 bit CPU counters, profile your game to find bottlenecks and optimize code.
- Frame Advisor - Capture and analyze rendering data from a significant frame.
- Graphics Analyzer - Analyze OpenGL ES and Vulkan API calls to determine exactly where rendering defects occur.
- Mali Offline Compiler - Investigate shader kernels to understand performance on Mali GPUs.
- RenderDoc for Arm GPUs - The industry-standard frame debugger with early support for Arm GPU extensions and Android features.
参考文档:
《Arm GPU Training》是官方关于ARM GPU培训视频,对培训视频的总结笔记有:《Arm Mali GPU 教程系列第 1 辑 内容整理》、《Arm Mali GPU Training——Mail 手机GPU 教程(一) 》。
RK3588和RK3399关于GPU文章:《Rockchip RK3588 - linux下Qt和opencv交叉编译环境搭建》《Rockchip RK3588 - OpenCL环境搭建》《Rockchip RK3399 - Mali-T860 GPU驱动(mesa+Panfrost)》