LXR | KVM | PM | Time | Interrupt | Systems Performance | Bootup Optimization

ARM GPU(Mali G610)及其驱动、OpenGLES、性能测试相关

关键词:MALI、Valhall、G610、OpenGL、OpenGL-ES、OpenCL、Vulkan、libmali等等。

 以RK3588实例简单了解:

  1. ARM MALI G610硬件驱动,以及相关的libmali库文件。
  2. 通用的OpenGL-ES、EGL协议以及库文件。
  3. OpenGL-ES测试工具:glmark2-es2-wayland/glmark2-es2-drm。
  4. MALI GPU性能调优工具。

1 GPU硬件规格和相关协议

1.1 ARM GPU Mali硬件规格

ARM GPU相关资料:

Mali-G610提供相关规格书介绍,其架构图如下:

1.2 GPU相关协议

G610支持如下规格API:

2 GPU软硬件框架

以glmark2-es2-wayland为例,库依赖关系如下:

 

如下是一个weston作为图形显示框架的环境中,运行glmark2-es2-wayland进行OpenGLES测试实例:

  1. glmark2-es2-wayland调用libGLES库文件进行2D/3D渲染。libGLES是OpenGL-ES协议的实现。
  2. libGLES调用libmali,进而调用MALI GPU功能。libmali是OpenGL-ES和MALI Driver之间的适配层,ARM仅提供库文件。在Mali Drivers | Mali GPU User-Space Binary Drivers – Arm Developer提供了MALI用户空间库文件下载列表。
  3. glmark2-es2-wayland将显示需求交给weston进行处理,weston通过libdrm调用DRM驱动VPU进行处理。
  4. mali driver是ARM提供的MALI开源驱动。Mali GPU目前经历5代:

在实际实现中libGLES、libEGL是空的,OpenGL-ES等实现转移到libmali中。

Mali - Rockchip open source Document》:Rockchip关于Mali的简单介绍。

3 GPU Mali-G610 Driver配置

G610是第4代Valhall。在Rockchip Kernel中,Bifrost驱动支持了Valhall:

Device Drivers
  ->Graphics support
    ->Mali-300/400/450 support
    ->Mali Midgard series support
    ->Mali Kernel Unit Test Framework
    ->Mali Bifrost series support
      ->Platform name
      ->Platform specific options
      Enable Mali CSF based GPU support--使能Command Stream Frontend,否则是Job Manager。
      Enable devfreq support for Mali--对Mali使用devfreq调频节能。
      Enable Streamline tracing support--支持arm Streamline Performance Analyser跟踪信息。
      Enable kbase tracing--在/sys/kernel/debug/mali0/mali_trace跟踪kbase。
      Enable map imported dma-bufs on demand
      Enable legacy compatibility cache flush on dma-buf map
      Enable Expert Settings
        Attempt to allocate 2MB pages
        Enable memory fully physically-backed
        Enable support of GPU core stack power control
        *** Platform options ***
        Enable No Mali
        Enable build of Mali kernel driver for GEM5
        *** Debug options ***
        Enable support for FW core dump
        Enable debug build
        Enable debug sync fence usage
        Enable system event tracing support
        *** Instrumentation options ***
        Select Performance counters set (Primary) --->
        Enable runtime selection of performance counters set via debugfs
        Enable system level support needed for job dumping
        *** Workarounds ***
        Enable workaround for PWRSOFT-765
        Disable workaround for BASE_HW_ISSUE_GPU2017_1336
        Use alternative workaround for BASE_HW_ISSUE_GPU2017_1336
      Enable Virtualization reference code

4 GPU Mali-G610驱动

下面简单梳理Mali在Kernel的驱动。

4.1 G610 DTS

G610的DTS包括两部分:G610基本配置和OPP表。

    gpu: gpu@fb000000 {
        compatible = "arm,mali-bifrost";
        reg = <0x0 0xfb000000 0x0 0x200000>;
        interrupts = <GIC_SPI 94 IRQ_TYPE_LEVEL_HIGH>,
                 <GIC_SPI 93 IRQ_TYPE_LEVEL_HIGH>,
                 <GIC_SPI 92 IRQ_TYPE_LEVEL_HIGH>;
        interrupt-names = "GPU", "MMU", "JOB";

        clocks = <&scmi_clk SCMI_CLK_GPU>, <&cru CLK_GPU_COREGROUP>,
             <&cru CLK_GPU_STACKS>, <&cru CLK_GPU>;
        clock-names = "clk_mali", "clk_gpu_coregroup",
                  "clk_gpu_stacks", "clk_gpu";
        assigned-clocks = <&scmi_clk SCMI_CLK_GPU>;
        assigned-clock-rates = <200000000>;
        power-domains = <&power RK3588_PD_GPU>;
        operating-points-v2 = <&gpu_opp_table>;
        #cooling-cells = <2>;
        dynamic-power-coefficient = <2982>;

        upthreshold = <30>;
        downdifferential = <10>;

        status = "disabled";
    };

    gpu_opp_table: gpu-opp-table {
        compatible = "operating-points-v2";

        nvmem-cells = <&gpu_leakage>, <&specification_serial_number>;
        nvmem-cell-names = "leakage", "specification_serial_number";
        rockchip,supported-hw;

        rockchip,pvtm-voltage-sel = <
            0    815    0
            816    835    1
            836    860    2
            861    885    3
            886    910    4
            911    9999    5
        >;
        rockchip,pvtm-pvtpll;
        rockchip,pvtm-offset = <0x1c>;
        rockchip,pvtm-sample-time = <1100>;
        rockchip,pvtm-freq = <800000>;
        rockchip,pvtm-volt = <750000>;
        rockchip,pvtm-ref-temp = <25>;
        rockchip,pvtm-temp-prop = <(-135) (-135)>;
        rockchip,pvtm-thermal-zone = "gpu-thermal";

        clocks = <&cru CLK_GPU>;
        clock-names = "clk";
        rockchip,grf = <&gpu_grf>;
        volt-mem-read-margin = <
            855000    1
            765000    2
            675000    3
            495000    4
        >;
        low-volt-mem-read-margin = <4>;
        intermediate-threshold-freq = <400000>;    /* KHz */

        rockchip,temp-hysteresis = <5000>;
        rockchip,low-temp = <10000>;
        rockchip,low-temp-min-volt = <750000>;
        rockchip,high-temp = <85000>;
        rockchip,high-temp-max-freq = <800000>;

        opp-300000000 {
            opp-supported-hw = <0xff 0xffff>;
            opp-hz = /bits/ 64 <300000000>;
            opp-microvolt = <675000 675000 850000>,
                    <675000 675000 850000>;
        };
        opp-400000000 {
            opp-supported-hw = <0xff 0xffff>;
            opp-hz = /bits/ 64 <400000000>;
            opp-microvolt = <675000 675000 850000>,
                    <675000 675000 850000>;
        };
...
    };

&gpu {
    mali-supply = <&vdd_gpu_s0>;
    mem-supply = <&vdd_gpu_mem_s0>;
    status = "okay";
};

4.2 G610驱动

G610属于Valhall架构,Rockchip集成的驱动以Valhall Mali 4th Gen GPU Architecture Kernel DriversVX504X08X-SW-99002-r40p0-01eac0.tar为基础。

Rockchip将Midgard,Bifrost,Valhall架构驱动兼容到一起:

kbase_driver_init
  kbase_platform_register
  kbase_platform_driver--注册platfrom驱动,匹配"arm,mali-valhall"。
    kbase_device_alloc
    kbase_device_init
      kbase_device_id_init--配置设备名称和序号。
      kbase_disjoint_init--
      ->遍历dev_init,调用其init()函数。

 GPU的probe函数中依次调用下面init()函数:

static const struct kbase_device_init dev_init[] = {
#if IS_ENABLED(CONFIG_MALI_BIFROST_NO_MALI)
    { kbase_gpu_device_create, kbase_gpu_device_destroy, "Dummy model initialization failed" },
#else
    { assign_irqs, NULL, "IRQ search failed" },--依次获取JOB/MMU/GPU中断,以及触发方式。
    { registers_map, registers_unmap, "Register map failed" },--映射寄存器到kernel内存区域。
#endif
    { power_control_init, power_control_term, "Power control initialization failed" },--获取Regulator/Clock资源。
    { kbase_device_io_history_init, kbase_device_io_history_term,
      "Register access history initialization failed" },
    { kbase_device_early_init, kbase_device_early_term, "Early device initialization failed" },
    { kbase_device_populate_max_freq, NULL, "Populating max frequency failed" },
    { kbase_pm_lowest_gpu_freq_init, NULL, "Lowest freq initialization failed" },
    { kbase_device_misc_init, kbase_device_misc_term,
      "Miscellaneous device initialization failed" },
    { kbase_device_pcm_dev_init, kbase_device_pcm_dev_term,
      "Priority control manager initialization failed" },
    { kbase_ctx_sched_init, kbase_ctx_sched_term, "Context scheduler initialization failed" },
    { kbase_mem_init, kbase_mem_term, "Memory subsystem initialization failed" },
    { kbase_csf_protected_memory_init, kbase_csf_protected_memory_term,--根据"protected-memory-allocator"加载Module。
      "Protected memory allocator initialization failed" },
    { kbase_device_coherency_init, NULL, "Device coherency init failed" },
    { kbase_protected_mode_init, kbase_protected_mode_term,
      "Protected mode subsystem initialization failed" },
    { kbase_device_list_init, kbase_device_list_term, "Device list setup failed" },
    { kbase_device_timeline_init, kbase_device_timeline_term,
      "Timeline stream initialization failed" },
    { kbase_clk_rate_trace_manager_init, kbase_clk_rate_trace_manager_term,
      "Clock rate trace manager initialization failed" },
    { kbase_device_hwcnt_watchdog_if_init, kbase_device_hwcnt_watchdog_if_term,
      "GPU hwcnt backend watchdog interface creation failed" },
    { kbase_device_hwcnt_backend_csf_if_init, kbase_device_hwcnt_backend_csf_if_term,
      "GPU hwcnt backend CSF interface creation failed" },
    { kbase_device_hwcnt_backend_csf_init, kbase_device_hwcnt_backend_csf_term,
      "GPU hwcnt backend creation failed" },
    { kbase_device_hwcnt_context_init, kbase_device_hwcnt_context_term,
      "GPU hwcnt context initialization failed" },
    { kbase_csf_early_init, kbase_csf_early_term, "Early CSF initialization failed" },
    { kbase_backend_late_init, kbase_backend_late_term, "Late backend initialization failed" },
    { kbase_csf_late_init, NULL, "Late CSF initialization failed" },
    { NULL, kbase_device_firmware_hwcnt_term, NULL },
    { kbase_debug_csf_fault_init, kbase_debug_csf_fault_term,
      "CSF fault debug initialization failed" },
    { kbase_device_debugfs_init, kbase_device_debugfs_term, "DebugFS initialization failed" },
    { kbase_sysfs_init, kbase_sysfs_term, "SysFS group creation failed" },--配置Mali Misc设备,以及设备属性节点。
    { kbase_device_misc_register, kbase_device_misc_deregister,
      "Misc device registration failed" },--注册Misc设备。
    { kbase_gpuprops_populate_user_buffer, kbase_gpuprops_free_user_buffer,
      "GPU property population failed" },
    { kbase_device_late_init, kbase_device_late_term, "Late device initialization failed" },
};

4.2.1 创建mali misc设备及其属性节点和调试节点

kbase_sysfs_init
  ->初始化Misc设备,操作函数集为看base_fops
  ->创建Misc设备的属性节点:kbase_attrs/kbase_mempool_attr_group/kbase_scheduling_attr_group。

kbase_fops是/dev/malix设备的操作函数集: 

static const struct file_operations kbase_fops = {
    .owner = THIS_MODULE,
    .open = kbase_open,
    .release = kbase_release,
    .read = kbase_read,
    .poll = kbase_poll,
    .unlocked_ioctl = kbase_ioctl,
    .compat_ioctl = kbase_ioctl,
    .mmap = kbase_mmap,
    .check_flags = kbase_check_flags,
    .get_unmapped_area = kbase_get_unmapped_area,
};

kbase_open()打开/dev/malix设备:

kbase_open
  kbase_find_device
  kbase_mem_migrate_set_address_space_ops
  kbase_device_firmware_init_once
    kbase_csf_firmware_deferred_init
      kbase_csf_firmware_load_init
        kbase_mmu_init
        kbase_mcu_shared_interface_region_tracker_init
        request_firmware
        load_firmware_entry
        kbase_csf_firmware_trace_buffers_init
        kbase_pm_wait_for_l2_powered
        load_mmu_tables
        boot_csf_firmware
        parse_capabilities
        kbase_csf_doorbell_mapping_init
        kbase_csf_scheduler_init
        kbase_csf_setup_dummy_user_reg_page
        kbase_csf_timeout_init
        global_init_on_boot
        kbase_csf_firmware_cfg_init
        kbase_device_csf_iterator_trace_init
        kbase_csf_firmware_log_init
    kbase_device_hwcnt_csf_deferred_init
    kbase_csf_debugfs_init--创建active_groups、scheduling_timer_enabled、scheduling_timer_kick、scheduler_state等调试节点。
    kbase_timeline_io_debugfs_init--创建tlstream调试节点。
  kbase_file_new

5 GPU Mali-G610测试

5.1 OpenGL测试:glmark2-es2-wayland/glmark2-es2-drm

glmark2是测试OpenGL协议的Benchmark工具。

glmark2-es2-wanland是基于Wayland协议的OpenGL测试程序:

glmark2-es2-wayland

结果如下:

arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'.
=======================================================
    glmark2 2021.02
=======================================================
    OpenGL Information
    GL_VENDOR:     ARM
    GL_RENDERER:   Mali-LODX
    GL_VERSION:    OpenGL ES 3.2 v1.g6p0-01eac0.ba52c908d926792b8f5fe28f383a2b03
=======================================================
[build] use-vbo=false: FPS: 2901 FrameTime: 0.345 ms
[build] use-vbo=true: FPS: 2663 FrameTime: 0.376 ms
[texture] texture-filter=nearest: FPS: 3188 FrameTime: 0.314 ms
[texture] texture-filter=linear: FPS: 3355 FrameTime: 0.298 ms
[texture] texture-filter=mipmap: FPS: 2915 FrameTime: 0.343 ms
...
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 2305 FrameTime: 0.434 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 2362 FrameTime: 0.423 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 2306 FrameTime: 0.434 ms
=======================================================
                                  glmark2 Score: 2113
=======================================================

关闭Weston,使用glmark2-es2-drm测试结果:

arm_release_ver of this libmali is 'g6p0-01eac0', rk_so_ver is '6'.
=======================================================
    glmark2 2021.02
=======================================================
    OpenGL Information
    GL_VENDOR:     ARM
    GL_RENDERER:   Mali-LODX
    GL_VERSION:    OpenGL ES 3.2 v1.g6p0-01eac0.ba52c908d926792b8f5fe28f383a2b03
=======================================================
[build] use-vbo=false: FPS: 46 FrameTime: 21.739 ms
[build] use-vbo=true: FPS: 46 FrameTime: 21.739 ms
[texture] texture-filter=nearest: FPS: 46 FrameTime: 21.739 ms
[texture] texture-filter=linear: FPS: 46 FrameTime: 21.739 ms
[texture] texture-filter=mipmap: FPS: 46 FrameTime: 21.739 ms
...
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 46 FrameTime: 21.739 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 46 FrameTime: 21.739 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 46 FrameTime: 21.739 ms
=======================================================
                                  glmark2 Score: 46
=======================================================

5.2 vkmark

基于Vulkan的Benchmark测试工具。

5.3 vulkan-tools

Vulkan-Tools提供Vulkan相关工具。

6 G610调试节点

Mali设备属性节点:

/sys/devices/platform/fb000000.gpu/
|-- core_mask
|-- csg_scheduling_period
|-- debug_command
|-- devfreq
|   `-- fb000000.gpu
|       |-- available_frequencies
|       |-- available_governors
|       |-- cur_freq
|       |-- device -> ../../../fb000000.gpu
|       |-- governor
|       |-- load
|       |-- max_freq
|       |-- min_freq
|       |-- name
|       |-- polling_interval
|       |-- power
|       |-- subsystem -> ../../../../../class/devfreq
|       |-- target_freq
|       |-- timer
|       |-- trans_stat
|       `-- uevent
|-- driver -> ../../../bus/platform/drivers/mali
|-- driver_override
|-- dvfs_period
|-- firmware_config
|   |-- Compute\ iterator\ suspend\ stage\ skip\ mask
|   |   |-- cur
|   |   |-- max
|   |   `-- min
|   |-- Fragment\ iterator\ suspend\ stage\ skip\ mask
|   |   |-- cur
|   |   |-- max
|   |   `-- min
|   |-- Log\ verbosity
|   |   |-- cur
|   |   |-- max
|   |   `-- min
|   `-- Tiler\ iterator\ suspend\ stage\ skip\ mask
|       |-- cur
|       |-- max
|       `-- min
|-- fw_timeout
|-- gpuinfo
|-- idle_hysteresis_time
|-- lp_mem_pool_max_size
|-- lp_mem_pool_size
|-- mcu_shader_pwroff_timeout
|-- mem_pool_max_size
|-- mem_pool_size
|-- mempool
|   |-- ctx_default_max_size
|   |-- lp_max_size
|   `-- max_size
|-- misc
|   `-- mali0
|       |-- dev
|       |-- device -> ../../../fb000000.gpu
|       |-- power
|       |-- subsystem -> ../../../../../class/misc
|       `-- uevent
|-- modalias
|-- of_node -> ../../../firmware/devicetree/base/gpu@fb000000
|-- pm_poweroff
|-- power
|-- power_policy
|-- progress_timeout
|-- reset_timeout
|-- scheduling
|-- subsystem -> ../../../bus/platform
|-- supplier:platform:fd8d8000.power-management:power-controller -> ../../virtual/devlink/platform:fd8d8000.power-management:power-controller--platform:fb000000.gpu
|-- supplier:platform:firmware:scmi -> ../../virtual/devlink/platform:firmware:scmi--platform:fb000000.gpu
|-- supplier:regulator:regulator.10 -> ../../virtual/devlink/regulator:regulator.10--platform:fb000000.gpu
|-- supplier:spi:spi2.0 -> ../../virtual/devlink/spi:spi2.0--platform:fb000000.gpu
|-- uevent
|-- utilisation
`-- utilisation_period

Mali相关ftrace:

/sys/kernel/debug/tracing/events/mali/
|-- enable
|-- filter
|-- mali_CORE_CTX_DESTROY
|-- mali_CORE_CTX_HWINSTR_TERM
|-- mali_CORE_GPU_CLEAN_INV_CACHES
|-- mali_CORE_GPU_HARD_RESET
...
|-- mali_total_alloc_pages_change
|-- sysgraph
`-- sysgraph_gpu

Mali相关debugfs调试节点:

/sys/kernel/debug/mali0/
|-- active_groups
|-- address_spaces
|   |-- as0
|   |-- as1
|   |-- as2
|   |-- as3
|   |-- as4
|   |-- as5
|   |-- as6
|   `-- as7
|-- csf_fault
|-- ctx
|   |-- 922_0
|   |   |-- cpu_queue
|   |   |-- force_same_va
|   |   |-- groups
|   |   |-- infinite_cache
|   |   |-- kcpu_queues
|   |   |-- lp_mem_pool_max_size
|   |   |-- lp_mem_pool_size
|   |   |-- mem_allocs
|   |   |-- mem_jit_count
|   |   |-- mem_jit_phys
|   |   |-- mem_jit_vm
|   |   |-- mem_pool_max_size
|   |   |-- mem_pool_size
|   |   |-- mem_view
|   |   |-- mem_zones
|   |   |-- tiler_heaps
|   |   `-- tiler_heaps_total
|   `-- defaults
|       |-- infinite_cache
|       |-- lp_mem_pool_max_size
|       `-- mem_pool_max_size
|-- dvfs_utilization
|-- fw_trace_enable_mask
|-- fw_trace_mode
|-- fw_traces
|-- gpu_memory
|-- instrumentation
|   `-- csf_tl_poll_interval_in_ms
|-- mali_trace
|-- protected_debug_mode
|-- quirks_gpu
|-- quirks_mmu
|-- quirks_sc
|-- quirks_tiler
|-- regs_history
|-- regs_history_enabled
|-- regs_history_size
|-- reset
|-- scheduler_state
|-- scheduling_timer_enabled
|-- scheduling_timer_kick
`-- tlstream

7 ARM Mali性能调试工具

7.1 Streamline

DS-5 Streamline是ARM提供的一个强大的图形化性能分析和抓取工具,其不仅可以用来做CPU的运行时性能分析,最主要的是还可以用来做Mali系列的GPU分析。甚至可以做基本上是Mali上面最强大的GPU性能分析工具。

Streamline Performance Analyzer (arm.com):Streamline工具官方介绍;DS-5 Streamline分析GPU执行性能Mali GPU OpenGL ES 应用性能优化--测试+定位+优化流程:Streamlne是实践示例。

7.2 Arm Performance Studio

包括如下工具:

  • Performance Advisor - Intuitive summary reports pinpoint problem areas and cut down profiling time.
  • Streamline - For deeper analysis of GPU and 32 and 64 bit CPU counters, profile your game to find bottlenecks and optimize code.
  • Frame Advisor - Capture and analyze rendering data from a significant frame.
  • Graphics Analyzer - Analyze OpenGL ES and Vulkan API calls to determine exactly where rendering defects occur.
  • Mali Offline Compiler - Investigate shader kernels to understand performance on Mali GPUs.
  • RenderDoc for Arm GPUs - The industry-standard frame debugger with early support for Arm GPU extensions and Android features.

参考文档:

Arm GPU Training》是官方关于ARM GPU培训视频,对培训视频的总结笔记有:《Arm Mali GPU 教程系列第 1 辑 内容整理》、《Arm Mali GPU Training——Mail 手机GPU 教程(一) 》。

RK3588和RK3399关于GPU文章:《Rockchip RK3588 - linux下Qt和opencv交叉编译环境搭建》《Rockchip RK3588 - OpenCL环境搭建》《Rockchip RK3399 - Mali-T860 GPU驱动(mesa+Panfrost)

posted on 2024-06-08 23:59  ArnoldLu  阅读(3219)  评论(0编辑  收藏  举报

导航