一、Opencv-OCL编程基础
1. Opencv OCL基本编程API
a) Opencv4.2 OCL API
b) Opencv3.1 OCL API
2. 图像处理Kernel实现及CU单元配置
3. Demo实验
我目前编译使用的opencv版本是opencv4.2【如果版本不同请下载不同版本下的Demo程序】,使用如下官方提供的Opencv-OCL代码,如果只是单纯的测试运行此官方提供的代码不需要有特定的加速设备,直接使用多核心CPU-PC平台即可,因为OpenCL本身就支持了CPU加速,具体代码如下:
1 // This file is part of OpenCV project. 2 // It is subject to the license terms in the LICENSE file found in the top-level directory 3 // of this distribution and at http://opencv.org/license.html 4 5 #include "opencv2/core.hpp" 6 #include "opencv2/core/ocl.hpp" 7 #include "opencv2/highgui.hpp" 8 #include "opencv2/imgcodecs.hpp" 9 #include "opencv2/imgproc.hpp" 10 11 #include <iostream> 12 13 using namespace std; 14 using namespace cv; 15 16 static const char* opencl_kernel_src = 17 "__kernel void magnutude_filter_8u(\n" 18 " __global const uchar* src, int src_step, int src_offset,\n" 19 " __global uchar* dst, int dst_step, int dst_offset, int dst_rows, int dst_cols,\n" 20 " float scale)\n" 21 "{\n" 22 " int x = get_global_id(0);\n" 23 " int y = get_global_id(1);\n" 24 " if (x < dst_cols && y < dst_rows)\n" 25 " {\n" 26 " int dst_idx = y * dst_step + x + dst_offset;\n" 27 " if (x > 0 && x < dst_cols - 1 && y > 0 && y < dst_rows - 2)\n" 28 " {\n" 29 " int src_idx = y * src_step + x + src_offset;\n" 30 " int dx = (int)src[src_idx]*2 - src[src_idx - 1] - src[src_idx + 1];\n" 31 " int dy = (int)src[src_idx]*2 - src[src_idx - 1*src_step] - src[src_idx + 1*src_step];\n" 32 " dst[dst_idx] = convert_uchar_sat(sqrt((float)(dx*dx + dy*dy)) * scale);\n" 33 " }\n" 34 " else\n" 35 " {\n" 36 " dst[dst_idx] = 0;\n" 37 " }\n" 38 " }\n" 39 "}\n"; 40 41 int main(int argc, char** argv) 42 { 43 const char* keys = 44 "{ i input | | specify input image }" 45 "{ h help | | print help message }"; 46 47 cv::CommandLineParser args(argc, argv, keys); 48 if (args.has("help")) 49 { 50 cout << "Usage : " << argv[0] << " [options]" << endl; 51 cout << "Available options:" << endl; 52 args.printMessage(); 53 return EXIT_SUCCESS; 54 } 55 56 cv::ocl::Context ctx = cv::ocl::Context::getDefault(); 57 if (!ctx.ptr()) 58 { 59 cerr << "OpenCL is not available" << endl; 60 return 1; 61 } 62 cv::ocl::Device device = cv::ocl::Device::getDefault(); 63 if (!device.compilerAvailable()) 64 { 65 cerr << "OpenCL compiler is not available" << endl; 66 return 1; 67 } 68 69 70 UMat src; 71 { 72 string image_file = args.get<string>("i"); 73 if (!image_file.empty()) 74 { 75 Mat image = imread(samples::findFile(image_file)); 76 if (image.empty()) 77 { 78 cout << "error read image: " << image_file << endl; 79 return 1; 80 } 81 cvtColor(image, src, COLOR_BGR2GRAY); 82 } 83 else 84 { 85 Mat frame(cv::Size(640, 480), CV_8U, Scalar::all(128)); 86 Point p(frame.cols / 2, frame.rows / 2); 87 line(frame, Point(0, frame.rows / 2), Point(frame.cols, frame.rows / 2), 1); 88 circle(frame, p, 200, Scalar(32, 32, 32), 8, LINE_AA); 89 string str = "OpenCL"; 90 int baseLine = 0; 91 Size box = getTextSize(str, FONT_HERSHEY_COMPLEX, 2, 5, &baseLine); 92 putText(frame, str, Point((frame.cols - box.width) / 2, (frame.rows - box.height) / 2 + baseLine), 93 FONT_HERSHEY_COMPLEX, 2, Scalar(255, 255, 255), 5, LINE_AA); 94 frame.copyTo(src); 95 } 96 } 97 98 99 cv::String module_name; // empty to disable OpenCL cache 100 101 { 102 cout << "OpenCL program source: " << endl; 103 cout << "======================================================================================================" << endl; 104 cout << opencl_kernel_src << endl; 105 cout << "======================================================================================================" << endl; 106 //! [Define OpenCL program source] 107 cv::ocl::ProgramSource source(module_name, "simple", opencl_kernel_src, ""); 108 //! [Define OpenCL program source] 109 110 //! [Compile/build OpenCL for current OpenCL device] 111 cv::String errmsg; 112 cv::ocl::Program program(source, "", errmsg); 113 if (program.ptr() == NULL) 114 { 115 cerr << "Can't compile OpenCL program:" << endl << errmsg << endl; 116 return 1; 117 } 118 //! [Compile/build OpenCL for current OpenCL device] 119 120 if (!errmsg.empty()) 121 { 122 cout << "OpenCL program build log:" << endl << errmsg << endl; 123 } 124 125 //! [Get OpenCL kernel by name] 126 cv::ocl::Kernel k("magnutude_filter_8u", program); 127 if (k.empty()) 128 { 129 cerr << "Can't get OpenCL kernel" << endl; 130 return 1; 131 } 132 //! [Get OpenCL kernel by name] 133 134 UMat result(src.size(), CV_8UC1); 135 136 //! [Define kernel parameters and run] 137 size_t globalSize[2] = {(size_t)src.cols, (size_t)src.rows}; 138 size_t localSize[2] = {8, 8}; 139 bool executionResult = k 140 .args( 141 cv::ocl::KernelArg::ReadOnlyNoSize(src), // size is not used (similar to 'dst' size) 142 cv::ocl::KernelArg::WriteOnly(result), 143 (float)2.0 144 ) 145 .run(2, globalSize, localSize, true); 146 if (!executionResult) 147 { 148 cerr << "OpenCL kernel launch failed" << endl; 149 return 1; 150 } 151 //! [Define kernel parameters and run] 152 153 imshow("Source", src); 154 imshow("Result", result); 155 156 for (;;) 157 { 158 int key = waitKey(); 159 if (key == 27/*ESC*/ || key == 'q' || key == 'Q') 160 break; 161 } 162 } 163 return 0; 164 }
使用mingw编译上述程序并运行结果如下(根据运行结果,说明对图像进行了边缘提取的功能):
当然如果你想处理其他的图像,也可以使用在CMD窗口当中调用编译完成的.exe文件加上文件的绝对路径。
终端Terminal当中显示了Kernel的基本内容,如下所示:
从Kernel的结构可以分析出来,kernel中实现的是求解了图像自身的x方向以及y方向的梯度,并求出了每一点的梯度方向,实际上比较类似与Canny边缘算子的检测算法,kernel使用了二维的方式处理图片上每一个点,在kernel核当中,使用了if判断是否存在指针越界的情况,具体的基础实现相关内容请移步OpenCL基础入门。
二、Demo代码变形
实现图像3x3均值滤波
三、嵌入式平台移植与编译(TI AM57x 系列)
Opencv3.1.0版本master下,opencv-ocl代码官方Demo:
1 /* 2 // The example of interoperability between OpenCL and OpenCV. 3 // This will loop through frames of video either from input media file 4 // or camera device and do processing of these data in OpenCL and then 5 // in OpenCV. In OpenCL it does inversion of pixels in left half of frame and 6 // in OpenCV it does bluring in the right half of frame. 7 */ 8 #include <cstdio> 9 #include <cstdlib> 10 #include <iostream> 11 #include <fstream> 12 #include <string> 13 #include <sstream> 14 #include <iomanip> 15 #include <stdexcept> 16 17 #define CL_USE_DEPRECATED_OPENCL_2_0_APIS // eliminate build warning 18 19 #if __APPLE__ 20 #include <OpenCL/cl.h> 21 #else 22 #include <CL/cl.h> 23 #endif 24 25 #include <opencv2/core/ocl.hpp> 26 #include <opencv2/core/utility.hpp> 27 #include <opencv2/video.hpp> 28 #include <opencv2/highgui.hpp> 29 #include <opencv2/imgproc.hpp> 30 31 32 using namespace std; 33 using namespace cv; 34 35 namespace opencl { 36 37 class PlatformInfo 38 { 39 public: 40 PlatformInfo() 41 {} 42 43 ~PlatformInfo() 44 {} 45 46 cl_int QueryInfo(cl_platform_id id) 47 { 48 query_param(id, CL_PLATFORM_PROFILE, m_profile); 49 query_param(id, CL_PLATFORM_VERSION, m_version); 50 query_param(id, CL_PLATFORM_NAME, m_name); 51 query_param(id, CL_PLATFORM_VENDOR, m_vendor); 52 query_param(id, CL_PLATFORM_EXTENSIONS, m_extensions); 53 return CL_SUCCESS; 54 } 55 56 std::string Profile() { return m_profile; } 57 std::string Version() { return m_version; } 58 std::string Name() { return m_name; } 59 std::string Vendor() { return m_vendor; } 60 std::string Extensions() { return m_extensions; } 61 62 private: 63 cl_int query_param(cl_platform_id id, cl_platform_info param, std::string& paramStr) 64 { 65 cl_int res; 66 67 size_t psize; 68 cv::AutoBuffer<char> buf; 69 70 res = clGetPlatformInfo(id, param, 0, 0, &psize); 71 if (CL_SUCCESS != res) 72 throw std::runtime_error(std::string("clGetPlatformInfo failed")); 73 74 buf.resize(psize); 75 res = clGetPlatformInfo(id, param, psize, buf, 0); 76 if (CL_SUCCESS != res) 77 throw std::runtime_error(std::string("clGetPlatformInfo failed")); 78 79 // just in case, ensure trailing zero for ASCIIZ string 80 buf[psize] = 0; 81 82 paramStr = buf; 83 84 return CL_SUCCESS; 85 } 86 87 private: 88 std::string m_profile; 89 std::string m_version; 90 std::string m_name; 91 std::string m_vendor; 92 std::string m_extensions; 93 }; 94 95 96 class DeviceInfo 97 { 98 public: 99 DeviceInfo() 100 {} 101 102 ~DeviceInfo() 103 {} 104 105 cl_int QueryInfo(cl_device_id id) 106 { 107 query_param(id, CL_DEVICE_TYPE, m_type); 108 query_param(id, CL_DEVICE_VENDOR_ID, m_vendor_id); 109 query_param(id, CL_DEVICE_MAX_COMPUTE_UNITS, m_max_compute_units); 110 query_param(id, CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, m_max_work_item_dimensions); 111 query_param(id, CL_DEVICE_MAX_WORK_ITEM_SIZES, m_max_work_item_sizes); 112 query_param(id, CL_DEVICE_MAX_WORK_GROUP_SIZE, m_max_work_group_size); 113 query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR, m_preferred_vector_width_char); 114 query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT, m_preferred_vector_width_short); 115 query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT, m_preferred_vector_width_int); 116 query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG, m_preferred_vector_width_long); 117 query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT, m_preferred_vector_width_float); 118 query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE, m_preferred_vector_width_double); 119 #if defined(CL_VERSION_1_1) 120 query_param(id, CL_DEVICE_PREFERRED_VECTOR_WIDTH_HALF, m_preferred_vector_width_half); 121 query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_CHAR, m_native_vector_width_char); 122 query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_SHORT, m_native_vector_width_short); 123 query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_INT, m_native_vector_width_int); 124 query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_LONG, m_native_vector_width_long); 125 query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_FLOAT, m_native_vector_width_float); 126 query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_DOUBLE, m_native_vector_width_double); 127 query_param(id, CL_DEVICE_NATIVE_VECTOR_WIDTH_HALF, m_native_vector_width_half); 128 #endif 129 query_param(id, CL_DEVICE_MAX_CLOCK_FREQUENCY, m_max_clock_frequency); 130 query_param(id, CL_DEVICE_ADDRESS_BITS, m_address_bits); 131 query_param(id, CL_DEVICE_MAX_MEM_ALLOC_SIZE, m_max_mem_alloc_size); 132 query_param(id, CL_DEVICE_IMAGE_SUPPORT, m_image_support); 133 query_param(id, CL_DEVICE_MAX_READ_IMAGE_ARGS, m_max_read_image_args); 134 query_param(id, CL_DEVICE_MAX_WRITE_IMAGE_ARGS, m_max_write_image_args); 135 #if defined(CL_VERSION_2_0) 136 query_param(id, CL_DEVICE_MAX_READ_WRITE_IMAGE_ARGS, m_max_read_write_image_args); 137 #endif 138 query_param(id, CL_DEVICE_IMAGE2D_MAX_WIDTH, m_image2d_max_width); 139 query_param(id, CL_DEVICE_IMAGE2D_MAX_HEIGHT, m_image2d_max_height); 140 query_param(id, CL_DEVICE_IMAGE3D_MAX_WIDTH, m_image3d_max_width); 141 query_param(id, CL_DEVICE_IMAGE3D_MAX_HEIGHT, m_image3d_max_height); 142 query_param(id, CL_DEVICE_IMAGE3D_MAX_DEPTH, m_image3d_max_depth); 143 #if defined(CL_VERSION_1_2) 144 query_param(id, CL_DEVICE_IMAGE_MAX_BUFFER_SIZE, m_image_max_buffer_size); 145 query_param(id, CL_DEVICE_IMAGE_MAX_ARRAY_SIZE, m_image_max_array_size); 146 #endif 147 query_param(id, CL_DEVICE_MAX_SAMPLERS, m_max_samplers); 148 #if defined(CL_VERSION_1_2) 149 query_param(id, CL_DEVICE_IMAGE_PITCH_ALIGNMENT, m_image_pitch_alignment); 150 query_param(id, CL_DEVICE_IMAGE_BASE_ADDRESS_ALIGNMENT, m_image_base_address_alignment); 151 #endif 152 #if defined(CL_VERSION_2_0) 153 query_param(id, CL_DEVICE_MAX_PIPE_ARGS, m_max_pipe_args); 154 query_param(id, CL_DEVICE_PIPE_MAX_ACTIVE_RESERVATIONS, m_pipe_max_active_reservations); 155 query_param(id, CL_DEVICE_PIPE_MAX_PACKET_SIZE, m_pipe_max_packet_size); 156 #endif 157 query_param(id, CL_DEVICE_MAX_PARAMETER_SIZE, m_max_parameter_size); 158 query_param(id, CL_DEVICE_MEM_BASE_ADDR_ALIGN, m_mem_base_addr_align); 159 query_param(id, CL_DEVICE_SINGLE_FP_CONFIG, m_single_fp_config); 160 #if defined(CL_VERSION_1_2) 161 query_param(id, CL_DEVICE_DOUBLE_FP_CONFIG, m_double_fp_config); 162 #endif 163 query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE, m_global_mem_cache_type); 164 query_param(id, CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE, m_global_mem_cacheline_size); 165 query_param(id, CL_DEVICE_GLOBAL_MEM_CACHE_SIZE, m_global_mem_cache_size); 166 query_param(id, CL_DEVICE_GLOBAL_MEM_SIZE, m_global_mem_size); 167 query_param(id, CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE, m_max_constant_buffer_size); 168 query_param(id, CL_DEVICE_MAX_CONSTANT_ARGS, m_max_constant_args); 169 #if defined(CL_VERSION_2_0) 170 query_param(id, CL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE, m_max_global_variable_size); 171 query_param(id, CL_DEVICE_GLOBAL_VARIABLE_PREFERRED_TOTAL_SIZE, m_global_variable_preferred_total_size); 172 #endif 173 query_param(id, CL_DEVICE_LOCAL_MEM_TYPE, m_local_mem_type); 174 query_param(id, CL_DEVICE_LOCAL_MEM_SIZE, m_local_mem_size); 175 query_param(id, CL_DEVICE_ERROR_CORRECTION_SUPPORT, m_error_correction_support); 176 #if defined(CL_VERSION_1_1) 177 query_param(id, CL_DEVICE_HOST_UNIFIED_MEMORY, m_host_unified_memory); 178 #endif 179 query_param(id, CL_DEVICE_PROFILING_TIMER_RESOLUTION, m_profiling_timer_resolution); 180 query_param(id, CL_DEVICE_ENDIAN_LITTLE, m_endian_little); 181 query_param(id, CL_DEVICE_AVAILABLE, m_available); 182 query_param(id, CL_DEVICE_COMPILER_AVAILABLE, m_compiler_available); 183 #if defined(CL_VERSION_1_2) 184 query_param(id, CL_DEVICE_LINKER_AVAILABLE, m_linker_available); 185 #endif 186 query_param(id, CL_DEVICE_EXECUTION_CAPABILITIES, m_execution_capabilities); 187 query_param(id, CL_DEVICE_QUEUE_PROPERTIES, m_queue_properties); 188 #if defined(CL_VERSION_2_0) 189 query_param(id, CL_DEVICE_QUEUE_ON_HOST_PROPERTIES, m_queue_on_host_properties); 190 query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PROPERTIES, m_queue_on_device_properties); 191 query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_PREFERRED_SIZE, m_queue_on_device_preferred_size); 192 query_param(id, CL_DEVICE_QUEUE_ON_DEVICE_MAX_SIZE, m_queue_on_device_max_size); 193 query_param(id, CL_DEVICE_MAX_ON_DEVICE_QUEUES, m_max_on_device_queues); 194 query_param(id, CL_DEVICE_MAX_ON_DEVICE_EVENTS, m_max_on_device_events); 195 #endif 196 #if defined(CL_VERSION_1_2) 197 query_param(id, CL_DEVICE_BUILT_IN_KERNELS, m_built_in_kernels); 198 #endif 199 query_param(id, CL_DEVICE_PLATFORM, m_platform); 200 query_param(id, CL_DEVICE_NAME, m_name); 201 query_param(id, CL_DEVICE_VENDOR, m_vendor); 202 query_param(id, CL_DRIVER_VERSION, m_driver_version); 203 query_param(id, CL_DEVICE_PROFILE, m_profile); 204 query_param(id, CL_DEVICE_VERSION, m_version); 205 #if defined(CL_VERSION_1_1) 206 query_param(id, CL_DEVICE_OPENCL_C_VERSION, m_opencl_c_version); 207 #endif 208 query_param(id, CL_DEVICE_EXTENSIONS, m_extensions); 209 #if defined(CL_VERSION_1_2) 210 query_param(id, CL_DEVICE_PRINTF_BUFFER_SIZE, m_printf_buffer_size); 211 query_param(id, CL_DEVICE_PREFERRED_INTEROP_USER_SYNC, m_preferred_interop_user_sync); 212 query_param(id, CL_DEVICE_PARENT_DEVICE, m_parent_device); 213 query_param(id, CL_DEVICE_PARTITION_MAX_SUB_DEVICES, m_partition_max_sub_devices); 214 query_param(id, CL_DEVICE_PARTITION_PROPERTIES, m_partition_properties); 215 query_param(id, CL_DEVICE_PARTITION_AFFINITY_DOMAIN, m_partition_affinity_domain); 216 query_param(id, CL_DEVICE_PARTITION_TYPE, m_partition_type); 217 query_param(id, CL_DEVICE_REFERENCE_COUNT, m_reference_count); 218 #endif 219 return CL_SUCCESS; 220 } 221 222 std::string Name() { return m_name; } 223 224 private: 225 template<typename T> 226 cl_int query_param(cl_device_id id, cl_device_info param, T& value) 227 { 228 cl_int res; 229 size_t size = 0; 230 231 res = clGetDeviceInfo(id, param, 0, 0, &size); 232 if (CL_SUCCESS != res && size != 0) 233 throw std::runtime_error(std::string("clGetDeviceInfo failed")); 234 235 if (0 == size) 236 return CL_SUCCESS; 237 238 if (sizeof(T) != size) 239 throw std::runtime_error(std::string("clGetDeviceInfo: param size mismatch")); 240 241 res = clGetDeviceInfo(id, param, size, &value, 0); 242 if (CL_SUCCESS != res) 243 throw std::runtime_error(std::string("clGetDeviceInfo failed")); 244 245 return CL_SUCCESS; 246 } 247 248 template<typename T> 249 cl_int query_param(cl_device_id id, cl_device_info param, std::vector<T>& value) 250 { 251 cl_int res; 252 size_t size; 253 254 res = clGetDeviceInfo(id, param, 0, 0, &size); 255 if (CL_SUCCESS != res) 256 throw std::runtime_error(std::string("clGetDeviceInfo failed")); 257 258 if (0 == size) 259 return CL_SUCCESS; 260 261 value.resize(size / sizeof(T)); 262 263 res = clGetDeviceInfo(id, param, size, &value[0], 0); 264 if (CL_SUCCESS != res) 265 throw std::runtime_error(std::string("clGetDeviceInfo failed")); 266 267 return CL_SUCCESS; 268 } 269 270 cl_int query_param(cl_device_id id, cl_device_info param, std::string& value) 271 { 272 cl_int res; 273 size_t size; 274 275 res = clGetDeviceInfo(id, param, 0, 0, &size); 276 if (CL_SUCCESS != res) 277 throw std::runtime_error(std::string("clGetDeviceInfo failed")); 278 279 value.resize(size + 1); 280 281 res = clGetDeviceInfo(id, param, size, &value[0], 0); 282 if (CL_SUCCESS != res) 283 throw std::runtime_error(std::string("clGetDeviceInfo failed")); 284 285 // just in case, ensure trailing zero for ASCIIZ string 286 value[size] = 0; 287 288 return CL_SUCCESS; 289 } 290 291 private: 292 cl_device_type m_type; 293 cl_uint m_vendor_id; 294 cl_uint m_max_compute_units; 295 cl_uint m_max_work_item_dimensions; 296 std::vector<size_t> m_max_work_item_sizes; 297 size_t m_max_work_group_size; 298 cl_uint m_preferred_vector_width_char; 299 cl_uint m_preferred_vector_width_short; 300 cl_uint m_preferred_vector_width_int; 301 cl_uint m_preferred_vector_width_long; 302 cl_uint m_preferred_vector_width_float; 303 cl_uint m_preferred_vector_width_double; 304 #if defined(CL_VERSION_1_1) 305 cl_uint m_preferred_vector_width_half; 306 cl_uint m_native_vector_width_char; 307 cl_uint m_native_vector_width_short; 308 cl_uint m_native_vector_width_int; 309 cl_uint m_native_vector_width_long; 310 cl_uint m_native_vector_width_float; 311 cl_uint m_native_vector_width_double; 312 cl_uint m_native_vector_width_half; 313 #endif 314 cl_uint m_max_clock_frequency; 315 cl_uint m_address_bits; 316 cl_ulong m_max_mem_alloc_size; 317 cl_bool m_image_support; 318 cl_uint m_max_read_image_args; 319 cl_uint m_max_write_image_args; 320 #if defined(CL_VERSION_2_0) 321 cl_uint m_max_read_write_image_args; 322 #endif 323 size_t m_image2d_max_width; 324 size_t m_image2d_max_height; 325 size_t m_image3d_max_width; 326 size_t m_image3d_max_height; 327 size_t m_image3d_max_depth; 328 #if defined(CL_VERSION_1_2) 329 size_t m_image_max_buffer_size; 330 size_t m_image_max_array_size; 331 #endif 332 cl_uint m_max_samplers; 333 #if defined(CL_VERSION_1_2) 334 cl_uint m_image_pitch_alignment; 335 cl_uint m_image_base_address_alignment; 336 #endif 337 #if defined(CL_VERSION_2_0) 338 cl_uint m_max_pipe_args; 339 cl_uint m_pipe_max_active_reservations; 340 cl_uint m_pipe_max_packet_size; 341 #endif 342 size_t m_max_parameter_size; 343 cl_uint m_mem_base_addr_align; 344 cl_device_fp_config m_single_fp_config; 345 #if defined(CL_VERSION_1_2) 346 cl_device_fp_config m_double_fp_config; 347 #endif 348 cl_device_mem_cache_type m_global_mem_cache_type; 349 cl_uint m_global_mem_cacheline_size; 350 cl_ulong m_global_mem_cache_size; 351 cl_ulong m_global_mem_size; 352 cl_ulong m_max_constant_buffer_size; 353 cl_uint m_max_constant_args; 354 #if defined(CL_VERSION_2_0) 355 size_t m_max_global_variable_size; 356 size_t m_global_variable_preferred_total_size; 357 #endif 358 cl_device_local_mem_type m_local_mem_type; 359 cl_ulong m_local_mem_size; 360 cl_bool m_error_correction_support; 361 #if defined(CL_VERSION_1_1) 362 cl_bool m_host_unified_memory; 363 #endif 364 size_t m_profiling_timer_resolution; 365 cl_bool m_endian_little; 366 cl_bool m_available; 367 cl_bool m_compiler_available; 368 #if defined(CL_VERSION_1_2) 369 cl_bool m_linker_available; 370 #endif 371 cl_device_exec_capabilities m_execution_capabilities; 372 cl_command_queue_properties m_queue_properties; 373 #if defined(CL_VERSION_2_0) 374 cl_command_queue_properties m_queue_on_host_properties; 375 cl_command_queue_properties m_queue_on_device_properties; 376 cl_uint m_queue_on_device_preferred_size; 377 cl_uint m_queue_on_device_max_size; 378 cl_uint m_max_on_device_queues; 379 cl_uint m_max_on_device_events; 380 #endif 381 #if defined(CL_VERSION_1_2) 382 std::string m_built_in_kernels; 383 #endif 384 cl_platform_id m_platform; 385 std::string m_name; 386 std::string m_vendor; 387 std::string m_driver_version; 388 std::string m_profile; 389 std::string m_version; 390 #if defined(CL_VERSION_1_1) 391 std::string m_opencl_c_version; 392 #endif 393 std::string m_extensions; 394 #if defined(CL_VERSION_1_2) 395 size_t m_printf_buffer_size; 396 cl_bool m_preferred_interop_user_sync; 397 cl_device_id m_parent_device; 398 cl_uint m_partition_max_sub_devices; 399 std::vector<cl_device_partition_property> m_partition_properties; 400 cl_device_affinity_domain m_partition_affinity_domain; 401 std::vector<cl_device_partition_property> m_partition_type; 402 cl_uint m_reference_count; 403 #endif 404 }; 405 406 } // namespace opencl 407 408 409 class App 410 { 411 public: 412 App(CommandLineParser& cmd); 413 ~App(); 414 415 int initOpenCL(); 416 int initVideoSource(); 417 418 int process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* cl_buffer); 419 int process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u); 420 int process_cl_image_with_opencv(cl_mem image, cv::UMat& u); 421 422 int run(); 423 424 bool isRunning() { return m_running; } 425 bool doProcess() { return m_process; } 426 bool useBuffer() { return m_use_buffer; } 427 428 void setRunning(bool running) { m_running = running; } 429 void setDoProcess(bool process) { m_process = process; } 430 void setUseBuffer(bool use_buffer) { m_use_buffer = use_buffer; } 431 432 protected: 433 bool nextFrame(cv::Mat& frame) { return m_cap.read(frame); } 434 void handleKey(char key); 435 void timerStart(); 436 void timerEnd(); 437 std::string timeStr() const; 438 std::string message() const; 439 440 private: 441 bool m_running; 442 bool m_process; 443 bool m_use_buffer; 444 445 int64 m_t0; 446 int64 m_t1; 447 float m_time; 448 float m_frequency; 449 450 string m_file_name; 451 int m_camera_id; 452 cv::VideoCapture m_cap; 453 cv::Mat m_frame; 454 cv::Mat m_frameGray; 455 456 opencl::PlatformInfo m_platformInfo; 457 opencl::DeviceInfo m_deviceInfo; 458 std::vector<cl_platform_id> m_platform_ids; 459 cl_context m_context; 460 cl_device_id m_device_id; 461 cl_command_queue m_queue; 462 cl_program m_program; 463 cl_kernel m_kernelBuf; 464 cl_kernel m_kernelImg; 465 cl_mem m_img_src; // used as src in case processing of cl image 466 cl_mem m_mem_obj; 467 cl_event m_event; 468 }; 469 470 471 App::App(CommandLineParser& cmd) 472 { 473 cout << "\nPress ESC to exit\n" << endl; 474 cout << "\n 'p' to toggle ON/OFF processing\n" << endl; 475 cout << "\n SPACE to switch between OpenCL buffer/image\n" << endl; 476 477 m_camera_id = cmd.get<int>("camera"); 478 m_file_name = cmd.get<string>("video"); 479 480 m_running = false; 481 m_process = false; 482 m_use_buffer = false; 483 484 m_t0 = 0; 485 m_t1 = 0; 486 m_time = 0.0; 487 m_frequency = (float)cv::getTickFrequency(); 488 489 m_context = 0; 490 m_device_id = 0; 491 m_queue = 0; 492 m_program = 0; 493 m_kernelBuf = 0; 494 m_kernelImg = 0; 495 m_img_src = 0; 496 m_mem_obj = 0; 497 m_event = 0; 498 } // ctor 499 500 501 App::~App() 502 { 503 if (m_queue) 504 { 505 clFinish(m_queue); 506 clReleaseCommandQueue(m_queue); 507 m_queue = 0; 508 } 509 510 if (m_program) 511 { 512 clReleaseProgram(m_program); 513 m_program = 0; 514 } 515 516 if (m_img_src) 517 { 518 clReleaseMemObject(m_img_src); 519 m_img_src = 0; 520 } 521 522 if (m_mem_obj) 523 { 524 clReleaseMemObject(m_mem_obj); 525 m_mem_obj = 0; 526 } 527 528 if (m_event) 529 { 530 clReleaseEvent(m_event); 531 } 532 533 if (m_kernelBuf) 534 { 535 clReleaseKernel(m_kernelBuf); 536 m_kernelBuf = 0; 537 } 538 539 if (m_kernelImg) 540 { 541 clReleaseKernel(m_kernelImg); 542 m_kernelImg = 0; 543 } 544 545 if (m_device_id) 546 { 547 clReleaseDevice(m_device_id); 548 m_device_id = 0; 549 } 550 551 if (m_context) 552 { 553 clReleaseContext(m_context); 554 m_context = 0; 555 } 556 } // dtor 557 558 559 int App::initOpenCL() 560 { 561 cl_int res = CL_SUCCESS; 562 cl_uint num_entries = 0; 563 564 res = clGetPlatformIDs(0, 0, &num_entries); 565 if (CL_SUCCESS != res) 566 return -1; 567 568 m_platform_ids.resize(num_entries); 569 570 res = clGetPlatformIDs(num_entries, &m_platform_ids[0], 0); 571 if (CL_SUCCESS != res) 572 return -1; 573 574 unsigned int i; 575 576 // create context from first platform with GPU device 577 for (i = 0; i < m_platform_ids.size(); i++) 578 { 579 cl_context_properties props[] = 580 { 581 CL_CONTEXT_PLATFORM, 582 (cl_context_properties)(m_platform_ids[i]), 583 0 584 }; 585 586 m_context = clCreateContextFromType(props, CL_DEVICE_TYPE_GPU, 0, 0, &res); 587 if (0 == m_context || CL_SUCCESS != res) 588 continue; 589 590 res = clGetContextInfo(m_context, CL_CONTEXT_DEVICES, sizeof(cl_device_id), &m_device_id, 0); 591 if (CL_SUCCESS != res) 592 return -1; 593 594 m_queue = clCreateCommandQueue(m_context, m_device_id, 0, &res); 595 if (0 == m_queue || CL_SUCCESS != res) 596 return -1; 597 598 const char* kernelSrc = 599 "__kernel " 600 "void bitwise_inv_buf_8uC1(" 601 " __global unsigned char* pSrcDst," 602 " int srcDstStep," 603 " int rows," 604 " int cols)" 605 "{" 606 " int x = get_global_id(0);" 607 " int y = get_global_id(1);" 608 " int idx = mad24(y, srcDstStep, x);" 609 " pSrcDst[idx] = ~pSrcDst[idx];" 610 "}" 611 "__kernel " 612 "void bitwise_inv_img_8uC1(" 613 " read_only image2d_t srcImg," 614 " write_only image2d_t dstImg)" 615 "{" 616 " int x = get_global_id(0);" 617 " int y = get_global_id(1);" 618 " int2 coord = (int2)(x, y);" 619 " uint4 val = read_imageui(srcImg, coord);" 620 " val.x = (~val.x) & 0x000000FF;" 621 " write_imageui(dstImg, coord, val);" 622 "}"; 623 size_t len = strlen(kernelSrc); 624 m_program = clCreateProgramWithSource(m_context, 1, &kernelSrc, &len, &res); 625 if (0 == m_program || CL_SUCCESS != res) 626 return -1; 627 628 res = clBuildProgram(m_program, 1, &m_device_id, 0, 0, 0); 629 if (CL_SUCCESS != res) 630 return -1; 631 632 m_kernelBuf = clCreateKernel(m_program, "bitwise_inv_buf_8uC1", &res); 633 if (0 == m_kernelBuf || CL_SUCCESS != res) 634 return -1; 635 636 m_kernelImg = clCreateKernel(m_program, "bitwise_inv_img_8uC1", &res); 637 if (0 == m_kernelImg || CL_SUCCESS != res) 638 return -1; 639 640 m_platformInfo.QueryInfo(m_platform_ids[i]); 641 m_deviceInfo.QueryInfo(m_device_id); 642 643 // attach OpenCL context to OpenCV 644 cv::ocl::attachContext(m_platformInfo.Name(), m_platform_ids[i], m_context, m_device_id); 645 646 break; 647 } 648 649 return m_context != 0 ? CL_SUCCESS : -1; 650 } // initOpenCL() 651 652 653 int App::initVideoSource() 654 { 655 try 656 { 657 if (!m_file_name.empty() && m_camera_id == -1) 658 { 659 m_cap.open(m_file_name.c_str()); 660 if (!m_cap.isOpened()) 661 throw std::runtime_error(std::string("can't open video file: " + m_file_name)); 662 } 663 else if (m_camera_id != -1) 664 { 665 m_cap.open(m_camera_id); 666 if (!m_cap.isOpened()) 667 { 668 std::stringstream msg; 669 msg << "can't open camera: " << m_camera_id; 670 throw std::runtime_error(msg.str()); 671 } 672 } 673 else 674 throw std::runtime_error(std::string("specify video source")); 675 } 676 677 catch (std::exception e) 678 { 679 cerr << "ERROR: " << e.what() << std::endl; 680 return -1; 681 } 682 683 return 0; 684 } // initVideoSource() 685 686 687 // this function is an example of "typical" OpenCL processing pipeline 688 // It creates OpenCL buffer or image, depending on use_buffer flag, 689 // from input media frame and process these data 690 // (inverts each pixel value in half of frame) with OpenCL kernel 691 int App::process_frame_with_open_cl(cv::Mat& frame, bool use_buffer, cl_mem* mem_obj) 692 { 693 cl_int res = CL_SUCCESS; 694 695 CV_Assert(mem_obj); 696 697 cl_kernel kernel = 0; 698 cl_mem mem = mem_obj[0]; 699 700 if (0 == mem || 0 == m_img_src) 701 { 702 // allocate/delete cl memory objects every frame for the simplicity. 703 // in real applicaton more efficient pipeline can be built. 704 705 if (use_buffer) 706 { 707 cl_mem_flags flags = CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR; 708 709 mem = clCreateBuffer(m_context, flags, frame.total(), frame.ptr(), &res); 710 if (0 == mem || CL_SUCCESS != res) 711 return -1; 712 713 res = clSetKernelArg(m_kernelBuf, 0, sizeof(cl_mem), &mem); 714 if (CL_SUCCESS != res) 715 return -1; 716 717 res = clSetKernelArg(m_kernelBuf, 1, sizeof(int), &frame.step[0]); 718 if (CL_SUCCESS != res) 719 return -1; 720 721 res = clSetKernelArg(m_kernelBuf, 2, sizeof(int), &frame.rows); 722 if (CL_SUCCESS != res) 723 return -1; 724 725 int cols2 = frame.cols / 2; 726 res = clSetKernelArg(m_kernelBuf, 3, sizeof(int), &cols2); 727 if (CL_SUCCESS != res) 728 return -1; 729 730 kernel = m_kernelBuf; 731 } 732 else 733 { 734 cl_mem_flags flags_src = CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR; 735 736 cl_image_format fmt; 737 fmt.image_channel_order = CL_R; 738 fmt.image_channel_data_type = CL_UNSIGNED_INT8; 739 740 cl_image_desc desc_src; 741 desc_src.image_type = CL_MEM_OBJECT_IMAGE2D; 742 desc_src.image_width = frame.cols; 743 desc_src.image_height = frame.rows; 744 desc_src.image_depth = 0; 745 desc_src.image_array_size = 0; 746 desc_src.image_row_pitch = frame.step[0]; 747 desc_src.image_slice_pitch = 0; 748 desc_src.num_mip_levels = 0; 749 desc_src.num_samples = 0; 750 desc_src.buffer = 0; 751 m_img_src = clCreateImage(m_context, flags_src, &fmt, &desc_src, frame.ptr(), &res); 752 if (0 == m_img_src || CL_SUCCESS != res) 753 return -1; 754 755 cl_mem_flags flags_dst = CL_MEM_READ_WRITE | CL_MEM_ALLOC_HOST_PTR; 756 757 cl_image_desc desc_dst; 758 desc_dst.image_type = CL_MEM_OBJECT_IMAGE2D; 759 desc_dst.image_width = frame.cols; 760 desc_dst.image_height = frame.rows; 761 desc_dst.image_depth = 0; 762 desc_dst.image_array_size = 0; 763 desc_dst.image_row_pitch = 0; 764 desc_dst.image_slice_pitch = 0; 765 desc_dst.num_mip_levels = 0; 766 desc_dst.num_samples = 0; 767 desc_dst.buffer = 0; 768 mem = clCreateImage(m_context, flags_dst, &fmt, &desc_dst, 0, &res); 769 if (0 == mem || CL_SUCCESS != res) 770 return -1; 771 772 size_t origin[] = { 0, 0, 0 }; 773 size_t region[] = { (size_t)frame.cols, (size_t)frame.rows, 1 }; 774 res = clEnqueueCopyImage(m_queue, m_img_src, mem, origin, origin, region, 0, 0, &m_event); 775 if (CL_SUCCESS != res) 776 return -1; 777 778 res = clWaitForEvents(1, &m_event); 779 if (CL_SUCCESS != res) 780 return -1; 781 782 res = clSetKernelArg(m_kernelImg, 0, sizeof(cl_mem), &m_img_src); 783 if (CL_SUCCESS != res) 784 return -1; 785 786 res = clSetKernelArg(m_kernelImg, 1, sizeof(cl_mem), &mem); 787 if (CL_SUCCESS != res) 788 return -1; 789 790 kernel = m_kernelImg; 791 } 792 } 793 794 m_event = clCreateUserEvent(m_context, &res); 795 if (0 == m_event || CL_SUCCESS != res) 796 return -1; 797 798 // process left half of frame in OpenCL 799 size_t size[] = { (size_t)frame.cols / 2, (size_t)frame.rows }; 800 res = clEnqueueNDRangeKernel(m_queue, kernel, 2, 0, size, 0, 0, 0, &m_event); 801 if (CL_SUCCESS != res) 802 return -1; 803 804 res = clWaitForEvents(1, &m_event); 805 if (CL_SUCCESS != res) 806 return - 1; 807 808 mem_obj[0] = mem; 809 810 return 0; 811 } 812 813 814 // this function is an example of interoperability between OpenCL buffer 815 // and OpenCV UMat objects. It converts (without copying data) OpenCL buffer 816 // to OpenCV UMat and then do blur on these data 817 int App::process_cl_buffer_with_opencv(cl_mem buffer, size_t step, int rows, int cols, int type, cv::UMat& u) 818 { 819 cv::ocl::convertFromBuffer(buffer, step, rows, cols, type, u); 820 821 // process right half of frame in OpenCV 822 cv::Point pt(u.cols / 2, 0); 823 cv::Size sz(u.cols / 2, u.rows); 824 cv::Rect roi(pt, sz); 825 cv::UMat uroi(u, roi); 826 cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3)); 827 828 if (buffer) 829 clReleaseMemObject(buffer); 830 m_mem_obj = 0; 831 832 return 0; 833 } 834 835 836 // this function is an example of interoperability between OpenCL image 837 // and OpenCV UMat objects. It converts OpenCL image 838 // to OpenCV UMat and then do blur on these data 839 int App::process_cl_image_with_opencv(cl_mem image, cv::UMat& u) 840 { 841 cv::ocl::convertFromImage(image, u); 842 843 // process right half of frame in OpenCV 844 cv::Point pt(u.cols / 2, 0); 845 cv::Size sz(u.cols / 2, u.rows); 846 cv::Rect roi(pt, sz); 847 cv::UMat uroi(u, roi); 848 cv::blur(uroi, uroi, cv::Size(7, 7), cv::Point(-3, -3)); 849 850 if (image) 851 clReleaseMemObject(image); 852 m_mem_obj = 0; 853 854 if (m_img_src) 855 clReleaseMemObject(m_img_src); 856 m_img_src = 0; 857 858 return 0; 859 } 860 861 862 int App::run() 863 { 864 if (0 != initOpenCL()) 865 return -1; 866 867 if (0 != initVideoSource()) 868 return -1; 869 870 Mat img_to_show; 871 872 // set running state until ESC pressed 873 setRunning(true); 874 // set process flag to show some data processing 875 // can be toggled on/off by 'p' button 876 setDoProcess(true); 877 // set use buffer flag, 878 // when it is set to true, will demo interop opencl buffer and cv::Umat, 879 // otherwise demo interop opencl image and cv::UMat 880 // can be switched on/of by SPACE button 881 setUseBuffer(true); 882 883 // Iterate over all frames 884 while (isRunning() && nextFrame(m_frame)) 885 { 886 cv::cvtColor(m_frame, m_frameGray, COLOR_BGR2GRAY); 887 888 UMat uframe; 889 890 // work 891 timerStart(); 892 893 if (doProcess()) 894 { 895 process_frame_with_open_cl(m_frameGray, useBuffer(), &m_mem_obj); 896 897 if (useBuffer()) 898 process_cl_buffer_with_opencv( 899 m_mem_obj, m_frameGray.step[0], m_frameGray.rows, m_frameGray.cols, m_frameGray.type(), uframe); 900 else 901 process_cl_image_with_opencv(m_mem_obj, uframe); 902 } 903 else 904 { 905 m_frameGray.copyTo(uframe); 906 } 907 908 timerEnd(); 909 910 uframe.copyTo(img_to_show); 911 912 putText(img_to_show, "Version : " + m_platformInfo.Version(), Point(5, 30), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); 913 putText(img_to_show, "Name : " + m_platformInfo.Name(), Point(5, 60), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); 914 putText(img_to_show, "Device : " + m_deviceInfo.Name(), Point(5, 90), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); 915 cv::String memtype = useBuffer() ? "buffer" : "image"; 916 putText(img_to_show, "interop with OpenCL " + memtype, Point(5, 120), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); 917 putText(img_to_show, "Time : " + timeStr() + " msec", Point(5, 150), FONT_HERSHEY_SIMPLEX, 1., Scalar(255, 100, 0), 2); 918 919 imshow("opencl_interop", img_to_show); 920 921 handleKey((char)waitKey(3)); 922 } 923 924 return 0; 925 } 926 927 928 void App::handleKey(char key) 929 { 930 switch (key) 931 { 932 case 27: 933 setRunning(false); 934 break; 935 936 case ' ': 937 setUseBuffer(!useBuffer()); 938 break; 939 940 case 'p': 941 case 'P': 942 setDoProcess( !doProcess() ); 943 break; 944 945 default: 946 break; 947 } 948 } 949 950 951 inline void App::timerStart() 952 { 953 m_t0 = getTickCount(); 954 } 955 956 957 inline void App::timerEnd() 958 { 959 m_t1 = getTickCount(); 960 int64 delta = m_t1 - m_t0; 961 m_time = (delta / m_frequency) * 1000; // units msec 962 } 963 964 965 inline string App::timeStr() const 966 { 967 stringstream ss; 968 ss << std::fixed << std::setprecision(1) << m_time; 969 return ss.str(); 970 } 971 972 973 int main(int argc, char** argv) 974 { 975 const char* keys = 976 "{ help h ? | | print help message }" 977 "{ camera c | -1 | use camera as input }" 978 "{ video v | | use video as input }"; 979 980 CommandLineParser cmd(argc, argv, keys); 981 if (cmd.has("help")) 982 { 983 cmd.printMessage(); 984 return EXIT_SUCCESS; 985 } 986 987 App app(cmd); 988 989 try 990 { 991 app.run(); 992 } 993 994 catch (const cv::Exception& e) 995 { 996 cout << "error: " << e.what() << endl; 997 return 1; 998 } 999 1000 catch (const std::exception& e) 1001 { 1002 cout << "error: " << e.what() << endl; 1003 return 1; 1004 } 1005 1006 catch (...) 1007 { 1008 cout << "unknown exception" << endl; 1009 return 1; 1010 } 1011 1012 return EXIT_SUCCESS; 1013 } // main()
官方给出的代码,有很多和OpenCL平台相关的处理部分,看起的比较冗杂,因此我们就不直接移植这个代码了(参考部分思想和API接口的基本调用),重新写一个新的基于Opencv3.1.0的OCL程序的Demo框架,基于TI的AM57x系列的板卡,其他的设备类似:
1. Opencv3.1版本OpenCL支持情况:
首先对于Opencv3.1.0版本,从正式发布Opencv3开始,其对OpenCL的支持已经发生了很大的变化,在之前需要使用cv::ocl相关函数API来实现kernel的编译调用等等,在其中还包括了很多的数据搬移,而Opencv3正式改变了这样的情况,重新封装了一个新的数据类型cv::UMat,这个数据类型能够无缝对接Opencv的普通接口,从而最少的改动代码而最大的完成OpenCL平台的加速功能[OpenCV3.x-OpenCL.pptx]:
如上图所示,只需要将原来的Mat格式换为UMat格式就可以实现Opencv函数在OpenCL设备上加速运行,而这其中具体实施的基本原理是什么呢?接下来看一下其底层实现的基本原理,具体参看Opencv中OpenCL部分实现的源代码:
上图中表明了,当你使用的数据类型是UMat {data.isUmat()},并且开启了OpenCL使能{useOpenCL()},那么Opencv的接口将会跳转到OpenCL支持的设备中进行加速运行,当然你需要注意的是,在第一次使用OpenCL加速程序时,OpenCL需要编译生成对应平台的Kernel代码,而编译是需要花费大量的时间的,因此初次运行需要比较长的时间。
2. 接下来开始AM57x系列的Opencv-OCL编程
主要代码如下所示(Line28非常重要,使能OpenCL平台):
1 #include <iostream> 2 #include "opencv2/opencv.hpp" 3 #include "opencv2/core/ocl.hpp" 4 #include "opencv2/imgcodecs.hpp" 5 #include "opencv2/videoio/videoio.hpp" 6 #include "opencv2/highgui/highgui.hpp" 7 #include "opencv2/imgproc/imgproc.hpp" 8 9 using namespace std; 10 using namespace cv; 11 using namespace cv::ocl; 12 13 #define DSP 1 14 15 int main() 16 { 17 double t = 0.0; 18 #if DSP 19 std::vector<cv::ocl::PlatformInfo> plats; 20 cv::ocl::getPlatfomsInfo(plats); 21 const cv::ocl::PlatformInfo *platform = &plats[0]; 22 cout << "Platform Name:" << platform->name().c_str() << endl; 23 24 cv::ocl::Device c_dev; 25 platform->getDevice(c_dev,0); 26 cout << "Device name:" << c_dev.name().c_str() << endl; 27 c_dev.set(0); 28 cv::ocl::setUseOpenCL(true); 29 cout << "Use the OpenCL Deivice?" << cv::ocl::useOpenCL() << endl; 30 31 cv::UMat Ori = cv::imread("/home/root/test.jpg").getUMat(cv::ACCESS_RW),Res,Canny; 32 33 t = (double)cv::getTickCount(); 34 cv::cvtColor(Ori,Res,cv::COLOR_RGB2GRAY); 35 cv::Canny(Res,Res,0,30); 36 t = ((double)cv::getTickCount() - t) / cv::getTickFrequency(); 37 std::cout << "TI AM57x Accelerate Time Cost:" << t << "s" << std::endl; 38 cv::imshow("Test",Ori); 39 cv::imshow("Gray",Res); 40 #else 41 Mat I = imread("/home/root/test.jpg"),gray; 42 43 t = (double)cv::getTickCount(); 44 cv::cvtColor(I,gray,cv::COLOR_RGB2GRAY); 45 cv::Canny(gray,gray,0,30); 46 t = ((double)cv::getTickCount() - t) / cv::getTickFrequency(); 47 std::cout << "CPU Time Cost:" << t << "s" << std::endl; 48 cv::imshow("Ori",I); 49 cv::imshow("Res",gray); 50 #endif 51 52 for (;;) 53 { 54 int key = waitKey(); 55 if (key == 27/*ESC*/ || key == 'q' || key == 'Q') 56 break; 57 } 58 return 0; 59 }
编译Makfile:
1 TARGET3 = ocl_demo 2 CXX = arm-linux-gnueabihf-g++ 3 CFLAGS += -Wl,-rpath-link,$(COMPILE_TOOL_PATH)/usr/lib \ 4 -Wl,-rpath-link,$(COMPILE_TOOL_PATH)/lib \ 5 -I$(COMPILE_TOOL_PATH)/usr/include \ 6 -L$(COMPILE_TOOL_PATH)/usr/lib \ 7 -L$(COMPILE_TOOL_PATH)/lib -Wall -W \ 8 -std=c++98 9 CFLAGS += -lopencv_core -lopencv_objdetect -lopencv_highgui -lopencv_videoio -lopencv_imgcodecs -lopencv_imgproc -lOpenCL -lpthread -lrt 10 11 all: 12 @$(CXX) $(TARGET3).cpp -o $(TARGET3) $(CFLAGS) 13 clean: 14 rm -rf $(TARGET3)
运行脚本opencv-ocl-runtime.sh:
1 export TI_OCL_LOAD_KERNELS_ONCHIP=Y 2 export TI_OCL_CACHE_KERNELS=Y 3 export OPENCV_OPENCL_DEVICE='TI AM57:ACCELERATOR:TI Multicore C66 DSP' 4 echo "OpenCL on, canny" 5 ./ocl_demo 6 export OPENCV_OPENCL_DEVICE='disabled' 7 echo "OpenCL off, canny" 8 ./ocl_demo
使用make指令编译后在AM5718平台上运行结果如下所示:
根据运行的结构可以看出,CPU运行时间是经过OpenCL-DSP平台加速后的十倍左右,因此能够明显体现出加速的效果。
注1:我们需要注意的是在第一次运行此代码时,一定要注意,通过脚本运行ocl_demo,在脚本当中,第一行表示如果需要加载Kernel,是直接通过本地芯片上加载Kernel的,第二行表示如果编译了Kernel,将会将Kernel保存在Cache当中,下次调用(直到关机之前)就不需要编译了,如果没有这两个选项,代码就会编译Kernel而消耗大量的时间,因此这是必须的。
注2:脚本中第三行是非常重要的,必须要在这里使能OpenCL设备,否则Opencv将永远不会调用DSP加速算法,而在CPU上运行,具体参考这里。
猜想与验证:当我在调用Opencv3.1中erode或者其他形态学相关的接口时,出现了经过DSP加速之后的效果居然比CPU端要差很多!!!这是为什么呢?猜想如下:
猜想:首先我们使用的是TI的DSP Accelerator,型号是C66系列, 而这个系列的DSP处理器实际上只有不超过10个核心的Processor,而DSP最主要的优势是其包括的乘法器资源以及加法器资源,因此对于包括乘法加法等运算的这些情况,其加速效果比较明显,对于Opencv中的形态学相关API主要用到的是基本的if逻辑分支判断的情况,因此,不能够充分使用DSP当中的加法器或者乘法器,因此加速效果不明显,甚至更慢,如下所示(erode algorithm):
验证:根据上述猜想,我们可以选择有多个核心的GPU来对erode算法加速,查看erode算法在GPU加速下的效率,加速效果如下所示:
Reference:
官方代码说明:https://www.khronos.org/opencl/
Opencv(编译OCL模块)+VSC+MinGW环境搭建:https://www.cnblogs.com/uestc-mm/p/12758110.html
TI官方OpenCV3.1-Release:https://git.ti.com/cgit/opencv/tiopencv/?h=tiopencvrelease_3.1
TI官方AM57x Processor SDK Linux(OpenCV支持):http://software-dl.ti.com/processor-sdk-linux/esd/docs/latest/linux/Foundational_Components_OpenCV.html