CUDA编程学习记录
主要是基于 OpenCV 的实现来编程。后面会慢慢记录一些常用的函数介绍和调用接口。
1. cv::cuda::GpuMat 类成员函数
1.1 upload 函数释义
第一种实现
void cv::cuda::GpuMat::upload ( InputArray arr );
Performs data upload to GpuMat (Blocking call). This function copies data from host memory to device memory. As being a blocking call, it is guaranteed that the copy operation is finished when this function returns.
第二种实现
void cv::cuda::GpuMat::upload ( InputArray arr, Stream &stream );
Performs data upload to GpuMat (Non-Blocking call) This function copies data from host memory to device memory. As being a non-blocking call, this function may return even if the copy operation is not finished. The copy operation may be overlapped with operations in other non-default streams if stream is not the default stream and dst is HostMem allocated with HostMem::PAGE_LOCKED option.
2. cv::cuda::Stream 类成员函数
#include <opencv2/core/cuda.hpp> typedef void(* StreamCallback) (int status, void *userData) void cv::cuda::Stream::waitForCompletion ();
Blocks the current CPU thread until all operations in the stream are complete.
3. pthread 线程相关的函数
3.1 pthread_cond_broadcast
#include <pthread.h> int pthread_cond_signal(pthread_cond_t *cond); int pthread_cond_broadcast(pthread_cond_t *cond);
These two functions are used to unblock threads blocked on a condition variable. The pthread_cond_signal() call unblocks at least one of the threads that are blocked on the specified condition variable cond(if any threads are blocked on cond) The pthread_cond_broadcast() call unblocks all threads currently blocked on the specified condition variable cond.
pthread_cond_signal(&cond)的的作用是唤醒所有正在pthread_cond_wait(&cond, &mutex)的至少一个线程。
pthread_cond_broadcast(&cond)的作用是唤醒所有正在pthread_cond_wait(&cond, &mutex)的线程。
3.2 pthread_exit
#include <pthread.h> void pthread_exit(void *retval);
The pthread_exit() function terminates the calling thread and returns a value via retval that (if the thread is joinable) is available to another thread in the same process that calls pthread_join().
使用函数 pthread_exit 退出线程,这是线程的主动行为。
由于一个进程中的多个线程是共享数据段的,因此通常在线程退出之后,退出线程所占用的资源并不会随着线程的终止而得到释放,但是可以用 pthread_join() 函数来同步并释放资源。
retval 为 pthread_exit()调用线程的返回值,可由其他函数如pthread_join来检索获取。
参考资料
[1] CUDA Pro Tip: nvprof is Your Handy Universal GPU Profiler https://developer.nvidia.com/blog/cuda-pro-tip-nvprof-your-handy-universal-gpu-profiler/
[2] How to Implement Performance Metrics in CUDA C/C++ https://developer.nvidia.com/blog/how-implement-performance-metrics-cuda-cc/
[3] CUDA peer to peer多GPU间内存copy技术 https://blog.csdn.net/weixin_42730667/article/details/106481624
[4] 【转载】 NVIDIA RTX2080ti不支持P2P Access,这是真的么? https://www.cnblogs.com/devilmaycry812839668/p/12370685.html