cuda 编 程(三) helloworld 打印 blockIdx和threadIdx.x threadIdx.y

#include <stdio.h>
#include <iostream>
using namespace std;

__global__ void hello_from_gpu()
{
    const int b = blockIdx.x;
    const int tx = threadIdx.x;
    const int ty = threadIdx.y;
   // cout<<b<<endl;
    printf("Hello World from block-%d and thread-(%d, %d)!\n", b, tx, ty);
}

int main(void)
{
    const dim3 block_size(2, 4);
    hello_from_gpu<<<2, block_size>>>();
    cudaDeviceSynchronize();
    return 0;
}

nvcc hello5.cu -o hello5
./hello5
Hello World from block-1 and thread-(0, 0)!
Hello World from block-1 and thread-(1, 0)!
Hello World from block-1 and thread-(0, 1)!
Hello World from block-1 and thread-(1, 1)!
Hello World from block-1 and thread-(0, 2)!
Hello World from block-1 and thread-(1, 2)!
Hello World from block-1 and thread-(0, 3)!
Hello World from block-1 and thread-(1, 3)!
Hello World from block-0 and thread-(0, 0)!
Hello World from block-0 and thread-(1, 0)!
Hello World from block-0 and thread-(0, 1)!
Hello World from block-0 and thread-(1, 1)!
Hello World from block-0 and thread-(0, 2)!
Hello World from block-0 and thread-(1, 2)!
Hello World from block-0 and thread-(0, 3)!
Hello World from block-0 and thread-(1, 3)!

posted @   luoganttcc  阅读(40)  评论(0编辑  收藏  举报
相关博文:
阅读排行:
· 阿里最新开源QwQ-32B,效果媲美deepseek-r1满血版,部署成本又又又降低了!
· 单线程的Redis速度为什么快?
· SQL Server 2025 AI相关能力初探
· AI编程工具终极对决:字节Trae VS Cursor,谁才是开发者新宠?
· 展开说说关于C#中ORM框架的用法!
点击右上角即可分享
微信分享提示