0%

【CUDA编程】随机数生成

随机数在一般编程中都会涉及,特别是在仿真过程中。我之前有几篇文章介绍了如何通过CPU函数调用产生随机数,这里我介绍如何在CUDA中产生随机数。在CUDA中,主要使用cuRAND库中的函数来产生随机数,其链接文档为https://docs.nvidia.com/cuda/curand/index.html


通过文档可知,cuRAND库提供两种方式的随机数生成:host与device。关于具体如何生成随机数,我们可参考文档中的Host API ExampleDevice API Examples。其中Host API Example的例子非常简单,这里不具体描述,唯一需要注意的点是,在VS中可能会出现error LNK2019的错误,此时需要在项目属性中添加依赖库,具体可以网上搜索答案。本文主要介绍如何参考Device API Examples的例子,来具体实现在核函数中产生随机数。为了便于介绍,我们首先给出了具体的代码实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#include"cuda_runtime.h"
#include"device_launch_parameters.h"
#include"curand_kernel.h"// this lib shoulb be included
#include<ctime>
#include<iostream>
#include<random>

using namespace std;

//-------------------generate random numbers-------//
__device__ float generate(curandState *globalState, int ind)
{
curandState localState = globalState[ind];
float RANDOM = curand_uniform(&localState);// uniform distribution
globalState[ind] = localState;
return RANDOM;
}

__global__ void setup_kernel(curandState *state, unsigned long seed)
{
int ix = threadIdx.x + blockIdx.x*blockDim.x;
int iy = threadIdx.y + blockIdx.y*blockDim.y;
int idx = iy * blockDim.x*gridDim.x + ix;
curand_init(seed, idx, 0, &state[idx]);// initialize the state
}

//-------------This is our kernel function where the random numbers generated------//
__global__ void our_kernel(curandState *globalState,int nx,int ny)
{
int ix = threadIdx.x + blockIdx.x*blockDim.x;
int iy = threadIdx.y + blockIdx.y*blockDim.y;
int idx = iy * blockDim.x*gridDim.x + ix;

if (ix < nx&&iy < ny)
{
int k = generate(globalState, idx) * 100000;
printf("%d\n", k);
}


}

int main()
{
int nx = 5;
int ny = 2;// generate nx*ny random numbers

int blockx = 32;
int blocky = 1;
dim3 block(blockx, blocky);//(32,1)

int gridx = (nx + block.x - 1) / block.x;
int gridy = (ny + block.y - 1) / block.y;
dim3 grid(gridx,gridy); //(1,10)


int N = gridx*gridy*blockx*blocky;// the number of states

//--------------------//
curandState* devStates;

cudaMalloc(&devStates, N * sizeof(curandState));

srand(time(0));
int seed = rand();

// Initialize the states
setup_kernel <<<grid, block>>> (devStates, seed);

our_kernel << <grid, block >> > (devStates,nx,ny);


cudaDeviceReset();
return 0;
}

从上面代码可以看出,主要的有两点:一是首先使用setup_kernel来初始化,二是使用generate来产生随机数。最需要注意的是,setup_kernelour_kernel运行的维度一致<<<grid, block>>>。并且,devStates的维度要相同,这样确保devStates为每一个thread都生成对应的state.

最后给出上述代码结果图:

图1

坚持原创技术分享,您的支持将鼓励我继续创作!