1. 程式人生 > >CUDA C Programming Guide | Programming Model

CUDA C Programming Guide | Programming Model

  • 塊內的執行緒: Threads within a block can cooperate by sharing data through some shared memory and by synchronizing their execution to coordinate memory accesses. More precisely【精確地】, one can specify synchronization points in the kernel by calling the __syncthreads() intrinsic function【內部函式】; __syncthreads() acts as a barrier at which all threads in the block must wait before any is allowed to proceed. Shared Memory gives an example of using shared memory. In addition to __syncthreads(), the Cooperative Groups API provides a rich set of thread-synchronization【執行緒同步】 primitives. 為了更效率第操作,共享記憶體 low-latency memory(更像是L1 cache).