High Performance Computing 25 SP OpenCL Programming

总字数:2097字,预计阅读时间 03分 29秒。

Open Computing Language.

OpenCL is Open Computing Language.

  • Open, royalty-free standard C-language extension.
  • For parallel programming of heterogeneous systems using GPUs, CPUs , CBE, DSP and other processors including embedded mobile devices.
  • Managed by Khronos Group.

image-20250529185915068

Anatomy of OpenCL

  • Platform Layer APi
  • Runtime Api
  • Language Specification

Compilation Model

OpenCL uses dynamic/runtime compilation model like OpenGL.

  1. The code is compiled to an IR.
  2. The IR is compiled to a machine code for execution.

And in dynamic compilation, step 1 is done usually once and the IR is stored. The app loads the IR and performs step 2 during the app runtime.

Execution Model

OpenCL program is divided into

  • Kernel: basic unit of executable code.
  • Host: collection of compute kernels and internal functions.

The host program invokes a kernel over an index space called an NDRange.

NDRange is N-Dimensional Range, and can be a 1, 2, 3-dimensional space.

A single kernel instance at a point of this index space is called work item. Work items are further grouped into work groups.

OpenCL Memory Model

image-20250529191215424

Multiple distinct address spaces: Address can be collapsed depending on the device's memory subsystem.

Address space:

  • Private: private to a work item.
  • Local: local to a work group.
  • Global: accessible by all work items in all work groups.
  • Constant: read only global memory.

Comparison with CUDA:

image-20250529191414250

Memory region for host and kernel:

image-20250529191512490

Programming Model

Data Parallel Programming Model

  1. Define N-Dimensional computation domain
  2. Work-items can be grouped together as work group.
  3. Execute multiple work-groups in parallel.

Task Parallel Programming Model

Data parallel execution model must be implemented by all OpenCL computing devices, but task parallel programming is a choice for vendor.

Some computing devices such as CPUs can also execute task-parallel computing kernels.

  • Executes as s single work item.
  • A computing kernel written in OpenCL.
  • A native function.

OpenCL Framework

image-20250529192022613

The basic OpenCL program structure:

image-20250529192056388

Contexts are used to contain the manage the state of the world.

Command-queue coordinates execution of the kernels.

文章作者:Ricardo Ren
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议,诸位读者如有兴趣可任意转载,不必征询许可,但请注明“转载自 Ricardo's Blog ”。

如果觉得不错的话,可以支持一下作者哦~

微信赞赏码
请我喝奶茶
支付宝赞赏码
请我吃晚饭

2021 - 2025 © Ricardo Ren ,由 .NET 9.0.2 驱动。

Build Commit # dab866f13a

蜀ICP备2022004429号-1