OpenCL

走进高性能计算时代

Halo9Pan / @Halo9Pan

转场样式

Cube - Page - Concave - Zoom - Linear - Fade - None - Default

主题

Default - Sky - Beige - Simple - Serif - Night
Moon - Solarized

Agenda

  • OpenCL是什么
  • OpenCL名词

OpenCL

  • OpenCL (Open Computing Language,开放计算语言)
  • 一个为异构平台编写程序的框架
  • 异构平台可由CPU,GPU或其他类型的处理器组成
  • 由一门用于编写kernels(在OpenCL设备上运行的函数)的语言(基于C99)和一组用于定义并控制平台的API组成
  • OpenCL提供了基于数据分区和任务分区的并行计算机制
OpenCL

OpenCL架构

浏览器不支持SVG

OpenCL类图

OpenCL UML Class Diagram

名词

Application

运行在hostdevice上程序的集合

Blocking and Non-Blocking Enqueue API calls

非堵塞队列可以执行一个command-queue里面的command,并立即返回结果给host

堵塞队列直到command完成才返回结果给host

Barrier
command-queuebarrier

OpenCL API提供command-queuebarrier

barrier命令确保command-queue里面已入队的command全部执行完成,然后才会执行command-queue里面后续的command

work-groupbarrier

OpenCL C提供内建的work-groupbarrier

内建的barrier命令用来确保在work-group里面运行的work-item之间的同步

barrier命令由在device上运行的kernel执行

所有在work-group里面运行的work-item在继续运行之前,必须在barrier处等待

Buffer Object

线性的存储字节序列的内存对象

Buffer通过在device上运行的kernel里面的指针访问

Buffer可以通过OpenCL API在host上操作

Buffer封装了以下信息:

size
属性和使用信息,分配范围
数据
Built-in Kernel

Built-in Kernel是运行在OpenCLdevice或者custom device上的一类kernel

application可以获得支持Built-in Kernel的device或者custom device列表

program可以包含由OpenCL C编写的kernel或者Built-in Kernel,但不能同时包含

Command

由OpenCL提交到command-queue里面执行的操作

例如,控制在device上的kernel执行或者操作内存对象

Command-queue

容纳将在特定device上运行的command的对象

Command-queue又context里面的特定device创建

Command-queue里面的command在队列里是有序的,但在执行时可以是in-order执行,也可以是out-of-order执行

Command-queue Barrier

Barrier

Compute Device Memory

一个或多个计算设备的内存

Compute Unit

一个OpenCLdevice有一个或者多个Compute Unit

一个work-group在一个Compute Unit运行

一个Compute Unit又由一个或者多个processing elementlocal memory组成

一个Compute Unit也可能包含专用的,processing element可以访问的纹理过滤单元

Concurrency

并发可以在系统中同时运行多个任务,但需要程序员在代码中做任务的调度,并确保并发时数据的一致性的

OpenCL基于并发而生,但是对并发的控制及数据的同步仍然需要在代码中自行处理。

Constant Memory

global memory中的一块区域,用来存储运行kernel时用到的常量

Constant Memory由host分配

Context

kernel执行和内存管理及同步的一整套上下文

Context包含一系列的device,可访问这些device的内存,相应的内存属性,一个或多个用来执行kernel或者操作memory objectcommand-queue组成

Custom Device

Custom Device实现了OpenCL Runtime但是不支持OpenCL C

Custom Device虽然可能不是可编程硬件,但是往往有更高的效率,比如DSP

Data Parallel Programming Model

在一个program做并发运算时,里面的数据结构是一致的

Device

一个Device由一系列的compute unit组成。

Device里面由command-queue控制command队列。

GPU,多核CPU,或者DSP,Cell/B.E都是OpenCL的Device

Event Object

Event Object封装了command的操作状态,可以用于在context内的同步操作

Event Wait List

Event Wait List由多个event object组成,用于控制command的开始执行

Global ID

Global ID在kernel执行时,全局唯一定义了一个work-item

Global ID是一个由(0, 0, … 0)开始的N维值

Global Memory

context内执行的所有work-item都可以访问到的内存

host通过command可以访问Global Memory,比如读,写,映射

GL share group

OpenCL应用可以使用OpenGL的buffer,texture和renderbuffer作为OpenCL的memory object,这样可以有效的在OpenCL和OpenGL之间共享数据。

OpenCL可以在kernel里面读写OpenGL的

memory object
Handle

OpenCL里面的任何操作都是通过引用句柄来完成的

Host

Host通过OpenCL API与context交互

Host通常代表了一台主机,包含了一个或多个的device

Host pointer

Host pointer一般是指在主机里的内存地址

Image Object

储存2维或者3维结构数组的memory object

Image Object只能被读或者写,而不能被映射

读操作通过sampler操作

Image Object封装了以下信息

image的维度
image每个元素的描述
属性和使用信息,分配范围
image数据
In-order Execution

OpenCL的一类执行模型,定义了command-queue里的command顺序执行

Kernel

Kernel是在program里声明,在OpenCLdevice里面执行的函数

Kernel在program里用__kernel或者kernel关键字指定

Kernel Object

Kernel Object封装了在program里定义的__kernel函数和__kernel函数所使用的参数值

Local ID

Local ID在kernel里面执行的work-group里面指定了唯一的work-itemID

Local ID是一个由(0, 0, … 0)开始的N维值

Local Memory

分配给work-group的内存,并且仅work-group里面的work-item可以访问

Marker

一种command,可以给之前入队的command打上标签

Markercommand会返回一个eventapplication可以等待这个event

例如可以等待Marker之前的命令全部完成

Memory Objects

Local MemoryGlobal Memory

Buffer ObjectImage Object

Memory Regions (or Pools)

OpenCL中的一个确切的内存地址

不同的内存区域可能在物理上会重叠,但OpenCL在逻辑上会区分

Local MemoryLocal MemoryGlobal MemoryGlobal Memory

Out-of-Order Execution

OpenCL的一类执行模型,定义了command-queue里的command非顺序执行

执行顺序通过Command-queue BarrierEvent Wait List控制

Parent device

OpenCL的device可以创建sub device

但不是所有的parent device都是root deviceroot device分拆后的sub device可以进一步分拆。在这种场景下,第一层的sub device就是第二层的parent device,但是不是root device

Platform

OpenCL管理的hostdevice的集合,允许application共享resource并在device上执行kernel

Private Memory

work-item私有的内存区域

在一个work-itemprivate memory里面的数据无法被其它的work-item访问

Processing Element

虚拟的标量处理器

一个work-item可能运行在一个或多个Processing Element上

Local MemoryLocal MemoryGlobal MemoryGlobal Memory

Program

OpenCL Program由一系列的kernel组成

Program也包含__kernel函数调用的函数和常量数据

Local MemoryLocal MemoryGlobal MemoryGlobal Memory

Program Object

Program Object封装了以下信息

context的引用
program的源码或者二进制
build参数,build日志,build用到的device
最近引用的kernel
Reference Count

OpenCL对象的生命周期由Reference Count决定,Reference Count表示引用该对象的内部数量

在OpenCL里创建一个对象后,该对象的Reference Count变为1,retainAPI(clRetainContext, clRetainCommandQueue...)会增加Reference Count,releaseAPI(clReleaseContext, clReleaseCommandQueue...)会减少Reference Count

当Reference Count为0后,OpenCL会回收对象的资源

Relaxed Consistency

一种内存一致性模型,不同work-item或者command可见的内存数据可以不一致,但是在barrier或者其它同步点需要做同步处理,使数据一致

Resource

常用的Resource是指contextcommand-queueprogramkernelmemory object

Computational Resources指一系列的运算硬件,比如hostdevicecompute unitprocessing element

Retain, Release

增加(retain)或减少(release)对OpenCL对象的引用数

Retain和Release的机制保证系统不会在对象相关的操作完成之前回收对象

Root device

OpenCL的根device,不存在同级的并行device

Sampler

当一个image对象读入到kernel中时,需要用Sampler来取样

Image的读取函数会有一个sampler作为参数

Sampler定义了image对象的地址模式(比如超出image坐标的数据处理),过滤模式已经image坐标轴是规格化或非规格化数值

SIMD: Single Instruction Multiple Data

一个kernel同时在多个processing element上运行,这些processing element有自己的数据并共享一个指令计数器

所有的processing element都执行严格一致的指令

SPMD: Single Program Multiple Data

一个kernel同时在多个processing element上运行,这些processing element有自己的数据和独立的指令计数器

因此,虽然所有的计算资源上运行相同的kernel,但它们维护自己的指令计数器和kernel中分支,指令的实际顺序在整个processing element组内是完全不同的

Task Parallel Programming Model

计算被分解成多个并发的任务,每个任务是一个在work-item数量为1的work-group里面运行的kernel

并发的任务可以运行在不同的kernel里面

Thread-safe

如果在多个host线程同时调用的时候,内部状态由OpenCL管理并保持一致,则认为这个OpenCL的API调用是线程安全的

线程安全的OpenCL的API允许一个application的多个host线程同时调用,而不必在host的线程之间控制

Work-group

一系列的在同一个compute unit上运行的work-item

在同一个Work-group里面运行的work-item执行同一个kernel,并且共享local memorywork group barrier

Work-group Barrier

Barrier

Work-item

在一个device上通过一个command调用kernel并行执行

work-group在一个compute unit上运行,但是work-item会在一个或者多个processing element上运行

一个work-item有唯一的global IDlocal ID

架构

OpenCL架构

  • 硬件模型
  • 执行模型
  • 内存模型
  • 编程模型

硬件模型

OpenCL Platform Model

执行模型

NDRange

NDRange index space

Context 以及 Command Queues

Context 包含下面的内容
Deviceshost使用的OpenCLdevice
Kernels:在OpenCLdevice上运行的OpenCL函数
Program Objects:能被kernel执行的二进制或者OpenCL C源码
Memory Objects

Command Queues 的类型
Kernel execution commands:在deviceprocessing element上执行kernel
Memory commands:在memory object之间传递数据,从host的地址空间映射memory object
Synchronization commands:控制command执行的顺序

内存模型

Global Constant Local Private
Host 动态分配
读/写
动态分配
读/写
动态分配
无法访问
不分配
无法访问
Kernel 不分配
读/写
静态分配
静态分配
读/写
静态分配
读/写

内存结构

Conceptual OpenCL device architecture with processing elements (PE), compute units and devices

编程模型

  • Data Parallel Programming Model
  • Task Parallel Programming Model

Platform API

Querying Platform Info

clGetPlatformIDs

cl_int clGetPlatformIDs (cl_uint num_entries,
                         cl_platform_id *platforms,
                         cl_uint *num_platforms)
						
clGetPlatformInfo

cl_int clGetPlatformInfo (cl_platform_id platform,
                          cl_platform_info param_name,
                          size_t param_value_size,
                          void *param_value,
                          size_t *param_value_size_ret)
						

Querying Devices

clGetDeviceIDs

cl_int clGetDeviceIDs (cl_platform_id platform,
                       cl_device_type device_type,
                       cl_uint num_entries,
                       cl_device_id *devices,
                       cl_uint *num_devices)
						
clGetDeviceInfo

cl_int clGetDeviceInfo (cl_device_id device,
                        cl_device_info param_name,
                        size_t param_value_size,
                        void *param_value,
                        size_t *param_value_size_ret)
						

Partitioning a Device

clCreateSubDevices

cl_int clCreateSubDevices (cl_device_id in_device,
                           const cl_device_partition_property *properties,
                           cl_uint num_devices,
                           cl_device_id *out_devices,
                           cl_uint *num_devices_ret)
						
clRetainDevice

cl_int clRetainDevice (cl_device_id device)
						
clReleaseDevice

cl_int clReleaseDevice (cl_device_id device)
						

Contexts

clCreateContext

cl_context
clCreateContext (const cl_context_properties *properties,
                 cl_uint num_devices,
                 const cl_device_id *devices,
                 void (CL_CALLBACK *pfn_notify)(const char *errinfo,
                                                const void *private_info, size_t cb,
                                                void *user_data),
                 void *user_data,
                 cl_int *errcode_ret)
						
clGetContextInfo

cl_int clGetContextInfo (cl_context context,
                         cl_context_info param_name,
                         size_t param_value_size,
                         void *param_value,
                         size_t *param_value_size_ret)
						

Contexts

clCreateContext clCreateContextFromType

cl_context
clCreateContextFromType (const cl_context_properties *properties,
                         cl_device_type device_type,
                         void (CL_CALLBACK *pfn_notify)(const char *errinfo,
                                                        const void *private_info, size_t cb,
                                                        void *user_data),
                         void *user_data,
                         cl_int *errcode_ret)
						
clRetainContext

cl_int clRetainContext (cl_context context)
						
clReleaseContext

cl_int clReleaseContext (cl_context context)
						

Runtime API

Command Queues

Command Queues

clCreateCommandQueue

cl_command_queue clCreateCommandQueue (cl_context context,
                                       cl_device_id device,
                                       cl_command_queue_properties properties,
                                       cl_int *errcode_ret)
						
clRetainCommandQueue

cl_int clRetainCommandQueue (cl_command_queue command_queue)
						
clReleaseCommandQueue

cl_int clReleaseCommandQueue (cl_command_queue command_queue)
						
clGetCommandQueueInfo

cl_int clGetCommandQueueInfo (cl_command_queue command_queue,
                              cl_command_queue_info param_name,
                              size_t param_value_size,
                              void *param_value,
                              size_t *param_value_size_ret)
						

Buffer

Buffer Objects

clCreateBuffer

cl_mem clCreateBuffer (cl_context context,
                       cl_mem_flags flags,
                       size_t size,
                       void *host_ptr,
                       cl_int *errcode_ret)
						
clCreateSubBuffer

cl_mem clCreateSubBuffer (cl_mem buffer,
                          cl_mem_flags flags,
                          cl_buffer_create_type buffer_create_type,
                          const void *buffer_create_info,
                          cl_int *errcode_ret)
						

Buffer Objects

clEnqueueReadBuffer

cl_int clEnqueueReadBuffer (cl_command_queue command_queue,
                            cl_mem buffer,
                            cl_bool blocking_read,
                            size_t offset,
                            size_t size,
                            void *ptr,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)
						

Buffer Objects

clEnqueueWriteBuffer

cl_int clEnqueueWriteBuffer (cl_command_queue command_queue,
                             cl_mem buffer,
                             cl_bool blocking_write,
                             size_t offset,
                             size_t size,
                             const void *ptr,
                             cl_uint num_events_in_wait_list,
                             const cl_event *event_wait_list,
                             cl_event *event)
						

Buffer Objects

clEnqueueReadBufferRect

入队一个 command,将一块2D或3D矩形区域从buffer object写到host的内存中


cl_int clEnqueueReadBufferRect (cl_command_queue command_queue,
                                cl_mem buffer,
                                cl_bool blocking_read,
                                const size_t *buffer_origin,
                                const size_t *host_origin,
                                const size_t *region,
                                size_t buffer_row_pitch,
                                size_t buffer_slice_pitch,
                                size_t host_row_pitch,
                                size_t host_slice_pitch,
                                void *ptr,
                                cl_uint num_events_in_wait_list,
                                const cl_event *event_wait_list,
                                cl_event *event)
						

Buffer Objects

clEnqueueWriteBufferRect

入队一个 command,将一块2D或3D矩形区域从host的内存中写入到buffer object


cl_int clEnqueueWriteBufferRect (cl_command_queue command_queue,
                                 cl_mem buffer,
                                 cl_bool blocking_write,
                                 const size_t *buffer_origin,
                                 const size_t *host_origin,
                                 const size_t *region,
                                 size_t buffer_row_pitch,
                                 size_t buffer_slice_pitch,
                                 size_t host_row_pitch,
                                 size_t host_slice_pitch,
                                 const void *ptr,
                                 cl_uint num_events_in_wait_list,
                                 const cl_event *event_wait_list,
                                 cl_event *event)
						

Buffer Objects

clEnqueueCopyBuffer

入队一个 command,将src_buffer指定的buffer object拷贝到由dst_buffer指定的buffer object


cl_int clEnqueueCopyBuffer (cl_command_queue command_queue,
                            cl_mem src_buffer,
                            cl_mem dst_buffer,
                            size_t src_offset,
                            size_t dst_offset,
                            size_t size,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)
						

Buffer Objects

clEnqueueCopyBufferRect

入队一个 command,将一块2D或3D矩形区域从src_buffer指定的buffer object拷贝到由dst_buffer指定的buffer object


cl_int clEnqueueCopyBufferRect (cl_command_queue command_queue,
                                cl_mem src_buffer,
                                cl_mem dst_buffer,
                                const size_t *src_origin,
                                const size_t *dst_origin,
                                const size_t *region,
                                size_t src_row_pitch,
                                size_t src_slice_pitch,
                                size_t dst_row_pitch,
                                size_t dst_slice_pitch,
                                cl_uint num_events_in_wait_list,
                                const cl_event *event_wait_list,
                                cl_event *event)
						

Buffer Objects

clEnqueueFillBuffer

入队一个 command,通过patternpattern_size过滤一个buffer object


cl_int clEnqueueFillBuffer (cl_command_queue command_queue,
                            cl_mem buffer,
                            const void *pattern,
                            size_t pattern_size,
                            size_t offset,
                            size_t size,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)
						

Buffer Objects

clEnqueueMapBuffer

入队一个 command,将buffer指定的buffer object映射到host的内存空间中,并返回一个指向映射空间的指针


void * clEnqueueMapBuffer (cl_command_queue command_queue,
                           cl_mem buffer,
                           cl_bool blocking_map,
                           cl_map_flags map_flags,
                           size_t offset,
                           size_t size,
                           cl_uint num_events_in_wait_list,
                           const cl_event *event_wait_list,
                           cl_event *event,
                           cl_int *errcode_ret)
						

Image

Image Objects

Image 格式描述


typedef struct _cl_image_format {
        cl_channel_order  image_channel_order;
        cl_channel_type   image_channel_data_type;
} cl_image_format;
						

Image 描述


typedef struct _cl_image_desc {
        cl_mem_object_type  image_type,
        size_t              image_width;
        size_t              image_height;
        size_t              image_depth;
        size_t              image_array_size;
        size_t              image_row_pitch;
        size_t              image_slice_pitch;
        cl_uint             num_mip_levels;
        cl_uint             num_samples;
        cl_mem              buffer;
} cl_image_desc;
						

Image Objects

clCreateImage

创建 1D image,1D image buffer,1D image array,2D image,2D image array 和 3D image object


cl_mem clCreateImage (cl_context context,
                      cl_mem_flags flags,
                      const cl_image_format *image_format,
                      const cl_image_desc *image_desc,
                      void *host_ptr,
                      cl_int *errcode_ret)
						
clGetSupportedImageFormats

获取OpenCL实现所支持的 image 格式


cl_int clGetSupportedImageFormats (cl_context context,
                                   cl_mem_flags flags,
                                   cl_mem_object_type image_type,
                                   cl_uint num_entries,
                                   cl_image_format *image_formats,
                                   cl_uint *num_image_formats)
						

Image Objects

clEnqueueReadImage

入队一个 command,从 image 或者 image array 对象读取到 host 的内存中


cl_int clEnqueueReadImage (cl_command_queue command_queue,
                           cl_mem image,
                           cl_bool blocking_read,
                           const size_t *origin,
                           const size_t *region,
                           size_t row_pitch,
                           size_t slice_pitch,
                           void *ptr,
                           cl_uint num_events_in_wait_list,
                           const cl_event *event_wait_list,
                           cl_event *event)
						

Image Objects

clEnqueueWriteImage

入队一个 command,从 host 的内存中将数据写入到 image 或者 image array 对象


cl_int clEnqueueWriteImage (cl_command_queue command_queue,
                            cl_mem image,
                            cl_bool blocking_write,
                            const size_t *origin,
                            const size_t *region,
                            size_t input_row_pitch,
                            size_t input_slice_pitch,
                            const void * ptr,
                            cl_uint num_events_in_wait_list,
                            const cl_event *event_wait_list,
                            cl_event *event)
						

Image Objects

clEnqueueCopyImage

入队一个 command,拷贝 image 对象,src_imagedst_image可以为 1D, 2D, 3D image 或者 a 1D, 2D image array


cl_int clEnqueueCopyImage (cl_command_queue command_queue,
                           cl_mem src_image,
                           cl_mem dst_image,
                           const size_t *src_origin,
                           const size_t *dst_origin,
                           const size_t *region,
                           cl_uint num_events_in_wait_list,
                           const cl_event *event_wait_list,
                           cl_event *event)
						

Image Objects

clEnqueueFillImage

入队一个 command,将 image 对象填充特定的颜色fill_color


cl_int clEnqueueFillImage (cl_command_queue command_queue,
                           cl_mem image,
                           const void *fill_color,
                           const size_t *origin,
                           const size_t *region,
                           cl_uint num_events_in_wait_list,
                           const cl_event *event_wait_list,
                           cl_event *event)
						

Image Objects

clEnqueueCopyImageToBuffer

入队一个 command,拷贝 image 对象src_image到 buffer 对象dst_buffer


cl_int clEnqueueCopyImageToBuffer (cl_command_queue command_queue,
                                   cl_mem src_image,
                                   cl_mem dst_buffer,
                                   const size_t *src_origin,
                                   const size_t *region,
                                   size_t dst_offset,
                                   cl_uint num_events_in_wait_list,
                                   const cl_event *event_wait_list,
                                   cl_event *event)
						

Image Objects

clEnqueueCopyBufferToImage

入队一个 command,拷贝 buffer 对象dst_buffer到 image 对象src_image


cl_int clEnqueueCopyBufferToImage (cl_command_queue command_queue,
                                   cl_mem src_buffer,
                                   cl_mem dst_image,
                                   size_t src_offset,
                                   const size_t *dst_origin,
                                   const size_t *region,
                                   cl_uint num_events_in_wait_list,
                                   const cl_event *event_wait_list,
                                   cl_event *event)
						

Image Objects

clEnqueueMapImage

入队一个 command,将 image 对象image内的一块区域映射到 host 的内存空间,并返回一个指向该空间的指针


void * clEnqueueMapImage (cl_command_queue command_queue,
                          cl_mem image,
                          cl_bool blocking_map,
                          cl_map_flags map_flags,
                          const size_t *origin,
                          const size_t *region,
                          size_t *image_row_pitch,
                          size_t *image_slice_pitch,
                          cl_uint num_events_in_wait_list,
                          const cl_event *event_wait_list,
                          cl_event *event,
                          cl_int *errcode_ret)
						

Image Objects

clGetImageInfo

获取 image 对象image的信息


cl_int clGetImageInfo (cl_mem image,
                       cl_image_info param_name,
                       size_t param_value_size,
                       void *param_value,
                       size_t *param_value_size_ret)
						

Memory

Memory Objects

clRetainMemObject

cl_int clRetainMemObject (cl_mem memobj)
						
clReleaseMemObject

cl_int clReleaseMemObject (cl_mem memobj)
						
clSetMemObjectDestructorCallback

注册一个 memory 对象销毁时的回调函数


cl_int clSetMemObjectDestructorCallback (cl_mem memobj,
                                         void (CL_CALLBACK *pfn_notify)(cl_mem memobj,
                                                                        void *user_data),
                                         void *user_data)
						

Memory Objects

clEnqueueUnmapMemObject

入队一个 command,将之前映射过的 memory 对象取消映射关系


cl_int clEnqueueUnmapMemObject (cl_command_queue command_queue,
                                cl_mem memobj,
                                void *mapped_ptr,
                                cl_uint num_events_in_wait_list,
                                const cl_event *event_wait_list,
                                cl_event *event)
						
clEnqueueMigrateMemObjects

入队一个 command,用于在不同的 device 间同步 memory 对象


cl_int clEnqueueMigrateMemObjects (cl_command_queue command_queue,
                                   cl_uint num_mem_objects,
                                   const cl_mem *mem_objects,
                                   cl_mem_migration_flags flags,
                                   cl_uint num_events_in_wait_list,
                                   const cl_event *event_wait_list,
                                   cl_event *event)
						

Memory Objects

clGetMemObjectInfo

获取 memory 对象memobj的信息


cl_int clGetMemObjectInfo (cl_mem memobj,
                           cl_mem_info param_name,
                           size_t param_value_sizMemorye,
                           void *param_value,
                           size_t *param_value_size_ret)
						

Sampler

Sampler Objects

clCreateSampler

创建 sampler 对象cl_sampler


cl_sampler clCreateSampler (cl_context context,
                            cl_bool normalized_coords,
                            cl_addressing_mode addressing_mode,
                            cl_filter_mode filter_mode,
                            cl_int *errcode_ret)
						
clRetainSampler

cl_int clRetainSampler (cl_sampler sampler)
						
clReleaseSampler

cl_int clReleaseSampler (cl_sampler sampler)
						

Sampler Objects

clGetSamplerInfo

获取 sampler 对象sampler信息


cl_int clGetSamplerInfo (cl_sampler sampler,
                         cl_sampler_info param_name,
                         size_t param_value_size,
                         void *param_value,
                         size_t *param_value_size_ret)
						

Program

Program Objects

clCreateProgramWithSource

通过文本源码strings创建 program 对象


cl_program clCreateProgramWithSource (cl_context context,
                                      cl_uint count,
                                      const char **strings,
                                      const size_t *lengths,
                                      cl_int *errcode_ret)
						
clCreateProgramWithBinary

通过二进制内容binaries创建 program 对象


cl_program clCreateProgramWithBinary (cl_context context,
                                      cl_uint num_devices,
                                      const cl_device_id *device_list,
                                      const size_t *lengths,
                                      const unsigned char **binaries,
                                      cl_int *binary_status,
                                      cl_int *errcode_ret)
						

Program Objects

clCreateProgramWithBuiltInKernels

通过内建的 kernel kernel_names创建 program 对象


cl_program clCreateProgramWithBuiltInKernels (cl_context context,
                                              cl_uint num_devices,
                                              const cl_device_id *device_list,
                                              const char *kernel_names,
                                              cl_int *errcode_ret)
						
clRetainProgram

cl_int clRetainProgram (cl_program program)
						
clReleaseProgram

cl_int clReleaseProgram (cl_program program)
						

Program Objects

clBuildProgram

构建(build) program 对象,包含了编译(compile)和链接(link)两个步骤,一起完成


cl_int clBuildProgram (cl_program program,
                       cl_uint num_devices,
                       const cl_device_id *device_list,
                       const char *options,
                       void (CL_CALLBACK *pfn_notify)(cl_program program,
                                                      void *user_data),
                       void *user_data)
						

Program Objects

clCompileProgram

编译(compile) program 对象


cl_int clCompileProgram (cl_program program,
                         cl_uint num_devices,
                         const cl_device_id *device_list,
                         const char *options,
                         cl_uint num_input_headers,
                         const cl_program *input_headers,
                         const char **header_include_names,
                         void (CL_CALLBACK *pfn_notify)(cl_program program,
                                                        void *user_data),
                         void *user_data)
						

Program Objects

clLinkProgram

链接(link) program 对象


cl_program clLinkProgram (cl_context context,
                          cl_uint num_devices,
                          const cl_device_id *device_list,
                          const char *options,
                          cl_uint num_input_programs,
                          const cl_program *input_programs,
                          void (CL_CALLBACK *pfn_notify)(cl_program program,
                                                         void *user_data),
                          void *user_data,
                          cl_int *errcode_ret)
						

Compiler Options

预处理选项
-D name
-D name=definition
-I dir
数学内部函数选项
-cl-single-precision-constant
-cl-denorms-are-zero
-cl-fp32-correctly-rounded-divide-sqrt
优化选项
-cl-opt-disable
-cl-mad-enable
-cl-no-signed-zeros
-cl-unsafe-math-optimizations
-cl-finite-math-only
-cl-fast-relaxed-math

Compiler Options

请求或禁止显示警告
-w
-Werror
OpenCL C 版本
-cl-std=
Kernel 参数信息
-cl-kernel-arg-info

Linker Options

Library 链接选项
-create-library
-enable-link-options
Program 链接选项
-cl-denorms-are-zero
-cl-no-signed-zeroes
-cl-unsafe-math-optimizations
-cl-finite-math-only
-cl-fast-relaxed-math

Program Objects

clUnloadPlatformCompiler

释放编译(compile) program 对象时用到的资源


cl_int clUnloadPlatformCompiler (cl_platform_id platform)
						
clGetProgramInfo

获取 program 对象的信息


cl_int clGetProgramInfo (cl_program program,
                         cl_program_info param_name,
                         size_t param_value_size,
                         void *param_value,
                         size_t *param_value_size_ret)
						

Program Objects

clGetProgramBuildInfo

获取 program 对象构建(build)的信息


cl_int clGetProgramBuildInfo (cl_program program,
                              cl_device_id device,
                              cl_program_build_info param_name,
                              size_t param_value_size,
                              void *param_value,
                              size_t *param_value_size_ret)
						

Kernel

Kernel Objects

clCreateKernel

创建一个 kernel 对象


cl_kernel clCreateKernel (cl_program program,
                          const char *kernel_name,
                          cl_int *errcode_ret)
						
clCreateKernelsInProgram

创建 program 里面所有的 kernel 对象


cl_int clCreateKernelsInProgram (cl_program program,
                                 cl_uint num_kernels,
                                 cl_kernel *kernels,
                                 cl_uint *num_kernels_ret)
						
clRetainKernel

cl_int clRetainKernel (cl_kernel kernel)
						
clReleaseKernel

cl_int clReleaseKernel (cl_kernel kernel)
						

Kernel Objects

clSetKernelArg

设置 kernel 参数


cl_int clSetKernelArg (cl_kernel kernel,
                       cl_uint arg_index,
                       size_t arg_size,
                       const void *arg_value)
						
clGetKernelArgInfo

获取 kernel 参数信息


cl_int clGetKernelArgInfo (cl_kernel kernel,
                           cl_uint arg_indx,
                           cl_kernel_arg_info param_name,
                           size_t param_value_size,
                           void *param_value,
                           size_t *param_value_size_ret)
						

Kernel Objects

clGetKernelInfo

获取 kernel 对象的信息


cl_int clGetKernelInfo (cl_kernel kernel,
                        cl_kernel_info param_name,
                        size_t param_value_size,
                        void *param_value,
                        size_t *param_value_size_ret)
						
clGetKernelWorkGroupInfo

获取 kernel 对象里面 work-group 的信息


cl_int clGetKernelWorkGroupInfo (cl_kernel kernel,
                                 cl_device_id device,
                                 cl_kernel_work_group_info param_name,
                                 size_t param_value_size,
                                 void *param_value,
                                 size_t *param_value_size_ret)
						

Executing

Executing Kernels

clEnqueueNDRangeKernel

入队一个 command,在 kernel 里面执行,Data Parallel Programming Model


cl_int clEnqueueNDRangeKernel (cl_command_queue command_queue,
                               cl_kernel kernel,
                               cl_uint work_dim,
                               const size_t *global_work_offset,
                               const size_t *global_work_size,
                               const size_t *local_work_size,
                               cl_uint num_events_in_wait_list,
                               const cl_event *event_wait_list,
                               cl_event *event)
						

Executing Kernels

clEnqueueTask

入队一个 command,在 kernel 里面执行,Task Parallel Programming Model


cl_int clEnqueueTask (cl_command_queue command_queue,
                      cl_kernel kernel,
                      cl_uint num_events_in_wait_list,
                      const cl_event *event_wait_list,
                      cl_event *event)
						

Executing Kernels

clEnqueueNativeKernel

入队一个 command,执行 native kernel


cl_int clEnqueueNativeKernel (cl_command_queue command_queue,
                              void (CL_CALLBACK *user_func)(void *)
                              void *args,
                              size_t cb_args,
                              cl_uint num_mem_objects,
                              const cl_mem *mem_list,
                              const void **args_mem_loc,
                              cl_uint num_events_in_wait_list,
                              const cl_event *event_wait_list,
                              cl_event *event)
						

Event

Event Objects

kernel execution command

clEnqueueNDRangeKernel

clEnqueueTask

clEnqueueNativeKernel

Event Objects

memory objects command

clEnqueueReadBuffer

clEnqueueWriteBuffer

clEnqueueMapBuffer

clEnqueueUnmapMemObject

clEnqueueReadBufferRect

clEnqueueWriteBufferRect

clEnqueueReadImage

clEnqueueWriteImage

clEnqueueMapImage

clEnqueueCopyBuffer

clEnqueueCopyImage

clEnqueueCopyBufferRect

clEnqueueCopyBufferToImage

clEnqueueCopyImageToBuffer

clEnqueueMarkerWithWaitList

clEnqueueBarrierWithWaitList

Event Objects

clCreateUserEvent

cl_event clCreateUserEvent (cl_context context, cl_int *errcode_ret)
						
clSetUserEventStatus

cl_int clSetUserEventStatus (cl_event event, cl_int execution_status)
						
clWaitForEvents

cl_int clWaitForEvents (cl_uint num_events, const cl_event *event_list)
						
clGetEventInfo

cl_int clGetEventInfo (cl_event event,
                       cl_event_info param_name,
                       size_t param_value_size,
                       void *param_value,
                       size_t *param_value_size_ret)
						

Event Objects

clSetEventCallback

cl_int clSetEventCallback (cl_event event,
                           cl_int command_exec_callback_type,
                           void (CL_CALLBACK *pfn_event_notify)(cl_event event,
                                                                cl_int event_command_exec_status,
                                                                void *user_data),
                           void *user_data)
						
clRetainEvent

cl_int clRetainEvent (cl_event event)
						
clReleaseEvent

cl_int clReleaseEvent (cl_event event)
						

Markers, Barriers and Waiting for Events

Markers

clEnqueueMarkerWithWaitList

cl_int clEnqueueMarkerWithWaitList (cl_command_queue command_queue,
                                    cl_uint num_events_in_wait_list,
                                    const cl_event *event_wait_list,
                                    cl_event *event)
						

Barriers

clEnqueueBarrierWithWaitList

cl_int clEnqueueBarrierWithWaitList (cl_command_queue command_queue,
                                     cl_uint num_events_in_wait_list,
                                     const cl_event *event_wait_list,
                                     cl_event *event)
						

Out-of-order Execution of Kernels and Memory Object Commands

Profiling Operations on Memory Objects and Kernels

Profiling

clGetEventProfilingInfo

cl_int clGetEventProfilingInfo (cl_event event,
                                cl_profiling_info param_name,
                                size_t param_value_size,
                                void *param_value,
                                size_t *param_value_size_ret)
						

Flush and Finish

Flush

clFlush

cl_int clFlush (cl_command_queue command_queue)
						

Finish

clFinish

cl_int clFinish (cl_command_queue command_queue)
						

OpenCL C

Querying Platform Info

clGetPlatformIDs

cl_int clGetPlatformIDs (cl_uint num_entries,
                         cl_platform_id *platforms,
                         cl_uint *num_platforms)
						
clGetPlatformInfo

cl_int clGetPlatformInfo (cl_platform_id platform,
                          cl_platform_info param_name,
                          size_t param_value_size,
                          void *param_value,
                          size_t *param_value_size_ret)
						

The End

Halo9Pan