Skip to content

Common/GPU

-iree-codegen-expand-gpu-opslink

Expands high-level GPU ops, such as clustered gpu.subgroup_reduce.

-iree-codegen-gpu-apply-tiling-levellink

Pass to tile tensor ops based on tiling configs

Optionslink

-tiling-level      : Tiling level to tile. Supported levels are 'reduction' and 'thread'
-allow-zero-slices : Allow pad fusion to generate zero size slices

-iree-codegen-gpu-bubble-resource-castslink

Bubbles iree_gpu.buffer_resource_cast ops upwards.

-iree-codegen-gpu-check-resource-usagelink

Checks GPU specific resource usage constraints like shared memory limits

-iree-codegen-gpu-combine-value-barrierslink

Combines iree_gpu.value_barrier ops

-iree-codegen-gpu-create-fast-slow-pathlink

Create separate fast and slow paths to handle padding

-iree-codegen-gpu-decompose-horizontally-fused-gemmslink

Decomposes a horizontally fused GEMM back into its constituent GEMMs

-iree-codegen-gpu-distributelink

Pass to distribute scf.forall ops using upstream patterns.

-iree-codegen-gpu-distribute-copy-using-foralllink

Pass to distribute copies to threads.

-iree-codegen-gpu-distribute-foralllink

Pass to distribute scf.forall ops.

-iree-codegen-gpu-distribute-scf-forlink

Distribute tiled loop nests to invocations

Optionslink

-use-block-dims : Use gpu.block_dim ops to query distribution sizes.

-iree-codegen-gpu-distribute-shared-memory-copylink

Pass to distribute shared memory copies to threads.

-iree-codegen-gpu-fuse-and-hoist-parallel-loopslink

Greedily fuses and hoists parallel loops.

-iree-codegen-gpu-generalize-named-opslink

Convert named Linalg ops to linalg.generic ops

-iree-codegen-gpu-greedily-distribute-to-threadslink

Greedily distributes all remaining tilable ops to threads

-iree-codegen-gpu-infer-memory-spacelink

Pass to infer and set the memory space for all alloc_tensor ops.

-iree-codegen-gpu-lower-to-ukernelslink

Lower suitable ops to previously-selected microkernels

-iree-codegen-gpu-multi-bufferinglink

Pass to do multi buffering.

Optionslink

-num-buffers : Number of buffers to use.

-iree-codegen-gpu-pack-to-intrinsicslink

Packs matmul like operations and converts to iree_gpu.multi_mma

-iree-codegen-gpu-pad-operandslink

Pass to pad operands of ops with padding configuration provided.

-iree-codegen-gpu-pipelininglink

Pass to do software pipelining.

Optionslink

-epilogue-peeling    : Try to use un-peeling epilogue when false, peeled epilouge o.w.
-pipeline-depth      : Number of stages 
-schedule-index      : Allows picking different schedule for the pipelining transformation.
-transform-file-name : Optional filename containing a transform dialect specification to apply. If left empty, the IR is assumed to contain one top-level transform dialect operation somewhere in the module.

-iree-codegen-gpu-promote-matmul-operandslink

Pass to insert copies with a different thread configuration on matmul operands

-iree-codegen-gpu-reduce-bank-conflictslink

Pass to try to reduce the number of bank conflicts by padding memref.alloc ops.

Optionslink

-padding-bits : Padding size (in bits) to introduce between rows.

-iree-codegen-gpu-reuse-shared-memory-allocslink

Pass to reuse shared memory allocations with no overlapping liveness.

-iree-codegen-gpu-tensor-alloclink

Pass to create allocations for some tensor values to useGPU shared memory

-iree-codegen-gpu-tensor-tilelink

Pass to tile tensor (linalg) ops within a GPU workgroup

Optionslink

-distribute-to-subgroup : Distribute the workloads to subgroup if true, otherwise distribute to threads.

-iree-codegen-gpu-tensor-tile-to-serial-loopslink

Pass to tile reduction dimensions for certain GPU ops

Optionslink

-coalesce-loops : Collapse the loops that are generated to a single loops

-iree-codegen-gpu-tilelink

Tile Linalg ops with tensor semantics to invocations

-iree-codegen-gpu-tile-reductionlink

Pass to tile linalg reduction dimensions.

-iree-codegen-gpu-vector-alloclink

Pass to create allocations for contraction inputs to copy to GPU shared memory

-iree-codegen-gpu-verify-distributionlink

Pass to verify writes before resolving distributed contexts.

-iree-codegen-reorder-workgroupslink

Reorder workgroup ids for better cache reuse

Optionslink

-strategy : Workgroup reordering strategy, one of: '' (none),  'transpose'

-iree-codegen-vector-reduction-to-gpulink

Convert vector reduction to GPU ops.