Common/GPU
-iree-codegen-expand-gpu-ops
link
Expands high-level GPU ops, such as clustered gpu.subgroup_reduce.
-iree-codegen-gpu-apply-tiling-level
link
Pass to tile tensor ops based on tiling configs
Optionslink
-tiling-level : Tiling level to tile. Supported levels are 'reduction' and 'thread'
-allow-zero-slices : Allow pad fusion to generate zero size slices
-iree-codegen-gpu-bubble-resource-casts
link
Bubbles iree_gpu.buffer_resource_cast ops upwards.
-iree-codegen-gpu-check-resource-usage
link
Checks GPU specific resource usage constraints like shared memory limits
-iree-codegen-gpu-combine-value-barriers
link
Combines iree_gpu.value_barrier
ops
-iree-codegen-gpu-create-fast-slow-path
link
Create separate fast and slow paths to handle padding
-iree-codegen-gpu-decompose-horizontally-fused-gemms
link
Decomposes a horizontally fused GEMM back into its constituent GEMMs
-iree-codegen-gpu-distribute
link
Pass to distribute scf.forall ops using upstream patterns.
-iree-codegen-gpu-distribute-copy-using-forall
link
Pass to distribute copies to threads.
-iree-codegen-gpu-distribute-forall
link
Pass to distribute scf.forall ops.
-iree-codegen-gpu-distribute-scf-for
link
Distribute tiled loop nests to invocations
Optionslink
-use-block-dims : Use gpu.block_dim ops to query distribution sizes.
-iree-codegen-gpu-distribute-shared-memory-copy
link
Pass to distribute shared memory copies to threads.
-iree-codegen-gpu-fuse-and-hoist-parallel-loops
link
Greedily fuses and hoists parallel loops.
-iree-codegen-gpu-generalize-named-ops
link
Convert named Linalg ops to linalg.generic ops
-iree-codegen-gpu-greedily-distribute-to-threads
link
Greedily distributes all remaining tilable ops to threads
-iree-codegen-gpu-infer-memory-space
link
Pass to infer and set the memory space for all alloc_tensor ops.
-iree-codegen-gpu-lower-to-ukernels
link
Lower suitable ops to previously-selected microkernels
-iree-codegen-gpu-multi-buffering
link
Pass to do multi buffering.
Optionslink
-num-buffers : Number of buffers to use.
-iree-codegen-gpu-pack-to-intrinsics
link
Packs matmul like operations and converts to iree_gpu.multi_mma
-iree-codegen-gpu-pad-operands
link
Pass to pad operands of ops with padding configuration provided.
-iree-codegen-gpu-pipelining
link
Pass to do software pipelining.
Optionslink
-epilogue-peeling : Try to use un-peeling epilogue when false, peeled epilouge o.w.
-pipeline-depth : Number of stages
-schedule-index : Allows picking different schedule for the pipelining transformation.
-transform-file-name : Optional filename containing a transform dialect specification to apply. If left empty, the IR is assumed to contain one top-level transform dialect operation somewhere in the module.
-iree-codegen-gpu-promote-matmul-operands
link
Pass to insert copies with a different thread configuration on matmul operands
-iree-codegen-gpu-reduce-bank-conflicts
link
Pass to try to reduce the number of bank conflicts by padding memref.alloc ops.
Optionslink
-padding-bits : Padding size (in bits) to introduce between rows.
-iree-codegen-gpu-reuse-shared-memory-allocs
link
Pass to reuse shared memory allocations with no overlapping liveness.
-iree-codegen-gpu-tensor-alloc
link
Pass to create allocations for some tensor values to useGPU shared memory
-iree-codegen-gpu-tensor-tile
link
Pass to tile tensor (linalg) ops within a GPU workgroup
Optionslink
-distribute-to-subgroup : Distribute the workloads to subgroup if true, otherwise distribute to threads.
-iree-codegen-gpu-tensor-tile-to-serial-loops
link
Pass to tile reduction dimensions for certain GPU ops
Optionslink
-coalesce-loops : Collapse the loops that are generated to a single loops
-iree-codegen-gpu-tile
link
Tile Linalg ops with tensor semantics to invocations
-iree-codegen-gpu-tile-reduction
link
Pass to tile linalg reduction dimensions.
-iree-codegen-gpu-vector-alloc
link
Pass to create allocations for contraction inputs to copy to GPU shared memory
-iree-codegen-gpu-verify-distribution
link
Pass to verify writes before resolving distributed contexts.
-iree-codegen-reorder-workgroups
link
Reorder workgroup ids for better cache reuse
Optionslink
-strategy : Workgroup reordering strategy, one of: '' (none), 'transpose'
-iree-codegen-vector-reduction-to-gpu
link
Convert vector reduction to GPU ops.