Common
-iree-codegen-add-fast-math-flagslink
Add fast math flags to all the operations supporting them, given a floating-point mode.
-iree-codegen-block-dynamic-dimensionslink
Expand dynamic dimensions that are known to be multiples of statically known values.
-iree-codegen-bubble-up-ordinal-opslink
Bubbles op ordinal ops to allow for workgroup count computation
Pass to bubble up ordinal operations to allow workgroup count computation based on slices to correlate back to workload computation.
-iree-codegen-bufferize-copy-only-dispatcheslink
Bufferize dispatches that copy to/from interfaces to convert to a linalg.copy op
Pass to bufferize dispatches that are copying from one interface to
another. This will create a linalg.generic op which is a copy that can
then be used by backends to handle appropriately.
-iree-codegen-bufferize-dispatch-tensor-load-storelink
Bufferize the iree_tensor_ext.dispatch.tensor.load/store ops at dispatch boundaries
Pass to bufferize the edges of dispatch regions, converting iree_tensor_ext.dispatch.tensor.load ops to iree_codegen.load_from_memref, and iree_tensor_ext.dispatch.tensor.store ops to iree_codegen.store_to_memref.
-iree-codegen-canonicalize-scf-forlink
Adhoc canonicalization of selected loop-carried values/dependencies for scf.for ops
-iree-codegen-cleanup-buffer-alloc-viewlink
Performs cleanups over HAL interface/buffer allocation/view operations
-iree-codegen-concretize-pad-result-shapelink
Concretizes tensor.pad op's result shape if its source opimplements OffsetSizeAndStrideOpInterface.
-iree-codegen-config-tracking-canonicalizelink
Codegen specific canonicalization pass that tracks lowering configs
Optionslink
-test-convergence : Fails if the patterns fail to converge
-iree-codegen-convert-bf16-to-uint16-bufferslink
Convert BF16 buffer ops and conversions to simulated behavior with uint16.
-iree-codegen-convert-hal-descriptor-type-to-gpu-address-spacelink
Convert #hal.descriptor_type to #gpu.address_space
-iree-codegen-convert-to-destination-passing-stylelink
Transforms the code to make the dispatch use destination-passing style
Converts entry point function within dispatch regions to use destination-passing style, which is better suited for the upstream comprehensive bufferization pass.
Optionslink
-convert-inputs-to-destinations         : Controls whether to adjust consumers to convert one of its inputs to a destination
-use-war-for-cooperative-matrix-codegen : WAR for failure in Cooperative matrix codegen pipelines. See #10648.
-iree-codegen-convolution-to-igemmlink
Transforms convolution operations into an implicit GEMM format.
-iree-codegen-decompose-affine-opslink
Decompose affine.apply operations into sub affine.apply
Decompose affine.apply operations into sub affine.apply where each
sub expression references values that are defined in the same loop scope.
The sub expression are then stitched back together following the loop
nest order.
The goal of this pass is to break down affine.apply expressions such
that the resulting sub expressions can be hoisted out in their respective
loop.
E.g., Let's say we have
%res = affine.apply
         affine_map<()[s0, s1, s2] -> (s0 * 1024 + s1 * 32 + s2)>()
           [%loopVariant, %inv1, %inv2]
%inv1 and %inv2 are loop invariant and %loopVariant is not.
This will produce the following subexpressions:
// Loop invariant computations first.
%inv1x32 =
  affine.apply affine_map<()[s0] -> (s0 * 32)>()[%inv1]
%inv1x32_plus_inv2 =
  affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()[%inv1x32, %inv2]
// Loop variant computation next.
%loopVariantx1024 =
  affine.apply affine_map<()[s0] -> (s0 * 1024)>()[%loopVariant]
// Compose things back together.
%res =
  affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()
    [%loopVariant, %inv1x32_plus_inv2]
%inv1x32_plus_inv2 can be hoisted out of the loop.
This pass requires scf.for structures to still be around otherwise
the break down will be meaningless.
Note: The decomposition performed by this pass will be undone by
canonicalization. Make sure to lower the resulting ops before that.
-iree-codegen-decompose-boundary-pack-unpack-opslink
Wrapper for DecomposePackUnPackOpsPass to decompose ops at function boundaries
Optionslink
-tile-outer-to-one : Always apply tiling to make outer dimension be ones
-iree-codegen-decompose-convolution-to-lower-dim-opslink
Decomposes linalg convolution ops to lower dim ops
-iree-codegen-decompose-linalg-genericlink
Decomposes linalg generic ops into individual ops
It is sometimes advantageous to operate on generic ops which contain at most one non-yield body operation. This is most often the case when needing to materialize individual ops (which some backends require). Note that this is often an extreme pessimization unless if part of a lowering flow which was designed for it.
Operates on tensor based linalg ops.
-iree-codegen-decompose-memrefslink
Decomposes memrefs
-iree-codegen-decompose-pack-unpack-opslink
Decompose pack/unpack ops into vectorizable ops
Optionslink
-tile-outer-to-one : Always apply tiling to make outer dimension be ones
-use-only-reshapes : Use decomposition into reshape ops, even when packing unit dimensions.
-iree-codegen-decompose-softmaxlink
Decomposes softmax op into a sequence of linalg ops
Optionslink
-use-fusion : Whether to use the internal pass fusion logic for the exp function. See #15862.
-iree-codegen-drop-vector-unit-dimslink
Pass to drop vector unit dims.
-iree-codegen-emulate-narrow-typelink
Emulate narrow integer operations using wide integer operations
A pass to emulate memref load operations that use narrow integer types with equivalent operations on supported wide integer types.
-iree-codegen-erase-dead-alloc-and-storeslink
Erase alloc ops if all the uses are just stores
-iree-codegen-erase-hal-descriptor-type-from-memreflink
Erase #hal.descriptor_type from MemRef memory space
-iree-codegen-expand-strided-metadatalink
Resolve memref.extract_strided_metadata operations
Optionslink
-allow-subview-expansion : Enables expansion of memref.subview ops
-allow-unresolved        : Allow unresolved strided metadata op (for testing)
-iree-codegen-extract-address-computationlink
Extract address computations from memory accesses
Extract the address computation from the instructions with memory accesses such that these memory accesses use only a base pointer.
For instance,
memref.load %base[%off0, ...]
Will be rewritten in:
%new_base = memref.subview %base[%off0,...][1,...][1,...]
memref.load %new_base[%c0,...]
-iree-codegen-flatten-memref-subspanlink
Flatten n-D MemRef subspan ops to 1-D ones and fold byte offsets
Flattens n-D MemRef subspan ops to 1-D MemRef and folds the byte offsets on subspan ops to the consumer load/store ops, in preparation for lowering to backends that require linearized access.
-iree-codegen-fold-affinemin-in-distributed-loopslink
Fold affine.min ops in distributed loops
-iree-codegen-fold-tensor-extract-oplink
Fold tensor.extract operations prior to lowering to LLVM
After running the upstream TensorConstantBufferize pass, remove tensor_loads introduced for use only in tensor_extract. These can be folded to use a load of the created memref object that holds the constant values.
-iree-codegen-fuse-tensor-pad-with-consumerlink
Fuse tensor.pad op into its consumer op's tiled loop nest
-iree-codegen-generic-vectorizationlink
Pass to perform vectorization on tensor/linalg ops.
Optionslink
-enable-vector-masking        : Enable vector masking during vectorization.
-use-configured-vector-sizes  : Control whether the op lowering config represents a set of masked vector sizes
-vectorize-copies             : Enable vectorization of linalg.copy operations.
-vectorize-padding            : Rewrite all tensor.pad ops in the function to vector form.
-vectorize-gather-accesses    : Enable vectorizaiton of operations that may generate vector.gather operations.
-vectorize-to-transfer-gather : Enables vectorization of gather-like operations that may generate iree_vector_ext.transfer_gather
-enable-cleanup               : Enable cleanups after vectorization. The patterns touch the structuregenerated from tiling so it affects later steps like bufferization and vector hoisting.
-generate-contract            : Enable conversion for reduction ops to contraction ops.
-fold-cast-into-contract      : Enable folding casting ops into vector.contract.
-max-vector-size              : Max vector size allowed to avoid creating large vectors.
-iree-codegen-hoist-statically-bound-allocationslink
Hoist statically bound alloca ops to the entry block of functions
Optionslink
-vscale-min : Minimum possible value of vscale.
-vscale-max : Maximum possible value of vscale (a value of zero means unbounded).
-iree-codegen-hoist-vector-extract-insert-slicelink
Hoist unrolled vector (extract, insert) pairs out of scf.for op
-iree-codegen-instrument-memory-accesseslink
Instruments memory reads and writes for address tracking when dispatch instrumentation is enabled.
-iree-codegen-iree-bufferize-constantslink
Convert from arith.constant on tensors to buffers
-iree-codegen-iree-comprehensive-bufferizelink
Convert from to Linalg ops on tensors to buffers
Optionslink
-test-analysis-only : Only runs inplaceability analysis (for testing purposes only)
-print-conflicts    : Annotates IR with RaW conflicts. Requires test-analysis-only.
-iree-codegen-link-tuning-specslink
Link nested transform dialect tuning specs named sequences into a single entry point
Given a module with multiple nested tuning specs, introduce a new named sequence that includes all the other tuning spec entry points. The order of inclusion is the same as the order in which these nested tuning specs appear in the IR.
A tuning spec entry point is a transform.named_sequence op annotated with the
iree_codegen.tuning_spec unit attribute. We require it to perform in-place op
modification and not consume the handle.
-iree-codegen-lower-executable-using-transform-dialectlink
Lower executables using the transform dialect recipe provided in the module.
-iree-codegen-lower-ukernel-ops-to-callslink
Lower micro-kernel wrapper ops into function calls
-iree-codegen-lowering-config-interpreterlink
Pass to apply lowering config annotated strategies.
This pass runs the transform dialect interpreter and applies the named sequence transformation specified by lowering configs annotated on operations.
-iree-codegen-materialize-device-encodinglink
Materialize the encoding for tensor as specified by the backend.
Optionslink
-test-cl-gpu-target : Flag used for lit-testing GPU target only. Not for general usage
-iree-codegen-materialize-encoding-into-noplink
Drop the encodings from tensor types with encodings.
-iree-codegen-materialize-encoding-into-paddinglink
Materialize #iree_encoding.pad_encoding_layout attributes.
Handles padding introduced by pad_encoding_layout encoding layouts, which
requires iree_tensor_ext.dispatch.tensor.load/.store to be adjusted to account for
padding regions.
Materializes any other encoding layouts into nop.
-iree-codegen-materialize-host-encodinglink
Materialize the encoding for tensor as specified by the backend.
-iree-codegen-materialize-tuning-specslink
Load tuning spec transform dialect libraries and encode them in the module
Links all available tuning spec transform dialect modules into a single
tuning spec. Next, serializes this tuning spec to bytecode and attaches it
as a module attribute. We do this so that the full tuning spec is always
encoded in the program IR and can be checked with --mlir-print-ir-after-all
(or equivalent). The alternative would be to add the tuning spec as a
submodule in the compiled program, but this may result in the tuning spec
being inadvertently visited by other passes that attempt to walk the outer
module. Serialization makes the tuning specs opaque and prevents it from
happening.
This attribute is expected to be short-lived and removed by
iree-codegen-materialize-user-configs.
-iree-codegen-materialize-user-configslink
Sets the lowering configs and translation info from user configs
-iree-codegen-math-transformlink
Apply math ops transformations: approximations, rewrites to other math ops, operand casts.
-iree-codegen-memrefcopy-to-linalglink
Convert memref.copy to linalg op
-iree-codegen-normalize-loop-boundslink
Normalize the loop bounds of scf.for and scf.forall
Normalizes the iteration range of scf.for and scf.forall loops to
[0, ub) += 1.
Optionslink
-normalize-for    : Enable normalization for `scf.for` loops
-normalize-forall : Enable normalization for `scf.forall` loops
-iree-codegen-optimize-tensor-insert-extract-sliceslink
Optimize tensor.insert_slice/tensor.extract_slice operations (e.g. hoist and fold)
Optionslink
-fold-identity-slices : Enable folding of identity tensor.*_slice ops.
-iree-codegen-optimize-vector-transferlink
Run optimization transformations on vector transfer operations
Optionslink
-flatten            : Flatten the vector type of vector transfers where possible (contiguous row-major data).
-redundant-hoisting : Enables use of redundant vector transfer hoisting.
-iree-codegen-pad-dynamic-alloclink
Pass to pad dynamic alloc into static one.
-iree-codegen-propagate-dispatch-size-boundslink
Pass to annotate workitem and workgroup IDs with known bounds
-iree-codegen-propagate-reshapes-by-expansionlink
Propagates reshaping operations by expansion.
Pass to propagate reshapes by expansion through all ops without explicit lowering configurations.
-iree-codegen-reconcile-translation-infolink
Reconcile information (like workgroup_size, subgroup_size) across TranslationInfo set on each function in the dispatch and merge themand set them at the appropriate places in the surrounding HAL ops
-iree-codegen-rematerialize-parallel-opslink
Pass to rematerialize and merge parallel ops into consumers.
-iree-codegen-remove-single-iteration-looplink
Remove distributed loop with single iteration.
-iree-codegen-replace-slow-min-max-opslink
Replace slow min/max operations that propagate NaNs and distinguish between +/-0.0 with faster min/max operations that ignore them.
-iree-codegen-resolve-swizzle-hintslink
Resolves iree_codegen.swizzle_hint ops
-iree-codegen-split-full-partial-transferlink
Split a vector.transfer operation into an in-bounds (i.e., no out-of-bounds masking) fastpath and a slowpath.
Optionslink
-split-transfers : Split vector transfers between slow (masked) and fast "
        "(unmasked) variants. Possible options are:\n"
          "\tnone [default]: keep unsplit vector.transfer and pay the price\n"
          "\tlinalg-copy: use linalg.fill + linalg.generic for the slow path\n"
          "\tvector-transfers: use extra small unmasked vector.transfers for"
          " the slow path\n
-iree-codegen-strip-compilation-infolink
Remove all the the lowering configuration and translation info attributes.
-iree-codegen-test-executable-preprocessinglink
Tests iree-hal-preprocess-executables-with behavior.
-iree-codegen-test-partitionable-loops-interfacelink
Test the PartitionableLoopsInterface
-iree-codegen-tile-and-distribute-to-workgroupslink
Tile and distribute operations to workgroups
Optionslink
-max-workgroup-parallel-dims : Maximum number of dims to distribute workgroups across.
-distribution-method         : Pick the distribution method. See linalg::DistributionMethod for details
-iree-codegen-tile-and-distribute-to-workgroups-using-forall-oplink
Tile and distribute operation to workgroups (using scf.forall op)
Optionslink
-transpose-workgroup : Swaps the workgroup mapping attribute x and y.Only swaps when the loop bounds are static.
-iree-codegen-tile-large-tensorslink
Greedily tiles all linalg ops that are beyond a certain size
Optionslink
-max-vector-size : Maximum static size to tile to (i.e. all remaining ops will be smaller)
-iree-codegen-type-propagationlink
Propogate the type of tensor to avoid load/stores of illegal bit widths
-iree-codegen-unroll-annotated-loopslink
Unrolls all scf.for loops marked with unroll_loop
-iree-codegen-vector-transfer-loweringlink
Pass to lower transfer ops to simpler ops like vector.load, vector.store, vector.broadcast, and a set of scf ops.
Optionslink
-enable-scalable-lowerings : Enables scalable vector specific transfer lowerings
-iree-codegen-vectorize-memref-copylink
Vectorizes memref copy operations.
-iree-codegen-vectorize-tensor-padlink
Vectorize a very specific form of tensor.pad with control flows
-iree-codegen-verify-workgroup-distributionlink
Pass to verify proper distribution to workgroups.
Pass to verify that all writes to global memory are explicitly mapped to workgroups. This means that in cases where we use loops (scf.forall) to manage distribution to workgroups, we require that all ops with write side effects are contained within a workgroup distributed loop.
-iree-convert-accgemm-to-gemmlink
Convert accumulating GEMMs to GEMMs post dispatch creation.
-iree-convert-bf16-arith-to-f32link
Convert bf16 arithmetic operations to f32
-iree-convert-unsupported-float-arithlink
Convert arith operations on unsupported(source types) float types to the target type. Populates the source and target based on the target architecture.
-iree-eliminate-empty-tensorslink
Eliminate tensor.empty ops to avoid buffer allocations
-iree-loop-invariant-code-motionlink
Performs LICM on loops guaranteed to have >= 1 trip
This is a mirror of the upstream LICM pass that restricts to loops that are guaranteed to have at least one trip. This currently only supports loops that expose a lower and upper bound as the generic loop-like interface does not expose a way to query for trip count.
Additionally code motion of scf.forall ops with mappings is always unsafe
and is explicitly disabled.
-iree-transform-dialect-interpreterlink
Pass to apply transform dialect operations.
This pass runs the transform dialect interpreter and applies the named
sequence transformation specified by the provided name (defaults to
TransformDialect::kTransformEntryPointSymbolName (i.e. __transform_main)).
Optionslink
-entry-point       : Entry point of the pass pipeline.
-library-file-name : File path to load a library of transform dialect strategies from.