Common

`-iree-codegen-add-fast-math-flags`link

Add fast math flags to all the operations supporting them, given a floating-point mode.

`-iree-codegen-block-dynamic-dimensions`link

Expand dynamic dimensions that are known to be multiples of statically known values.

`-iree-codegen-bubble-up-ordinal-ops`link

Bubbles op ordinal ops to allow for workgroup count computation

Pass to bubble up ordinal operations to allow workgroup count computation based on slices to correlate back to workload computation.

`-iree-codegen-bufferize-copy-only-dispatches`link

Bufferize dispatches that copy to/from interfaces to convert to a linalg.copy op

Pass to bufferize dispatches that are copying from one interface to another. This will create a linalg.generic op which is a copy that can then be used by backends to handle appropriately.

`-iree-codegen-bufferize-dispatch-tensor-load-store`link

Bufferize the iree_tensor_ext.dispatch.tensor.load/store ops at dispatch boundaries

Pass to bufferize the edges of dispatch regions, converting iree_tensor_ext.dispatch.tensor.load ops to iree_codegen.load_from_memref, and iree_tensor_ext.dispatch.tensor.store ops to iree_codegen.store_to_memref.

`-iree-codegen-canonicalize-scf-for`link

Adhoc canonicalization of selected loop-carried values/dependencies for scf.for ops

`-iree-codegen-cleanup-buffer-alloc-view`link

Performs cleanups over HAL interface/buffer allocation/view operations

`-iree-codegen-concretize-pad-result-shape`link

Concretizes tensor.pad op's result shape if its source opimplements OffsetSizeAndStrideOpInterface.

`-iree-codegen-config-tracking-canonicalize`link

Codegen specific canonicalization pass that tracks lowering configs

Optionslink

-test-convergence : Fails if the patterns fail to converge

`-iree-codegen-convert-bf16-to-uint16-buffers`link

Convert BF16 buffer ops and conversions to simulated behavior with uint16.

`-iree-codegen-convert-hal-descriptor-type-to-gpu-address-space`link

Convert #hal.descriptor_type to #gpu.address_space

`-iree-codegen-convert-to-destination-passing-style`link

Transforms the code to make the dispatch use destination-passing style

Converts entry point function within dispatch regions to use destination-passing style, which is better suited for the upstream comprehensive bufferization pass.

Optionslink

-convert-inputs-to-destinations         : Controls whether to adjust consumers to convert one of its inputs to a destination
-use-war-for-cooperative-matrix-codegen : WAR for failure in Cooperative matrix codegen pipelines. See #10648.

`-iree-codegen-convolution-to-igemm`link

Transforms convolution operations into an implicit GEMM format.

`-iree-codegen-decompose-affine-ops`link

Decompose affine.apply operations into sub affine.apply

Decompose affine.apply operations into sub affine.apply where each sub expression references values that are defined in the same loop scope. The sub expression are then stitched back together following the loop nest order. The goal of this pass is to break down affine.apply expressions such that the resulting sub expressions can be hoisted out in their respective loop. E.g., Let's say we have

%res = affine.apply
         affine_map<()[s0, s1, s2] -> (s0 * 1024 + s1 * 32 + s2)>()
           [%loopVariant, %inv1, %inv2]

Where %inv1 and %inv2 are loop invariant and %loopVariant is not. This will produce the following subexpressions:

// Loop invariant computations first.
%inv1x32 =
  affine.apply affine_map<()[s0] -> (s0 * 32)>()[%inv1]
%inv1x32_plus_inv2 =
  affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()[%inv1x32, %inv2]
// Loop variant computation next.
%loopVariantx1024 =
  affine.apply affine_map<()[s0] -> (s0 * 1024)>()[%loopVariant]
// Compose things back together.
%res =
  affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()
    [%loopVariant, %inv1x32_plus_inv2]

Now the sequence of instructions leading to and including %inv1x32_plus_inv2 can be hoisted out of the loop. This pass requires scf.for structures to still be around otherwise the break down will be meaningless. Note: The decomposition performed by this pass will be undone by canonicalization. Make sure to lower the resulting ops before that.

`-iree-codegen-decompose-boundary-pack-unpack-ops`link

Wrapper for DecomposePackUnPackOpsPass to decompose ops at function boundaries

Optionslink

-tile-outer-to-one : Always apply tiling to make outer dimension be ones

`-iree-codegen-decompose-convolution-to-lower-dim-ops`link

Decomposes linalg convolution ops to lower dim ops

`-iree-codegen-decompose-linalg-generic`link

Decomposes linalg generic ops into individual ops

It is sometimes advantageous to operate on generic ops which contain at most one non-yield body operation. This is most often the case when needing to materialize individual ops (which some backends require). Note that this is often an extreme pessimization unless if part of a lowering flow which was designed for it.

Operates on tensor based linalg ops.

`-iree-codegen-decompose-memrefs`link

Decomposes memrefs

`-iree-codegen-decompose-pack-unpack-ops`link

Decompose pack/unpack ops into vectorizable ops

Optionslink

-tile-outer-to-one : Always apply tiling to make outer dimension be ones
-use-only-reshapes : Use decomposition into reshape ops, even when packing unit dimensions.

`-iree-codegen-decompose-softmax`link

Decomposes softmax op into a sequence of linalg ops

Optionslink

-use-fusion : Whether to use the internal pass fusion logic for the exp function. See #15862.

`-iree-codegen-drop-vector-unit-dims`link

Pass to drop vector unit dims.

`-iree-codegen-emulate-narrow-type`link

Emulate narrow integer operations using wide integer operations

A pass to emulate memref load operations that use narrow integer types with equivalent operations on supported wide integer types.

`-iree-codegen-erase-dead-alloc-and-stores`link

Erase alloc ops if all the uses are just stores

`-iree-codegen-erase-hal-descriptor-type-from-memref`link

Erase #hal.descriptor_type from MemRef memory space

`-iree-codegen-expand-strided-metadata`link

Resolve memref.extract_strided_metadata operations

Optionslink

-allow-subview-expansion : Enables expansion of memref.subview ops
-allow-unresolved        : Allow unresolved strided metadata op (for testing)

`-iree-codegen-extract-address-computation`link

Extract address computations from memory accesses

Extract the address computation from the instructions with memory accesses such that these memory accesses use only a base pointer.

For instance,

memref.load %base[%off0, ...]

Will be rewritten in:

%new_base = memref.subview %base[%off0,...][1,...][1,...]
memref.load %new_base[%c0,...]

`-iree-codegen-flatten-memref-subspan`link

Flatten n-D MemRef subspan ops to 1-D ones and fold byte offsets

Flattens n-D MemRef subspan ops to 1-D MemRef and folds the byte offsets on subspan ops to the consumer load/store ops, in preparation for lowering to backends that require linearized access.

`-iree-codegen-fold-affinemin-in-distributed-loops`link

Fold affine.min ops in distributed loops

`-iree-codegen-fold-tensor-extract-op`link

Fold tensor.extract operations prior to lowering to LLVM

After running the upstream TensorConstantBufferize pass, remove tensor_loads introduced for use only in tensor_extract. These can be folded to use a load of the created memref object that holds the constant values.

`-iree-codegen-fuse-tensor-pad-with-consumer`link

Fuse tensor.pad op into its consumer op's tiled loop nest

`-iree-codegen-generic-vectorization`link

Pass to perform vectorization on tensor/linalg ops.

Optionslink

-enable-vector-masking        : Enable vector masking during vectorization.
-use-configured-vector-sizes  : Control whether the op lowering config represents a set of masked vector sizes
-vectorize-copies             : Enable vectorization of linalg.copy operations.
-vectorize-padding            : Rewrite all tensor.pad ops in the function to vector form.
-vectorize-gather-accesses    : Enable vectorizaiton of operations that may generate vector.gather operations.
-vectorize-to-transfer-gather : Enables vectorization of gather-like operations that may generate iree_vector_ext.transfer_gather
-enable-cleanup               : Enable cleanups after vectorization. The patterns touch the structuregenerated from tiling so it affects later steps like bufferization and vector hoisting.
-generate-contract            : Enable conversion for reduction ops to contraction ops.
-fold-cast-into-contract      : Enable folding casting ops into vector.contract.
-max-vector-size              : Max vector size allowed to avoid creating large vectors.

`-iree-codegen-hoist-statically-bound-allocations`link

Hoist statically bound alloca ops to the entry block of functions

Optionslink

-vscale-min : Minimum possible value of vscale.
-vscale-max : Maximum possible value of vscale (a value of zero means unbounded).

`-iree-codegen-hoist-vector-extract-insert-slice`link

Hoist unrolled vector (extract, insert) pairs out of scf.for op

`-iree-codegen-instrument-memory-accesses`link

Instruments memory reads and writes for address tracking when dispatch instrumentation is enabled.

`-iree-codegen-iree-bufferize-constants`link

Convert from arith.constant on tensors to buffers

`-iree-codegen-iree-comprehensive-bufferize`link

Convert from to Linalg ops on tensors to buffers

Optionslink

-test-analysis-only : Only runs inplaceability analysis (for testing purposes only)
-print-conflicts    : Annotates IR with RaW conflicts. Requires test-analysis-only.

`-iree-codegen-link-tuning-specs`link

Link nested transform dialect tuning specs named sequences into a single entry point

Given a module with multiple nested tuning specs, introduce a new named sequence that includes all the other tuning spec entry points. The order of inclusion is the same as the order in which these nested tuning specs appear in the IR.

A tuning spec entry point is a transform.named_sequence op annotated with the iree_codegen.tuning_spec unit attribute. We require it to perform in-place op modification and not consume the handle.

`-iree-codegen-lower-executable-using-transform-dialect`link

Lower executables using the transform dialect recipe provided in the module.

`-iree-codegen-lower-ukernel-ops-to-calls`link

Lower micro-kernel wrapper ops into function calls

`-iree-codegen-lowering-config-interpreter`link

Pass to apply lowering config annotated strategies.

This pass runs the transform dialect interpreter and applies the named sequence transformation specified by lowering configs annotated on operations.

`-iree-codegen-materialize-device-encoding`link

Materialize the encoding for tensor as specified by the backend.

Optionslink

-test-cl-gpu-target : Flag used for lit-testing GPU target only. Not for general usage

`-iree-codegen-materialize-encoding-into-nop`link

Drop the encodings from tensor types with encodings.

`-iree-codegen-materialize-encoding-into-padding`link

Materialize #iree_encoding.pad_encoding_layout attributes.

Handles padding introduced by pad_encoding_layout encoding layouts, which requires iree_tensor_ext.dispatch.tensor.load/.store to be adjusted to account for padding regions. Materializes any other encoding layouts into nop.

`-iree-codegen-materialize-host-encoding`link

Materialize the encoding for tensor as specified by the backend.

`-iree-codegen-materialize-tuning-specs`link

Load tuning spec transform dialect libraries and encode them in the module

Links all available tuning spec transform dialect modules into a single tuning spec. Next, serializes this tuning spec to bytecode and attaches it as a module attribute. We do this so that the full tuning spec is always encoded in the program IR and can be checked with --mlir-print-ir-after-all (or equivalent). The alternative would be to add the tuning spec as a submodule in the compiled program, but this may result in the tuning spec being inadvertently visited by other passes that attempt to walk the outer module. Serialization makes the tuning specs opaque and prevents it from happening.

This attribute is expected to be short-lived and removed by iree-codegen-materialize-user-configs.

`-iree-codegen-materialize-user-configs`link

Sets the lowering configs and translation info from user configs

`-iree-codegen-math-transform`link

Apply math ops transformations: approximations, rewrites to other math ops, operand casts.

`-iree-codegen-memrefcopy-to-linalg`link

Convert memref.copy to linalg op

`-iree-codegen-normalize-loop-bounds`link

Normalize the loop bounds of scf.for and scf.forall

Normalizes the iteration range of scf.for and scf.forall loops to [0, ub) += 1.

Optionslink

-normalize-for    : Enable normalization for `scf.for` loops
-normalize-forall : Enable normalization for `scf.forall` loops

`-iree-codegen-optimize-tensor-insert-extract-slices`link

Optimize tensor.insert_slice/tensor.extract_slice operations (e.g. hoist and fold)

Optionslink

-fold-identity-slices : Enable folding of identity tensor.*_slice ops.

`-iree-codegen-optimize-vector-transfer`link

Run optimization transformations on vector transfer operations

Optionslink

-flatten            : Flatten the vector type of vector transfers where possible (contiguous row-major data).
-redundant-hoisting : Enables use of redundant vector transfer hoisting.

`-iree-codegen-pad-dynamic-alloc`link

Pass to pad dynamic alloc into static one.

`-iree-codegen-propagate-dispatch-size-bounds`link

Pass to annotate workitem and workgroup IDs with known bounds

`-iree-codegen-propagate-reshapes-by-expansion`link

Propagates reshaping operations by expansion.

Pass to propagate reshapes by expansion through all ops without explicit lowering configurations.

`-iree-codegen-reconcile-translation-info`link

Reconcile information (like workgroup_size, subgroup_size) across TranslationInfo set on each function in the dispatch and merge themand set them at the appropriate places in the surrounding HAL ops

`-iree-codegen-rematerialize-parallel-ops`link

Pass to rematerialize and merge parallel ops into consumers.

`-iree-codegen-remove-single-iteration-loop`link

Remove distributed loop with single iteration.

`-iree-codegen-replace-slow-min-max-ops`link

Replace slow min/max operations that propagate NaNs and distinguish between +/-0.0 with faster min/max operations that ignore them.

`-iree-codegen-resolve-swizzle-hints`link

Resolves iree_codegen.swizzle_hint ops

`-iree-codegen-split-full-partial-transfer`link

Split a vector.transfer operation into an in-bounds (i.e., no out-of-bounds masking) fastpath and a slowpath.

Optionslink

-split-transfers : Split vector transfers between slow (masked) and fast "
        "(unmasked) variants. Possible options are:\n"
          "\tnone [default]: keep unsplit vector.transfer and pay the price\n"
          "\tlinalg-copy: use linalg.fill + linalg.generic for the slow path\n"
          "\tvector-transfers: use extra small unmasked vector.transfers for"
          " the slow path\n

`-iree-codegen-strip-compilation-info`link

Remove all the the lowering configuration and translation info attributes.

`-iree-codegen-test-executable-preprocessing`link

Tests iree-hal-preprocess-executables-with behavior.

`-iree-codegen-test-partitionable-loops-interface`link

Test the PartitionableLoopsInterface

`-iree-codegen-tile-and-distribute-to-workgroups`link

Tile and distribute operations to workgroups

Optionslink

-max-workgroup-parallel-dims : Maximum number of dims to distribute workgroups across.
-distribution-method         : Pick the distribution method. See linalg::DistributionMethod for details

`-iree-codegen-tile-and-distribute-to-workgroups-using-forall-op`link

Tile and distribute operation to workgroups (using scf.forall op)

Optionslink

-transpose-workgroup : Swaps the workgroup mapping attribute x and y.Only swaps when the loop bounds are static.

`-iree-codegen-tile-large-tensors`link

Greedily tiles all linalg ops that are beyond a certain size

Optionslink

-max-vector-size : Maximum static size to tile to (i.e. all remaining ops will be smaller)

`-iree-codegen-type-propagation`link

Propogate the type of tensor to avoid load/stores of illegal bit widths

`-iree-codegen-unroll-annotated-loops`link

Unrolls all scf.for loops marked with unroll_loop

`-iree-codegen-vector-transfer-lowering`link

Pass to lower transfer ops to simpler ops like vector.load, vector.store, vector.broadcast, and a set of scf ops.

Optionslink

-enable-scalable-lowerings : Enables scalable vector specific transfer lowerings

`-iree-codegen-vectorize-memref-copy`link

Vectorizes memref copy operations.

`-iree-codegen-vectorize-tensor-pad`link

Vectorize a very specific form of tensor.pad with control flows

`-iree-codegen-verify-workgroup-distribution`link

Pass to verify proper distribution to workgroups.

Pass to verify that all writes to global memory are explicitly mapped to workgroups. This means that in cases where we use loops (scf.forall) to manage distribution to workgroups, we require that all ops with write side effects are contained within a workgroup distributed loop.

`-iree-convert-accgemm-to-gemm`link

Convert accumulating GEMMs to GEMMs post dispatch creation.

`-iree-convert-bf16-arith-to-f32`link

Convert bf16 arithmetic operations to f32

`-iree-convert-unsupported-float-arith`link

Convert arith operations on unsupported(source types) float types to the target type. Populates the source and target based on the target architecture.

`-iree-eliminate-empty-tensors`link

Eliminate tensor.empty ops to avoid buffer allocations

`-iree-loop-invariant-code-motion`link

Performs LICM on loops guaranteed to have >= 1 trip

This is a mirror of the upstream LICM pass that restricts to loops that are guaranteed to have at least one trip. This currently only supports loops that expose a lower and upper bound as the generic loop-like interface does not expose a way to query for trip count.

Additionally code motion of scf.forall ops with mappings is always unsafe and is explicitly disabled.

`-iree-transform-dialect-interpreter`link

Pass to apply transform dialect operations.

This pass runs the transform dialect interpreter and applies the named sequence transformation specified by the provided name (defaults to TransformDialect::kTransformEntryPointSymbolName (i.e. __transform_main)).

Optionslink

-entry-point       : Entry point of the pass pipeline.
-library-file-name : File path to load a library of transform dialect strategies from.

Common

-iree-codegen-add-fast-math-flagslink

-iree-codegen-block-dynamic-dimensionslink

-iree-codegen-bubble-up-ordinal-opslink

-iree-codegen-bufferize-copy-only-dispatcheslink

-iree-codegen-bufferize-dispatch-tensor-load-storelink

-iree-codegen-canonicalize-scf-forlink

-iree-codegen-cleanup-buffer-alloc-viewlink

-iree-codegen-concretize-pad-result-shapelink

-iree-codegen-config-tracking-canonicalizelink

Optionslink

-iree-codegen-convert-bf16-to-uint16-bufferslink

-iree-codegen-convert-hal-descriptor-type-to-gpu-address-spacelink

-iree-codegen-convert-to-destination-passing-stylelink

Optionslink

-iree-codegen-convolution-to-igemmlink

-iree-codegen-decompose-affine-opslink

-iree-codegen-decompose-boundary-pack-unpack-opslink

Optionslink

-iree-codegen-decompose-convolution-to-lower-dim-opslink

-iree-codegen-decompose-linalg-genericlink

-iree-codegen-decompose-memrefslink

-iree-codegen-decompose-pack-unpack-opslink

Optionslink

-iree-codegen-decompose-softmaxlink

Optionslink

-iree-codegen-drop-vector-unit-dimslink

-iree-codegen-emulate-narrow-typelink

-iree-codegen-erase-dead-alloc-and-storeslink

-iree-codegen-erase-hal-descriptor-type-from-memreflink

-iree-codegen-expand-strided-metadatalink

Optionslink

-iree-codegen-extract-address-computationlink

-iree-codegen-flatten-memref-subspanlink

-iree-codegen-fold-affinemin-in-distributed-loopslink

-iree-codegen-fold-tensor-extract-oplink

-iree-codegen-fuse-tensor-pad-with-consumerlink

-iree-codegen-generic-vectorizationlink

Optionslink

-iree-codegen-hoist-statically-bound-allocationslink

Optionslink

-iree-codegen-hoist-vector-extract-insert-slicelink

-iree-codegen-instrument-memory-accesseslink

-iree-codegen-iree-bufferize-constantslink

-iree-codegen-iree-comprehensive-bufferizelink

Optionslink

-iree-codegen-link-tuning-specslink

-iree-codegen-lower-executable-using-transform-dialectlink

-iree-codegen-lower-ukernel-ops-to-callslink

-iree-codegen-lowering-config-interpreterlink

-iree-codegen-materialize-device-encodinglink

Optionslink

-iree-codegen-materialize-encoding-into-noplink

-iree-codegen-materialize-encoding-into-paddinglink

-iree-codegen-materialize-host-encodinglink

-iree-codegen-materialize-tuning-specslink

-iree-codegen-materialize-user-configslink

-iree-codegen-math-transformlink

-iree-codegen-memrefcopy-to-linalglink

-iree-codegen-normalize-loop-boundslink

Optionslink

-iree-codegen-optimize-tensor-insert-extract-sliceslink

Optionslink

-iree-codegen-optimize-vector-transferlink

Optionslink

-iree-codegen-pad-dynamic-alloclink

-iree-codegen-propagate-dispatch-size-boundslink

-iree-codegen-propagate-reshapes-by-expansionlink

-iree-codegen-reconcile-translation-infolink

-iree-codegen-rematerialize-parallel-opslink

-iree-codegen-remove-single-iteration-looplink

-iree-codegen-replace-slow-min-max-opslink

-iree-codegen-resolve-swizzle-hintslink

-iree-codegen-split-full-partial-transferlink

Optionslink

-iree-codegen-strip-compilation-infolink

-iree-codegen-test-executable-preprocessinglink

-iree-codegen-test-partitionable-loops-interfacelink

-iree-codegen-tile-and-distribute-to-workgroupslink

Optionslink

`-iree-codegen-add-fast-math-flags`link

`-iree-codegen-block-dynamic-dimensions`link

`-iree-codegen-bubble-up-ordinal-ops`link

`-iree-codegen-bufferize-copy-only-dispatches`link

`-iree-codegen-bufferize-dispatch-tensor-load-store`link

`-iree-codegen-canonicalize-scf-for`link

`-iree-codegen-cleanup-buffer-alloc-view`link

`-iree-codegen-concretize-pad-result-shape`link

`-iree-codegen-config-tracking-canonicalize`link

`-iree-codegen-convert-bf16-to-uint16-buffers`link

`-iree-codegen-convert-hal-descriptor-type-to-gpu-address-space`link

`-iree-codegen-convert-to-destination-passing-style`link

`-iree-codegen-convolution-to-igemm`link

`-iree-codegen-decompose-affine-ops`link

`-iree-codegen-decompose-boundary-pack-unpack-ops`link

`-iree-codegen-decompose-convolution-to-lower-dim-ops`link

`-iree-codegen-decompose-linalg-generic`link

`-iree-codegen-decompose-memrefs`link

`-iree-codegen-decompose-pack-unpack-ops`link

`-iree-codegen-decompose-softmax`link

`-iree-codegen-drop-vector-unit-dims`link

`-iree-codegen-emulate-narrow-type`link

`-iree-codegen-erase-dead-alloc-and-stores`link

`-iree-codegen-erase-hal-descriptor-type-from-memref`link

`-iree-codegen-expand-strided-metadata`link

`-iree-codegen-extract-address-computation`link

`-iree-codegen-flatten-memref-subspan`link

`-iree-codegen-fold-affinemin-in-distributed-loops`link

`-iree-codegen-fold-tensor-extract-op`link

`-iree-codegen-fuse-tensor-pad-with-consumer`link

`-iree-codegen-generic-vectorization`link

`-iree-codegen-hoist-statically-bound-allocations`link

`-iree-codegen-hoist-vector-extract-insert-slice`link

`-iree-codegen-instrument-memory-accesses`link

`-iree-codegen-iree-bufferize-constants`link

`-iree-codegen-iree-comprehensive-bufferize`link

`-iree-codegen-link-tuning-specs`link

`-iree-codegen-lower-executable-using-transform-dialect`link

`-iree-codegen-lower-ukernel-ops-to-calls`link

`-iree-codegen-lowering-config-interpreter`link

`-iree-codegen-materialize-device-encoding`link

`-iree-codegen-materialize-encoding-into-nop`link

`-iree-codegen-materialize-encoding-into-padding`link

`-iree-codegen-materialize-host-encoding`link

`-iree-codegen-materialize-tuning-specs`link

`-iree-codegen-materialize-user-configs`link

`-iree-codegen-math-transform`link

`-iree-codegen-memrefcopy-to-linalg`link

`-iree-codegen-normalize-loop-bounds`link

`-iree-codegen-optimize-tensor-insert-extract-slices`link

`-iree-codegen-optimize-vector-transfer`link

`-iree-codegen-pad-dynamic-alloc`link

`-iree-codegen-propagate-dispatch-size-bounds`link

`-iree-codegen-propagate-reshapes-by-expansion`link

`-iree-codegen-reconcile-translation-info`link

`-iree-codegen-rematerialize-parallel-ops`link

`-iree-codegen-remove-single-iteration-loop`link

`-iree-codegen-replace-slow-min-max-ops`link

`-iree-codegen-resolve-swizzle-hints`link

`-iree-codegen-split-full-partial-transfer`link

`-iree-codegen-strip-compilation-info`link

`-iree-codegen-test-executable-preprocessing`link

`-iree-codegen-test-partitionable-loops-interface`link

`-iree-codegen-tile-and-distribute-to-workgroups`link

`-iree-codegen-tile-and-distribute-to-workgroups-using-forall-op`link

`-iree-codegen-tile-large-tensors`link

`-iree-codegen-type-propagation`link

`-iree-codegen-unroll-annotated-loops`link

`-iree-codegen-vector-transfer-lowering`link

`-iree-codegen-vectorize-memref-copy`link

`-iree-codegen-vectorize-tensor-pad`link

`-iree-codegen-verify-workgroup-distribution`link

`-iree-convert-accgemm-to-gemm`link

`-iree-convert-bf16-arith-to-f32`link

`-iree-convert-unsupported-float-arith`link

`-iree-eliminate-empty-tensors`link

`-iree-loop-invariant-code-motion`link

`-iree-transform-dialect-interpreter`link