Common/CPU
-iree-codegen-cpu-lower-to-ukernels
link
Separate out parts of the IR that lower to a micro-kernel
Optionslink
-skip-intermediate-roundings : Allow skipping intermediate roundings, e.g. in f16 ukernels internally doing f32 arithmetic.
-iree-codegen-cpu-prepare-ukernels
link
Rank reduce operations to fit existing ukernels requirements.For example, batch_mmt4d ops are decomposed to mmt4d ops