Intuitively I understand why it is a hard problem. Micro-optimizations have deterministic properties that are simple enough that optimality is all but certain. Macro-optimization heuristics create incidental minor pessimization effects that are buried below the noise floor by major optimization effects on average.
In the middle are optimizations that are too complex to guarantee effective optimization but too small in effect to offset any incidental pessimization. It is easy to inadvertently make things worse in these cases. Most cases of surprising performance variances I see fall into this gap.
It is also where the codegen from different compilers seems to disagree the most, which lends evidence to the idea that the “correct” codegen is far from obvious to a compiler.
Can u give some examples?
Perhaps you mean small granular choices which occur widely throughout the code?
KazeEmmanuar did a great job analyzing exactly this so we don't have to!
mov eax, 0x00043201
test cl, 8
setz al
shl cl, 2
shr eax, cl
and eax, 15
Something similar may be possible on ARM64, but I suspect it will definitely be more than 19 bytes ;-)for Zen5 rustc creates the following:
utf8_sequence_length_lookup:
shl edi, 24
mov ecx, 274945
not edi
lzcnt eax, edi
shl al, 2
shrx rax, rcx, rax
and al, 7
ret
https://rust.godbolt.org/z/hz1eKjnaGThere can be competition between optimizations. At a given moment, the preconditions for both optimization O1 and optimization O2 may hold, but if you then apply O1, the precondition for O2 may no longer hold and give versa.
In such cases, the best solution is to pick the better of O1 and O2.