This is an embarrassing context to admit, but here goes.
Back when Parrot was a thing and the Perl 6 people were targeting it, I profiled the prelude of Perl 6 to optimize startup time and discovered two things:
- the first basic block of the prelude was thousands of instructions long (not surprising)
- the compiler had to allocate thousands of registers because the prelude instructions used virtual registers
The prelude emitted two instructions, one right after another: load a named symbol from a library, then make it available. I forget all of the details, but each of those instructions either one string register and one PMC register. Because register allocation used the dominance frontier method, the size of the basic block and total number of all symbolic registers dominated the algorithm.
I suggested a change to the prelude emitter to reuse actual registers and avoid virtual registers and compilation sped up quite a bit.