Tiled Hacker news on React Router

JIT: So you want to be faster than an interpreter on modern CPUs

129 points - last Sunday at 7:08 PM

Source

klipklop
yesterday at 10:21 PM
A shame operating systems like iOS/iPadOS do not allow JIT. iPad Pro's have such fast CPU's that you cant even use fully because of decisions like this.
stmw
yesterday at 10:15 PM
Good read. But a word of caution - the "JIT vs interpreter" comparisons often favor the interpreter when the JIT is inplemented as more-or-less simple inlining of the interpreter code. (Here called "copy-and-patch" but a decades-only approach). I've had fairly senior engineers try to convince me that this is true even for Java VMs. It's not in general, at least not with the right kind of JIT compiler design.
gr4vityWall
yesterday at 8:50 PM
That was a pretty interesting read.
My take is that you can get pretty far these days with a simple bytecode interpreter. Food for thought if your side project could benefit from a DSL!
neerajsi
today at 1:21 AM
From the previous article in the series, it looks like the biggest impediment to just using full llvm to compile the query is that they didn't find a good way to cache the results across invocations.
Sql server hekaton punted this problem in a seemingly effective way by requiring the client to use stored procedures to get full native compilation. Not sure though if they recompile if the table statistics indicate a different query plan is needed.
gary_0
today at 12:12 AM
> This is called branch prediction, it has been the source of many fun security issues...
No, that's speculative execution you just described. Branch prediction was implemented long before out-of-order CPUs were a thing, as you need branch prediction to make the most of pipelining (eg. fetching and decoding a new instruction while you're still executing the previous one--if you predict branches, you're more likely to keep the pipeline full).
scrash
today at 4:14 AM
The issues with branch prediction aren't really as much of a thing in modern interpreters, I can really recommend reading https://inria.hal.science/hal-01100647/document
imtringued
yesterday at 11:21 PM
I'm not really interested in building an interpreter, but the part about scalar out of order execution got me thinking. The opcode sequencing logic of an interpreter is inherently serial and an obvious bottleneck (step++; goto step->label; requires an add, then a fetch and then a jump, pretty ugly).
Why not do the same thing the CPU does and fetch N jump addresses at once?
Now the overhead is gone and you just need to figure out how to let the CPU fetch the chain of instructions that implement the opcodes.
You simply copy the interpreter N times, store N opcode jump addresses in N registers and each interpreter copy is hardcoded to access its own register during the computed goto.

JIT: So you want to be faster than an interpreter on modern CPUs

klipklop

ivankra

Pulcinella

pjscott

duped

bencyoung

saagarjha

Pulcinella

saagarjha

almostgotcaught

pjc50

stmw

hoten

chromatic

_cogg

ack_complete

stmw

gr4vityWall

neerajsi

gary_0

Arnavion

monocasa

scrash

titzer

imtringued

saagarjha