Tiled Hacker news on React Router

How much slower is random access, really?

83 points - last Monday at 2:49 PM

Source

andersa
yesterday at 11:01 PM
Note this is not true random access in the manner it occurs in most programs. By having a contiguous array of indices to look at, that array can be prefetched as it goes, and speculative execution will take care of loading many upcoming indices of the target array in parallel.
A more interesting example might be if each slot in the target array has the next index to go to in addition to the value, then you will introduce a dependency chain preventing this from happening.
archi42
today at 10:13 AM
What surprises me is the 24 GB of DDR4 DRAM on a dual channel memory controller? AFAIK there are only 8 GB or 16 GB modules, no 12 GB modules. At least I can only find 12 GB DDR5 modules listed, but not DDR4.
This means: The system likely uses 3x 8 GB modules. As a result, one channel has two modules with 16 GB total, while the other channel has only a single 8 GB module.
Not sure how big this impact is with the given memory access patterns and assuming [mostly] exclusive single-threaded access. It's just something I noted, and could be a source of unexpected artifacts.
porcoda
yesterday at 11:56 PM
The RandomAccess (or GUPS) benchmark (see: https://ieeexplore.ieee.org/document/4100365) was looking at measuring machines on this kind of workload. In high performance computing this was important for graph calculations and was one of the things the Cray (formerly Tera) MTA machine was particularly good at. I suppose this benchmark wouldn’t be very widely known outside HPC circles.
today at 9:15 AM
JonChesterfield
today at 5:32 AM
Random access is catastrophically slower because of the successive cache misses when the prefetcher fails to guess what you're doing.
One hint in the same article that random access is not cheap, in contrast with the conclusion, was noticing that the shuffle was unacceptably slow on large data sets.
Still, good to see peformance measurements, especially where the curves look roughly like you'd hope them to.
anonymousDan
today at 9:37 AM
Would a better benchmark not just use some kind of pseudo randomly generated sequence to avoid having two arrays?
Animats
today at 5:26 AM
If, of course, you have the CPU and its caches all to yourself.
b0a04gl
today at 6:12 AM
longtime back ran a model scoring job where each row did feature lookups across diff memory regions. access pattern changed every run since it depended on prev feature values. avg latency stayed fine but total run time kept jumping sometimes 6s, sometimes 8s on same input. perf counters looked flat but thread scheduling kept shifting. reordered the lookups to cut down mem jumps. after that threads aligned better, scheduler ran smoother, run time got stable across batches. gain wasn’t from faster mem, imo i observed it's not the memory is slower but rather how less predictable it's
Adhyyan1252
yesterday at 10:33 PM
Love this analysis! Was expecting random to be much slower. 4x is not bad at all
yesterday at 11:37 PM
o11c
today at 5:09 AM
Hm, no discussion of cache line size, page size, or the limits of cache associativity?
Surac
today at 6:10 AM
is this not just a memory test for the burst capacitiy and access strategy of the dram controller?
FpUser
today at 2:17 AM
I did another type of experiment which evaluates benefits of branch prediction on AMD 9950X on contiguous array with 1,000,000 elements. Calculated sum adding element if it is bigger than 125 (50% of 256). Difference between random and sorted was 10 times. I guess branch prediction plays a huge role as well.
forrestthewoods
yesterday at 11:13 PM
Here’s an older blog post of mine on roughly the same topic:
https://www.forrestthewoods.com/blog/memory-bandwidth-napkin...
I’m not sure I agree with the data presentation format. “time per element” doesn’t seem like the right metric.
today at 12:58 AM

How much slower is random access, really?

andersa

wtallis

less_less

gpderetta

deepsun

wtallis

jltsiren

eru

jltsiren

eru

hansvm

jiggawatts

kortilla

cb321

gpderetta

cb321

delusional

phi-go

shakna

delusional

wtallis

archi42

porcoda

jandrewrogers

porcoda

jandrewrogers

JonChesterfield

anonymousDan

Animats

b0a04gl

Adhyyan1252

Nevermark

o11c

Surac

FpUser

Andys

bee_rider

forrestthewoods

klank

forrestthewoods

alain94040

delusional

forrestthewoods

petermcneeley