Summing ASCII encoded integers on Haswell at almost the speed of memcpy
Summing ASCII encoded integers on Haswell at almost the speed of memcpy Jul 12, 2024 “Print the sum of 50 million ASCII-encoded integers uniformly sampled from [0, 2³¹−1], separated by a single new line and sent to standard input.” On the surface, a trivial problem. But what if you wanted to...
Hasnain says:
This approach is nuts. Learnt a lot about algorithm design and also about high performance micro-optimizations
"The program is over-fit to the input spec and the particular host it runs on (Intel Xeon E3-1271 v3 @ 3.60GHz, 512MB RAM, Ubuntu 20.04). Given the CPU, it only uses SIMD instructions up to AVX2, no AVX512. It assumes the input is exactly according to the spec and hence does zero error handling and even on such input will only produce correct results with probability < 1, though very close to 1, depending on the parameters you choose."
Posted on 2024-07-13T22:47:19+0000