placeholder

High throughput Fizz Buzz

Fizz Buzz is a common challenge given during interviews. The challenge goes something like this: Write a program that prints the numbers from 1 to n. If a number is divisible by 3, write Fizz inst...

Click to view the original at codegolf.stackexchange.com

Hasnain says:

This (the first comment which has an answer) is a work of art and should be in some hall of fame. I learnt a lot reading through (half of, before I gave up) this

I loved the comment where someone says this should be a thesis and the author responds that this was harder than their MS thesis.

“This program aims for the maximum possible single-threaded performance. In terms of the FizzBuzz calculation itself, it is intended to sustain a performance of 64 bytes of FizzBuzz per 4 clock cycles (and is future-proofed where possible to be able to run faster if the relevant processor bottleneck – L2 cache write speed – is ever removed). This is faster than a number of standard functions. In particular, it's faster than memcpy, which presents interesting challenges when it comes to I/O (if you try to output using write then the copies in write will take up almost all the runtime – replacing the I/O routine here with write causes the performance on my CPU to drop by a factor of 5). As such, I needed to use much more obscure system calls to keep I/O-related copies to a minimum (in particular, the generated FizzBuzz text is only sent to main memory if absolutely necessary; most of the time it's stored in the processor's L2 cache and piped into the target program from there, which is why reading it from a sibling CPU can boost performance – the physical connection to the L2 cache is shorter and higher bandwidth than it would be to a more distant CPU).”

Posted on 2021-10-29T06:47:36+0000