Analyzing Data 180,000x Faster with Rust
How to hash, index, profile, multi-thread, and SIMD your way to incredible speeds.
Hasnain says:
“So how far did we come? The original Python program was going to take 2.9 years to complete at k=5. Our final Rust program only takes 8 minutes on the same dataset. That is roughly a 180,000x speedup. A summary of the key optimizations:
Use Rust’s compiler optimizations.
Hash numbers instead of strings.
Use (indexed) vectors instead of hashmaps.
Use bit-sets for efficient membership tests.
Use SIMD for efficient bit-sets.
Use multi-threading to split the work over many cores.
Use batching to avoid a bottleneck at work distribution.”