Lecture 7: cpu affinity, function call overheads, and going parallel
September 18, 2018
We continue the "closest" running example, identifying several more performance bottlenecks. Finally, we make a first attempt at parallelizing the program, which after fixing a race condition brought runtime from 0.7 seconds single-threaded to 2.5 minutes running on 64 hardware threads.