The performance of a computer program is affected by many factors beyond the familiar big-O complexity of the algorithm. In this class, we cover some of those factors, and the tools and techniques you need in order to detect, diagnose and fix performance bugs in explicitly and implicitly concurrent programs.
The picture above shows the internals of a modern computer chip, an Intel Xeon 7500. If you look close, you can see 16 almost identical core tiles, surrounding the large, shared last-level cache (LLC) (the image is also tiled horizontally, for layout purposes). 16 cores naturally introduces 16-way concurrency, but as you’ll find out in this class, a modern CPU is much more concurrent than that. Only by understanding how your program interacts with this extremely concurrent machine can you take maximum advantage of your available computing resources.