Schedule

Sort by:

Tuesday

Aug 28

2018

Lecture 1: introduction

Introduction to the course.

Video

Reading: class notes chapter 2 and 3

Thursday

Aug 30

2018

Lecture 2: basic profiling tools

Brief overview of common tools for profiling software. Unfortunately, this recording was lost due to operator error.

Tuesday

Sep 04

2018

Lecture 3: tools and intro to single-threaded performance

Revisited some tools, and got started on hidden concurrency in single-threaded performance.

Video

Thursday

Sep 06

2018

Lecture 4: the memory hierarchy and the cost of a MOV

We review the varying cost of a single MOV instruction, depending on the operand type and the state of the memory hierarchy.

Video

Tuesday

Sep 11

2018

Lecture 5: speculative execution and more profiling

(cropped lecture due to crash)

Video

Thursday

Sep 13

2018

Lecture 6: speculation and performance

We discuss the full single-threaded processing pipeline, then study the performance impact of speculation on one small program. Finally, we use mmap() to improve the throughput of our running example “closest”, by caching…

Video

Tuesday

Sep 18

2018

Lecture 7: cpu affinity, function call overheads, and going parallel

We continue the “closest” running example, identifying several more performance bottlenecks. Finally, we make a first attempt at parallelizing the program, which after fixing a race condition brought runtime from 0.7 seconds single-threaded…

Video

Thursday

Sep 20

2018

Lecture 8: compiler optimizations, cache associativity, and pthreads

We review hw2 in some detail, then start looking at concurrent programming with pthreads.

Video

Tuesday

Sep 25

2018

Lecture 9: pthreads semantics and synchronization

We review task-parallel programming with pthreads, and the basic synchronization primitives offered: locks, barriers and condition variables.

Video

Thursday

Sep 27

2018

Lecture 10: the memory hierarchy under multi-threading

A first look at how the memory hierarchy, primarily caches, store buffers and RAM, support out-of-order execution at the same time as concurrent access to memory by threads running on different CPUs, while providing…

Video

Tuesday

Oct 02

2018

Lecture 11: false sharing and memory consistency

We experiment with shared memory access, exploring both performance and correctness implications.

Video

Thursday

Oct 04

2018

Lecture 12: cache coherence

further discussion and experimentation with cache coherence and memory consistency.

Video

Tuesday

Oct 09

2018

Week 7: Review and midterm exam

Date is tentative. Exam will include all topics covered up to the exam date.

Tuesday

Oct 09

2018

Lecture 13: Helping your Compiler, and OpenMP

We discuss and experiment with how describing a called function precisely can result in improved performance. Then, we get started on OpenMP.

Video

Thursday

Oct 11

2018

Lecture 14: midterm review

Brief discussion of OpenMP tasks, followed by a review of the materials covered in the upcoming midterm.

Video

Tuesday

Oct 23

2018

Lecture 15: synchronization design for concurrent data structures

Discussion of lock types and synchronization design.

Video

Thursday

Oct 25

2018

Lecture 16: lock-free datastructure design

We discuss Read-Copy-Update (RCU) and lock-free data structure design.

Video

Tuesday

Oct 30

2018

Lecture 17: more on non-blocking data structures and alternatives

We discuss some challenges in implementing the delete operation in a fully non-blocking linked list, then discuss some other alternatives to locking.

Video

Thursday

Nov 01

2018

Lecture 18: delegation with ffwd

After a brief look at hw5, we delve into some detail on ffwd, a high-throughput delegation design we presented at SOSP 2017.

Video

Tuesday

Nov 06

2018

Lecture 19: Intro to SIMD and vectorized code

Overview of the vectorized instruction set.

Video

Thursday

Nov 08

2018

Lecture 20: Hands-on with vectorized instructions

Hands-on examples with vectorized instruction programming.

Video

Tuesday

Nov 13

2018

Lecture 21: SIMD add and min examples

The add didn’t really work out, but the min is a good starting point for homework 6.

Video

Thursday

Nov 15

2018

Lecture 22: hw6, and start of GPU

We discuss the 6th homework, and start discussing general-purpose GPU programming.

Video

Thursday

Nov 22

2018

Lecture 23: OpenCL semantics

We discuss the OpenCL execution model, and run several experiments to understand its semantics better. Watch out: the interpretation of an experiment with __local memory is incorrect – addressed in Lecture 24.

Video

Tuesday

Nov 27

2018

Lecture 24: OpenCL performance

We review a misunderstanding from the previous lecture, then design a small OpenCL program to experiment with performance trade-offs in GPU programming.

Video

Thursday

Nov 29

2018

Lecture 25: GPGPU with OpenMP targets, and performance diagnosis

We discuss and experiment with the new OpenMP target facilities, then revisit diagnosing system and program performance.

Video

Tuesday

Dec 04

2018

Lecture 26: advanced profiling and instrumentation techniques

We review several advanced tools for diagnosing program performance and behavior, including binary instrumentation using Intel’s PIN tool.

Video

Thursday

Dec 06

2018

Lecture 27: review

In this lecture, we try to briefly review the content of the class, in preparation for the final exam.

Video

Friday

Dec 14

2018

Final Exam

Final exam scheduled for 10:30-12:30 pm, on Dec 14, 2018.

CS 491: High-Performance Concurrent Computing

Computer Science College Of Engineering

Schedule

Lecture 1: introduction

Reading: class notes chapter 2 and 3

Lecture 2: basic profiling tools

Lecture 3: tools and intro to single-threaded performance

Lecture 4: the memory hierarchy and the cost of a MOV

Lecture 5: speculative execution and more profiling

Lecture 6: speculation and performance

Lecture 7: cpu affinity, function call overheads, and going parallel

Lecture 8: compiler optimizations, cache associativity, and pthreads

Lecture 9: pthreads semantics and synchronization

Lecture 10: the memory hierarchy under multi-threading

Lecture 11: false sharing and memory consistency

Lecture 12: cache coherence

Week 7: Review and midterm exam

Lecture 13: Helping your Compiler, and OpenMP

Lecture 14: midterm review

Lecture 15: synchronization design for concurrent data structures

Lecture 16: lock-free datastructure design

Lecture 17: more on non-blocking data structures and alternatives

Lecture 18: delegation with ffwd

Lecture 19: Intro to SIMD and vectorized code

Lecture 20: Hands-on with vectorized instructions

Lecture 21: SIMD add and min examples

Lecture 22: hw6, and start of GPU

Lecture 23: OpenCL semantics

Lecture 24: OpenCL performance

Lecture 25: GPGPU with OpenMP targets, and performance diagnosis

Lecture 26: advanced profiling and instrumentation techniques

Lecture 27: review

Final Exam