Leilani Battle, University of Maryland
Introduction to Data Visualization
Datasets are becoming increasingly large and complex, requiring intuitive ways to explore and interpret them quickly and efficiently. In this case, a picture is worth a thousand words: visualizations enable us to transform data into images that are easier to understand, reason about, and communicate with, compared to raw numbers and raw text. In these lectures, we will explore various methods for designing and implementing effective visualizations, based on principles from various fields influential to human-computer interaction and data science.
Dave Levin, University of Maryland
Measuring and Improving the Web’s Public Key Infrastructure
The importance of the web’s public key infrastructure (PKI) cannot be overstated: it allows us to determine with whom we are communicating online, serving as the basis for HTTPS. While much of the PKI is automated, surprisingly much of it is left to human decision: website administrators, browser manufacturers, certificate authorities, and content delivery networks must all perform certain tasks to secure the web—but do they all do what they must? In this lecture, we will investigate measurement techniques to understand where and why security breaks down in the PKI, and develop new methods to fix it.
Rupak Majumdar, MPI-SWS
Random Testing with Theoretical Guarantees
Random testing, or fuzzing, is an effective way to catch bugs in concurrent and distributed systems. This is surprising, as the space of executions is enormous and our intuition would suggest that bad behaviors would only be found by extremely unlikely coincidences. In this lecture series, we shall explore what sort of theoretical guarantees one can provide for different classes of random testing procedures. At a practical level, we shall look at different random testing techniques for concurrent programs. At a theoretical level, we shall connect the testing algorithms with insights from combinatorics and randomized algorithms. The goal will be, wherever possible, to provide theoretical guarantees on the performance of a specific testing algorithm.
I will describe two concrete scenarios. First, I will describe bugs in distributed systems caused by network partition faults. In many practical instances, these bugs occur due to two or three key nodes, such as leaders or replicas, not being able to communicate, or because the leading node finds itself in a block of the partition without quorum. In this case, I will show using the probabilistic method that a small set of randomly chosen tests will cover all “small partition” scenarios with high probability.
Second, I will consider bugs that arise due to unexpected schedules (interleavings) of concurrent events. Again, many bugs depend only on the relative ordering of a small number of events (the “bug depth” of the bug). In this case, I will show a testing algorithm that prioritizes low depth interleavings and a randomized testing algorithm that bounds the probability of sampling any behavior of bug depth k for a fixed k. The testing algorithm is based on combinatorial insights from the theory of partial orders, such as the notion of dimension and its generalization to d-hitting families as well as results on online chain partitioning.
Adrian Sampson, Cornell University
Designing Programming Languages for Heterogeneous Hardware
After five decades of exponential progress, general-purpose processors are no longer getting appreciably faster or more efficient each year. Ordinary CPUs are ceding ground to specialized architectures that do fewer things but do them better: GPUs, FPGAs, Google’s TPU, the dozens of fixed-function pipelines on any smartphone SoC, and so on. Today, each computational unit tends to have its own, one-off programming model and a set of ad hoc APIs for coordinating between units. The status quo presents an urgent challenge to programming language designers: we need languages that can address a collection of heterogeneous hardware resources as a unified integrated computational fabric.
This course is an introduction to programming language and type system design with a focus on modern heterogeneous hardware. We will cover both the theoretical, formal tools for defining language semantics and practical approaches to implementation. In the last phase of the course, we will use these tools to study recent research on programming languages for GPUs, FPGAs, and custom hardware accelerators.
Fred Schneider, Cornell University
Isolation and Flow of Information
Computer security is concerned with ensuring that accesses to services and information satisfy given policies. These lectures will provide an introduction to the area and then focus on two important classes of mechanisms that are typically involved: (i) means to enforce isolation, through time-multiplexing, space-multiplexing, and measured principals, and (ii) information flow, including reclassification and flow-based access control.
Adish Singla, MPI-SWS
Machine Teaching
Much research in machine learning has focused on designing algorithms for discovering knowledge from data and learning a target model. Machine teaching leverages this work to optimally teach machines and humans. For example, consider a situation where a “teacher” already knows the learning goal and wants to steer the “learner” towards this target as quickly as possible. What is the optimal set of demonstrations and lessons to provide? Similar ideas are used in a very different setting, namely, adversarial attacks on machine learning systems, where an attacker or hacking algorithm (the “teacher”) manipulates a machine learning system (the “learner”) by maliciously modifying the training data. This lecture will provide an overview of machine teaching and cover the following three aspects: (i) how machine teaching differs from machine learning, (ii) the problem space of machine teaching, and (iii) recent work on developing teaching algorithms for human learners.