Tutorials | ITW2021

Code-Based Cryptography

Sven Puchinger and Antonia Wachter-Zeh (Technical University of Munich, Germany)
Currently-used public-key cryptosystems based on number-theoretic problems are threatened by the possibility that large-scale quantum computers will be built in the near future; Shor’s algorithm is able to solve these seemingly hard problems in polynomial time on such computers. In this context, the National Institute of Standards and Technology (NIST) has initiated a standardization process for public key encryption (PKE) schemes, key encapsulation mechanisms (KEM), and signatures. The standardization process has now reached Round 3, where lattice-based and code-based cryptography play a prominent role. These code-based cryptosystems include most prominently the classical McEliece system. Their security is based on hard computational problems in coding theory, and encryption and decryption often correspond to en- and decoding of an error-correcting code.

The goal of this tutorial is to provide an overview of important existing code-based cryptosystems, their security, and current challenges in the area of code-based cryptosystems. We will thereby consider amongst others systems based on Goppa, Moderate-Density-Parity-Check (MDPC), and rank-metric codes.

Design and Decoding of Polar Codes with Large Kernels

Peter Trifonov and Grigorii Trofimiuk (ITMO University, Russia)

Polar codes with large kernels were recently shown to asymptotically achieve optimal scaling exponent. However, these codes were believed to be impractical due to lack of efficient decoding algorithms. In this tutorial we present a toolset for design and decoding of polar codes with large kernels. In particular, detailed treatment of recursive trellis and window kernel processing algorithms will be provided. These algorithms together with the successive cancellation list decoder allow one to obtain the same performance with lower complexity compared to polar codes with Arikan kernel. We present also techniques for construction of polar codes with large kernels, as well as kernels with good polarization properties and low decoding complexity.

Private Computation with Applications to Machine Learning

Anoosheh Heidarzadeh and Alex Sprintson (Texas A&M University, USA)

With the advent of cloud computing, many different types of our data such as medical records, financial transactions, and access logs are stored at remote servers across the globe. This data is being constantly used by a broad range of applications that perform computation remotely for a variety of purposes, including financial reports, statistical analysis, and business analytics. Many of these applications leverage Machine Learning (ML) algorithms for data analysis and decision making. In such applications, it is critical to hide the identity of the data items used in the computation from the cloud operators. We refer to this notion of privacy as data access privacy. The data access privacy provides guarantees to the data owners that the cloud provider does not know whether their data is used by various computing applications. We refer to the process of computation with private data access as private computation. The objective of private computation is to minimize the amount of data that needs to be downloaded from the remote provider(s) while guaranteeing data access privacy. Private computation generalizes classical Private Information Retrieval (PIR) problem by enabling the applications to perform remote computation on the data, and is generally more efficient than first retrieving data privately (using a PIR scheme) and then performing computation on the retrieved data.

Private computation is a fascinating topic with many breakthrough results reported over the recent years by the information and coding theory community. That said, many interesting problems in this field remain open, providing a fertile ground for future research. The goal of this tutorial is to provide a comprehensive survey of the broad range of private computation problems. We discuss both the state-of-the-art achievability schemes and converse proof techniques as well as open research problems. First, we will discuss the Private Linear Computation (PLC) problem and the Private Linear Transformation (PLT) problem whose goal is to compute a single linear combination and multiple linear combinations of a subset of data items, respectively. These problems are motivated by several ML applications including a linear transformation for dimensionality reduction and training linear models for regression or classification. We will discuss several variations of the PLC and PLT problems, which include different types of data access privacy, the presence of side information, the single-provider and multiple-provider scenarios, and the presence of colluding providers. We will also discuss the general private computation scenarios and the related open problems.

Modeling and Optimization of Latency in Erasure-Coded Storage Systems

Vaneet Aggarwal (Purdue University, USA), Tian Lan (George Washington University, USA) and Parimal Parag (Indian Institute of Science, Bangalore, India)

As consumers are increasingly engaged in social networking and E-commerce activities, businesses grow to rely on Big Data analytics for intelligence, and traditional IT infrastructure continues to migrate to the cloud and edge, these trends cause distributed data storage demand to rise at an unprecedented speed. Erasure coding has seen itself quickly emerged as a promising technique to reduce storage cost while providing similar reliability as replicated systems, widely adopted by companies like Facebook, Microsoft and Google. However, it also brings new challenges in characterizing and optimizing the access latency when erasure codes are used in distributed storage. The aim of this tutorial is to provide a review of recent progress (both theoretical and practical) on systems that employ erasure codes for distributed storage.

In this tutorial, we will first identify the key challenges and taxonomy of the research problems and then give an overview of different approaches that have been developed to quantify and model latency of erasure-coded storage. This includes recent work leveraging MDS-Reservation, Fork-Join, Probabilistic, and Delayed-Relaunch scheduling policies, as well as their applications to characterize access latency (e.g., mean, tail, asymptotic latency) of erasure-coded distributed storage systems. We bridge the gap between theory and practice, and discuss lessons learned from prototype implementation. In particular, we will discuss exemplary implementations of erasure-coded storage, illuminate key design degrees of freedom and tradeoffs. Open problems for future research will also be discussed.