The fifth London Symposium of Information Theory (LSIT) will be held on 30 and 31 May 2019 at King's College London. The symposium continues the tradition of the historical first four editions of LSIT, which were held in London in 1950, 1952, 1955, and 1960 and were attended by the likes of Claude Shannon and Alan Turing.

This will be a two-day event, with the first day devoted to exploring the intersection of machine learning and information theory and the second day dedicated to novel applications of information theory. We will have two keynote presentations per day, as well as two invited sessions and a poster session.

For more information see the event programme.**How to be involved:**

Authors interested in presenting a poster at LSIT should send a title, list of authors, and a brief description of the problem and main results to lsitposter@gmail.com by **31 December 2018**. Decisions will be sent by 31 January 2019.

To register as an attendee please visit our EventBrite page.

**Schedule: **

**May 30, 2019**

**09.15–09.30 Welcome**

**09.30–10.30 Keynote Talk **

Michael Gastpar (EPFL)*Title*: Information Measures, Learning and Generalization*Abstract*: Abstract: There has been growing interest in the connections between information measures and the generalization error of learning algorithms. In this talk, we survey recent progress of bounding the generalization error via various information measures, including mutual information, lautum information, and maximal leakage. We also outline a few applications of this theory, such as bounds on the sample complexity of algorithms. (Joint work with Amedeo Roberto Esposito and Ibrahim Issa.)

**10.45–12.50 Invited Session on Information Theory and Data-Driven Methods **

Bernhard Geiger (Graz University of Technology)*Title*: How (Not) To Train Your Neural Network Using the Information Bottleneck Principle*Abstract*: The information bottleneck theory of neural networks has received a lot of attention in both machine learning and information theory. At the heart of this theory is the assumption that a good classifier creates representations that are minimal sufficient statistics, i.e., they share only as much mutual information with the input features that is necessary to correctly identify the class label. Indeed, it has been claimed that information-theoretic compression is a possible cause of generalization performance and a consequence of learning the weights using stochastic gradient descent. On the one hand, the claims set forth by this theory have been heavily disputed based on conflicting empirical evidence: There exist classes of invertible neural networks with state-of-the-art generalization performance; the compression phase also appears in full batch learning; information-theoretic compression is an artifact of using a saturating activation function. On the other hand, several authors report that training neural networks using a cost function derived from the information bottleneck principle leads to representations that have desirable properties and yields improved operational capabilities, such as generalization performance and adversarial robustness.

In this work we provide yet another perspective on the information bottleneck theory of neural networks. With a focus on training deterministic (i.e., non-Bayesian) neural networks, we show that the information bottleneck framework suffers from two important shortcomings: First, for continuously distributed input features, the information-theoretic compression term is infinite for almost every choice of network weights, making this term problematic during optimization. The second and more important issue is that the information bottleneck functional is invariant under bijective transforms of the representation. Optimizing a neural network w.r.t. this functional thus yields representations that are informative about the class label, but that may still fail to satisfy desirable properties, such as allowing to use simple decision functions or being robust against small perturbations of the input feature. We show that there exist remedies for these shortcomings: Including a decision rule or softmax layer, making the network stochastic by adding noise, or replacing the terms in the information bottleneck functional by more well-behaved quantities. We conclude by showing that the successes reported about training neural networks using the information bottleneck framework can be attributed to exactly these remedies.

Jonathan Scarlett (National University of Singapore)*Title*: Converse Bounds for Gaussian Process Bandit Optimization*Abstract*: Gaussian processes provide a versatile framework for sequential black-box function optimization via noisy point queries, with applications including automatic hyperparameter tuning and robotics. While theoretical performance guarantees (i.e., upper bounds on regret) have long been known for various practical algorithms, algorithm-independent converse bounds (i.e., lower bounds on regret) previously remained largely lacking. In this talk, we overview recent developments in establishing such converse bounds using information theory, via reductions to multiple hypothesis testing and applications of Le Cam's method or Fano's inequality. We identify both cases where the bounds have near-optimal scaling laws, and cases where significant gaps remain.

Pablo Piantanida (CentraleSupélec)*Title*: Deep Directed Information-Based Learning for Privacy-Preserving Smart Meter Data Release*Abstract*: The explosion of data collection has raised serious privacy concerns in users due to the possibility that sharing data may also reveal sensitive information. The main goal of a privacy-preserving mechanism is to prevent a malicious third party from inferring sensitive information while keeping the shared data useful. This talk will explore this problem in the context of time series data and smart meters (SMs) power consumption measurements in particular. Although Mutual Information (MI) between private and released variables has been used as a common information-theoretic privacy measure, it fails to capture the causal time dependencies present in the power consumption time series data. To overcome this limitation, we introduce the Directed Information (DI) as a more meaningful measure of privacy in the considered setting, and propose a novel loss function. The optimization is then performed using an adversarial framework where two Recurrent Neural Networks (RNNs), referred to as the releaser and the attacker, are trained with opposite goals. Our empirical studies on real-world data sets from SMs measurements in the worst case scenario where an attacker has access to all the training data set, validate the proposed method and show the existing trade-offs between privacy and utility.

Changho Suh (KAIST)*Title*: Matrix completion with graph side information*Abstract*: Recommender systems provide users with appropriate items based on their revealed preference such as ratings and like/dislikes. Due to their wide applicability, the systems have received significant attention in machine learning and data mining societies. The great success of social media during the last decade has posed the possibility of further improving the recommender system with the help of additional information which are available in applications. One such additional information can be social network like Facebook’s friendship graph. There has been a proliferation of recommendation algorithms which exploit such graph side information to improve the performance significantly. However, we are lacking in a theoretical understanding on the maximal gain that one can achieve due to such side information.

In this talk, I will present our recent progress towards a theoretical understanding on the role of such side information in the context of matrix-completion-based recommender systems. As an initial effort, we consider a simple setting in which there are n users and m items, and ratings are binary (like/dislikes). As side information, we assume that only one-sided graph (e.g., social graph) is given which respects a stochastic block model. We also assume that there are two clusters, each of which has the same ratings over items. Under this setting, we characterize the fundamental limit on the number of revealed ratings (called the minimal sample complexity) required for reliably estimating missing entries. As a consequence of this result, we show that the gain due to side information can be order-of-magnitude larger than the one with such side information for many interested scenarios in which the size of a rating matrix is highly asymmetric. We also develop a computationally efficient algorithm that can achieve the minimal sample complexity.

This is joint work with Mr. Kwangjun Ahn (my former student and is expected to join MIT EECS as a PhD student), Dr. Kangwook Lee (my postdoc and is expected to join UW Madison ECE as an assistant professor), and Hyunseung Cha (a collaborator in KaKaoBrain which funded this project).

Camilla Hollanti (Aalto University)*Title*: In the quest for the capacity of private information retrieval from coded and colluding servers*Abstract*: Private information retrieval (PIR) addresses the question of how to retrieve data items from a database or cloud without disclosing information about the identity of the data items retrieved. The area has received renewed attention in the context of PIR from coded storage. Here, the files are distributed over the servers according to a storage code instead of mere replication. Alongside with the basic principles of PIR, we will review recent capacity results and derive the coded and colluding PIR capacity in the case of strongly linear schemes.

The talk is based on joint work with Lukas Holzbaur and Ragnar Freij-Hollanti: arXiv:1903.12552.

**14.10–15.10 Keynote Talk **

Phil Schniter (Ohio State University)*Title*: Recent Advances in Approximate Message Passing*Abstract*: The approximate message passing (AMP) algorithm proposed by Donoho, Maleki, and Montanari is a computationally efficient iterative approach to sparse reconstruction and related problems. AMP has a remarkable property: for large i.i.d. sub-Gaussian measurement matrices, its per-iteration behavior is rigorously characterized by a scalar state-evolution whose fixed points, when unique, are Bayes optimal. However, AMP is fragile in that even small deviations from the i.i.d. sub-Gaussian model can cause the algorithm to diverge. In this talk, I will describe the "vector AMP" (VAMP) algorithm, which also has a rigorous scalar state-evolution. VAMP's state-evolution, however, holds under a much broader class of large random measurement matrices: those that are right-rotationally invariant. I will also describe a non-parametric version of VAMP that can cope with an unknown prior and/or likelihood, plug-and-play extensions of VAMP, and connections between VAMP and interpretable deep neural networks.

**15.20–17.00 Invited Session on Statistical Signal Processing **

Organised by Ramji Venkataramanan (University of Cambridge)

Po-Ling Loh (University of Wisconsin-Madison)*Title*: Teaching and learning in uncertainty*Abstract*: We investigate a simple model for social learning with two characters: a teacher and a student. The teacher's goal is to teach the student the state of the world Theta. However, the teacher herself is not certain about Theta and needs to simultaneously learn it and teach it. We examine several natural strategies the teacher may employ to make the student learn as fast as possible when the state of the world is {0,1} and transmissions occur through a binary channel. Our primary technical contribution is analyzing the exact learning rates for these strategies by studying the large deviation properties of the sign of a transient random walk on the integer grid. We also discuss a Gaussian variant of this problem and contrast the conclusions reached in the binary vs. Gaussian settings. This is joint work with Varun Jog.

Cynthia Rush (Columbia University)*Title*: SLOPE is better than LASSO*Abstract*: This talk considers the high-dimensional linear regression problem where one wishes to recover an unknown signal from noisy, linear observations in a regime of linear sparsity, that is, the fraction of non-zero entries in the signal tends to a constant. We demonstrate that solving this problem with a generalized version of LASSO -- the sorted L1 penalized regression, or SLOPE -- produces solutions with the same, if not better, properties as LASSO. Our analysis uses tools from approximate message passing theory, which allow for asymptotically exact statistical characterization of the SLOPE solution. This is joint work with Zhiqi Bu, Jason Klusowski, and Weijie Su.

Jean Barbier (EPFL)*Title*: Mutual Information for the Dense Stochastic Block Model : A direct Proof*Abstract*: We rigorously derive a single-letter variational expression for the mutual information of the asymmetric two-groups stochastic block model in the dense graph regime. Existing proofs in the literature are indirect, as they involve mapping the model to a rank-one matrix estimation problem whose mutual information is then determined by a combination of methods (e.g., interpolation, cavity, algorithmic, spatial coupling). In this contribution we provide a self-contained direct method using only the recently introduced adaptive interpolation method.

Galen Reeves (Duke University)*Title*: The geometry of community detection via the MMSE matrix*Abstract*: The information-theoretic limits of community detection have been studied extensively for network models with high levels of symmetry or homogeneity. In this talk, I will present a new approach that applies to a broader class of network models that allow for variability in the sizes and behaviors of the different communities, and thus better reflect the behaviors observed in real-world networks. The results show that the ability to detect communities can be described succinctly in terms of a matrix of effective signal-to-noise ratios that provides a geometrical representation of the relationships between the different communities. This characterization follows from a matrix version of the I-MMSE relationship and generalizes the concept of an effective scalar signal-to-noise ratio introduced in previous work.

**May 31, 2019**

**09.30–10.30 Keynote Talk**Kannan Ramchandran (Berkeley)

*Title*: Beyond communications: codes offer a CLEAR advantage (

**C**omputing,

**LEA**rning, and

**R**ecovery)

*Abstract*: Seven decades after Claude Shannon's groundbreaking work, codes are now an indispensable part of modern communications and storage systems. But do they have a role in the modern age of big data deluge? Can codes help address the challenge of scale in speeding up computation, learning and inference by exploiting the underlying structure such as sparsity? In this talk, we will explore how coding theory can offer an exciting playground for diverse problem settings, illustrated through the lens of sparse-graph codes. We will show how this forms the core of a unified architecture featuring a divide-and-conquer strategy based on fast peeling-based decoding. We will highlight our approach to computational tasks such as massive-scale Fourier and Walsh decompositions, sparse polynomial learning, compressed sensing, group testing, and learning mixtures of sparse linear regressions. Applications include fast acquisition for MRI, hyper-graph sketching, fast neighbor discovery in the IoT, and spectrum sensing for cognitive radio. We will also survey how codes can speed up distributed machine learning in today’s cloud computing platforms by rendering them robust to system noise in the form of straggling compute nodes, with a particular focus on server-less systems such as Amazon's AWS Lambda.

**10.45–12.50 Invited Session on Information Theory and Frontiers in Communications**Ayfer Özgür (Stanford)

*Title*: Distributed Learning under Communication Constraints

*Abstract*: We develop an information-theoretic framework for learning high-dimensional distributions and their parameters under communication constraints. We use this framework to prove tight minimax bounds for distributed estimation of common statistical models (such as the product Bernoulli model, multinomial model, Gaussian location model). We then apply our results to distributed training of deep neural networks and provide an experimental evaluation of our algorithms.

Mark Wilde (Louisiana State University)*Title*: A tale of quantum data processing and recovery*Abstract*: I will tell the story of how a recent physically meaningful refinement of the quantum data processing inequality for quantum relative entropy came about. Starting from the 2012 conjecture of Andreas Winter and Ke Li, and continuing with the 2014 seminal work of Omar Fawzi and Renato Renner on sharpening the strong subadditivity of quantum entropy, the theorem on improving quantum data processing for relative entropy came about from an international collaboration in summer 2015, during which I traveled from Delft to Zurich to Barcelona to Urbana-Champaign and finally back home to Baton Rouge. The main accomplishment can be summarized informally as follows: if the decrease in quantum relative entropy between two quantum states after a quantum physical evolution is relatively small, then it is possible to perform a recovery operation, such that one can perfectly recover one state while approximately recovering the other. This can be interpreted as quantifying how well one can reverse a quantum physical evolution. The theorem has a number of applications in quantum information theory, which have to do with providing physically meaningful improvements to many known entropy inequalities.

Michèle Wigger (Telecom ParisTech)

*Title*: Networks with Mixed Delay-Constraints*Abstract*: The talk considers networks with mixed-delay constraints, where some data streams have to be sent over the network without further delay (these are called ``fast” data), whereas the transmission of the other data streams (called ``slow” data) can profit from cooperation between transmitters or between receivers. We consider different network models and investigate how the overall performance (sum-rate) relies on the rate of the ``fast” data. In fact, in certain networks and configurations, the sum-rate degrades whenever the rate of the ``fast" data is non-zero. Here, imposing a stringent delay constraint on some data streams comes at the cost of decreasing the overall performance of the system. We show that for other networks, and in particular when both the transmitters and the receivers can cooperate, the overall performance of the system can be kept constant even when ``fast” data are transmitted at their highest reliable rates. To derive these results, in the talk we present both coding schemes and information-theoretic converses. (The talk is based on joint work with Homa Nikbakht and Shlomo Shamai.)

Aaron Wagner (Cornell University)

*Title*: How to use Feedback to Improve Fixed-Rate Coding Performance Over DMCs*Abstract*: We introduce a novel mechanism, called timid/bold coding, by which feedback can be used to improve coding performance over some discrete memoryless channels (DMCs). For a certain class of DMCs, called compound dispersion channels, we show that timid/bold coding allows for an improved second-order coding rate compared with coding without feedback. For DMCs that are not compound dispersion, we show that feedback does not improve the second-order coding rate. Thus we completely determine the class of DMCs for which feedback improves the second-order coding rate. An upper bound on the second-order coding is provided for compound-dispersion DMCs. The main results are obtained

by relating the analysis of feedback codes to certain controlled diffusions.

Ofer Shayevitz (Tel Aviv University)

*Title* : The minimax quadratic risk of distributed correlation estimation*Abstract*: Alice and Bob observe infinitely many i.i.d. copies of unit-variance (Gaussian or binary) random variables, with unknown correlation. By interactively exchanging k bits, Bob wants to produce an estimate of the correlation. We show that the minimax quadratic risk associated with this task is asymptotically 1/(2k \ln 2), and can be achieved by a one-way (non-interactive) protocol. We also prove that the \Omega(1/k) lower bound holds locally, i.e., even when the correlation is restricted to any small sub-interval of [-1,1]. Our proof relies on symmetric strong data-processing inequalities and various tensorization techniques from information-theoretic interactive common-randomness extraction. (Joint work with Uri Hadar, Jingbo Liu and Yury Polyanskiy.)

**14.10–15.10 Keynote Talk **Yiannis Kontoyiannis (University of Cambridge)

*Title*: Bayesian inference for discrete time series using context trees*Abstract*: Discrete time series, often consisting of streaming "big" data sets, are becoming very common in modern biological, medical, communications, and financial applications. Consequently, there is increasingly pressing demand for methods that can perform inferential and learning tasks in a real-time, sequential fashion. We will describe how the family of context tree sources, a class of variable-memory models studied extensively in information theory, can be used for an extremely broad array of statistical and learning tasks across a variety of applications. We will show how the context tree weighting (CTW) and related algorithms, developed by Willems, Shtarkov, Tjalkens and their collaborators since the early 1990s, can be generalized and extended to provide methodological and algorithmic tools for effective Bayesian inference on discrete time series. After describing the new, general Bayesian setting, we will present both deterministic and simulation-based procedures for several tasks, including prediction, model selection, anomaly and change-point detection, estimation, and content recognition. Results illustrating the performance of the resulting methods on a variety of different data sets will be presented.

** **

**15.20–17.00 Invited Session on Post-Quantum Cryptography **Organised by Cong Ling (Imperial College London)

Shun Watanabe (Tokyo University of Agriculture and Technology)

*Title*: Change of Measure Argument for Strong Converse and Application to Parallel Repetition Theorem*Abstract*: The strong converse for a coding theorem shows that the optimal asymptotic rate possible with vanishing error cannot be improved by allowing a fixed error. The strong converse for multi-terminal problems typically requires some sophisticated techniques such as the blowing-up lemma, and it has been regarded as a difficult problem. In this talk, we describe a recently introduced method for proving strong converse; it involves a change of measure after replacing hard information constraints with soft ones. We also apply the method to provide a new proof of the nonsignaling multiprover parallel repetition theorem, which underlies many hardness amplification results in computer science and cryptography. (This is a joint work with Himanshu Tyagi.)

Qian Guo (University of Bergen)

*Title*: Decryption Failure Attacks on Post-Quantum Cryptographic Primitives with Error-Correcting Codes.*Abstract*: In this talk, we discuss the impact of decryption failures on the chosen-ciphertext security of several Post-Quantum cryptographic schemes using Error-Correcting Codes (ECC). The targeted schemes are categorized into two classes, i.e.,code-based schemes using sparse codes with iterative decoding algorithms (e.g., the McEliece-type schemes based on low-density parity-check (LDPC) codes or moderate-density parity-check (MDPC) codes) and lattice-based schemes using ECC for reducing the decryption error rate (say LAC). In both cases, the adversary can fully recover the secret key with sufficient decryption failures.

Leo Ducas (CWI)

*Title*: Polynomial Time Bounded Distance Decoding near Minkowski’s Bound in Discrete Logarithm Lattices*Abstract*: We propose a concrete family of dense lattices of arbitrary dimension n in which the lattice Bounded Distance Decoding (BDD) problem can be solved in deterministic polynomial time. This construction is directly adapted from the Chor-Rivest cryptosystem (1988). The lattice construction needs discrete logarithm computations that can be made in deterministic polynomial time for well-chosen parameters. Each lattice comes with a deterministic polynomial time decoding algorithm able to decode up to large radius. Namely, we reach decoding radius within O(logn) of Minkowski’s bound, for both l1 and l2 norms.

Thomas Prest (PQShield)

*Title*: Unifying Leakage Models on a Rényi Day*Abstract*: In the last decade, several works have focused on finding the best way to model circuit leakage in order to obtain provably secure implementations. One of the most realistic models is the noisy leakage model, introduced in (Prouff, Rivain'13) and (Duc-Dziembowski-Faust'14) together with secure constructions. These works suffer from various limitations, in particular the use of ideal leak-free gates in (Prouff, Rivain'13) and an important loss (in the size of the field) in the reduction in (Duc-Dziembowski-Faust'14). In this work, we provide new strategies to prove the security of masked implementations and start by unifying the different noisiness metrics used in prior works by relating all of them to a standard notion in information theory: the pointwise mutual information. Based on this new interpretation, we define two new natural metrics and analyze the security of known compilers with respect to these metrics. In particular, we prove (1) a tighter bound for reducing the noisy leakage models to the probing model using our first new metric, (2) better bounds for amplification-based security proofs using the second metric. To support that the improvements we obtain are not only a consequence of the use of alternative metrics, we show that for a concrete representation of leakage (e.g, "Hamming weight + Gaussian noise''), our approach significantly improves the parameters compared to prior works. Finally, using the Rényi divergence, we quantify concretely the advantage of an adversary in attacking a block cipher depending on the number of leakage acquisitions available to it. (Joint work with Dahmun Goudarzi, Ange Martinelli and Alain Passelègue.)

The Faculty of Natural & Mathematical Sciences at King’s College London has a Code of Conduct which we expect participants at our events to abide by. This is intended to ensure an inclusive and productive environment and can be read here.