Privacy in Machine Learning

Direct Link to Live Stream Virtual NeurIPS 2021 Workshop


This one-day workshop focuses on privacy-preserving machine learning techniques for large-scale data analysis, both in the distributed and centralized settings, and on scenarios that highlight the importance and need for these techniques (e.g., via privacy attacks). There is growing interest from the Machine Learning (ML) community in leveraging cryptographic techniques such as Multi-Party Computation (MPC) and Homomorphic Encryption (HE) for secure computation during training and inference, as well as Differential Privacy (DP) for limiting the privacy risks from the trained model itself. We encourage both theory and application-oriented submissions exploring a range of approaches listed below.

  • Privacy-preserving machine learning
  • Differential privacy: theory, applications, and implementations
  • Statistical and information-theoretic notions of privacy, including DP relaxations
  • Empirical and theoretical comparisons between different notions of privacy
  • Privacy-preserving data sharing, anonymization, and privacy of synthetic data
  • Privacy attacks
  • Federated and decentralized privacy-preserving algorithms
  • Policy-making aspects of data privacy
  • Secure multi-party computation techniques for machine learning
  • Learning on encrypted data, homomorphic encryption
  • Privacy in autonomous systems
  • Online social networks privacy
  • Privacy and private learning in computer vision and natural language processing tasks
  • Programming languages for privacy-preserving data analysis
  • Relations of privacy with fairness, transparency and adversarial robustness
  • Machine unlearning and data-deletion

Call For Papers & Important Dates

Download Full CFP

Submission deadline: September 16 17, 2021 (UTC)
Notification of acceptance: October 15 19, 2021
Video and slides submission deadline (for accepted papers): November 1, 2021
Event date: December 14, 2021
Contact :

Submission Instructions

Submissions in the form of extended abstracts must be at most 4 pages long (not including references; additional supplementary material may be submitted but may be ignored by reviewers), non-anonymized, and adhere to the NeurIPS format. We encourage the submission of work that is new to the privacy-preserving machine learning community. Submissions solely based on work that has been previously published in conferences on machine learning and related fields are not suitable for the workshop. On the other hand, we allow submission of works currently under submission and relevant works recently previously published in privacy and security venues. Submission of work under review at NeurIPS 2021 is allowed but this must be disclosed at submission time. Submissions accepted to the NeurIPS main conference may be deprioritized in selecting oral presentations. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to have a link to arxiv or a pdf added on the workshop webpage.

Submit Your Abstract Here

Invited Speakers

  • Helen Nissenbaum (Cornell Tech)
  • Emiliano de Cristofaro (University College London)
  • Kristin Lauter (Facebook AI Research)
  • Aaron Roth (UPenn / Amazon)


Block A Paris time
10:20-10:30 Welcome
10:30-11:00 Invited talk: Emiliano de Cristofaro (University College London)
Privacy in Machine Learning -- It's Complicated
11:00-11:15 Emiliano de Cristofaro Q & A:
11:15-11:30 Break
11:30-11:45 Differential Privacy via Group Shuffling (contributed talk)   
Amir Mohammad Abouei, Clement Louis Canonne
The past decade has seen data privacy emerge as a fundamental and pressing issue. Among the tools developed to tackle it, differential privacy has emerged as a central and principled framework, with specific variants capturing various threat models. In particular, the recently proposed shuffle model of differential privacy allows for promising tradeoffs between accuracy and privacy. However, the shuffle model may not be suitable in all situations, as it relies on a distributed setting where all users can coordinate and trust (or simulate) a joint shuffling algorithm. To address this, we introduce a new model, the group shuffle model, in which users are partitioned into several groups, each group having its own local shuffler. We investigate the privacy/accuracy tradeoffs in our model, by comparing it to both the shuffle and local models of privacy, which it some sense interpolates between. In addition to general relations between group shuffle, shuffle, and local privacy, we provide a detailed comparison of the cost and benefit of the group shuffle model, by providing both upper and lower bounds for the specific task of binary summation.
11:45-12:00 SoK: Privacy-preserving Clustering (Extended Abstract) (contributed talk)   
Aditya Hegde, Helen Möllering, Thomas Schneider, Hossein Yalame
Clustering is a popular unsupervised machine learning technique that groups similar input elements into clusters. In many applications, sensitive information is clustered that should not be leaked. Moreover, nowadays it is often required to combine data from multiple sources to increase the quality of the analysis as well as to outsource complex computation to powerful cloud servers. This calls for efficient privacy-preserving clustering. In this work, we systematically analyze the state-of-the-art in privacy-preserving clustering. We implement and benchmark today's four most efficient fully private clustering protocols by Cheon et al. (SAC'19), Meng et al. (ArXiv'19), Mohassel et al. (PETS'20), and Bozdemir et al. (ASIACCS'21) with respect to communication, computation, and clustering quality.
12:00-12:15 Contribute talk Q&A
12:15-12:30 Coffee Break
12:30-13:30 Poster
13:30:14:15 Panel   
Block B LA time
8:20-8:30 Welcome
8:30-9:00 Invited talk: Helen Nissenbaum (Cornell Tech)
Practical Privacy, Fairness, Ethics, Policy
9:00-9:30 Invited talk: Aaron Roth (UPenn / Amazon)
Machine Unlearning.
9:30-10:00 Q & A for Helen and Aaron:
10:00-10:15 Break
10:15-11:15 Poster Session
11:15-11:30 Break
11:30-12:00 Invited talk: Kristin Lauter (Facebook AI Research):
ML on Encrypted Data.
12:00-12:15 Kristin Lauter Q & A :
12:15-12:30 Privacy-Aware Rejection Sampling (contributed talk)   
Jordan Awan, Vinayak Rao
Differential privacy (DP) offers strong protection against adversaries with arbitrary side-information and computational power. However, many implementations of DP mechanisms leave themselves vulnerable to side channel attacks, such as timing attacks. As many privacy mechanisms, such as the exponential mechanism, do not lend themselves to easy implementations, when sampling methods such as MCMC or rejection sampling are used, the runtime can leak privacy. In this work, we quantify the privacy cost due to the runtime of a rejection sampler in terms of -DP. We also propose three modifications to the rejection sampling algorithm, to protect against timing attacks by making the runtime independent of the data. We also use our techniques to develop an adaptive rejection sampler for log-Holder densities, which also has data-independent runtime.
12:30-12:45 Population Level Privacy Leakage in Binary Classification wtih Label Noise (contributed talk)   
Robert Istvan Busa-Fekete, Andres Munoz medina, Umar Syed, Sergei Vassilvitskii
We study the privacy limitations of label differential privacy. Label differential privacy has emerged as an intermediate trust model between local and central differential privacy, where only the label of each training example is protected (and the features are assumed to be public). We show that the guarantees provided by label DP are significantly weaker than they appear, as an adversary can "un-noise" the perturbed labels. Formally we show that the privacy loss has a close connection with Jeffreys' divergence of the conditional distribution between positive and negative labels, which allows explicit formulation of the trade-off between utility and privacy in this setting. Our results suggest how to select public features that optimize this trade-off. But we still show that there is no free lunch --- instances where label differential privacy guarantees are strong are exactly those where a good classifier does not exist. We complement the negative results with a non-parametric estimator for the true privacy loss, and apply our techniques on large-scale benchmark data to demonstrate how to achieve a desired privacy protection.
12:45-13:00 Simple Baselines Are Strong Performers for Differentially Private Natural Language Processing (contributed talk)   
Xuechen Li, Florian Tramer, Percy Liang, Tatsunori Hashimoto
Differentially private learning has seen limited success for deep learning models of text, resulting in a perception that differential privacy may be incompatible with the language model fine-tuning paradigm. We demonstrate that this perception is inaccurate and that with the right setup, high performing private models can be learned on moderately-sized corpora by directly fine-tuning with differentially private optimization. Our work highlights the important role of hyperparameters, task formulations, and pretrained models. Our analyses also show that the low performance of naive differentially private baselines in prior work is attributable to suboptimal choices in these factors. Empirical results reveal that differentially private optimization does not suffer from dimension-dependent performance degradation with pretrained models and achieves performance on-par with state-of-the-art private training procedures and strong non-private baselines.
13:00-13:15 Canonical Noise Distributions and Private Hypothesis Tests (contributed talk)   
Jordan Awan, Salil Vadhan
In the setting of -DP, we propose the concept \emph{canonical noise distribution} (CND) which captures whether an additive privacy mechanism is tailored for a given , and give a construction of a CND for an arbitrary tradeoff function . We show that private hypothesis tests are intimately related to CNDs, allowing for the release of private -values at no additional privacy cost as well as the construction of uniformly most powerful (UMP) tests for binary data. We apply our techniques to difference of proportions testing.
13:15-13:45 Q&A for four contributed talks
13:45-14:30 Panel   
14:30-14:40 Closing

Accepted Papers

Abhinav Aggarwal, Shiva Kasiviswanathan, Zekun Xu, Oluwaseyi Feyisetan, Nathanael Teissier
Reconstructing Test Labels from Noisy Loss Scores (Extended Abstract)    [openreview] [Visit Poster at Spot B1]
Xinyu Tang, Saeed Mahloujifar, Liwei Song, Virat Shejwalkar, Milad Nasr, Amir Houmansadr, Prateek Mittal
A Novel Self-Distillation Architecture to Defeat Membership Inference Attacks    [openreview] [Visit Poster at Spot F3]
Cecilia Ferrando, Jennifer Gillenwater, Alex Kulesza
Combining Public and Private Data    [openreview] [Visit Poster at Spot G1]
Virat Shejwalkar, Huseyin A Inan, Amir Houmansadr, Robert Sim
Membership Inference Attacks Against NLP Classification Models    [openreview] [Visit Poster at Spot H0]
Yongqin Wang, Edward Suh, Wenjie Xiong, Brian Knott, Benjamin Lefaudeux, Murali Annavaram, Hsien-Hsin Lee
Characterizing and Improving MPC-based Private Inference for Transformer-based Models    [openreview] [Visit Poster at Spot H2]
Andres Munoz medina, Matthew Joseph, Jennifer Gillenwater, Mónica Ribero
A Joint Exponential Mechanism for Differentially Private Top-k Set    [openreview] [Visit Poster at Spot I0]
Karan Chadha, John Duchi, Rohith Kuditipudi
Private Confidence Sets    [openreview] [Visit Poster at Spot I1]
Rachel Cummings, Vitaly Feldman, Audra McMillan, Kunal Talwar
Mean Estimation with User-level Privacy under Data Heterogeneity    [openreview] [Visit Poster at Spot J0]
Dmitrii Usynin, Alexander Ziller, Moritz Knolle, Andrew Trask, Kritika Prakash, Daniel Rueckert, Georgios Kaissis
An automatic differentiation system for the age of differential privacy    [arxiv] [Visit Poster at Spot A0]
Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, Ilya Mironov
Opacus: User-Friendly Differential Privacy Library in PyTorch    [openreview] [Visit Poster at Spot D3]
Robert Istvan Busa-Fekete, Andres Munoz medina, Umar Syed, Sergei Vassilvitskii
Population Level Privacy Leakage in Binary Classification wtih Label Noise (oral)    [openreview] [Visit Poster at Spot F2]
Aditya Hegde, Helen Möllering, Thomas Schneider, Hossein Yalame
SoK: Privacy-preserving Clustering (Extended Abstract) (oral)    [openreview] [Visit Poster at Spot H1]


Workshop organizers

  • Borja Balle (DeepMind)
  • Giovanni Cherubin (Alan Turing Institute)
  • Kamalika Chaudhuri (UC San Diego and Facebook AI Research)
  • Antti Honkela (University of Helsinki)
  • Jonathan Lebensold (Mila and McGill University)
  • Casey Meehan (UC San Diego)
  • Mijung Park (University of British Columbia)
  • Yu-Xiang Wang (UC Santa Barbara)
  • Adrian Weller (Alan Turing Institute & Cambridge University)
  • Yuqing Zhu (UC Santa Barbara)

Program Committee

  • Carmela Troncoso (EPFL)
  • Bogdan Kulynych (EPFL)
  • Florian Tramer (Stanford)
  • Samuel Gordon (George Mason University )
  • James Henry Bell (Alan Turing Institute)
  • Adria Gascon (Google)
  • Antti Koskela (University of Helsinki)
  • James Foulds (University of Maryland Baltimore County)
  • Catuscia Palamidessi (INRIA)
  • jordan Awan (Purdue)
  • Di Wang (KAUST)
  • Feargus Pendlebury (ICSI)
  • Om Thakkar(Boston University)
  • Bolin Ding (Alibaba Group)
  • Anne-Sophie Charest (Laval university)
  • Theresa Stadler (EPFL)
  • Esfandiar Mohammadi (Universität zu Lübeck)
  • Tejas Kulkarni (Aalto University)
  • Bargav Jayaraman (University of Virginia)
  • Kallista Bonawitz (Google)
  • Graham Cormode (University of Warwick)
  • Rachel Cummings (Georgia Institute of Technology)
  • Morten Dahl (OpenMined & Dropout Labs)
  • Olga Ohrimenko (Melbourne University)
  • Martine De Cock (University of Washington)
  • Christos Dimitrakakis (University of Oslo, Norway)
  • Reza Shokri (National University of Singapore)
  • Jamie Hayes (DeepMind)
  • Matthew Jagielski (Northeastern University)
  • Peter Kairouz (Google)
  • Matthew Reimherr (Pennsylvania State University)
  • Gautam Kamath (University of Waterloo)
  • Marcel Keller (Data61)
  • Richard Nock (Data61 & Australian National University)
  • Jacob Imola (UCSD)
  • Jan Ramon (INRIA)
  • Vasilios Mavroudis (Alan Turing Institute)
  • Luca Melis (Facebook)
  • Jinshuo Dong (Northwestern University)
  • Phillipp Schoppmann (Google)
  • Or Sheffet (Bar Ilan University, Technion)
  • Kana Shimizu (Waseda University)
  • Congzheng Song (Apple)
  • Ananth Raghunathan(Facebook)
  • Stratis Ioannidis (Northeastern University)
  • Samee Zahur(University of Virginia)
  • Mohammad Yaghini (Toronto University)
  • Ali Shahin Shamsabadi (Vector)