Direct Link to Live Stream Virtual NeurIPS 2021 Workshop
This one-day workshop focuses on privacy-preserving machine learning techniques for large-scale data analysis, both in the distributed and centralized settings, and on scenarios that highlight the importance and need for these techniques (e.g., via privacy attacks). There is growing interest from the Machine Learning (ML) community in leveraging cryptographic techniques such as Multi-Party Computation (MPC) and Homomorphic Encryption (HE) for secure computation during training and inference, as well as Differential Privacy (DP) for limiting the privacy risks from the trained model itself. We encourage both theory and application-oriented submissions exploring a range of approaches listed below.
Submission deadline: September 16 17, 2021 (UTC)
Notification of acceptance: October 15 19, 2021
Video and slides submission deadline (for accepted papers): November 1, 2021
Event date: December 14, 2021
Contact : privacymlworkshop@gmail.com
Submissions in the form of extended abstracts must be at most 4 pages long (not including references; additional supplementary material may be submitted but may be ignored by reviewers), non-anonymized, and adhere to the NeurIPS format. We encourage the submission of work that is new to the privacy-preserving machine learning community. Submissions solely based on work that has been previously published in conferences on machine learning and related fields are not suitable for the workshop. On the other hand, we allow submission of works currently under submission and relevant works recently previously published in privacy and security venues. Submission of work under review at NeurIPS 2021 is allowed but this must be disclosed at submission time. Submissions accepted to the NeurIPS main conference may be deprioritized in selecting oral presentations. The workshop will not have formal proceedings, but authors of accepted abstracts can choose to have a link to arxiv or a pdf added on the workshop webpage.
Block A Paris time | |
10:20-10:30 | Welcome |
10:30-11:00 | Invited talk: Emiliano de Cristofaro (University College London) |
Privacy in Machine Learning -- It's Complicated
|
|
11:00-11:15 | Emiliano de Cristofaro Q & A: |
|
|
11:15-11:30 | Break |
11:30-11:45 |
Differential Privacy via Group Shuffling
(contributed talk)
Amir Mohammad Abouei, Clement Louis Canonne |
The past decade has seen data privacy emerge as a fundamental and pressing issue. Among the
tools developed to tackle it, differential privacy has emerged as a central and principled
framework, with specific variants capturing various threat models. In particular, the recently
proposed shuffle model of differential privacy allows for promising tradeoffs between accuracy
and privacy. However, the shuffle model may not be suitable in all situations, as it relies on
a distributed setting where all users can coordinate and trust (or simulate) a joint shuffling algorithm.
To address this, we introduce a new model, the group shuffle model, in which users are partitioned
into several groups, each group having its own local shuffler. We investigate the privacy/accuracy
tradeoffs in our model, by comparing it to both the shuffle and local models of privacy, which
it some sense interpolates between. In addition to general relations between group shuffle,
shuffle, and local privacy, we provide a detailed comparison of the cost and benefit of the
group shuffle model, by providing both upper and lower bounds for the specific task of binary summation.
|
|
11:45-12:00 |
SoK: Privacy-preserving Clustering (Extended Abstract)
(contributed talk)
Aditya Hegde, Helen Möllering, Thomas Schneider, Hossein Yalame |
Clustering is a popular unsupervised machine learning technique that groups similar input
elements into clusters. In many applications, sensitive information is clustered that should
not be leaked. Moreover, nowadays it is often required to combine data from multiple sources
to increase the quality of the analysis as well as to outsource complex computation to powerful
cloud servers. This calls for efficient privacy-preserving clustering. In this work, we
systematically analyze the state-of-the-art in privacy-preserving clustering. We implement
and benchmark today's four most efficient fully private clustering protocols by Cheon et al.
(SAC'19), Meng et al. (ArXiv'19), Mohassel et al. (PETS'20), and Bozdemir et al. (ASIACCS'21)
with respect to communication, computation, and clustering quality.
|
|
12:00-12:15 | Contribute talk Q&A |
12:15-12:30 | Coffee Break |
12:30-13:30 | Poster |
13:30:14:15 |
Panel
|
Block B LA time | |
8:20-8:30 | Welcome |
8:30-9:00 | Invited talk: Helen Nissenbaum (Cornell Tech) |
Practical Privacy, Fairness, Ethics, Policy
|
|
9:00-9:30 | Invited talk: Aaron Roth (UPenn / Amazon) |
Machine Unlearning.
|
|
9:30-10:00 | Q & A for Helen and Aaron: |
10:00-10:15 | Break |
10:15-11:15 | Poster Session |
11:15-11:30 | Break |
11:30-12:00 | Invited talk: Kristin Lauter (Facebook AI Research): |
ML on Encrypted Data.
|
|
12:00-12:15 | Kristin Lauter Q & A : |
12:15-12:30 |
Privacy-Aware Rejection Sampling
(contributed talk)
Jordan Awan, Vinayak Rao |
Differential privacy (DP) offers strong protection against adversaries with arbitrary side-information
and computational power. However, many implementations of DP mechanisms leave themselves vulnerable
to side channel attacks, such as timing attacks. As many privacy mechanisms, such as the exponential
mechanism, do not lend themselves to easy implementations, when sampling methods such as MCMC or
rejection sampling are used, the runtime can leak privacy. In this work, we quantify the privacy
cost due to the runtime of a rejection sampler in terms of -DP. We also propose three modifications
to the rejection sampling algorithm, to protect against timing attacks by making the runtime
independent of the data. We also use our techniques to develop an adaptive rejection sampler
for log-Holder densities, which also has data-independent runtime.
|
|
12:30-12:45 |
Population Level Privacy Leakage in Binary Classification wtih Label Noise
(contributed talk)
Robert Istvan Busa-Fekete, Andres Munoz medina, Umar Syed, Sergei Vassilvitskii |
We study the privacy limitations of label differential privacy. Label differential privacy has
emerged as an intermediate trust model between local and central differential privacy, where
only the label of each training example is protected (and the features are assumed to be public).
We show that the guarantees provided by label DP are significantly weaker than they appear,
as an adversary can "un-noise" the perturbed labels. Formally we show that the privacy loss
has a close connection with Jeffreys' divergence of the conditional distribution between
positive and negative labels, which allows explicit formulation of the trade-off between
utility and privacy in this setting. Our results suggest how to select public features that
optimize this trade-off. But we still show that there is no free lunch --- instances where
label differential privacy guarantees are strong are exactly those where a good classifier
does not exist. We complement the negative results with a non-parametric estimator for the
true privacy loss, and apply our techniques on large-scale benchmark data to demonstrate how
to achieve a desired privacy protection.
|
|
12:45-13:00 |
Simple Baselines Are Strong Performers for Differentially Private Natural Language Processing
(contributed talk)
Xuechen Li, Florian Tramer, Percy Liang, Tatsunori Hashimoto |
Differentially private learning has seen limited success for deep learning models of text,
resulting in a perception that differential privacy may be incompatible with the language
model fine-tuning paradigm. We demonstrate that this perception is inaccurate and that with
the right setup, high performing private models can be learned on moderately-sized corpora by
directly fine-tuning with differentially private optimization. Our work highlights the important
role of hyperparameters, task formulations, and pretrained models. Our analyses also show that
the low performance of naive differentially private baselines in prior work is attributable
to suboptimal choices in these factors. Empirical results reveal that differentially private
optimization does not suffer from dimension-dependent performance degradation with pretrained
models and achieves performance on-par with state-of-the-art private training procedures and
strong non-private baselines.
|
|
13:00-13:15 |
Canonical Noise Distributions and Private Hypothesis Tests
(contributed talk)
Jordan Awan, Salil Vadhan |
In the setting of -DP, we propose the concept \emph{canonical noise distribution} (CND) which
captures whether an additive privacy mechanism is tailored for a given , and give a construction
of a CND for an arbitrary tradeoff function . We show that private hypothesis tests are intimately
related to CNDs, allowing for the release of private -values at no additional privacy cost as
well as the construction of uniformly most powerful (UMP) tests for binary data. We apply our
techniques to difference of proportions testing.
|
|
13:15-13:45 | Q&A for four contributed talks |
13:45-14:30 |
Panel
|
14:30-14:40 | Closing |