Facebook Releases Open Source Differential Privacy Library
Facebook recently introduced a free open source library for training deep learning models with differential privacy called Opacus. This new tool is designed for simplicity, flexibility and speed. It offers a simple and user-friendly API, which allows machine learning practitioners to privateize a training pipeline by adding as few as two lines to their code.
Check out the Opacus source code here.
Register for our upcoming Masterclass>>
Presentation of Opacus
Over the years, differential confidentiality has become the main concept of confidentiality for statistical analysis. It enables complex computational tasks to be performed on large data sets while maintaining information about individual data points.
Differentially Private Stochastic Gradient Descent (DP-SGD), which is a modification of SGD, guarantees differential confidentiality with each update of the model parameters; instead of calculating the average gradient over a batch of samples, a DP-SGD implementation calculates the gradients per sample, client their standard, aggregates them into the batch gradient, and adds Gaussian noise.
The image below illustrates the representation of the DP-SGD algorithm, where the monochrome lines represent the gradients per sample, the width of the lines shows their respective norms, and the multicolored lines show the aggregated gradients.
However, deep learning frameworks such as TensorFlow or PyTorch do not expose intermediate calculations, including gradients per sample, mainly for efficiency reasons. As a result, users only have access to the gradients averaged over a batch. Therefore, a simple way to implement DP-SGD is to separate each batch into “size one micro-lots” and calculate the gradients on those micro-lots, clip, and add noise.
Here is a code snippet to generate the gradients per sample via micro batching:
Although “micro-lot” or “micro-lot” gives correct gradients per sample, it can be very slow in practice due to the underutilization of hardware accelerators such as GPUs and TPU-optimized TPUs. batch calculations and parallel to the data. This is where Opacus comes in, where it implements a vectorized calculation improving performance instead of “micro-batching”.
Here are some of the key design principles and features of Opacus:
- Simplicity: Opacus offers a compact and easy-to-use API for researchers and engineers. In other words, users don’t need to know the details of DP-SGD to train their ML models with differential privacy.
- Flexibility: Opacus supports rapid prototyping by users familiar with PyTorch and Python.
- Speed: Opacus seeks to minimize the performance overhead of DP-SGD by supporting vectorized computation.
In addition to this, other key features include privacy accounting, model validation, Poisson sampling, vectorized computation, virtual steps, custom layers, and secure random number generation.
Privacy accounting: Here, Opacus provides out-of-the-box privacy tracking with a Rényi Differential Privacy based accountant. It controls the privacy budget at all times, allowing early shutdown and real-time monitoring.
Model validation: Before training a model, Opacus validates that the model is compatible with DP-SGD.
Poisson Sampling: Opacus supports uniform batch sampling (aka Poisson sampling). This means that each data point is independently added to the lot with a probability equal to the sampling rate.
Vectorized calculation: Opacus effectively uses hardware accelerators like GPUs, TPUs, etc.
Virtual stages: To maximize the use of all available memory, Opacus provides an option to decouple the physical lot size from the logical lot size.
Custom layers: Opacus is flexible as it supports various layers including convolutions, LSTMs, multi-head attention, normalization, and integration layers. When using a custom PyTorch layer, users can provide a method to calculate gradients per sample for that layer and save it with a simple decorator provided by Opacus.
Secure generation of random numbers: Opacus offers a cryptographically secure (but slower) pseudo-random number generator (CSPRNG) for security-critical code.
Other Differential Privacy Learning Libraries
Besides Opacus, other differential privacy learning libraries include PyVacy and TensorFlow Privacy. These two frameworks provide implementations of DP-SGD for PyTorch and TensorFlow. Another framework includes BackPACK for DP-SGD, which leverages Jacobians for efficiency. It currently only supports fully connected or convolutional layers and multiple activation layers. However, recurring and residual layers are not yet supported.
As a PyTorch library for training deep learning models with differential privacy, the design of Opacus aims to provide simplicity, flexibility and speed. Currently, it is maintained as an open source project, supported by Facebook’s privacy-friendly machine learning team. Going forward, the team is looking to add several extensions and upgrades, including flexibility for custom components, efficiency improvements, and better integration with the PyTorch community through projects such as PyTorch Lightning.
Join our Discord server. Be part of an engaging online community. Join here.
Subscribe to our newsletter
Receive the latest updates and relevant offers by sharing your email.