Daniele Romanini et al. PyVertical
The paper introduces the PyVertical framework for vertical federated learning using split neural networks (SplitNNs). This framework facilitates training neural networks on vertically distributed data across multiple owners, ensuring that raw data remains on the owner's device. This vertical partitioning means that different entities own different features of the same dataset. Private Set Intersection (PSI) identifies and links common data points across these datasets without compromising privacy. To validate the effectiveness of PyVertical, the authors trained a split neural network on the MNIST dataset, with the data samples being split across two different owners and a third party (data scientist) overseeing the training.
Introduction
- The challenge of utilizing data in isolated silos for machine learning is highlighted. Issues arise, especially when this data is sensitive or legally protected.
- The paper introduces Vertical Federated Learning (VFL), which differs from the typical horizontal federated learning. While horizontal partitioning distributes the same features across different owners, vertical partitioning scatters various features of the same data set. An example provided is a patient's medical data being held by other medical institutions.
Contributions
- The authors expand on previous works to utilize Split Neural Networks (SplitNNs) and PSI in Vertical Federated Learning.
- The paper introduces PyVertical, a novel open-source framework for training on vertically partitioned datasets.
Background and Related Work
PSI (Private Set Intersection)
A cryptographic technique that lets two parties find common elements in their datasets without revealing any other information.