Vertical Federated Learning
Vertical Federated Learning, or feature-based federated learning, is a type of Federated Learning where multiple parties collaborate to train a shared machine learning model without directly exchanging their raw data. Unlike Horizontal Federated Learning, where different parties have different samples (or data points) but share the same feature set, in Vertical Federated Learning, various parties possess different subsets of features for the same collection of samples.
Example
Consider two companies, a bank, and a retail store, that want to collaborate on a machine-learning model to predict customer spending behavior. The bank has financial features like income, credit score, and loan history, while the retail store has behavioral features like purchase history, product preferences, and online engagement metrics. Both companies have data on the same set of customers (i.e., same sample set) but have collected different kinds of information (i.e., different feature sets).
How it Works
- Initialization. A global model is initialized, often on a centralized server, or one of the parties acts as a coordinator.
- Local Computation. Each party computes local model updates using its feature set and the shared model parameters. Since the parties have different features but share the same samples, the local computations can be aligned to the same group of individuals.
- Secure Aggregation. The local updates from each party are aggregated securely, often employing advanced cryptographic techniques like Secure Multi-Party Computation (SMPC) or Homomorphic Encryption to generate an updated global model.
- Iteration. Steps 2 and 3 are repeated until the model converges or other stopping criteria are met.
Key Advantages
- Data Privacy. Raw data never leaves the local premises, ensuring data privacy.
- Feature Utilization. Allows for more comprehensive models that utilize features from multiple parties, leading to potentially more accurate and insightful results.
- Reduced Dimensionality. Each party only needs to worry about its own set of features, reducing the computational burden.
Challenges
- Complexity. Secure aggregation methods like SMPC can be computationally expensive and complicated to implement.
- Alignment. All parties must have data on the same set of samples, which might not always be feasible or straightforward to achieve.
- Communication Overhead. Exchanging model updates can be bandwidth-intensive, mainly when secure cryptographic methods are employed.
- Trust and Governance. A secure and mutually agreed-upon protocol ensures no party cheats or gains an unfair advantage.
Applications
Vertical Federated Learning is especially useful in sectors where entities hold different data on the same individuals or items. Examples include collaborations between healthcare providers and research institutions, banks and retail companies, or telecom companies and content providers.
By leveraging Vertical Federated Learning, organizations can create more comprehensive models than possible using only their data, all while maintaining strict data privacy standards. This enables them to extract richer insights and create more value from their collaborative efforts.