Qiang Yang et al. Federated Machine Learning Concept and Applications

Intro

aim to extend to all privacy-preserving decentralized collaborative machine learning techniques.
simple definition
- $N$ parties federate their data without exposing them to each other to attain performance closely comparable to the model trained as if all the information were gathered.
Secure Multi-Party Computation. jointly compute over their inputs while keeping those inputs private.
Differential Privacy. Add random noise to the data, making identifying any individual's data in the aggregate results difficult.
Homomorphic Encryption. allows computations on encrypted data without requiring decryption
Indirect Information Leakage
- Considered Blockchained FL architectures

Feature
Horizontal Federated Learning. sample-based FL
- Deep Gradient Compression to share data efficiently
- assumes honest participants and security against honest-but-curious servers.
Vertical Federated Learning. feature-based FL
- assumes honest-but-curious participants
Federated Transfer Learning
- Security measures are similar to VFL

We need an intermediary collaborator.
Step 1: Collaborator C creates encryption pairs and sends the public key to A and B.
Step 2: A and B encrypt and exchange the intermediate results for gradient and loss calculations.
Step 3: A and B compute encrypted gradients and adds additional mask, respectively, and B also calculates encrypted loss; A and B send encrypted values to C.
Step 4: C decrypts and sends the decrypted gradients and loss back to A and B; A and B unmask the gradients and update the model parameters accordingly.

Privacy-preserving machine learning is designed to perform learning while keeping data private.

Federated learning. A decentralized collaborative machine learning method.
Secure multi-party computation (SMC): Provides privacy guarantees.
Secure multi-party decision trees, k-means, Naive Bayes classifier, etc.: Algorithms for various machine learning tasks with privacy preservation.
Homomorphic encryption. Allows computation on encrypted data without decryption, ensuring privacy.
Yao's garbled circuits: Another privacy-preserving computation method.

Distributed ML focuses on distributed storage and operation.
Uses tools like "Parameter Server" to efficiently store and compute.
Differences with Federated Learning:
- Data privacy emphasis in Federated Learning.
- Distributed ML centralizes control, while Federated Learning decentralizes control to data owners.

Federated learning serves as a protocol for edge computing.
To optimize learning, focus on determining the best trade-off for local updates and global aggregation.

These systems integrate multiple databases.
Differences with Federated Learning:
- No privacy mechanisms in federated database interactions.
- Federated learning aims to create a unified model across different data owners with privacy.

Personalize services such as product recommendations.
Challenges: Data privacy, security, and heterogeneity across different entities (e.g., banks, social networks, e-shops).
Solution: Federated learning can train models without sharing raw data, thus overcoming privacy barriers.

Detect multi-party borrowing which is a risk to the industry.
Federated learning can help find malicious borrowers without exposing user lists.

Challenges: Sensitive medical data scattered across isolated centers.
Solution: Federated learning combined with transfer learning can share model insights without sharing patient data.

Aggregate data, use cloud computing to compute models, then use results.