메인 내용으로 이동

Qiang Yang et al. Federated Machine Learning Concept and Applications

Intro

  • Data exists in isolated islands
  • More security required

First FML framework by Google, 2016

  • Horizontal FL
  • Vertical FL
  • Federated Transfer Learnings

Traditional Data Processing Models

  • aka Simple Data Transaction Model
  • 3 parties
    • Data Collector
    • Data Sanitizer
    • ML Trainer
  • Privacy concerns

Overview

  • aim to extend to all privacy-preserving decentralized collaborative machine learning techniques.
  • simple definition
    • NN parties federate their data without exposing them to each other to attain performance closely comparable to the model trained as if all the information were gathered.
  • Secure Multi-Party Computation. jointly compute over their inputs while keeping those inputs private.
  • Differential Privacy. Add random noise to the data, making identifying any individual's data in the aggregate results difficult.
  • Homomorphic Encryption. allows computations on encrypted data without requiring decryption
  • Indirect Information Leakage
    • Considered Blockchained FL architectures

Categorization of Federated Learning

Architecture

Horizontal Federated Learning

  • Participants compute locally, send to server
  • Server aggregates global model, distributes to participants
  • Participants update their local model

Vertical Federated Learning

  • We need an intermediary collaborator.
  • Step 1: Collaborator C creates encryption pairs and sends the public key to A and B.
  • Step 2: A and B encrypt and exchange the intermediate results for gradient and loss calculations.
  • Step 3: A and B compute encrypted gradients and adds additional mask, respectively, and B also calculates encrypted loss; A and B send encrypted values to C.
  • Step 4: C decrypts and sends the decrypted gradients and loss back to A and B; A and B unmask the gradients and update the model parameters accordingly.

Privacy-preserving Machine Learning

Privacy-preserving machine learning is designed to perform learning while keeping data private.

Techniques

Federated Learning vs. Other Concepts

Distributed Machine Learning

  • Distributed ML focuses on distributed storage and operation.
  • Uses tools like "Parameter Server" to efficiently store and compute.
  • Differences with Federated Learning:

Edge Computing

  • Federated learning serves as a protocol for edge computing.
  • To optimize learning, focus on determining the best trade-off for local updates and global aggregation.

Federated Database Systems

  • These systems integrate multiple databases.
  • Differences with Federated Learning:
    • No privacy mechanisms in federated database interactions.
    • Federated learning aims to create a unified model across different data owners with privacy.

Applications of Federated Learning

Smart Retail

  • Personalize services such as product recommendations.
  • Challenges: Data privacy, security, and heterogeneity across different entities (e.g., banks, social networks, e-shops).
  • Solution: Federated learning can train models without sharing raw data, thus overcoming privacy barriers.

Finance

  • Detect multi-party borrowing which is a risk to the industry.
  • Federated learning can help find malicious borrowers without exposing user lists.

Smart Healthcare

  • Challenges: Sensitive medical data scattered across isolated centers.
  • Solution: Federated learning combined with transfer learning can share model insights without sharing patient data.

Federated Learning as a Business Model

Traditional Approach

Aggregate data, use cloud computing to compute models, then use results.

With Federated Learning

  • Data stays where it is; only model insights are shared.
  • Privacy and data security are prioritized.
  • Offers a new paradigm for big data applications.
  • Can use blockchain for profit allocation in a data alliance.
  • Calls for establishing standards for federated learning for faster adoption.