Skip to main content

Project Florence

Project Florence is a research project on VFL


Can't we just ensemble them-?

  1. Federated learning and split learning
    • Discuss split training versus federated learning
    • Federated learning converges to a better optimization point than ensembling independently trained models
    • Split learning involves training parts of the network at different sites
  2. Vertical partitioning of data
    • Vertical partitioning of features across different sites can lead to poor individual predictors
    • Training a model that combines the data in a more sophisticated way may perform better
    • Focus on implementations that do not require training parts of the network at a central node
  3. Next steps
    • Look into existing implementations of split learning and vertical partitioning
    • Focus on approaches using deep learning rather than classical models
    • Assume the record linkage problem is solved and focus on the training approach
  4. Action items
    • Search for relevant papers that meet the criteria
    • Filter out papers using classical models instead of neural networks


  1. Vertical Federated Learning
    • The goal is to train a model using data from multiple sites without sharing the raw data.
    • Each site may have different features/columns in their data, but some overlap.
    • The challenge is training parts of the network using the data available at each site.
  2. Record Linkage
    • Matching records across sites to identify which records represent the same entity.
    • Can be done using properties like name, address, phone number, and string similarity.
  3. Inference
    • Once the model is trained, inference is done globally using all available data for an entity, not just at one site.
  4. Potential Conferences
    • NeurIPS in May 2024 is a good target conference. Earlier deadlines may be too soon.
  5. Meeting Plans
    • Thursdays at 2 p.m. at ISI or remotely if needed.