Federated learning (also known as collaborative learning) is a machine learning method where multiple groups (called clients) work together to train a model without combining their data into one central location. A key feature of federated learning is that data can be different between clients. Since data is not stored in one place, the information each client holds may not be the same across all groups.
Federated learning focuses on challenges such as protecting data privacy, using only necessary data, and ensuring proper data access rights. It is used in many areas, including defense, telecommunications, the Internet of Things, and pharmaceuticals.
Definition
Federated learning is a method used to train a machine learning model, such as a deep neural network, using data stored on different devices or locations without directly sharing the data itself. The process involves training individual models on each device's data and then sharing specific details, like the model's settings (such as weights and biases), between devices at regular intervals. These shared details are used to create a single, unified model that all devices can use.
The main difference between federated learning and distributed learning is how they handle the data on each device. Distributed learning assumes that the data on all devices is similar in type and size, and it focuses on using many computers to speed up the training process. Federated learning, however, is designed to work with data that varies greatly in type and size. It also deals with devices that may not always be reliable, such as smartphones or Internet of Things (IoT) devices, which often have weaker connections and limited power compared to powerful computers in data centers used in distributed learning.
The goal of federated learning is to combine data from all devices into one shared model. This involves two main tasks:
1. Improving a specific mathematical formula that combines results from all devices into a single model.
2. Ensuring that all devices agree on the final model's settings, so they all end up with the same model after training.
In a centralized federated learning setup, a central computer (called a server) manages the process. It selects which devices participate and collects updates from them to combine into the shared model. However, this setup can slow down the process because all updates must go through the central server.
In a decentralized federated learning setup, devices work together directly without a central server. This avoids problems if the server fails, but the way devices are connected (like in a network) can affect how well the model is trained. Examples of this include systems using blockchain technology.
In many real-world situations, not all devices are the same. Some devices might use the same model structure but have different types of data, which is called Personalized Federated Learning (PFL). Others may have completely different model structures, such as smartphones and IoT devices, which is called Heterogeneous Federated Learning. A challenge in PFL is making sure each device's model stays unique while still learning from other devices.
Most current federated learning methods assume all devices use the same model structure. A newer approach called HeteroFL has been developed to handle devices with very different computing power and communication abilities. HeteroFL allows devices to train models that vary in complexity and handle different types of data, while still creating a single, accurate model that works for all devices.
Main features
The process of federated learning repeats a series of steps between a central server and local devices, and each step is called a federated learning round. During each round, the central server sends the current global model to local devices. These devices then train the model using their own data to create possible model updates. Afterward, the server combines these updates into one global update and applies it to the global model.
In this method, a central server is used to combine model updates, while local devices train models based on the server’s instructions. Other methods can achieve the same results without a central server by using peer-to-peer communication, such as gossip or consensus techniques.
A federated learning round includes one cycle of the learning process. The process can be described as follows:
- Initialization: The server selects a type of machine learning model (such as linear regression, neural networks, or boosting) to train on local devices. The model is then set up, and devices wait for the server to assign tasks.
- Client Selection: Some local devices are chosen to begin training using their data. These devices receive the current model, while others wait for the next round.
- Configuration: The server instructs selected devices to train the model on their data in a specific way, such as using mini-batch updates for gradient descent.
- Reporting: Each selected device sends its trained model to the server. The server combines these models, sends the updated model back to the devices, and handles issues like disconnected devices or missing updates. The process then repeats by selecting new devices for the next round.
- Termination: When a stopping condition is met (such as reaching a set number of rounds or achieving a target model accuracy), the server combines the final updates and completes the global model.
The process described above assumes that all model updates happen at the same time. Newer methods in federated learning address situations where updates happen at different times or when models change during training. Unlike synchronous methods, which wait until all layers of a neural network are computed before sharing updates, asynchronous methods share updates as soon as a layer’s computation is complete. These methods are called split learning and can be used during both training and testing, whether a central server is involved or not.
In most federated learning setups, the assumption that data across local devices is independent and identically distributed (i.i.d.) is not always true. This can cause training results to vary based on uneven data amounts or different data distributions (such as differences in features or labels) at each device. To better understand how non-i.i.d. data affects learning, the following categories from a 2019 study by Peter Kairouz et al. are described:
- Covariate shift: Data on different devices may have different feature distributions. For example, in natural language processing, the same digits or letters might be written with different styles.
- Prior probability shift: Labels on different devices may have different distributions. For instance, image datasets might vary by region or population group.
- Concept drift (same label, different features): Devices may share the same labels but use different features. For example, images of the same object might look different due to weather conditions.
- Concept shift (same features, different labels): Devices may share the same features but assign different labels. For example, the same text might be interpreted as having different sentiments in different contexts.
- Unbalanced data: Some devices may have much more data than others.
To reduce accuracy loss caused by non-i.i.d. data, more advanced data normalization techniques can be used instead of standard batch normalization.
Algorithmic hyper-parameters
The way local results are combined and how devices share information can differ from the centralized model described earlier. This creates many types of federated learning methods, such as systems without a central server or random communication between devices.
One important type is networks without a central organizer. In this setup, there is no single server that sends tasks to devices or collects their results. Instead, each device shares its results with a few randomly chosen other devices, which then combine the information locally. This limits the number of steps needed, which can sometimes make training faster and use less computer power.
After choosing the network structure, you can adjust different settings to improve learning (in addition to the model’s own settings):
- Number of learning rounds: T
- Total number of devices used: K
- How many devices are used in each step: C
- Size of data batches used in each learning step: B
Other settings related to the model can also be adjusted, such as:
- Number of training steps before combining results: N
- Learning rate for each device: η
These settings must be adjusted based on the needs of the learning task, such as available computer power, memory, and internet speed. For example, randomly selecting a limited number of devices (C) in each step can lower computer costs and may help avoid overfitting, similar to how random selection in gradient descent can reduce overfitting.
Limitations
Federated learning needs regular communication between devices during the learning process. This means the devices must have enough computing power and memory, as well as fast internet connections to share information about the machine learning model. However, this method avoids sending raw data, which can use a lot of resources in traditional learning methods. Still, the devices often used in federated learning, such as IoT devices or smartphones, usually connect to Wi-Fi networks. Even though model updates are easier to send than raw data, federated learning might not always work well because of limited communication abilities.
Federated learning faces several statistical challenges:
- Differences between local datasets: Each device might have data that is not representative of the whole population, and the amount of data on each device may vary greatly;
- Changes over time: The way data is distributed on each device may change as time passes;
- Datasets must be able to work together;
- Each device’s dataset may need regular updates;
- Hiding training data could allow attackers to add hidden flaws to the global model;
- Without access to all training data, it is harder to spot unfair biases, such as those based on age, gender, or sexual orientation;
- Loss of model updates from device failures can harm the global model;
- Some devices may lack labels or annotations needed for training;
- Differences between the types of devices used for processing.
Governance issues can also affect how well federated learning works, especially when different organizations are involved. Most federated learning systems use a central server to manage the process, which raises questions about who controls the system, who owns the model, and who makes important decisions. In situations where multiple groups work together, especially in competitive or cooperative data markets, unclear governance rules can cause confusion about who is responsible for checking model updates, reviewing system behavior, and handling security problems. These issues might discourage groups from sharing models and could make it harder to follow rules, limiting how widely the technology is used.
Another governance issue is how to fairly share benefits as more groups join a federated learning system. As more participants add data, the model’s accuracy may reach a point where new contributions help less. This can create problems in shared projects because early contributors might feel later contributions are less valuable, even though they still benefit from the shared model. These imbalances can affect whether groups want to join or stay in a project and make it harder to create fair rules for sharing work, ownership, or rewards. Setting clear rules for how contributions are measured and how benefits are shared is an important challenge and an area needing further study.
Algorithms
Many methods have been developed to improve how data is shared and models are trained across different devices in federated learning.
Stochastic gradient descent is a technique used in deep learning. It calculates changes (called gradients) using a random part of the full dataset and then updates the model based on these changes.
Federated stochastic gradient descent is similar to this method but applies it to federated learning. Instead of using a random part of the dataset, it uses a random group of devices (nodes), with each device using all its data. The central server then averages the gradients based on how much data each device has and uses this average to update the model.
Federated averaging (FedAvg) is a more flexible version of FedSGD. It allows each device to update its model multiple times using its data before sharing updated model weights with the server. This reduces the need for frequent communication and works well if all devices start with the same model weights. FedAvg has been improved by using other methods like ADAM and AdaGrad, which often perform better than the basic FedAvg version.
Federated Proximal (FedProx) improves FedAvg by adding a special term to the local model updates. This term helps control how much each device’s model changes, which is useful when the data on different devices is very different (non-IID). This reduces problems caused by devices having very different data.
Federated learning methods can struggle when data is unevenly distributed across devices. In 2021, Acar et al. created FedDyn, a method that adjusts each device’s local loss function to align with the overall goal. This helps devices work together even when data is very different, and it ensures the final model performs well.
To reduce the time each device spends calculating, FedDynOneGD modifies FedDyn. It calculates only one gradient per device per round, then updates the global model. This makes the process faster and allows calculations to happen at the same time on each device. FedDynOneGD has the same guarantees about how well it works as FedDyn but uses less local computation.
Federated learning models often perform poorly when data is not the same across devices. To fix this, Vahidian et al. introduced Sub-FedAvg, which uses a technique called hybrid pruning to create personalized models. This method helps balance communication efficiency, limited resources, and accurate results for each device.
Sub-FedAvg also applies the "lottery ticket hypothesis" from central training to federated learning. This hypothesis asks whether each device can find a small, effective part of its model (a "winning ticket") that works well for its specific data. Sub-FedAvg shows that this is possible and provides methods to find these personalized parts.
IDA (Inverse Distance Aggregation) is a new method for federated learning that uses information about how far apart model parameters are to improve learning. It helps reduce the impact of unusual data and speeds up model training.
Few methods exist for hybrid federated learning, where devices have only parts of both data samples and features. However, this is important in real-world situations. In 2024, HyFDCA was introduced to solve problems in this setting. It builds on an existing method called CoCoA and works for both horizontal and vertical federated learning scenarios.
HyFDCA has several advantages:
– It guarantees convergence (progress toward the correct solution) in hybrid federated learning, including when all devices participate or only some do. It matches the speed of FedAvg and is faster than other methods in some cases.
– It includes steps to protect the privacy of data on each device.
– It performs better than other methods like HyFEM and FedAvg in many tests, using fewer resources and achieving better results.
Another method for hybrid federated learning is HyFEM, introduced in 2020. It balances local and global model accuracy but requires adjusting a parameter to work well. Unlike HyFDCA, which builds a global model that matches a central training result, HyFEM creates separate local and global models that need manual adjustments. HyFEM works with complex models like deep learning, while HyFDCA is designed for simpler problems like logistic regression.
HyFDCA was tested against HyFEM and FedAvg using real datasets like MNIST, Covtype, and News20. It performed better in most cases, achieving lower loss values and higher accuracy with less time. It also requires tuning only one parameter (the number of inner steps), while FedAvg and HyFEM require tuning three or four parameters. This makes HyFDCA easier to use in practice.
Current research topics
Federated learning became an important area of study in 2015 and 2016, with early research focusing on a method called federated averaging in telecommunication systems. Before this, a research paper titled "A Framework for Multi-source Prefetching Through Adaptive Weight" introduced a way to combine predictions from models trained at three different points in a request response cycle. Another key research area is reducing the amount of communication needed during federated learning. From 2017 to 2018, studies focused on creating strategies to manage resources, especially to lower communication needs between devices using methods like gossip algorithms and analyzing how well systems can protect data privacy. Other research aims to reduce the amount of data sent between devices by using techniques such as sparsification and quantization, which involve simplifying or compressing machine learning models before sharing them. Creating very simple deep neural network (DNN) designs is important for learning on devices and at the edge of networks. Recent work highlights the need for energy-efficient systems in federated learning and the importance of compressing deep learning models during training.
New research is beginning to use real-world communication conditions, moving away from earlier studies that assumed perfect communication channels. Another active area of study is developing federated learning methods to train different models on devices with varying levels of computing power and then combine them into a single strong model for making predictions.
A new learning framework called Assisted learning was recently created to help devices improve their learning without sharing private data, models, or learning goals. Unlike federated learning, which often needs a central system to coordinate learning, Assisted learning allows devices to learn and improve together without relying on a single global model.
Use cases
Federated learning is often used when individuals or organizations need to train models using large datasets. However, they may not be able to share the actual data due to legal, strategic, or economic reasons. The technology requires strong connections between local servers and enough computing power for each device involved.
Self-driving cars use many machine learning tools to operate. These include computer vision to identify obstacles and systems that adjust speed based on road conditions, such as rough surfaces. Because there may be many self-driving cars and they must react quickly to real-world situations, traditional cloud-based methods could create safety risks. Federated learning can help by reducing the amount of data transferred and speeding up learning.
In Industry 4.0, machine learning is widely used to improve industrial processes while ensuring safety. However, protecting sensitive data is crucial for industries and manufacturers. Federated learning can help because it does not share private information. It is also used to predict PM2.5 levels, supporting smart city applications.
Federated learning aims to solve problems related to data management and privacy by training models together without sharing raw data. Traditional methods that collect data from multiple sources raise concerns about privacy and data protection. To address this, training models across multiple medical institutions without moving data is a key technology. A study published in Nature Digital Medicine in September 2020 explored how federated learning might improve digital health. It also highlighted challenges that need solutions. Recently, 20 institutions worldwide tested federated learning’s usefulness in training AI models. In a paper titled Federated Learning for Predicting Clinical Outcomes in Patients with COVID-19, researchers showed how a federated AI model accurately predicted oxygen needs for patients with COVID-19. Another study, A Systematic Review of Federated Learning in the Healthcare Area, discussed challenges related to medical data.
A group of industry and academic organizations created MedPerf, an open-source platform that tests medical AI models using real-world data. The platform uses federated evaluation to protect patient privacy and relies on diverse committees to set neutral, clinically important benchmarks.
Robotics uses many machine learning methods, including perception, decision-making, and control. As robotic systems handle more complex tasks, such as autonomous navigation, the need for machine learning increases. Federated learning improves traditional training methods. In one study, mobile robots learned to navigate different environments using federated learning, improving their ability to adapt. Another study used federated learning to help robots navigate with limited communication bandwidth, a common challenge in real-world tasks. A third study applied federated learning to teach vision-based navigation, helping robots transfer learning from simulations to real-world settings.
Federated learning is changing biometric recognition by allowing models to train across different data sources while keeping data private. It avoids sharing sensitive information like fingerprints, facial images, or iris scans, addressing privacy and legal issues. This method improves model accuracy and helps with fragmented data, making it useful for applications like facial and iris recognition. However, federated learning faces challenges, such as differences in models and data, high computational costs, and risks like security attacks. Future work includes creating personalized federated learning systems, improving efficiency, and expanding use in areas like detecting fake biometric data and assessing data quality.