Join our daily and weekly newsletters for the latest industry-leading AI updates and exclusive content. learn more
Cyberattacks are becoming more frequent each year, and data breaches are becoming more costly. Whether companies seek to protect AI systems during development or use algorithms to improve their security posture, they must mitigate cybersecurity risks. Federated Learning has the potential to do both.
What is federated learning?
Federated learning is an approach to AI development in which multiple parties independently train a single model. Each downloads its current primary algorithm from a central cloud server. They train their configuration independently on a local server and upload it upon completion. In this way, you can share data remotely without exposing the raw data or model parameters.
A centralized algorithm weights the number of samples it receives from each separately trained configuration and aggregates them to create a single global model. All information remains on each participant’s local server or device. Instead of processing raw data, a central repository evaluates updates.
Federated Learning is rapidly growing in popularity because it addresses common security concerns related to development. It is also very popular due to its performance benefits. Research has shown that this technique can improve image classification models. Up to 20% more accuracy — Significant increase.
horizontal federated learning
There are two types of federated learning. The traditional option is horizontally federated learning. In this approach, data is divided between different devices. The datasets share a feature space but have different samples. This allows edge nodes to jointly train machine learning (ML) models without sharing information.
Vertical federated learning
With vertical federated learning, the opposite is true. The characteristics are different, but the sample is the same. Features are distributed vertically across participants, each with different attributes for the same set of entities. This approach preserves privacy because only one party has access to the complete set of sample labels.
How federated learning strengthens cybersecurity
Traditional development is prone to security gaps. Algorithms require extensive and relevant datasets to maintain accuracy, and the involvement of multiple departments and vendors creates an opening for attackers to gain entry. Lack of visibility and wide attack surface can be exploited to inject bias, perform rapid engineering, or steal sensitive training data.
As algorithms are introduced into cybersecurity roles, their performance can impact an organization’s security posture. Research shows that processing new data can cause a sudden drop in model accuracy. An AI system may seem accurate, but it may fail when tested elsewhere because it has learned to use fake shortcuts to produce convincing results.
Because AI cannot think critically or truly consider context, its accuracy decreases over time. ML models evolve as they absorb new information, but if their decision-making skills are based on shortcuts, their performance stagnates. This is where federated learning comes into play.
Other notable benefits of training a model centrally via heterogeneous updates include privacy and security. Because all participants work independently, no one has to share confidential or confidential information to progress the training. Additionally, less data transfer reduces the risk of man-in-the-middle attacks (MITM).
All updates are encrypted for secure aggregation. Multi-party computation hides them behind different encryption schemes, reducing the likelihood of compromise and MITM attacks. This increases collaboration while minimizing risk, ultimately improving your security posture.
One of the overlooked benefits of federated learning is speed. It has significantly lower latency than its centralized counterpart. Because training occurs locally rather than on a central server, algorithms can detect, classify, and respond to threats faster. With minimal latency and the ability to transmit data quickly, cybersecurity professionals can easily deal with malicious actors.
Considerations for cybersecurity professionals
Before leveraging this training method, AI engineers and cybersecurity teams need to consider several technical, security, and operational factors.
Resource usage
Developing AI is expensive. Teams building their own models should expect spending to come from anywhere. $5 million to $200 million It costs more than $5 million a year in upfront and maintenance costs. Financial commitment is important even when costs are spread across multiple parties. Business leaders need to consider the costs of cloud and edge computing.
Federated learning is also compute-intensive and may have bandwidth, storage space, or compute limitations. The cloud enables on-demand scalability, but cybersecurity teams are at risk of vendor lock-in if they’re not careful. Strategic hardware and vendor selection is paramount.
Participant trust
Although heterogeneous training is safe, it lacks transparency and raises concerns about intentional bias or malicious injection. A consensus mechanism is essential to approve model updates before a centralized algorithm aggregates them. This helps minimize the risk of threats without sacrificing confidentiality or exposing sensitive information.
Training data security
While this machine learning training method can improve a company’s security posture, it is never 100% secure. Developing models in the cloud comes with risks of insider threats, human error, and data loss. Redundancy is key. Teams should create backups to avoid interruptions and roll back updates if necessary.
Decision makers should double-check the source of their training datasets. In the ML community, heavy borrowing of datasets occurs, raising well-founded concerns about model inconsistency. For documents with codes, that’s all 50% of the task community Use borrowed datasets at least 57.8% of the time. Furthermore, 50% of the datasets there are from just 12 universities.
Applications of federated learning in cybersecurity
Once the primary algorithm aggregates and weights the participants’ updates, they can be re-shared to suit the trained application. Cybersecurity teams can use this to detect threats. There are two advantages here. Attackers can’t easily exfiltrate the data, so they can only guess, but experts accumulate insights to get highly accurate output.
Federated learning is ideal for adjacent applications such as threat classification and indicators for breach detection. AI’s large datasets and extensive training build a knowledge base and curate extensive expertise. Cybersecurity professionals can use this model as an integrated defense mechanism to protect a wide range of attack surfaces.
ML models, especially those that make predictions, tend to fluctuate over time as concepts evolve and variables become less relevant. Federated learning allows teams to regularly update their models with different features and data samples to gain more accurate and timely insights.
Leveraging federated learning for cybersecurity
Companies should consider using federated learning, whether they want to protect training datasets or leverage AI for threat detection. This technique has the potential to improve accuracy and performance and strengthen your security posture, as long as you can strategically avoid potential insider threats and breach risks.
Zac Amos is the Features Editor for: Rehack.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technologists who work with data, can share data-related insights and innovations.
If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
Why not consider contributing your own articles?
Read more about DataDecisionMakers