Consumers and regulators are putting more time and focus into enhancing privacy connected to the acquisition and use of data. The GDPR – General Data Protection Regulation was implemented in Europe in 2018 and impacted companies conducting business there.
Businesses must be more careful while gathering, storing, using, and transferring customer data under GDPR. In addition, the United States implemented the California Consumer Privacy Act (CCPA). Thanks to the CCPA, Americans now have the right to ask businesses what kind of data they have and request that it be deleted.
When it comes to AI (machine learning and deep learning), enough data is needed to train AI models, and this data frequently contains personal information.
Given the aforementioned new areas of regulation and policies, new machine learning approaches like federated learning (FL) have been created in part to address these problems since data privacy and security constitute an increasingly vital concern.
Once you have a basic knowledge of what is federated learning, this article will detail the privacy and security features of federated learning.
Federated Learning Security and Privacy Overview
In 2017, Google first proposed the concept of federated learning. The main feature of federated learning is that it allows data scientists to train shared statistical models using local data sets on decentralized servers or devices.
Although data scientists train using the same model, there is no need to share sensitive information with coworkers or research teams or upload it to the cloud. Federated learning lowers data security and privacy concerns by retaining local data stores as opposed to conventional centralized machine learning techniques, which need data sets to live on a single server.
The separation of data provisioned at end-user equipment from machine learning model aggregation, like deep learning network parameters at a centralized server, has attracted a lot of interest in FL’s strategy for handling the issue of user privacy protection.
To jointly learn a global model without explicitly compromising data privacy is FL’s sole goal. FL provides clear privacy advantages when comparing FL to data center training on a data set.
By linking to other data sets, even storing an “anonymized” data set on a server can compromise the privacy of its users. For FL, on the other hand, only minor updates to a certain machine learning model’s correctness constitute the information communicated.
The updates themselves may be transient and will never be able to supplement the training data in terms of information.
Recently, NVIDIA added FL to their platform for autonomous driving as an example use case. OEMs must train each model using various driving data sets because distinct geographic landscapes and possible driving circumstances exist across different areas.
The shared models at each OEM will be able to be retrained using local data thanks to the company’s DGX edge platform. The local training results can be transmitted back to the FL server via a secure channel to update the shared model.
Future for Federated Learning Security and Privacy
Machine learning research is thriving in the field of FL. Researchers are working hard to improve the methodology’s capacity to fulfill privacy and security requirements. For instance, the description of privacy above includes privacy at a local or global level regarding all of the network’s devices.
However, given that privacy restrictions may vary between devices or data points on a single device, it could be required to define privacy more precisely in practice.
One suggestion is to substitute sample-specific privacy guarantees for user-specific ones, which would offer a less-secure level of privacy in return for more precise models. Developing techniques to manage hybrid device-specific or sample-specific privacy limitations looks promising.
Another potential FL trend that is complex and difficult to achieve is the concurrent training of deep learning models on distributed data sets while maintaining data privacy.
To preserve privacy while providing parallel training, one group of academics has created the federated learning framework (FEDF). Using the framework, a model can learn using several geographically dispersed training data sets (which may belong to various owners).
Still, neither the training data sets nor the intermediate results are publicized.
Conclusion
This article discussed security and privacy in federated learning, a significant new method gaining acceptance for distributed machine learning.
New approaches like federated learning show great promise as the necessity of security and privacy for machine learning models increases due to new regulations like GDPR and CCPA.
By using a local data set to train shared statistical models based on decentralized devices or servers, federated learning is possible to address several significant issues regarding personal.