Editing article

Title

Summary

Content

<h3>Secure Together — Federated Learning for Decentralized Security on GCP</h3>Integrating security mechanisms to enhance organization posture with FL<figure><img alt="" src="https://cdn-images-1.medium.com/max/500/0*yCOc3CDRbhGEVJwn.jpeg" /></figure>As I might have emphasized enough, I am not a machine learning guy, neither am I able to be the AI boss around people talking deep about models and other jargons that I am falling short of even talking about it right now. But you, you can be rest assured that if you’re reading this article to learn, you’ll be able to because if I could, you can as well.Federated Learning (FL) enables cooperative training on decentralized data. By maintaining sensitive data on individual devices or inside organizational silos, this strategy promotes security and privacy in security-sensitive applications. Google Cloud is a desirable choice for developing decentralized security solutions because it provides a stable platform for implementing FL workflows.This article explores the fundamental ideas of Federated Learning (FL), looks at how it can help with decentralized security on Google Cloud, and presents use cases along with tools and code samples.<h3>Understanding FL</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*if1Dz3C2Ej8Nx9D5.png" /></figure>Large volumes of data must frequently be centrally located in order for traditional machine learning algorithms to be trained. Privacy issues are brought up by this method, particularly when handling sensitive data such as medical records or financial transactions. Federated learning presents a strong substitute.In FL, the training procedure is managed by a central coordinator who does not have direct access to each individual data point. The workflow is broken down as follows:<ul><li>Model Distribution: To enable devices or organizations to participate, the coordinator distributes a preliminary global model to them.</li><li>Local Training: Using their own data, each participant trains the model locally. Privacy is guaranteed by this localized training because the raw data never leaves the device or silo.</li><li>Model Updates: In contrast to sending raw data, participants send the coordinator only the model updates, or gradients, greatly cutting down on communication overhead.</li><li>Aggregation of the Global Model: The coordinator compiles the updates that are received and applies them to enhance the global model.</li><li>Iteration: The global model is improved iteratively without jeopardizing data privacy by repeating steps 1–4 for a number of rounds.</li></ul><h4>So what are the benefits?</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*cUasHA5oNG5hzkMY.png" /></figure>FL offers a number of benefits for developing private-preserving and safe security solutions on Google Cloud:<ul><li>Enhanced Data Privacy: FL reduces the possibility of data breaches and unauthorized access by maintaining data decentralization. Organizations handling sensitive security data, such as threat intelligence or user behavior patterns, will especially benefit from this.</li><li>Enhanced Regulatory Compliance: By reducing data collection and sharing, FL can assist businesses in complying with stringent data privacy laws like the California Consumer Privacy Act and the General Data Protection Regulation.</li><li>Collaborative Threat Intelligence Sharing: FL allows security teams from different organizations to securely collaborate with one another. Without disclosing their unique threat intelligence datasets, they can jointly train a threat detection model. This promotes a more thorough comprehension of the changing threat environment.</li><li>On-Device Security Training: FL enables security model training on user devices directly. This protects user privacy while enabling real-time, personalized threat detection and anomaly identification.</li><li>Federated Learning for Secure Multi-party Computation (SMC): To conduct secure computations on sensitive data dispersed among several parties, FL can be coupled with SMC methodologies. This creates opportunities for sophisticated analytics in security applications that protect privacy.</li></ul><h3>Getting to work</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/453/0*Nk2dhSQSVM-GBvrX.jpg" /></figure>Let’s talk about some of the ways we can use FL for securing postures<h4>Collaborative Malware Detection</h4>Conventional methods of malware detection frequently rely on signature-based techniques. These techniques compare files with known malicious patterns to identify malware. On the other hand, zero-day attacks — attackers who employ novel tactics — are difficult for signature-based methods to identify.This restriction is addressed by collaborative malware detection, which shares threat intelligence amongst various systems. This knowledge may consist of:<ul><li>File hashes of known malware: Systems can swiftly recognize malware that has already been encountered by exchanging file hashes.</li><li>Data from behavioral analysis: Exchanging information about how files work with the system makes it easier to spot questionable patterns of behavior.</li><li>Compromise Indicators (IOCs): Collective defense is strengthened when information related to malware campaigns, such as URLs, IP addresses, and domain names, are shared.</li></ul>Collaborative detection systems are better able to recognize new malware variants and emerging threats by pooling this shared intelligence.Prepping ourselves<ul><li>Collect Data: Compile a wide range of benign and malware samples, such as PE and APK files. Online public malware datasets are accessible, but make sure to observe ethical and legal requirements.</li></ul><pre>import apache_beam as beam class IngestMalware(beam.DoFn): def process(self, element): # element: Malware sample metadata (e.g., filename, source) file_name = element[&#39;filename&#39;] # Download malware sample from source based on metadata download_and_save_malware(file_name) yield {&#39;filePath&#39;: f&#39;gs://your-bucket/{file_name}&#39;} # Upload to GCS with beam.Pipeline() as pipeline: malware_data = ( pipeline | &#39;ReadMetadata&#39; &gt;&gt; beam.io.ReadFromText(&#39;path/to/metadata.csv&#39;) | &#39;IngestMalware&#39; &gt;&gt; beam.ParDo(IngestMalware()) )</pre><ul><li>Data Labeling: Assign a malicious or benign label to every file. Crowdsourcing platforms or security experts can perform this manually.</li><li>Data Preprocessing: Prepare and clean the data in accordance with the specifications of the selected machine learning model. This could entail formatting, normalization, and feature extraction.</li></ul><pre>import kfp.components as comp # Download and pre-process internal security data download_security_data = comp....(source=&quot;internal_security_logs&quot;) preprocess_security_data = comp....(inputs=[download_security_data.outputs[&quot;data&quot;]]) # Download and pre-process public threat intelligence data download_threat_intel = comp....(source=&quot;public_threat_feed_url&quot;) preprocess_threat_intel = comp....(inputs=[download_threat_intel.outputs[&quot;data&quot;]]) # Merge both pre-processed datasets merged_data = comp....(inputs=[preprocess_security_data.outputs[&quot;data&quot;], preprocess_threat_intel.outputs[&quot;data&quot;]]) # Create a Vertex AI Pipeline with these components training_pipeline = comp.pipeline( name=&quot;data_preprocessing_pipeline&quot;, description=&quot;Preprocesses data for malware detection model training&quot;, components=[ download_security_data, preprocess_security_data, download_threat_intel, preprocess_threat_intel, merged_data, ], )</pre>I know you guys are professionals so we won’t delve deeper into this with code. Moving On!Training Our Model<ul><li>Select a Model: Depending on the format of your data, choose an appropriate machine learning model (e.g., image classification for executables, NLP for scripts). Scikit-learn models and TensorFlow are popular options.</li><li>Create a Training Script: To load, preprocess, and train the model using your labeled data, write a Python script. For resource management and dispersed training, use Vertex AI Training.</li></ul><pre>from google.cloud importaiplatform project = &quot;your-project-id&quot; location = &quot;us-central1&quot; endpoint = aiplatform.Endpoint.create( display_name=&quot;malware-detection-endpoint&quot;, project=project, location=location, ) dataset = aiplatform.Dataset.create( display_name=&quot;malware-dataset&quot;, project=project, location=location, ) # Define training and validation splits train_split = 0.8 training_job = aiplatform.TrainingJob.create( display_name=&quot;malware-detection-training&quot;, project=project, location=location, dataset=dataset, split=train_split, machine_type=&quot;n1-standard-4&quot;, # Adjust machine type as needed target_rotation_period=&quot;30d&quot;, # Periodic retraining to stay up-to-date encryption_spec_key_name=&quot;your-encryption-key&quot;, # Optional encryption ) # Monitor training job progress using aiplatform.TrainingJob.get(training_job.name)</pre>Alert generationThis sample of code shows how a Cloud Function is started by a Pub/Sub message that contains a malware detection from Vertex AI. Based on collaborative detection results, the function determines the threat type of the finding and, if it indicates malware, generates an alert.<pre>import json def analyze_malware_finding(data, context): # Access the Pub/Sub message data payload = json.loads(data.pubsubj) finding = payload[&quot;finding&quot;] # Check if the finding indicates malware based on collaborative detection results if finding[&quot;threat_type&quot;] == &quot;MALWARE&quot;: # Generate an alert with details from the finding alert_message = f&quot;Potential Malware Detected: {finding[&#39;file_hash&#39;]}&quot; # Send the alert using a notification service (e.g., Cloud Monitoring) </pre>Alert Integration (Cloud Monitoring API)<pre>from google.cloud import monitoring_v3 alerts_service = monitoring_v3.AlertingPolicyServiceClient() # Define the alert policy details alert_policy = monitoring_v3.AlertPolicy( name=f&quot;projects/{project}/locations/{location}/alertPolicies/malware_detection_alert&quot;, # ... other policy configuration options ) # Create the alert policy alerts_service.CreateAlertPolicy(request={&quot;parent&quot;: parent, &quot;alert_policy&quot;: alert_policy})</pre>Note: This is a simplified overview. You’ll need to fill in the details based on your specific requirements and chosen tools. Refer to the Vertex AI and Cloud Monitoring documentation for comprehensive instructions and code examples.<h3>Resources</h3><ul><li>Vertex AI Pipelines: <a href="https://cloud.google.com/vertex-ai/docs/pipelines/introduction">https://cloud.google.com/vertex-ai/docs/pipelines/introduction</a></li><li>Custom Training in Vertex AI: <a href="https://cloud.google.com/vertex-ai/docs/training/overview">https://cloud.google.com/vertex-ai/docs/training/overview</a></li><li>Cloud Monitoring Metrics: <a href="https://cloud.google.com/monitoring/api/metrics_gcp">https://cloud.google.com/monitoring/api/metrics_gcp</a></li><li>Alerting Policies in Cloud Monitoring: <a href="https://cloud.google.com/monitoring/alerts">https://cloud.google.com/monitoring/alerts</a></li><li><a href="https://federated.withgoogle.com/">https://federated.withgoogle.com/</a></li></ul><h3>Get in Touch?</h3><a href="https://imranfosec.linkb.org/">Imran Roshan</a><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4c6219ba8f09" width="1" height="1" alt=""><hr><a href="https://medium.com/google-cloud/secure-together-federated-learning-for-decentralized-security-on-gcp-4c6219ba8f09">Secure Together — Federated Learning for Decentralized Security on GCP</a> was originally published in <a href="https://medium.com/google-cloud">Google Cloud - Community</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.

Author

Link

Published date

Image url

Feed url

Guid

Hidden blurb

--- !ruby/object:Feedjira::Parser::RSSEntry
title: Secure Together — Federated Learning for Decentralized Security on GCP
url: https://medium.com/google-cloud/secure-together-federated-learning-for-decentralized-security-on-gcp-4c6219ba8f09?source=rss----e52cf94d98af---4
author: Imran Roshan
categories:
- technology
- ai
- google-cloud-platform
- machine-learning
- python
published: 2024-03-28 10:20:14.000000000 Z
entry_id: !ruby/object:Feedjira::Parser::GloballyUniqueIdentifier
 is_perma_link: 'false'
 guid: https://medium.com/p/4c6219ba8f09
carlessian_info:
 news_filer_version: 2
 newspaper: Google Cloud - Medium
 macro_region: Blogs
rss_fields:
- title
- url
- author
- categories
- published
- entry_id
- content
content: '<h3>Secure Together — Federated Learning for Decentralized Security on GCP</h3>Integrating
 security mechanisms to enhance organization posture with FL<figure><img alt=""
 src="https://cdn-images-1.medium.com/max/500/0*yCOc3CDRbhGEVJwn.jpeg" /></figure>As
 I might have emphasized enough, I am not a machine learning guy, neither am I able
 to be the AI boss around people talking deep about models and other jargons that
 I am falling short of even talking about it right now. But you, you can be rest
 assured that if you’re reading this article to learn, you’ll be able to because
 if I could, you can as well.Federated Learning (FL) enables cooperative training
 on decentralized data. By maintaining sensitive data on individual devices or inside
 organizational silos, this strategy promotes security and privacy in security-sensitive
 applications. Google Cloud is a desirable choice for developing decentralized security
 solutions because it provides a stable platform for implementing FL workflows.This
 article explores the fundamental ideas of Federated Learning (FL), looks at how
 it can help with decentralized security on Google Cloud, and presents use cases
 along with tools and code samples.<h3>Understanding FL</h3><figure><img alt=""
 src="https://cdn-images-1.medium.com/max/1024/0*if1Dz3C2Ej8Nx9D5.png" /></figure>Large
 volumes of data must frequently be centrally located in order for traditional machine
 learning algorithms to be trained. Privacy issues are brought up by this method,
 particularly when handling sensitive data such as medical records or financial transactions.
 Federated learning presents a strong substitute.In FL, the training procedure
 is managed by a central coordinator who does not have direct access to each individual
 data point. The workflow is broken down as follows:<ul><li>Model Distribution:
 To enable devices or organizations to participate, the coordinator distributes a
 preliminary global model to them.</li><li>Local Training: Using their own data,
 each participant trains the model locally. Privacy is guaranteed by this localized
 training because the raw data never leaves the device or silo.</li><li>Model Updates:
 In contrast to sending raw data, participants send the coordinator only the model
 updates, or gradients, greatly cutting down on communication overhead.</li><li>Aggregation
 of the Global Model: The coordinator compiles the updates that are received and
 applies them to enhance the global model.</li><li>Iteration: The global model is
 improved iteratively without jeopardizing data privacy by repeating steps 1–4 for
 a number of rounds.</li></ul><h4>So what are the benefits?</h4><figure><img alt=""
 src="https://cdn-images-1.medium.com/max/1024/0*cUasHA5oNG5hzkMY.png" /></figure>FL
 offers a number of benefits for developing private-preserving and safe security
 solutions on Google Cloud:<ul><li>Enhanced Data Privacy: FL reduces the possibility
 of data breaches and unauthorized access by maintaining data decentralization. Organizations
 handling sensitive security data, such as threat intelligence or user behavior patterns,
 will especially benefit from this.</li><li>Enhanced Regulatory Compliance: By reducing
 data collection and sharing, FL can assist businesses in complying with stringent
 data privacy laws like the California Consumer Privacy Act and the General Data
 Protection Regulation.</li><li>Collaborative Threat Intelligence Sharing: FL allows
 security teams from different organizations to securely collaborate with one another.
 Without disclosing their unique threat intelligence datasets, they can jointly train
 a threat detection model. This promotes a more thorough comprehension of the changing
 threat environment.</li><li>On-Device Security Training: FL enables security model
 training on user devices directly. This protects user privacy while enabling real-time,
 personalized threat detection and anomaly identification.</li><li>Federated Learning
 for Secure Multi-party Computation (SMC): To conduct secure computations on sensitive
 data dispersed among several parties, FL can be coupled with SMC methodologies.
 This creates opportunities for sophisticated analytics in security applications
 that protect privacy.</li></ul><h3>Getting to work</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/453/0*Nk2dhSQSVM-GBvrX.jpg"
 /></figure>Let’s talk about some of the ways we can use FL for securing postures<h4>Collaborative
 Malware Detection</h4>Conventional methods of malware detection frequently
 rely on signature-based techniques. These techniques compare files with known malicious
 patterns to identify malware. On the other hand, zero-day attacks — attackers who
 employ novel tactics — are difficult for signature-based methods to identify.This
 restriction is addressed by collaborative malware detection, which shares threat
 intelligence amongst various systems. This knowledge may consist of:<ul><li>File
 hashes of known malware: Systems can swiftly recognize malware that has already
 been encountered by exchanging file hashes.</li><li>Data from behavioral analysis:
 Exchanging information about how files work with the system makes it easier to spot
 questionable patterns of behavior.</li><li>Compromise Indicators (IOCs): Collective
 defense is strengthened when information related to malware campaigns, such as URLs,
 IP addresses, and domain names, are shared.</li></ul>Collaborative detection
 systems are better able to recognize new malware variants and emerging threats by
 pooling this shared intelligence.Prepping ourselves<ul><li>Collect
 Data: Compile a wide range of benign and malware samples, such as PE and APK files.
 Online public malware datasets are accessible, but make sure to observe ethical
 and legal requirements.</li></ul><pre>import apache_beam as beam class IngestMalware(beam.DoFn): def
 process(self, element): # element: Malware sample metadata (e.g., filename,
 source) file_name = element[&#39;filename&#39;] # Download
 malware sample from source based on metadata download_and_save_malware(file_name) yield
 {&#39;filePath&#39;: f&#39;gs://your-bucket/{file_name}&#39;} # Upload to GCS with
 beam.Pipeline() as pipeline: malware_data = ( pipeline |
 &#39;ReadMetadata&#39; &gt;&gt; beam.io.ReadFromText(&#39;path/to/metadata.csv&#39;) |
 &#39;IngestMalware&#39; &gt;&gt; beam.ParDo(IngestMalware()) )</pre><ul><li>Data
 Labeling: Assign a malicious or benign label to every file. Crowdsourcing platforms
 or security experts can perform this manually.</li><li>Data Preprocessing: Prepare
 and clean the data in accordance with the specifications of the selected machine
 learning model. This could entail formatting, normalization, and feature extraction.</li></ul><pre>import
 kfp.components as comp # Download and pre-process internal security data download_security_data
 = comp....(source=&quot;internal_security_logs&quot;) preprocess_security_data
 = comp....(inputs=[download_security_data.outputs[&quot;data&quot;]]) # Download
 and pre-process public threat intelligence data download_threat_intel = comp....(source=&quot;public_threat_feed_url&quot;) preprocess_threat_intel
 = comp....(inputs=[download_threat_intel.outputs[&quot;data&quot;]]) # Merge
 both pre-processed datasets merged_data = comp....(inputs=[preprocess_security_data.outputs[&quot;data&quot;],
 preprocess_threat_intel.outputs[&quot;data&quot;]]) # Create a Vertex AI
 Pipeline with these components training_pipeline = comp.pipeline( name=&quot;data_preprocessing_pipeline&quot;, description=&quot;Preprocesses
 data for malware detection model training&quot;, components=[ download_security_data, preprocess_security_data, download_threat_intel, preprocess_threat_intel, merged_data, ], )</pre>I
 know you guys are professionals so we won’t delve deeper into this with code. Moving On!Training
 Our Model<ul><li>Select a Model: Depending on the format of your
 data, choose an appropriate machine learning model (e.g., image classification for
 executables, NLP for scripts). Scikit-learn models and TensorFlow are popular options.</li><li>Create
 a Training Script: To load, preprocess, and train the model using your labeled data,
 write a Python script. For resource management and dispersed training, use Vertex
 AI Training.</li></ul><pre>from google.cloud importaiplatform project = &quot;your-project-id&quot; location
 = &quot;us-central1&quot; endpoint = aiplatform.Endpoint.create( display_name=&quot;malware-detection-endpoint&quot;, project=project, location=location, ) dataset
 = aiplatform.Dataset.create( display_name=&quot;malware-dataset&quot;, project=project, location=location, ) #
 Define training and validation splits train_split = 0.8 training_job =
 aiplatform.TrainingJob.create( display_name=&quot;malware-detection-training&quot;, project=project, location=location, dataset=dataset, split=train_split, machine_type=&quot;n1-standard-4&quot;, #
 Adjust machine type as needed target_rotation_period=&quot;30d&quot;, #
 Periodic retraining to stay up-to-date encryption_spec_key_name=&quot;your-encryption-key&quot;, #
 Optional encryption ) # Monitor training job progress using aiplatform.TrainingJob.get(training_job.name)</pre>Alert
 generationThis sample of code shows how a Cloud Function is
 started by a Pub/Sub message that contains a malware detection from Vertex AI. Based
 on collaborative detection results, the function determines the threat type of the
 finding and, if it indicates malware, generates an alert.<pre>import json def
 analyze_malware_finding(data, context): # Access the Pub/Sub message data payload
 = json.loads(data.pubsubj) finding = payload[&quot;finding&quot;] #
 Check if the finding indicates malware based on collaborative detection results if
 finding[&quot;threat_type&quot;] == &quot;MALWARE&quot;: # Generate an alert
 with details from the finding alert_message = f&quot;Potential Malware Detected:
 {finding[&#39;file_hash&#39;]}&quot; # Send the alert using a notification
 service (e.g., Cloud Monitoring) </pre>Alert Integration (Cloud
 Monitoring API)<pre>from google.cloud import monitoring_v3 alerts_service
 = monitoring_v3.AlertingPolicyServiceClient() # Define the alert policy details alert_policy
 = monitoring_v3.AlertPolicy( name=f&quot;projects/{project}/locations/{location}/alertPolicies/malware_detection_alert&quot;, #
 ... other policy configuration options ) # Create the alert policy alerts_service.CreateAlertPolicy(request={&quot;parent&quot;:
 parent, &quot;alert_policy&quot;: alert_policy})</pre>Note: This
 is a simplified overview. You’ll need to fill in the details based on your specific
 requirements and chosen tools. Refer to the Vertex AI and Cloud Monitoring documentation
 for comprehensive instructions and code examples.<h3>Resources</h3><ul><li>Vertex
 AI Pipelines: <a href="https://cloud.google.com/vertex-ai/docs/pipelines/introduction">https://cloud.google.com/vertex-ai/docs/pipelines/introduction</a></li><li>Custom
 Training in Vertex AI: <a href="https://cloud.google.com/vertex-ai/docs/training/overview">https://cloud.google.com/vertex-ai/docs/training/overview</a></li><li>Cloud
 Monitoring Metrics: <a href="https://cloud.google.com/monitoring/api/metrics_gcp">https://cloud.google.com/monitoring/api/metrics_gcp</a></li><li>Alerting
 Policies in Cloud Monitoring: <a href="https://cloud.google.com/monitoring/alerts">https://cloud.google.com/monitoring/alerts</a></li><li><a
 href="https://federated.withgoogle.com/">https://federated.withgoogle.com/</a></li></ul><h3>Get
 in Touch?</h3><a href="https://imranfosec.linkb.org/">Imran Roshan</a><img
 src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4c6219ba8f09"
 width="1" height="1" alt=""><hr><a href="https://medium.com/google-cloud/secure-together-federated-learning-for-decentralized-security-on-gcp-4c6219ba8f09">Secure Together — Federated
 Learning for Decentralized Security on GCP</a> was originally published in <a href="https://medium.com/google-cloud">Google
 Cloud - Community</a> on Medium, where people are continuing the conversation by
 highlighting and responding to this story.'

Language

Active

Ricc internal notes

Ricc source

Show this article Back to articles