Editing article

Title

Summary

<div class="block-paragraph_advanced"><p><span style="font-style: italic; vertical-align: baseline;"><strong>Editor’s note</strong>: Today’s blog post was written in collaboration with Google Cloud Premier Partner </span><a href="https://www.virtusa.com/" rel="noopener" target="_blank"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">Virtusa</span></a><span style="font-style: italic; vertical-align: baseline;">, which specializes in building cloud-native, microservices-based, pre-built solutions and APIs on Kubernetes, as well as machine learning and data-oriented applications, as well as offering managed cloud services and cloud operations design.</span></p>
<hr/>
<p><span style="vertical-align: baseline;">In the ever-evolving world of data engineering and analytics, traditional centralized data architectures are facing limitations in scalability, agility, and governance. To address these challenges, a new paradigm, called “data mesh,” has emerged, which allows organizations to take a decentralized approach to data architecture. This blog post explores data mesh as a concept and delineates the ways that </span><a href="https://cloud.google.com/dataplex"><span style="text-decoration: underline; vertical-align: baseline;">Dataplex</span></a><span style="vertical-align: baseline;">, a data fabric capability within the BigQuery suite, can be used to realize the benefits of this decentralized data architecture.</span></p>
<p><span style="vertical-align: baseline;">Data mesh is an architectural framework that promotes the idea of treating data as a product and decentralizes data ownership and infrastructure. It enables teams across an organization to be responsible for their own data domains, allowing for greater autonomy, scalability, and data democratization. Instead of relying on a centralized data team, individual teams or data products take ownership of their data, including its quality, schema, and governance. This distributed responsibility model leads to improved data discovery, easier data integration, and faster insights.</span></p>
<p><span style="vertical-align: baseline;">The illustration in Figure 1 is an overview of data mesh’s fundamental building blocks.</span></p></div>
<div class="block-image_full_width">

<img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-1.max-1000x1000.png"
        
          alt="Image-1">
        
        </a>
      
        <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 1: Representation of a data mesh concept</p></figcaption>
      
    </figure>

</div>
    </div>

</div>
<div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Let’s discuss the core principles of data mesh architecture, then understand how they change the way we manage and leverage data.</span></p></div>
<div class="block-image_full_width">

<img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-2.max-1000x1000.png"
        
          alt="Image-2">
        
        </a>
      
        <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 2: Core principals of data mesh architecture</p></figcaption>
      
    </figure>

</div>
    </div>

</div>
<div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Domain-oriented ownership:</strong><span style="vertical-align: baseline;"> Data mesh emphasizes decentralizing data ownership and the allocation of responsibility to individual domains or business units within an organization. Each domain takes responsibility for managing its own data, including data quality, access controls, and governance. By doing so, domain experts are empowered, fostering a sense of ownership and accountability. This principle aligns data management with the specific needs and knowledge of each domain, ensuring better data quality and decision-making.</span></p>
<p><strong style="vertical-align: baseline;">Self-serve data infrastructure:</strong><span style="vertical-align: baseline;"> In a data mesh architecture, data infrastructure is treated as a product that provides self-serve capabilities to domain teams. Instead of relying on a centralized data team or platform, domain teams have the autonomy to choose and manage their own data storage, processing, and analysis tools. This approach allows teams to tailor their data infrastructure to their specific requirements, accelerating their workflows and reducing dependencies on centralized resources.</span></p>
<p><strong style="vertical-align: baseline;">Federated computational governance: </strong><span style="vertical-align: baseline;">Data governance in a data mesh is not dictated by a central authority; rather, it follows a federated model. Each domain team collaboratively defines and enforces data governance practices that align with their specific domain requirements. This approach ensures that governance decisions are made by those closest to the data, and it allows for flexibility in adapting to domain-specific needs. Federated computational governance promotes trust, accountability, and agility in managing data assets.</span></p>
<p><strong style="vertical-align: baseline;">Data as a product:</strong><span style="vertical-align: baseline;"> Data in a data mesh is treated as a product, and data platforms are built and managed with a product mindset. This means focusing on providing value to the end users (domain teams) and continuously iterating and improving the data infrastructure based on feedback. When teams adopt a product thinking approach, data platforms become user-friendly, reliable, and scalable. They evolve in response to changing requirements and deliver tangible value to the organization.</span></p>
<h3><strong style="vertical-align: baseline;">Understanding Dataplex</strong></h3>
<p><span style="vertical-align: baseline;">Dataplex is a cloud-native, intelligent data fabric platform designed to simplify and streamline the management, integration, and analysis of large and complex data sets. It offers a unified approach to data governance, data discovery, and data lineage, enabling organizations to gain more value from their data.</span></p></div>
<div class="block-image_full_width">

<img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-3.max-1000x1000.png"
        
          alt="Image-3">
        
        </a>
      
        <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 3: Google Cloud Dataplex capabilities</p></figcaption>
      
    </figure>

</div>
    </div>

</div>
<div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Key features and benefits of Dataplex</span><strong style="font-style: italic; vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">include data integration from various sources into a unified data fabric; robust data governance capabilities that help ensure security and compliance; intelligent data discovery tools for enhanced data visibility and accessibility; scalability and flexibility to handle large volumes of data in real-time; multi-cloud support for leveraging data across different cloud providers; and efficient metadata management for improved data organization and accessibility.</span></p>
<h3><strong style="vertical-align: baseline;">Steps to implement data mesh using Dataplex</strong></h3>
<p><strong style="vertical-align: baseline;">Step 1: Create a data lake and define the data domain.</strong></p>
<p><span style="vertical-align: baseline;">In this step, we set up a data lake on Google Cloud and establish the data domain, which refers to the scope and boundaries of the data that will be stored and managed in the data lake. A data lake is a centralized repository that allows you to store structured, semi-structured, and unstructured data in its native format, making it a flexible and scalable solution for big data storage and analytics.</span></p>
<p><span style="vertical-align: baseline;">The following diagram illustrates domains as Dataplex lakes, each owned by distinct data producers. Within their respective domains, data producers maintain control over creation, curation, and access. Conversely, data consumers have the ability to request access to these lakes (domains) or specific zones (subdomains) to conduct their analysis.</span></p></div>
<div class="block-image_full_width">

<img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-4.max-1000x1000.png"
        
          alt="Image-4">
        
        </a>
      
        <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 4: Decentralized data with defined ownership</p></figcaption>
      
    </figure>

</div>
    </div>

</div>
<div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Step 2: Create zones in your data lake and define the data zones.</strong></p>
<p><span style="vertical-align: baseline;">In this step, we divide the data lake into zones. Each zone serves a specific purpose and has well-defined characteristics. Zones help organize data based on factors like data type, access requirements, and processing needs. Creating data zones provides better data governance, security, and efficiency within the data lake environment. </span></p>
<p><span style="vertical-align: baseline;">Common data zones include the following:</span></p>
<ul>
<li aria-level="1" style="list-style-type: disc; vertical-align: baseline;">
<p role="presentation"><strong style="vertical-align: baseline;">Raw zone</strong><strong style="font-style: italic; vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> This zone is dedicated to ingesting and storing raw, unprocessed data. It is the landing zone for new data as it enters the data lake. Data in this zone is typically kept in its native format, making it ideal for data archival and data lineage purposes.</span></p>
</li>
<li aria-level="1" style="list-style-type: disc; vertical-align: baseline;">
<p role="presentation"><strong style="vertical-align: baseline;">Curated zone:</strong><span style="vertical-align: baseline;"> The curated zone is where data is prepared and cleansed before it moves to other zones. This zone may involve data transformation, normalization, or deduplication to ensure data quality.</span></p>
</li>
<li aria-level="1" style="list-style-type: disc; vertical-align: baseline;">
<p role="presentation"><strong style="vertical-align: baseline;">Transformed zone:</strong><span style="vertical-align: baseline;"> The transformed zone holds high-quality, transformed, and structured data that is ready for consumption by data analysts and other users. Data in this zone is organized and optimized for analytical purposes.</span></p>
</li>
</ul></div>
<div class="block-image_full_width">

<img
            src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Qm8Yf3U.max-1000x1000.png"
        
          alt="Image-5">
        
        </a>
      
        <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 5: Data zones inside a data lake</p></figcaption>
      
    </figure>

</div>
    </div>

</div>
<div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Step 3: Add assets to the data lake zones.</strong></p>
<p><span style="vertical-align: baseline;"> In this step, we focus on adding assets to the different data lake zones. Assets refer to the data files, data sets, or resources that are ingested into the data lake and stored within their respective zones. By adding assets to the zones, you populate the data lake with valuable data that can be utilized for analysis, reporting, and other data-driven processes.</span></p>
<p><strong style="vertical-align: baseline;">Step 4: Secure your data lake. </strong></p>
<p><span style="vertical-align: baseline;">In this step, we implement robust security measures to safeguard your data lake and the sensitive data it holds. A secure data lake is crucial for protecting sensitive information, helping to ensure compliance with data regulations, and maintaining the trust of your users and stakeholders.</span></p>
<p><span style="vertical-align: baseline;">The security model in Dataplex enables you to control access for performing the following tasks:</span></p>
<ul>
<li aria-level="1" style="list-style-type: disc; vertical-align: baseline;">
<p role="presentation"><span style="vertical-align: baseline;">Managing a data lake, which involves tasks such as creating and associating assets, defining zones, and setting up additional data lakes</span></p>
</li>
<li aria-level="1" style="list-style-type: disc; vertical-align: baseline;">
<p role="presentation"><span style="vertical-align: baseline;"> Retrieving data linked to a data lake via the mapped asset (e.g., BigQuery data sets and storage buckets)</span></p>
</li>
<li aria-level="1" style="list-style-type: disc; vertical-align: baseline;">
<p role="presentation"><span style="vertical-align: baseline;"> Retrieving metadata associated with the data linked to a data lake</span></p>
</li>
</ul>
<p><span style="vertical-align: baseline;">The administrator of a data lake manages access to Dataplex resources (including the lake, zones, and assets) by assigning the necessary basic and predefined roles.  Metadata roles possess the capability to access and examine metadata, including table schemas. With data roles granted, it gives the privilege to read or write data in the underlying resources referenced by the assets within the data lake.</span></p>
<h3><strong style="vertical-align: baseline;">Advantages of building a data mesh</strong></h3>
<p><strong style="vertical-align: baseline;">Improved data ownership and accountability:</strong><span style="vertical-align: baseline;"> One of the primary advantages of a data mesh is the shift in data ownership and accountability to individual domain teams. By decentralizing data governance, each team becomes responsible for the quality, integrity, and security of their data products. </span></p>
<p><strong style="vertical-align: baseline;">Agility and flexibility</strong><span style="vertical-align: baseline;">: Data meshes empower domain teams to be autonomous in their decision-making, allowing them to respond swiftly to evolving business needs. This agility enables faster time to market for new data products and iterative improvements to existing ones. </span></p>
<p><strong style="vertical-align: baseline;">Scalability and reduced bottlenecks:</strong><span style="vertical-align: baseline;"> A data mesh eliminates scalability bottlenecks by distributing data processing and analysis across domain teams. Each team can independently scale its data infrastructure based on its specific needs, ensuring efficient handling of increasing data volumes.</span></p>
<p><strong style="vertical-align: baseline;">Enhanced data discoverability and accessibility:</strong><span style="vertical-align: baseline;"> Data meshes emphasize metadata management, enabling better data discoverability and accessibility. With comprehensive metadata, teams can easily locate and understand available data assets. </span></p>
<p><strong style="vertical-align: baseline;">Empowerment and collaboration:</strong><span style="vertical-align: baseline;"> By distributing data knowledge and decision-making authority, domain experts are empowered to make data-driven decisions aligned with their business objectives. </span></p>
<p><strong style="vertical-align: baseline;">Scalable data infrastructure:</strong><span style="vertical-align: baseline;"> With the rise of cloud technologies, data meshes can take advantage of scalable cloud-native infrastructure. Leveraging cloud services, such as serverless computing and elastic storage, enables organizations to scale their data infrastructure on-demand, ensuring optimal performance and cost-efficiency.</span></p>
<p><strong style="vertical-align: baseline;">Comprehensive and robust</strong><span style="vertical-align: baseline;"> </span><strong style="vertical-align: baseline;">data governance</strong><span style="vertical-align: baseline;">:</span><span style="font-style: italic; vertical-align: baseline;"> </span><span style="vertical-align: baseline;">Dataplex offers an extensive solution for data governance, ensuring security, compliance, and transparency throughout the data lifecycle. With fine-grained access controls, encryption, and policy-driven data management, Dataplex enhances data security and facilitates adherence to regulatory requirements. The platform provides visibility into the entire data lifecycle through lineage tracking, promoting transparency and accountability. Organizations can enforce standardized governance policies, ensuring consistency and reliability across their data landscape. Dataplex's tools for data quality monitoring and centralized data catalog governance further contribute to effective data governance practices.</span></p>
<h3><strong style="vertical-align: baseline;">Learn more</strong></h3>
<p><span style="vertical-align: baseline;">By embracing the principles of decentralization, data ownership, and autonomy, businesses can unlock a range of benefits, including improved data quality, greater accountability, and enhanced agility, scalability, and decision-making. Embracing this innovative approach can position organizations at the forefront of the data revolution, driving growth, innovation, and a competitive advantage. Learn more about </span><a href="https://cloud.google.com/partners/ai"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud’s open generative AI partner ecosystem</span></a><span style="vertical-align: baseline;">. To get started with Google Cloud and Virtusa and to learn more about building a data mesh using Dataplex, </span><a href="https://cloud.google.com/contact/"><span style="text-decoration: underline; vertical-align: baseline;">contact us</span></a><span style="vertical-align: baseline;"> today.</span></p></div>

Content

Author

Link

Published date

Image url

Feed url

Guid

Hidden blurb

--- !ruby/object:Feedjira::Parser::RSSEntry
title: 'Data democratization with Dataplex: Implementing a data mesh architecture'
rss_fields:
- summary
- author
- url
- title
- categories
- published
- entry_id
summary: "<div class=\"block-paragraph_advanced\"><p><span style=\"font-style: italic;
  vertical-align: baseline;\"><strong>Editor’s note</strong>: Today’s blog post was
  written in collaboration with Google Cloud Premier Partner </span><a href=\"https://www.virtusa.com/\"
  rel=\"noopener\" target=\"_blank\"><span style=\"font-style: italic; text-decoration:
  underline; vertical-align: baseline;\">Virtusa</span></a><span style=\"font-style:
  italic; vertical-align: baseline;\">, which specializes in building cloud-native,
  microservices-based, pre-built solutions and APIs on Kubernetes, as well as machine
  learning and data-oriented applications, as well as offering managed cloud services
  and cloud operations design.</span></p>\n<hr/>\n<p><span style=\"vertical-align:
  baseline;\">In the ever-evolving world of data engineering and analytics, traditional
  centralized data architectures are facing limitations in scalability, agility, and
  governance. To address these challenges, a new paradigm, called “data mesh,” has
  emerged, which allows organizations to take a decentralized approach to data architecture.
  This blog post explores data mesh as a concept and delineates the ways that </span><a
  href=\"https://cloud.google.com/dataplex\"><span style=\"text-decoration: underline;
  vertical-align: baseline;\">Dataplex</span></a><span style=\"vertical-align: baseline;\">,
  a data fabric capability within the BigQuery suite, can be used to realize the benefits
  of this decentralized data architecture.</span></p>\n<p><span style=\"vertical-align:
  baseline;\">Data mesh is an architectural framework that promotes the idea of treating
  data as a product and decentralizes data ownership and infrastructure. It enables
  teams across an organization to be responsible for their own data domains, allowing
  for greater autonomy, scalability, and data democratization. Instead of relying
  on a centralized data team, individual teams or data products take ownership of
  their data, including its quality, schema, and governance. This distributed responsibility
  model leads to improved data discovery, easier data integration, and faster insights.</span></p>\n<p><span
  style=\"vertical-align: baseline;\">The illustration in Figure 1 is an overview
  of data mesh’s fundamental building blocks.</span></p></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n
  \ \n    <div class=\"article-module h-c-page\">\n      <div class=\"h-c-grid\">\n
  \ \n\n    <figure class=\"article-image--large\n      \n      \n        h-c-grid__col\n
  \       h-c-grid__col--6 h-c-grid__col--offset-3\n        \n        \n      \"\n
  \     >\n\n      \n      \n        \n        <img\n            src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-1.max-1000x1000.png\"\n
  \       \n          alt=\"Image-1\">\n        \n        </a>\n      \n        <figcaption
  class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 1: Representation
  of a data mesh concept</p></figcaption>\n      \n    </figure>\n\n  \n      </div>\n
  \   </div>\n  \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><span
  style=\"vertical-align: baseline;\">Let’s discuss the core principles of data mesh
  architecture, then understand how they change the way we manage and leverage data.</span></p></div>\n<div
  class=\"block-image_full_width\">\n\n\n\n\n\n\n  \n    <div class=\"article-module
  h-c-page\">\n      <div class=\"h-c-grid\">\n  \n\n    <figure class=\"article-image--large\n
  \     \n      \n        h-c-grid__col\n        h-c-grid__col--6 h-c-grid__col--offset-3\n
  \       \n        \n      \"\n      >\n\n      \n      \n        \n        <img\n
  \           src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-2.max-1000x1000.png\"\n
  \       \n          alt=\"Image-2\">\n        \n        </a>\n      \n        <figcaption
  class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 2: Core principals
  of data mesh architecture</p></figcaption>\n      \n    </figure>\n\n  \n      </div>\n
  \   </div>\n  \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><strong
  style=\"vertical-align: baseline;\">Domain-oriented ownership:</strong><span style=\"vertical-align:
  baseline;\"> Data mesh emphasizes decentralizing data ownership and the allocation
  of responsibility to individual domains or business units within an organization.
  Each domain takes responsibility for managing its own data, including data quality,
  access controls, and governance. By doing so, domain experts are empowered, fostering
  a sense of ownership and accountability. This principle aligns data management with
  the specific needs and knowledge of each domain, ensuring better data quality and
  decision-making.</span></p>\n<p><strong style=\"vertical-align: baseline;\">Self-serve
  data infrastructure:</strong><span style=\"vertical-align: baseline;\"> In a data
  mesh architecture, data infrastructure is treated as a product that provides self-serve
  capabilities to domain teams. Instead of relying on a centralized data team or platform,
  domain teams have the autonomy to choose and manage their own data storage, processing,
  and analysis tools. This approach allows teams to tailor their data infrastructure
  to their specific requirements, accelerating their workflows and reducing dependencies
  on centralized resources.</span></p>\n<p><strong style=\"vertical-align: baseline;\">Federated
  computational governance: </strong><span style=\"vertical-align: baseline;\">Data
  governance in a data mesh is not dictated by a central authority; rather, it follows
  a federated model. Each domain team collaboratively defines and enforces data governance
  practices that align with their specific domain requirements. This approach ensures
  that governance decisions are made by those closest to the data, and it allows for
  flexibility in adapting to domain-specific needs. Federated computational governance
  promotes trust, accountability, and agility in managing data assets.</span></p>\n<p><strong
  style=\"vertical-align: baseline;\">Data as a product:</strong><span style=\"vertical-align:
  baseline;\"> Data in a data mesh is treated as a product, and data platforms are
  built and managed with a product mindset. This means focusing on providing value
  to the end users (domain teams) and continuously iterating and improving the data
  infrastructure based on feedback. When teams adopt a product thinking approach,
  data platforms become user-friendly, reliable, and scalable. They evolve in response
  to changing requirements and deliver tangible value to the organization.</span></p>\n<h3><strong
  style=\"vertical-align: baseline;\">Understanding Dataplex</strong></h3>\n<p><span
  style=\"vertical-align: baseline;\">Dataplex is a cloud-native, intelligent data
  fabric platform designed to simplify and streamline the management, integration,
  and analysis of large and complex data sets. It offers a unified approach to data
  governance, data discovery, and data lineage, enabling organizations to gain more
  value from their data.</span></p></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n
  \ \n    <div class=\"article-module h-c-page\">\n      <div class=\"h-c-grid\">\n
  \ \n\n    <figure class=\"article-image--large\n      \n      \n        h-c-grid__col\n
  \       h-c-grid__col--6 h-c-grid__col--offset-3\n        \n        \n      \"\n
  \     >\n\n      \n      \n        \n        <img\n            src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-3.max-1000x1000.png\"\n
  \       \n          alt=\"Image-3\">\n        \n        </a>\n      \n        <figcaption
  class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 3: Google Cloud
  Dataplex capabilities</p></figcaption>\n      \n    </figure>\n\n  \n      </div>\n
  \   </div>\n  \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><span
  style=\"vertical-align: baseline;\">Key features and benefits of Dataplex</span><strong
  style=\"font-style: italic; vertical-align: baseline;\"> </strong><span style=\"vertical-align:
  baseline;\">include data integration from various sources into a unified data fabric;
  robust data governance capabilities that help ensure security and compliance; intelligent
  data discovery tools for enhanced data visibility and accessibility; scalability
  and flexibility to handle large volumes of data in real-time; multi-cloud support
  for leveraging data across different cloud providers; and efficient metadata management
  for improved data organization and accessibility.</span></p>\n<h3><strong style=\"vertical-align:
  baseline;\">Steps to implement data mesh using Dataplex</strong></h3>\n<p><strong
  style=\"vertical-align: baseline;\">Step 1: Create a data lake and define the data
  domain.</strong></p>\n<p><span style=\"vertical-align: baseline;\">In this step,
  we set up a data lake on Google Cloud and establish the data domain, which refers
  to the scope and boundaries of the data that will be stored and managed in the data
  lake. A data lake is a centralized repository that allows you to store structured,
  semi-structured, and unstructured data in its native format, making it a flexible
  and scalable solution for big data storage and analytics.</span></p>\n<p><span style=\"vertical-align:
  baseline;\">The following diagram illustrates domains as Dataplex lakes, each owned
  by distinct data producers. Within their respective domains, data producers maintain
  control over creation, curation, and access. Conversely, data consumers have the
  ability to request access to these lakes (domains) or specific zones (subdomains)
  to conduct their analysis.</span></p></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n
  \ \n    <div class=\"article-module h-c-page\">\n      <div class=\"h-c-grid\">\n
  \ \n\n    <figure class=\"article-image--large\n      \n      \n        h-c-grid__col\n
  \       h-c-grid__col--6 h-c-grid__col--offset-3\n        \n        \n      \"\n
  \     >\n\n      \n      \n        \n        <img\n            src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-4.max-1000x1000.png\"\n
  \       \n          alt=\"Image-4\">\n        \n        </a>\n      \n        <figcaption
  class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 4: Decentralized
  data with defined ownership</p></figcaption>\n      \n    </figure>\n\n  \n      </div>\n
  \   </div>\n  \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><strong
  style=\"vertical-align: baseline;\">Step 2: Create zones in your data lake and define
  the data zones.</strong></p>\n<p><span style=\"vertical-align: baseline;\">In this
  step, we divide the data lake into zones. Each zone serves a specific purpose and
  has well-defined characteristics. Zones help organize data based on factors like
  data type, access requirements, and processing needs. Creating data zones provides
  better data governance, security, and efficiency within the data lake environment. </span></p>\n<p><span
  style=\"vertical-align: baseline;\">Common data zones include the following:</span></p>\n<ul>\n<li
  aria-level=\"1\" style=\"list-style-type: disc; vertical-align: baseline;\">\n<p
  role=\"presentation\"><strong style=\"vertical-align: baseline;\">Raw zone</strong><strong
  style=\"font-style: italic; vertical-align: baseline;\">:</strong><span style=\"vertical-align:
  baseline;\"> This zone is dedicated to ingesting and storing raw, unprocessed data.
  It is the landing zone for new data as it enters the data lake. Data in this zone
  is typically kept in its native format, making it ideal for data archival and data
  lineage purposes.</span></p>\n</li>\n<li aria-level=\"1\" style=\"list-style-type:
  disc; vertical-align: baseline;\">\n<p role=\"presentation\"><strong style=\"vertical-align:
  baseline;\">Curated zone:</strong><span style=\"vertical-align: baseline;\"> The
  curated zone is where data is prepared and cleansed before it moves to other zones.
  This zone may involve data transformation, normalization, or deduplication to ensure
  data quality.</span></p>\n</li>\n<li aria-level=\"1\" style=\"list-style-type: disc;
  vertical-align: baseline;\">\n<p role=\"presentation\"><strong style=\"vertical-align:
  baseline;\">Transformed zone:</strong><span style=\"vertical-align: baseline;\">
  The transformed zone holds high-quality, transformed, and structured data that is
  ready for consumption by data analysts and other users. Data in this zone is organized
  and optimized for analytical purposes.</span></p>\n</li>\n</ul></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n
  \ \n    <div class=\"article-module h-c-page\">\n      <div class=\"h-c-grid\">\n
  \ \n\n    <figure class=\"article-image--large\n      \n      \n        h-c-grid__col\n
  \       h-c-grid__col--6 h-c-grid__col--offset-3\n        \n        \n      \"\n
  \     >\n\n      \n      \n        \n        <img\n            src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Qm8Yf3U.max-1000x1000.png\"\n
  \       \n          alt=\"Image-5\">\n        \n        </a>\n      \n        <figcaption
  class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 5: Data zones
  inside a data lake</p></figcaption>\n      \n    </figure>\n\n  \n      </div>\n
  \   </div>\n  \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><strong
  style=\"vertical-align: baseline;\">Step 3: Add assets to the data lake zones.</strong></p>\n<p><span
  style=\"vertical-align: baseline;\"> In this step, we focus on adding assets to
  the different data lake zones. Assets refer to the data files, data sets, or resources
  that are ingested into the data lake and stored within their respective zones. By
  adding assets to the zones, you populate the data lake with valuable data that can
  be utilized for analysis, reporting, and other data-driven processes.</span></p>\n<p><strong
  style=\"vertical-align: baseline;\">Step 4: Secure your data lake. </strong></p>\n<p><span
  style=\"vertical-align: baseline;\">In this step, we implement robust security measures
  to safeguard your data lake and the sensitive data it holds. A secure data lake
  is crucial for protecting sensitive information, helping to ensure compliance with
  data regulations, and maintaining the trust of your users and stakeholders.</span></p>\n<p><span
  style=\"vertical-align: baseline;\">The security model in Dataplex enables you to
  control access for performing the following tasks:</span></p>\n<ul>\n<li aria-level=\"1\"
  style=\"list-style-type: disc; vertical-align: baseline;\">\n<p role=\"presentation\"><span
  style=\"vertical-align: baseline;\">Managing a data lake, which involves tasks such
  as creating and associating assets, defining zones, and setting up additional data
  lakes</span></p>\n</li>\n<li aria-level=\"1\" style=\"list-style-type: disc; vertical-align:
  baseline;\">\n<p role=\"presentation\"><span style=\"vertical-align: baseline;\"> Retrieving
  data linked to a data lake via the mapped asset (e.g., BigQuery data sets and storage
  buckets)</span></p>\n</li>\n<li aria-level=\"1\" style=\"list-style-type: disc;
  vertical-align: baseline;\">\n<p role=\"presentation\"><span style=\"vertical-align:
  baseline;\"> Retrieving metadata associated with the data linked to a data lake</span></p>\n</li>\n</ul>\n<p><span
  style=\"vertical-align: baseline;\">The administrator of a data lake manages access
  to Dataplex resources (including the lake, zones, and assets) by assigning the necessary
  basic and predefined roles.  Metadata roles possess the capability to access and
  examine metadata, including table schemas. With data roles granted, it gives the
  privilege to read or write data in the underlying resources referenced by the assets
  within the data lake.</span></p>\n<h3><strong style=\"vertical-align: baseline;\">Advantages
  of building a data mesh</strong></h3>\n<p><strong style=\"vertical-align: baseline;\">Improved
  data ownership and accountability:</strong><span style=\"vertical-align: baseline;\">
  One of the primary advantages of a data mesh is the shift in data ownership and
  accountability to individual domain teams. By decentralizing data governance, each
  team becomes responsible for the quality, integrity, and security of their data
  products. </span></p>\n<p><strong style=\"vertical-align: baseline;\">Agility and
  flexibility</strong><span style=\"vertical-align: baseline;\">: Data meshes empower
  domain teams to be autonomous in their decision-making, allowing them to respond
  swiftly to evolving business needs. This agility enables faster time to market for
  new data products and iterative improvements to existing ones. </span></p>\n<p><strong
  style=\"vertical-align: baseline;\">Scalability and reduced bottlenecks:</strong><span
  style=\"vertical-align: baseline;\"> A data mesh eliminates scalability bottlenecks
  by distributing data processing and analysis across domain teams. Each team can
  independently scale its data infrastructure based on its specific needs, ensuring
  efficient handling of increasing data volumes.</span></p>\n<p><strong style=\"vertical-align:
  baseline;\">Enhanced data discoverability and accessibility:</strong><span style=\"vertical-align:
  baseline;\"> Data meshes emphasize metadata management, enabling better data discoverability
  and accessibility. With comprehensive metadata, teams can easily locate and understand
  available data assets. </span></p>\n<p><strong style=\"vertical-align: baseline;\">Empowerment
  and collaboration:</strong><span style=\"vertical-align: baseline;\"> By distributing
  data knowledge and decision-making authority, domain experts are empowered to make
  data-driven decisions aligned with their business objectives. </span></p>\n<p><strong
  style=\"vertical-align: baseline;\">Scalable data infrastructure:</strong><span
  style=\"vertical-align: baseline;\"> With the rise of cloud technologies, data meshes
  can take advantage of scalable cloud-native infrastructure. Leveraging cloud services,
  such as serverless computing and elastic storage, enables organizations to scale
  their data infrastructure on-demand, ensuring optimal performance and cost-efficiency.</span></p>\n<p><strong
  style=\"vertical-align: baseline;\">Comprehensive and robust</strong><span style=\"vertical-align:
  baseline;\"> </span><strong style=\"vertical-align: baseline;\">data governance</strong><span
  style=\"vertical-align: baseline;\">:</span><span style=\"font-style: italic; vertical-align:
  baseline;\"> </span><span style=\"vertical-align: baseline;\">Dataplex offers an
  extensive solution for data governance, ensuring security, compliance, and transparency
  throughout the data lifecycle. With fine-grained access controls, encryption, and
  policy-driven data management, Dataplex enhances data security and facilitates adherence
  to regulatory requirements. The platform provides visibility into the entire data
  lifecycle through lineage tracking, promoting transparency and accountability. Organizations
  can enforce standardized governance policies, ensuring consistency and reliability
  across their data landscape. Dataplex's tools for data quality monitoring and centralized
  data catalog governance further contribute to effective data governance practices.</span></p>\n<h3><strong
  style=\"vertical-align: baseline;\">Learn more</strong></h3>\n<p><span style=\"vertical-align:
  baseline;\">By embracing the principles of decentralization, data ownership, and
  autonomy, businesses can unlock a range of benefits, including improved data quality,
  greater accountability, and enhanced agility, scalability, and decision-making.
  Embracing this innovative approach can position organizations at the forefront of
  the data revolution, driving growth, innovation, and a competitive advantage. Learn
  more about </span><a href=\"https://cloud.google.com/partners/ai\"><span style=\"text-decoration:
  underline; vertical-align: baseline;\">Google Cloud’s open generative AI partner
  ecosystem</span></a><span style=\"vertical-align: baseline;\">. To get started with
  Google Cloud and Virtusa and to learn more about building a data mesh using Dataplex,
  </span><a href=\"https://cloud.google.com/contact/\"><span style=\"text-decoration:
  underline; vertical-align: baseline;\">contact us</span></a><span style=\"vertical-align:
  baseline;\"> today.</span></p></div>"
url: https://cloud.google.com/blog/products/data-analytics/using-bigquery-dataplex-to-build-a-data-mesh/
author: Suhrid Saran
published: 2024-05-13 16:00:00.000000000 Z
carlessian_info:
  news_filer_version: 2
  newspaper: Google Cloud Blog
  macro_region: Technology
entry_id: !ruby/object:Feedjira::Parser::GloballyUniqueIdentifier
  guid: https://cloud.google.com/blog/products/data-analytics/using-bigquery-dataplex-to-build-a-data-mesh/
categories:
- Partners
- Data Analytics

Language

Active

Ricc internal notes

Ricc source

Show this article Back to articles