♊️ GemiNews 🗞️
(dev)
🏡
📰 Articles
🏷️ Tags
🧠 Queries
📈 Graphs
☁️ Stats
💁🏻 Assistant
💬
🎙️
Demo 1: Embeddings + Recommendation
Demo 2: Bella RAGa
Demo 3: NewRetriever
Demo 4: Assistant function calling
Editing article
Title
Summary
<div class="block-paragraph_advanced"><p><span style="font-style: italic; vertical-align: baseline;"><strong>Editor’s note</strong>: Today’s blog post was written in collaboration with Google Cloud Premier Partner </span><a href="https://www.virtusa.com/" rel="noopener" target="_blank"><span style="font-style: italic; text-decoration: underline; vertical-align: baseline;">Virtusa</span></a><span style="font-style: italic; vertical-align: baseline;">, which specializes in building cloud-native, microservices-based, pre-built solutions and APIs on Kubernetes, as well as machine learning and data-oriented applications, as well as offering managed cloud services and cloud operations design.</span></p> <hr/> <p><span style="vertical-align: baseline;">In the ever-evolving world of data engineering and analytics, traditional centralized data architectures are facing limitations in scalability, agility, and governance. To address these challenges, a new paradigm, called “data mesh,” has emerged, which allows organizations to take a decentralized approach to data architecture. This blog post explores data mesh as a concept and delineates the ways that </span><a href="https://cloud.google.com/dataplex"><span style="text-decoration: underline; vertical-align: baseline;">Dataplex</span></a><span style="vertical-align: baseline;">, a data fabric capability within the BigQuery suite, can be used to realize the benefits of this decentralized data architecture.</span></p> <p><span style="vertical-align: baseline;">Data mesh is an architectural framework that promotes the idea of treating data as a product and decentralizes data ownership and infrastructure. It enables teams across an organization to be responsible for their own data domains, allowing for greater autonomy, scalability, and data democratization. Instead of relying on a centralized data team, individual teams or data products take ownership of their data, including its quality, schema, and governance. This distributed responsibility model leads to improved data discovery, easier data integration, and faster insights.</span></p> <p><span style="vertical-align: baseline;">The illustration in Figure 1 is an overview of data mesh’s fundamental building blocks.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-1.max-1000x1000.png" alt="Image-1"> </a> <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 1: Representation of a data mesh concept</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Let’s discuss the core principles of data mesh architecture, then understand how they change the way we manage and leverage data.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-2.max-1000x1000.png" alt="Image-2"> </a> <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 2: Core principals of data mesh architecture</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Domain-oriented ownership:</strong><span style="vertical-align: baseline;"> Data mesh emphasizes decentralizing data ownership and the allocation of responsibility to individual domains or business units within an organization. Each domain takes responsibility for managing its own data, including data quality, access controls, and governance. By doing so, domain experts are empowered, fostering a sense of ownership and accountability. This principle aligns data management with the specific needs and knowledge of each domain, ensuring better data quality and decision-making.</span></p> <p><strong style="vertical-align: baseline;">Self-serve data infrastructure:</strong><span style="vertical-align: baseline;"> In a data mesh architecture, data infrastructure is treated as a product that provides self-serve capabilities to domain teams. Instead of relying on a centralized data team or platform, domain teams have the autonomy to choose and manage their own data storage, processing, and analysis tools. This approach allows teams to tailor their data infrastructure to their specific requirements, accelerating their workflows and reducing dependencies on centralized resources.</span></p> <p><strong style="vertical-align: baseline;">Federated computational governance: </strong><span style="vertical-align: baseline;">Data governance in a data mesh is not dictated by a central authority; rather, it follows a federated model. Each domain team collaboratively defines and enforces data governance practices that align with their specific domain requirements. This approach ensures that governance decisions are made by those closest to the data, and it allows for flexibility in adapting to domain-specific needs. Federated computational governance promotes trust, accountability, and agility in managing data assets.</span></p> <p><strong style="vertical-align: baseline;">Data as a product:</strong><span style="vertical-align: baseline;"> Data in a data mesh is treated as a product, and data platforms are built and managed with a product mindset. This means focusing on providing value to the end users (domain teams) and continuously iterating and improving the data infrastructure based on feedback. When teams adopt a product thinking approach, data platforms become user-friendly, reliable, and scalable. They evolve in response to changing requirements and deliver tangible value to the organization.</span></p> <h3><strong style="vertical-align: baseline;">Understanding Dataplex</strong></h3> <p><span style="vertical-align: baseline;">Dataplex is a cloud-native, intelligent data fabric platform designed to simplify and streamline the management, integration, and analysis of large and complex data sets. It offers a unified approach to data governance, data discovery, and data lineage, enabling organizations to gain more value from their data.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-3.max-1000x1000.png" alt="Image-3"> </a> <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 3: Google Cloud Dataplex capabilities</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><span style="vertical-align: baseline;">Key features and benefits of Dataplex</span><strong style="font-style: italic; vertical-align: baseline;"> </strong><span style="vertical-align: baseline;">include data integration from various sources into a unified data fabric; robust data governance capabilities that help ensure security and compliance; intelligent data discovery tools for enhanced data visibility and accessibility; scalability and flexibility to handle large volumes of data in real-time; multi-cloud support for leveraging data across different cloud providers; and efficient metadata management for improved data organization and accessibility.</span></p> <h3><strong style="vertical-align: baseline;">Steps to implement data mesh using Dataplex</strong></h3> <p><strong style="vertical-align: baseline;">Step 1: Create a data lake and define the data domain.</strong></p> <p><span style="vertical-align: baseline;">In this step, we set up a data lake on Google Cloud and establish the data domain, which refers to the scope and boundaries of the data that will be stored and managed in the data lake. A data lake is a centralized repository that allows you to store structured, semi-structured, and unstructured data in its native format, making it a flexible and scalable solution for big data storage and analytics.</span></p> <p><span style="vertical-align: baseline;">The following diagram illustrates domains as Dataplex lakes, each owned by distinct data producers. Within their respective domains, data producers maintain control over creation, curation, and access. Conversely, data consumers have the ability to request access to these lakes (domains) or specific zones (subdomains) to conduct their analysis.</span></p></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-4.max-1000x1000.png" alt="Image-4"> </a> <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 4: Decentralized data with defined ownership</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Step 2: Create zones in your data lake and define the data zones.</strong></p> <p><span style="vertical-align: baseline;">In this step, we divide the data lake into zones. Each zone serves a specific purpose and has well-defined characteristics. Zones help organize data based on factors like data type, access requirements, and processing needs. Creating data zones provides better data governance, security, and efficiency within the data lake environment. </span></p> <p><span style="vertical-align: baseline;">Common data zones include the following:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Raw zone</strong><strong style="font-style: italic; vertical-align: baseline;">:</strong><span style="vertical-align: baseline;"> This zone is dedicated to ingesting and storing raw, unprocessed data. It is the landing zone for new data as it enters the data lake. Data in this zone is typically kept in its native format, making it ideal for data archival and data lineage purposes.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Curated zone:</strong><span style="vertical-align: baseline;"> The curated zone is where data is prepared and cleansed before it moves to other zones. This zone may involve data transformation, normalization, or deduplication to ensure data quality.</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><strong style="vertical-align: baseline;">Transformed zone:</strong><span style="vertical-align: baseline;"> The transformed zone holds high-quality, transformed, and structured data that is ready for consumption by data analysts and other users. Data in this zone is organized and optimized for analytical purposes.</span></p> </li> </ul></div> <div class="block-image_full_width"> <div class="article-module h-c-page"> <div class="h-c-grid"> <figure class="article-image--large h-c-grid__col h-c-grid__col--6 h-c-grid__col--offset-3 " > <img src="https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Qm8Yf3U.max-1000x1000.png" alt="Image-5"> </a> <figcaption class="article-image__caption "><p data-block-key="ib6z8">Figure 5: Data zones inside a data lake</p></figcaption> </figure> </div> </div> </div> <div class="block-paragraph_advanced"><p><strong style="vertical-align: baseline;">Step 3: Add assets to the data lake zones.</strong></p> <p><span style="vertical-align: baseline;"> In this step, we focus on adding assets to the different data lake zones. Assets refer to the data files, data sets, or resources that are ingested into the data lake and stored within their respective zones. By adding assets to the zones, you populate the data lake with valuable data that can be utilized for analysis, reporting, and other data-driven processes.</span></p> <p><strong style="vertical-align: baseline;">Step 4: Secure your data lake. </strong></p> <p><span style="vertical-align: baseline;">In this step, we implement robust security measures to safeguard your data lake and the sensitive data it holds. A secure data lake is crucial for protecting sensitive information, helping to ensure compliance with data regulations, and maintaining the trust of your users and stakeholders.</span></p> <p><span style="vertical-align: baseline;">The security model in Dataplex enables you to control access for performing the following tasks:</span></p> <ul> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;">Managing a data lake, which involves tasks such as creating and associating assets, defining zones, and setting up additional data lakes</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;"> Retrieving data linked to a data lake via the mapped asset (e.g., BigQuery data sets and storage buckets)</span></p> </li> <li aria-level="1" style="list-style-type: disc; vertical-align: baseline;"> <p role="presentation"><span style="vertical-align: baseline;"> Retrieving metadata associated with the data linked to a data lake</span></p> </li> </ul> <p><span style="vertical-align: baseline;">The administrator of a data lake manages access to Dataplex resources (including the lake, zones, and assets) by assigning the necessary basic and predefined roles. Metadata roles possess the capability to access and examine metadata, including table schemas. With data roles granted, it gives the privilege to read or write data in the underlying resources referenced by the assets within the data lake.</span></p> <h3><strong style="vertical-align: baseline;">Advantages of building a data mesh</strong></h3> <p><strong style="vertical-align: baseline;">Improved data ownership and accountability:</strong><span style="vertical-align: baseline;"> One of the primary advantages of a data mesh is the shift in data ownership and accountability to individual domain teams. By decentralizing data governance, each team becomes responsible for the quality, integrity, and security of their data products. </span></p> <p><strong style="vertical-align: baseline;">Agility and flexibility</strong><span style="vertical-align: baseline;">: Data meshes empower domain teams to be autonomous in their decision-making, allowing them to respond swiftly to evolving business needs. This agility enables faster time to market for new data products and iterative improvements to existing ones. </span></p> <p><strong style="vertical-align: baseline;">Scalability and reduced bottlenecks:</strong><span style="vertical-align: baseline;"> A data mesh eliminates scalability bottlenecks by distributing data processing and analysis across domain teams. Each team can independently scale its data infrastructure based on its specific needs, ensuring efficient handling of increasing data volumes.</span></p> <p><strong style="vertical-align: baseline;">Enhanced data discoverability and accessibility:</strong><span style="vertical-align: baseline;"> Data meshes emphasize metadata management, enabling better data discoverability and accessibility. With comprehensive metadata, teams can easily locate and understand available data assets. </span></p> <p><strong style="vertical-align: baseline;">Empowerment and collaboration:</strong><span style="vertical-align: baseline;"> By distributing data knowledge and decision-making authority, domain experts are empowered to make data-driven decisions aligned with their business objectives. </span></p> <p><strong style="vertical-align: baseline;">Scalable data infrastructure:</strong><span style="vertical-align: baseline;"> With the rise of cloud technologies, data meshes can take advantage of scalable cloud-native infrastructure. Leveraging cloud services, such as serverless computing and elastic storage, enables organizations to scale their data infrastructure on-demand, ensuring optimal performance and cost-efficiency.</span></p> <p><strong style="vertical-align: baseline;">Comprehensive and robust</strong><span style="vertical-align: baseline;"> </span><strong style="vertical-align: baseline;">data governance</strong><span style="vertical-align: baseline;">:</span><span style="font-style: italic; vertical-align: baseline;"> </span><span style="vertical-align: baseline;">Dataplex offers an extensive solution for data governance, ensuring security, compliance, and transparency throughout the data lifecycle. With fine-grained access controls, encryption, and policy-driven data management, Dataplex enhances data security and facilitates adherence to regulatory requirements. The platform provides visibility into the entire data lifecycle through lineage tracking, promoting transparency and accountability. Organizations can enforce standardized governance policies, ensuring consistency and reliability across their data landscape. Dataplex's tools for data quality monitoring and centralized data catalog governance further contribute to effective data governance practices.</span></p> <h3><strong style="vertical-align: baseline;">Learn more</strong></h3> <p><span style="vertical-align: baseline;">By embracing the principles of decentralization, data ownership, and autonomy, businesses can unlock a range of benefits, including improved data quality, greater accountability, and enhanced agility, scalability, and decision-making. Embracing this innovative approach can position organizations at the forefront of the data revolution, driving growth, innovation, and a competitive advantage. Learn more about </span><a href="https://cloud.google.com/partners/ai"><span style="text-decoration: underline; vertical-align: baseline;">Google Cloud’s open generative AI partner ecosystem</span></a><span style="vertical-align: baseline;">. To get started with Google Cloud and Virtusa and to learn more about building a data mesh using Dataplex, </span><a href="https://cloud.google.com/contact/"><span style="text-decoration: underline; vertical-align: baseline;">contact us</span></a><span style="vertical-align: baseline;"> today.</span></p></div>
Content
empty
Author
Link
Published date
Image url
Feed url
Guid
Hidden blurb
--- !ruby/object:Feedjira::Parser::RSSEntry title: 'Data democratization with Dataplex: Implementing a data mesh architecture' rss_fields: - summary - author - url - title - categories - published - entry_id summary: "<div class=\"block-paragraph_advanced\"><p><span style=\"font-style: italic; vertical-align: baseline;\"><strong>Editor’s note</strong>: Today’s blog post was written in collaboration with Google Cloud Premier Partner </span><a href=\"https://www.virtusa.com/\" rel=\"noopener\" target=\"_blank\"><span style=\"font-style: italic; text-decoration: underline; vertical-align: baseline;\">Virtusa</span></a><span style=\"font-style: italic; vertical-align: baseline;\">, which specializes in building cloud-native, microservices-based, pre-built solutions and APIs on Kubernetes, as well as machine learning and data-oriented applications, as well as offering managed cloud services and cloud operations design.</span></p>\n<hr/>\n<p><span style=\"vertical-align: baseline;\">In the ever-evolving world of data engineering and analytics, traditional centralized data architectures are facing limitations in scalability, agility, and governance. To address these challenges, a new paradigm, called “data mesh,” has emerged, which allows organizations to take a decentralized approach to data architecture. This blog post explores data mesh as a concept and delineates the ways that </span><a href=\"https://cloud.google.com/dataplex\"><span style=\"text-decoration: underline; vertical-align: baseline;\">Dataplex</span></a><span style=\"vertical-align: baseline;\">, a data fabric capability within the BigQuery suite, can be used to realize the benefits of this decentralized data architecture.</span></p>\n<p><span style=\"vertical-align: baseline;\">Data mesh is an architectural framework that promotes the idea of treating data as a product and decentralizes data ownership and infrastructure. It enables teams across an organization to be responsible for their own data domains, allowing for greater autonomy, scalability, and data democratization. Instead of relying on a centralized data team, individual teams or data products take ownership of their data, including its quality, schema, and governance. This distributed responsibility model leads to improved data discovery, easier data integration, and faster insights.</span></p>\n<p><span style=\"vertical-align: baseline;\">The illustration in Figure 1 is an overview of data mesh’s fundamental building blocks.</span></p></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n \ \n <div class=\"article-module h-c-page\">\n <div class=\"h-c-grid\">\n \ \n\n <figure class=\"article-image--large\n \n \n h-c-grid__col\n \ h-c-grid__col--6 h-c-grid__col--offset-3\n \n \n \"\n \ >\n\n \n \n \n <img\n src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-1.max-1000x1000.png\"\n \ \n alt=\"Image-1\">\n \n </a>\n \n <figcaption class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 1: Representation of a data mesh concept</p></figcaption>\n \n </figure>\n\n \n </div>\n \ </div>\n \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><span style=\"vertical-align: baseline;\">Let’s discuss the core principles of data mesh architecture, then understand how they change the way we manage and leverage data.</span></p></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n \n <div class=\"article-module h-c-page\">\n <div class=\"h-c-grid\">\n \n\n <figure class=\"article-image--large\n \ \n \n h-c-grid__col\n h-c-grid__col--6 h-c-grid__col--offset-3\n \ \n \n \"\n >\n\n \n \n \n <img\n \ src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-2.max-1000x1000.png\"\n \ \n alt=\"Image-2\">\n \n </a>\n \n <figcaption class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 2: Core principals of data mesh architecture</p></figcaption>\n \n </figure>\n\n \n </div>\n \ </div>\n \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><strong style=\"vertical-align: baseline;\">Domain-oriented ownership:</strong><span style=\"vertical-align: baseline;\"> Data mesh emphasizes decentralizing data ownership and the allocation of responsibility to individual domains or business units within an organization. Each domain takes responsibility for managing its own data, including data quality, access controls, and governance. By doing so, domain experts are empowered, fostering a sense of ownership and accountability. This principle aligns data management with the specific needs and knowledge of each domain, ensuring better data quality and decision-making.</span></p>\n<p><strong style=\"vertical-align: baseline;\">Self-serve data infrastructure:</strong><span style=\"vertical-align: baseline;\"> In a data mesh architecture, data infrastructure is treated as a product that provides self-serve capabilities to domain teams. Instead of relying on a centralized data team or platform, domain teams have the autonomy to choose and manage their own data storage, processing, and analysis tools. This approach allows teams to tailor their data infrastructure to their specific requirements, accelerating their workflows and reducing dependencies on centralized resources.</span></p>\n<p><strong style=\"vertical-align: baseline;\">Federated computational governance: </strong><span style=\"vertical-align: baseline;\">Data governance in a data mesh is not dictated by a central authority; rather, it follows a federated model. Each domain team collaboratively defines and enforces data governance practices that align with their specific domain requirements. This approach ensures that governance decisions are made by those closest to the data, and it allows for flexibility in adapting to domain-specific needs. Federated computational governance promotes trust, accountability, and agility in managing data assets.</span></p>\n<p><strong style=\"vertical-align: baseline;\">Data as a product:</strong><span style=\"vertical-align: baseline;\"> Data in a data mesh is treated as a product, and data platforms are built and managed with a product mindset. This means focusing on providing value to the end users (domain teams) and continuously iterating and improving the data infrastructure based on feedback. When teams adopt a product thinking approach, data platforms become user-friendly, reliable, and scalable. They evolve in response to changing requirements and deliver tangible value to the organization.</span></p>\n<h3><strong style=\"vertical-align: baseline;\">Understanding Dataplex</strong></h3>\n<p><span style=\"vertical-align: baseline;\">Dataplex is a cloud-native, intelligent data fabric platform designed to simplify and streamline the management, integration, and analysis of large and complex data sets. It offers a unified approach to data governance, data discovery, and data lineage, enabling organizations to gain more value from their data.</span></p></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n \ \n <div class=\"article-module h-c-page\">\n <div class=\"h-c-grid\">\n \ \n\n <figure class=\"article-image--large\n \n \n h-c-grid__col\n \ h-c-grid__col--6 h-c-grid__col--offset-3\n \n \n \"\n \ >\n\n \n \n \n <img\n src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-3.max-1000x1000.png\"\n \ \n alt=\"Image-3\">\n \n </a>\n \n <figcaption class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 3: Google Cloud Dataplex capabilities</p></figcaption>\n \n </figure>\n\n \n </div>\n \ </div>\n \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><span style=\"vertical-align: baseline;\">Key features and benefits of Dataplex</span><strong style=\"font-style: italic; vertical-align: baseline;\"> </strong><span style=\"vertical-align: baseline;\">include data integration from various sources into a unified data fabric; robust data governance capabilities that help ensure security and compliance; intelligent data discovery tools for enhanced data visibility and accessibility; scalability and flexibility to handle large volumes of data in real-time; multi-cloud support for leveraging data across different cloud providers; and efficient metadata management for improved data organization and accessibility.</span></p>\n<h3><strong style=\"vertical-align: baseline;\">Steps to implement data mesh using Dataplex</strong></h3>\n<p><strong style=\"vertical-align: baseline;\">Step 1: Create a data lake and define the data domain.</strong></p>\n<p><span style=\"vertical-align: baseline;\">In this step, we set up a data lake on Google Cloud and establish the data domain, which refers to the scope and boundaries of the data that will be stored and managed in the data lake. A data lake is a centralized repository that allows you to store structured, semi-structured, and unstructured data in its native format, making it a flexible and scalable solution for big data storage and analytics.</span></p>\n<p><span style=\"vertical-align: baseline;\">The following diagram illustrates domains as Dataplex lakes, each owned by distinct data producers. Within their respective domains, data producers maintain control over creation, curation, and access. Conversely, data consumers have the ability to request access to these lakes (domains) or specific zones (subdomains) to conduct their analysis.</span></p></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n \ \n <div class=\"article-module h-c-page\">\n <div class=\"h-c-grid\">\n \ \n\n <figure class=\"article-image--large\n \n \n h-c-grid__col\n \ h-c-grid__col--6 h-c-grid__col--offset-3\n \n \n \"\n \ >\n\n \n \n \n <img\n src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/Image-4.max-1000x1000.png\"\n \ \n alt=\"Image-4\">\n \n </a>\n \n <figcaption class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 4: Decentralized data with defined ownership</p></figcaption>\n \n </figure>\n\n \n </div>\n \ </div>\n \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><strong style=\"vertical-align: baseline;\">Step 2: Create zones in your data lake and define the data zones.</strong></p>\n<p><span style=\"vertical-align: baseline;\">In this step, we divide the data lake into zones. Each zone serves a specific purpose and has well-defined characteristics. Zones help organize data based on factors like data type, access requirements, and processing needs. Creating data zones provides better data governance, security, and efficiency within the data lake environment. </span></p>\n<p><span style=\"vertical-align: baseline;\">Common data zones include the following:</span></p>\n<ul>\n<li aria-level=\"1\" style=\"list-style-type: disc; vertical-align: baseline;\">\n<p role=\"presentation\"><strong style=\"vertical-align: baseline;\">Raw zone</strong><strong style=\"font-style: italic; vertical-align: baseline;\">:</strong><span style=\"vertical-align: baseline;\"> This zone is dedicated to ingesting and storing raw, unprocessed data. It is the landing zone for new data as it enters the data lake. Data in this zone is typically kept in its native format, making it ideal for data archival and data lineage purposes.</span></p>\n</li>\n<li aria-level=\"1\" style=\"list-style-type: disc; vertical-align: baseline;\">\n<p role=\"presentation\"><strong style=\"vertical-align: baseline;\">Curated zone:</strong><span style=\"vertical-align: baseline;\"> The curated zone is where data is prepared and cleansed before it moves to other zones. This zone may involve data transformation, normalization, or deduplication to ensure data quality.</span></p>\n</li>\n<li aria-level=\"1\" style=\"list-style-type: disc; vertical-align: baseline;\">\n<p role=\"presentation\"><strong style=\"vertical-align: baseline;\">Transformed zone:</strong><span style=\"vertical-align: baseline;\"> The transformed zone holds high-quality, transformed, and structured data that is ready for consumption by data analysts and other users. Data in this zone is organized and optimized for analytical purposes.</span></p>\n</li>\n</ul></div>\n<div class=\"block-image_full_width\">\n\n\n\n\n\n\n \ \n <div class=\"article-module h-c-page\">\n <div class=\"h-c-grid\">\n \ \n\n <figure class=\"article-image--large\n \n \n h-c-grid__col\n \ h-c-grid__col--6 h-c-grid__col--offset-3\n \n \n \"\n \ >\n\n \n \n \n <img\n src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/5_Qm8Yf3U.max-1000x1000.png\"\n \ \n alt=\"Image-5\">\n \n </a>\n \n <figcaption class=\"article-image__caption \"><p data-block-key=\"ib6z8\">Figure 5: Data zones inside a data lake</p></figcaption>\n \n </figure>\n\n \n </div>\n \ </div>\n \n\n\n\n\n</div>\n<div class=\"block-paragraph_advanced\"><p><strong style=\"vertical-align: baseline;\">Step 3: Add assets to the data lake zones.</strong></p>\n<p><span style=\"vertical-align: baseline;\"> In this step, we focus on adding assets to the different data lake zones. Assets refer to the data files, data sets, or resources that are ingested into the data lake and stored within their respective zones. By adding assets to the zones, you populate the data lake with valuable data that can be utilized for analysis, reporting, and other data-driven processes.</span></p>\n<p><strong style=\"vertical-align: baseline;\">Step 4: Secure your data lake. </strong></p>\n<p><span style=\"vertical-align: baseline;\">In this step, we implement robust security measures to safeguard your data lake and the sensitive data it holds. A secure data lake is crucial for protecting sensitive information, helping to ensure compliance with data regulations, and maintaining the trust of your users and stakeholders.</span></p>\n<p><span style=\"vertical-align: baseline;\">The security model in Dataplex enables you to control access for performing the following tasks:</span></p>\n<ul>\n<li aria-level=\"1\" style=\"list-style-type: disc; vertical-align: baseline;\">\n<p role=\"presentation\"><span style=\"vertical-align: baseline;\">Managing a data lake, which involves tasks such as creating and associating assets, defining zones, and setting up additional data lakes</span></p>\n</li>\n<li aria-level=\"1\" style=\"list-style-type: disc; vertical-align: baseline;\">\n<p role=\"presentation\"><span style=\"vertical-align: baseline;\"> Retrieving data linked to a data lake via the mapped asset (e.g., BigQuery data sets and storage buckets)</span></p>\n</li>\n<li aria-level=\"1\" style=\"list-style-type: disc; vertical-align: baseline;\">\n<p role=\"presentation\"><span style=\"vertical-align: baseline;\"> Retrieving metadata associated with the data linked to a data lake</span></p>\n</li>\n</ul>\n<p><span style=\"vertical-align: baseline;\">The administrator of a data lake manages access to Dataplex resources (including the lake, zones, and assets) by assigning the necessary basic and predefined roles. Metadata roles possess the capability to access and examine metadata, including table schemas. With data roles granted, it gives the privilege to read or write data in the underlying resources referenced by the assets within the data lake.</span></p>\n<h3><strong style=\"vertical-align: baseline;\">Advantages of building a data mesh</strong></h3>\n<p><strong style=\"vertical-align: baseline;\">Improved data ownership and accountability:</strong><span style=\"vertical-align: baseline;\"> One of the primary advantages of a data mesh is the shift in data ownership and accountability to individual domain teams. By decentralizing data governance, each team becomes responsible for the quality, integrity, and security of their data products. </span></p>\n<p><strong style=\"vertical-align: baseline;\">Agility and flexibility</strong><span style=\"vertical-align: baseline;\">: Data meshes empower domain teams to be autonomous in their decision-making, allowing them to respond swiftly to evolving business needs. This agility enables faster time to market for new data products and iterative improvements to existing ones. </span></p>\n<p><strong style=\"vertical-align: baseline;\">Scalability and reduced bottlenecks:</strong><span style=\"vertical-align: baseline;\"> A data mesh eliminates scalability bottlenecks by distributing data processing and analysis across domain teams. Each team can independently scale its data infrastructure based on its specific needs, ensuring efficient handling of increasing data volumes.</span></p>\n<p><strong style=\"vertical-align: baseline;\">Enhanced data discoverability and accessibility:</strong><span style=\"vertical-align: baseline;\"> Data meshes emphasize metadata management, enabling better data discoverability and accessibility. With comprehensive metadata, teams can easily locate and understand available data assets. </span></p>\n<p><strong style=\"vertical-align: baseline;\">Empowerment and collaboration:</strong><span style=\"vertical-align: baseline;\"> By distributing data knowledge and decision-making authority, domain experts are empowered to make data-driven decisions aligned with their business objectives. </span></p>\n<p><strong style=\"vertical-align: baseline;\">Scalable data infrastructure:</strong><span style=\"vertical-align: baseline;\"> With the rise of cloud technologies, data meshes can take advantage of scalable cloud-native infrastructure. Leveraging cloud services, such as serverless computing and elastic storage, enables organizations to scale their data infrastructure on-demand, ensuring optimal performance and cost-efficiency.</span></p>\n<p><strong style=\"vertical-align: baseline;\">Comprehensive and robust</strong><span style=\"vertical-align: baseline;\"> </span><strong style=\"vertical-align: baseline;\">data governance</strong><span style=\"vertical-align: baseline;\">:</span><span style=\"font-style: italic; vertical-align: baseline;\"> </span><span style=\"vertical-align: baseline;\">Dataplex offers an extensive solution for data governance, ensuring security, compliance, and transparency throughout the data lifecycle. With fine-grained access controls, encryption, and policy-driven data management, Dataplex enhances data security and facilitates adherence to regulatory requirements. The platform provides visibility into the entire data lifecycle through lineage tracking, promoting transparency and accountability. Organizations can enforce standardized governance policies, ensuring consistency and reliability across their data landscape. Dataplex's tools for data quality monitoring and centralized data catalog governance further contribute to effective data governance practices.</span></p>\n<h3><strong style=\"vertical-align: baseline;\">Learn more</strong></h3>\n<p><span style=\"vertical-align: baseline;\">By embracing the principles of decentralization, data ownership, and autonomy, businesses can unlock a range of benefits, including improved data quality, greater accountability, and enhanced agility, scalability, and decision-making. Embracing this innovative approach can position organizations at the forefront of the data revolution, driving growth, innovation, and a competitive advantage. Learn more about </span><a href=\"https://cloud.google.com/partners/ai\"><span style=\"text-decoration: underline; vertical-align: baseline;\">Google Cloud’s open generative AI partner ecosystem</span></a><span style=\"vertical-align: baseline;\">. To get started with Google Cloud and Virtusa and to learn more about building a data mesh using Dataplex, </span><a href=\"https://cloud.google.com/contact/\"><span style=\"text-decoration: underline; vertical-align: baseline;\">contact us</span></a><span style=\"vertical-align: baseline;\"> today.</span></p></div>" url: https://cloud.google.com/blog/products/data-analytics/using-bigquery-dataplex-to-build-a-data-mesh/ author: Suhrid Saran published: 2024-05-13 16:00:00.000000000 Z carlessian_info: news_filer_version: 2 newspaper: Google Cloud Blog macro_region: Technology entry_id: !ruby/object:Feedjira::Parser::GloballyUniqueIdentifier guid: https://cloud.google.com/blog/products/data-analytics/using-bigquery-dataplex-to-build-a-data-mesh/ categories: - Partners - Data Analytics
Language
Active
Ricc internal notes
Imported via /usr/local/google/home/ricc/git/gemini-news-crawler/webapp/db/seeds.d/import-feedjira.rb on 2024-05-13 19:31:24 +0200. Content is EMPTY here. Entried: summary,author,url,title,categories,published,entry_id. TODO add Newspaper: filename = /usr/local/google/home/ricc/git/gemini-news-crawler/webapp/db/seeds.d/../../../crawler/out/feedjira/Technology/Google Cloud Blog/2024-05-13-Data_democratization_with_Dataplex:_Implementing_a_data_mesh_arc-v2.yaml
Ricc source
Show this article
Back to articles