♊️ GemiNews 🗞️
(dev)
🏡
📰 Articles
🏷️ Tags
🧠 Queries
📈 Graphs
☁️ Stats
💁🏻 Assistant
💬
🎙️
Demo 1: Embeddings + Recommendation
Demo 2: Bella RAGa
Demo 3: NewRetriever
Demo 4: Assistant function calling
Editing article
Title
Summary
Content
<h3>Intro</h3><p>Many of you have already used the <strong>Large Language Model (LLM)</strong> from Generative AI. These models are great in performing certain creative tasks like content generation, text summarization, entity extraction and etc, but that’s not sufficient for enterprises that need to:</p><ul><li>provide accurate and up-to-date information (reducing hallucination)</li><li>offer contextual user experiences</li><li>offer secure and governed access to the data</li></ul><p>Hence comes the <strong>Retrieval-Augmented Generation technique (RAG)</strong> to fulfill those requirements. It combines the power of LLMs with the ability to reference external knowledge sources, by incorporating the following 2 systems:</p><ul><li><strong>Retrieval</strong>: When a user asks a question, RAG first searches through a database of documents or text to find relevant passages.</li><li><strong>Generation</strong>: the user then sends the retrieved information along as the context in the LLM prompt, effectively grounding LLM’s language understanding with specific knowledge in order to generate a more informed and accurate answer.</li></ul><p>So how does the RAG retrieval system find the relevant knowledge? Welcome to the world of embeddings and vector search.</p><ul><li><strong>Vector embeddings</strong> are numerical representations of text that capture the semantic meaning and relationships between words and concepts. You would use a pre-trained model to help generate embeddings. For example the Google Vertex <a href="https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings">textembedding-gecko model</a> generates a 768-dimensional embedding, while a <a href="https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings">multimodal embedding model</a> would generate a 128, 256, 512, or 1408 dimensional embedding.</li><li><strong>Vector search</strong> comes into play by comparing the user’s query embedding to the vectors representing documents or passages in the knowledge base. This comparison uses similarity metrics to find the most relevant pieces of information based on their semantic closeness to the query.</li></ul><p>Now with these concepts explained, you can implement RAG with the following steps:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/794/1*F0Yz_slYGzqsOaqf6ipHTQ.png" /></figure><ul><li>Break down large documents or text corpus using a suitable <a href="https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d">chunking strategy</a></li><li>Generate embeddings for each chunk using a selected embedding model</li><li>Store the chunked data and vector embeddings together in a vector database</li><li>User posts a prompt query</li><li>Use the same pre-trained embedding model to generate a vector embedding for the user query</li><li>Use the query embedding to search for most similar embeddings in vector database, then retrieve the corresponding data chunk</li><li>Create a new prompt for the LLM by incorporating the retrieved chunked text alongside the original user query</li></ul><p>Vector embeddings need to be stored in a vector database before you can search for embeddings. But adding a vector database to your software stack increases complexity, cost and learning curve. The great news is that most of the GCP data products already support vector out of the box, which means users will no longer need to choose between vector query and other critical database functionality. For example, all GCP transactional databases aims to fully support Vector features in near future:</p><ul><li>AlloyDB (GA)</li><li>Cloud SQL for PostgreSQL (GA)</li><li>Cloud SQL for MySQL (Preview)</li><li>Spanner (Preview)</li><li>Memorystore for Redis (Preview)</li><li>Firestore (Preview)</li><li>Bigtable (Preview)</li></ul><p>Here I will showcase vector implementation across 3 main data product families on GCP:</p><ul><li><strong>AlloyDB</strong> — Transactional database</li><li><strong>BigQuery</strong> — Enterprise data warehouse</li><li><strong>Vertex AI Vector Search</strong> — Machine learning platform</li></ul><blockquote>Disclaimer</blockquote><blockquote>I work as a Data Analytics practice lead in Google Cloud, This article is my own opinion and does not reflect the views of my employer.</blockquote><blockquote>Please take note that by the time you read this article, the information may already be obsolete as GenAI is a fast developing domain and Google Cloud is actively releasing new product features in this space.</blockquote><h3>AlloyDB</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HAe09_TeYj5Dhbg6EWs7Bg.png" /></figure><p>AlloyDB is a fully Managed, PostgreSQL-compatible cloud native database service built to deliver superior performance, scalability, and high availability for most demanding enterprise workloads. It now comes with AlloyDB AI feature suite that provides the semantic and predictive power of ML models for your data out-of-the-box.</p><h4>Setup</h4><ul><li>Make sure you already have an <a href="https://cloud.google.com/alloydb/docs/cluster-create">AlloyDB cluster and instance setup</a>.</li><li>Enable <strong>Vertex AI integration</strong> and the <strong>pgvector</strong> extensions in your AlloyDB instance:</li></ul><pre>psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"<br><br>psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS vector"</pre><h4>Embeddings generation</h4><ul><li>Create a new column with the type <strong>vector</strong> to store the embeddings:<br>- The vector dimension should match the model that you use. For example textembedding-gecko model has 768 dimensions.<br>- AlloyDB implements embeddings as arrays of real values, but it can automatically cast from real array to a vector value.</li></ul><pre>ALTER TABLE my_products ADD COLUMN embedding_column VECTOR(768);</pre><ul><li>To generate embedding, use the <strong>embedding() </strong>function:<br>- To use <strong>textembedding-gecko model</strong>, the AlloyDB cluster must reside in <strong>region us-central1</strong> to match the region of the model.<br>- You can <a href="https://cloud.google.com/alloydb/docs/ai/invoke-predictions">invoke predictions</a> to get around the region restriction.<br>- 003 is the latest version of textembedding-gecko model. Note that it’s always advisable to specify the version tag to avoid mistakes, as a new published model may return different embeddings.</li></ul><pre>SELECT embedding('textembedding-gecko@003', 'Google Pixel 8 Pro redefines smartphone photography with its advanced AI-powered camera system');</pre><ul><li>To generate embedding value based on another column:</li></ul><pre>UPDATE my_products SET embedding_column = embedding(( 'textembedding-gecko@003', product_description);</pre><ul><li>Alternatively, you can also create an embedding column with default value generated from another column:</li></ul><pre>ALTER TABLE my_products ADD COLUMN embedding_column vector GENERATED ALWAYS AS (embedding('textembedding-gecko@003', product_description)) STORED;</pre><h4>Vector index</h4><ul><li>By default pgvector performs exact nearest neighbor search which provides perfect recall. It can support approximate nearest-neighbor searching through indexing of HNSW or IVFFlat. AlloyDB provides built-in optimizations for pgvector by adding a scalar quantization feature (SQ8) to IVF index creation that can significantly speed up queries.<br>- SQ8 supports vectors with up to 8000 dimensions. <br>- You can choose among 3 distance functions: vector_l2_ops (L2 distance), vector_ip_ops (Inner product) or vector_cosine_ops (Cosine distance).</li></ul><pre>CREATE INDEX embedding_column_idx ON my_products<br> USING ivf (embedding_column vector_l2_ops)<br> WITH (lists = 20, quantizer = 'SQ8');</pre><h4>Vector search</h4><ul><li>Perform vector search using the pgvector nearest-neighbor operator <-> in order to find the database rows with the most semantically similar embeddings:</li></ul><pre>SELECT product_name FROM my_products<br> ORDER BY embedding_column<br> <-> embedding(('textembedding-gecko@003', 'I need a phone that provides the best photography quality')::vector<br> LIMIT 10;</pre><p>Check the following links for more information:</p><ul><li><a href="https://cloud.google.com/alloydb/docs/ai/work-with-embeddings">Work with vector embeddings on AlloyDB</a></li><li><a href="https://github.com/pgvector/pgvector#indexing">pgvector indexing</a></li></ul><h3>BigQuery</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2XV9VBLwcU7bdryyp-U5DQ.png" /></figure><h4>Setup</h4><ul><li>BigQuery is a serverless service and no resource setup is needed for it.</li><li>Create a remote connection to Vertex AI remote models:</li></ul><pre>bq mk --connection --location=US --project_id={PROJECT_ID} --connection_type=CLOUD_RESOURCE vertex_embeddings</pre><ul><li>Grant ‘Vertex AI User’ role to the service account of the created connection:</li></ul><pre>gcloud projects add-iam-policy-binding {PROJECT_ID} \<br> --member='serviceAccount:{CONNECTION_SERVICE_ACCOUNT}' \<br> --role='roles/aiplatform.user'</pre><h4>Embeddings generation</h4><ul><li>Create a remote embedding model to represented the hosted textembedding-gecko model:</li></ul><pre>CREATE OR REPLACE MODEL test_embeddings.llm_embedding_model<br> REMOTE WITH CONNECTION `us.vertex_embeddings`<br> OPTIONS(ENDPOINT='textembedding-gecko@003');</pre><ul><li>You can now generate text embeddings using the <a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-embedding">ML.GENERATE_EMBEDDING</a> function:<br>- We use data from a public dataset table called imdb.reviews in this example.<br>- The text_embedding column is of type ARRAY<FLOAT> with 768-dimensions.</li></ul><pre>CREATE OR REPLACE TABLE test_embeddings.embedded_reviews<br>AS SELECT content as review, text_embedding<br>FROM<br> ML.GENERATE_TEXT_EMBEDDING(<br> MODEL `test_embeddings.llm_embedding_model`,<br> (SELECT review as content<br> FROM bigquery-public-data.imdb.reviews limit 8000<br> ),<br> STRUCT(TRUE AS flatten_json_output)<br> );</pre><h4>Vector index</h4><ul><li>Create vector index on the embeddings column. Vector index enables Approximate Nearest Neighbor search to help improve vector search performance.<br>- Currently supported distance types are EUCLIDEAN (L2) and COSINE.<br>- Currently only IVF is supported for index type.<br>- The created index is fully managed by BigQuery, the refresh happens automatically as data changes.<br>- The metadata information of the vector index is available via <a href="https://cloud.google.com/bigquery/docs/information-schema-vector-indexes">INFORMATION_SCHEMA.VECTOR_INDEXES</a> view.</li></ul><pre>CREATE VECTOR INDEX embedded_reviews_idx ON test_embeddings.embedded_reviews(text_embedding) OPTIONS(distance_type = 'EUCLIDEAN', index_type='IVF');</pre><h4>Vector search</h4><ul><li>Use VECTOR_SEARCH function to perform text similarity search:<br>- It first generates embeddings from the text query, then compares them to the column `embeddings.embedded_reviews.text_embedding`.</li></ul><pre>SELECT<br> *<br>FROM<br> VECTOR_SEARCH( TABLE `embeddings.embedded_reviews`, 'text_embedding', (<br> SELECT<br> ml_generate_embedding_result,<br> content AS query<br> FROM<br> ML.GENERATE_EMBEDDING( MODEL `embeddings.llm_embedding_model`,<br> (<br> SELECT 'Our family enjoyed this movie, especially the kids were so fascinated by the magical world' AS content<br> )) <br> ),<br> top_k => 5);</pre><p>Check the following links for more information:</p><ul><li><a href="https://cloud.google.com/bigquery/docs/vector-search">Search embeddings with vector search</a></li><li>A <a href="https://github.com/steveloh/demo/blob/main/bigquery/notebook/BigQuery%20Embedding%20and%20Vector%20Search.ipynb">notebook</a> that I created to showcase embedding and vector search in BigQuery.</li></ul><h3>Vertex AI Vector Search</h3><p>Vertex AI is a unified machine learning platform that simplifies and accelerates the end-to-end process of building, deploying, and managing ML models at scale. Vector Search (previously known as Matching Engine) provides highly scalable and performant vector similarity search.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6Qe6HxbxDsnrxlpgz5BGIA.png" /></figure><p>Following code snippets are based on Python.</p><h4>Setup</h4><ul><li>Import aiplatform package:</li></ul><pre>from google.cloud import aiplatform<br>aiplatform.init(project=PROJECT_ID, location=LOCATION)</pre><ul><li>Vector Search does not provide services to generate embeddings. You can for example generate embeddings via BigQuery, then export the embeddings to a file in a storage bucket, before importing them into Vector Search.</li></ul><h4>Vector index</h4><ul><li>Create a vector index endpoint, which is a server instance that accepts query requests for your index.</li></ul><pre>my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(<br> display_name = f"index-endpoint-{PROJECT_ID}",<br> public_endpoint_enabled = True<br>)</pre><ul><li>Create a vector search index:<br>- EMBEDDING_BUCKET_URI is where you store the files with embeddings, read here about the required <a href="https://cloud.google.com/vertex-ai/docs/vector-search/setup/format-structure">input data format and structure</a><br>- Approximate_neighbors_count specifies the number of neighbors to find through approximate search before exact reordering is performed.<br>- See here for available <a href="https://cloud.google.com/vertex-ai/docs/vector-search/configuring-indexes#distance-measure-type">distance measure type</a>.</li></ul><pre><br>my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(<br> display_name={DISPLAY_NAME},<br> contents_delta_uri={EMBEDDING_BUCKET_URI},<br> dimensions=768,<br> approximate_neighbors_count=10,<br> distance_measure_type="DOT_PRODUCT_DISTANCE",<br>)</pre><ul><li>Deploy the index to the index endpoint:</li></ul><pre>my_index_endpoint = my_index_endpoint.deploy_index(<br> index=my_index, deployed_index_id = DEPLOYED_INDEX_ID<br>)</pre><h4>Vector search</h4><ul><li>Now you can search the vector index using a query embedding:</li></ul><pre># get the query embedding<br>model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003") <br>query = "Our family enjoyed this movie, especially the kids were so fascinated by the magical world"<br>query_embeddings = model.get_embeddings([query])[0]<br><br># query the index endpoint to find 3 nearest neighbors.<br>response = my_index_endpoint.find_neighbors(<br> deployed_index_id=my_index_endpoint.deployed_indexes[0].id,<br> queries=[query_embeddings.values],<br> num_neighbors=3,<br>)</pre><p>I have created a <a href="https://github.com/steveloh/demo/blob/main/vertex/vector-search/vertex-vector-search-python-call.ipynb">notebook</a> to demonstrate how to do vector search in Vertex AI.</p><h3>Summary</h3><p>2023 was the booming year of GenAI foundation models, while this year organizations will focus on building applications harnessing values from these models. This may include accelerating organization’s access to insights, improving productivity, streamlining operations and business processes and building innovative product services. Vector storage and vector search are the backbone for storing and organizing the rich semantic information to ground generative AI models. <strong>Their ability to handle various structures of data, power meaningful search, scale efficiently, and support rapid development makes them the ideal engine for the next generation of AI innovation.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=46cedad68934" width="1" height="1" alt=""><hr><p><a href="https://medium.com/google-cloud/making-sense-of-vector-search-and-embeddings-across-gcp-products-46cedad68934">Making sense of Vector Search and Embeddings across GCP products</a> was originally published in <a href="https://medium.com/google-cloud">Google Cloud - Community</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>
Author
Link
Published date
Image url
Feed url
Guid
Hidden blurb
--- !ruby/object:Feedjira::Parser::RSSEntry title: Making sense of Vector Search and Embeddings across GCP products published: 2024-04-03 04:49:10.000000000 Z categories: - vector-search - google-cloud-platform - embedding - generative-ai - machine-learning entry_id: !ruby/object:Feedjira::Parser::GloballyUniqueIdentifier is_perma_link: 'false' guid: https://medium.com/p/46cedad68934 carlessian_info: news_filer_version: 2 newspaper: Google Cloud - Medium macro_region: Blogs content: '<h3>Intro</h3><p>Many of you have already used the <strong>Large Language Model (LLM)</strong> from Generative AI. These models are great in performing certain creative tasks like content generation, text summarization, entity extraction and etc, but that’s not sufficient for enterprises that need to:</p><ul><li>provide accurate and up-to-date information (reducing hallucination)</li><li>offer contextual user experiences</li><li>offer secure and governed access to the data</li></ul><p>Hence comes the <strong>Retrieval-Augmented Generation technique (RAG)</strong> to fulfill those requirements. It combines the power of LLMs with the ability to reference external knowledge sources, by incorporating the following 2 systems:</p><ul><li><strong>Retrieval</strong>: When a user asks a question, RAG first searches through a database of documents or text to find relevant passages.</li><li><strong>Generation</strong>: the user then sends the retrieved information along as the context in the LLM prompt, effectively grounding LLM’s language understanding with specific knowledge in order to generate a more informed and accurate answer.</li></ul><p>So how does the RAG retrieval system find the relevant knowledge? Welcome to the world of embeddings and vector search.</p><ul><li><strong>Vector embeddings</strong> are numerical representations of text that capture the semantic meaning and relationships between words and concepts. You would use a pre-trained model to help generate embeddings. For example the Google Vertex <a href="https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings">textembedding-gecko model</a> generates a 768-dimensional embedding, while a <a href="https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-embeddings">multimodal embedding model</a> would generate a 128, 256, 512, or 1408 dimensional embedding.</li><li><strong>Vector search</strong> comes into play by comparing the user’s query embedding to the vectors representing documents or passages in the knowledge base. This comparison uses similarity metrics to find the most relevant pieces of information based on their semantic closeness to the query.</li></ul><p>Now with these concepts explained, you can implement RAG with the following steps:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/794/1*F0Yz_slYGzqsOaqf6ipHTQ.png" /></figure><ul><li>Break down large documents or text corpus using a suitable <a href="https://medium.com/@anuragmishra_27746/five-levels-of-chunking-strategies-in-rag-notes-from-gregs-video-7b735895694d">chunking strategy</a></li><li>Generate embeddings for each chunk using a selected embedding model</li><li>Store the chunked data and vector embeddings together in a vector database</li><li>User posts a prompt query</li><li>Use the same pre-trained embedding model to generate a vector embedding for the user query</li><li>Use the query embedding to search for most similar embeddings in vector database, then retrieve the corresponding data chunk</li><li>Create a new prompt for the LLM by incorporating the retrieved chunked text alongside the original user query</li></ul><p>Vector embeddings need to be stored in a vector database before you can search for embeddings. But adding a vector database to your software stack increases complexity, cost and learning curve. The great news is that most of the GCP data products already support vector out of the box, which means users will no longer need to choose between vector query and other critical database functionality. For example, all GCP transactional databases aims to fully support Vector features in near future:</p><ul><li>AlloyDB (GA)</li><li>Cloud SQL for PostgreSQL (GA)</li><li>Cloud SQL for MySQL (Preview)</li><li>Spanner (Preview)</li><li>Memorystore for Redis (Preview)</li><li>Firestore (Preview)</li><li>Bigtable (Preview)</li></ul><p>Here I will showcase vector implementation across 3 main data product families on GCP:</p><ul><li><strong>AlloyDB</strong> — Transactional database</li><li><strong>BigQuery</strong> — Enterprise data warehouse</li><li><strong>Vertex AI Vector Search</strong> — Machine learning platform</li></ul><blockquote>Disclaimer</blockquote><blockquote>I work as a Data Analytics practice lead in Google Cloud, This article is my own opinion and does not reflect the views of my employer.</blockquote><blockquote>Please take note that by the time you read this article, the information may already be obsolete as GenAI is a fast developing domain and Google Cloud is actively releasing new product features in this space.</blockquote><h3>AlloyDB</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HAe09_TeYj5Dhbg6EWs7Bg.png" /></figure><p>AlloyDB is a fully Managed, PostgreSQL-compatible cloud native database service built to deliver superior performance, scalability, and high availability for most demanding enterprise workloads. It now comes with AlloyDB AI feature suite that provides the semantic and predictive power of ML models for your data out-of-the-box.</p><h4>Setup</h4><ul><li>Make sure you already have an <a href="https://cloud.google.com/alloydb/docs/cluster-create">AlloyDB cluster and instance setup</a>.</li><li>Enable <strong>Vertex AI integration</strong> and the <strong>pgvector</strong> extensions in your AlloyDB instance:</li></ul><pre>psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"<br><br>psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS vector"</pre><h4>Embeddings generation</h4><ul><li>Create a new column with the type <strong>vector</strong> to store the embeddings:<br>- The vector dimension should match the model that you use. For example textembedding-gecko model has 768 dimensions.<br>- AlloyDB implements embeddings as arrays of real values, but it can automatically cast from real array to a vector value.</li></ul><pre>ALTER TABLE my_products ADD COLUMN embedding_column VECTOR(768);</pre><ul><li>To generate embedding, use the <strong>embedding() </strong>function:<br>- To use <strong>textembedding-gecko model</strong>, the AlloyDB cluster must reside in <strong>region us-central1</strong> to match the region of the model.<br>- You can <a href="https://cloud.google.com/alloydb/docs/ai/invoke-predictions">invoke predictions</a> to get around the region restriction.<br>- 003 is the latest version of textembedding-gecko model. Note that it’s always advisable to specify the version tag to avoid mistakes, as a new published model may return different embeddings.</li></ul><pre>SELECT embedding('textembedding-gecko@003', 'Google Pixel 8 Pro redefines smartphone photography with its advanced AI-powered camera system');</pre><ul><li>To generate embedding value based on another column:</li></ul><pre>UPDATE my_products SET embedding_column = embedding(( 'textembedding-gecko@003', product_description);</pre><ul><li>Alternatively, you can also create an embedding column with default value generated from another column:</li></ul><pre>ALTER TABLE my_products ADD COLUMN embedding_column vector GENERATED ALWAYS AS (embedding('textembedding-gecko@003', product_description)) STORED;</pre><h4>Vector index</h4><ul><li>By default pgvector performs exact nearest neighbor search which provides perfect recall. It can support approximate nearest-neighbor searching through indexing of HNSW or IVFFlat. AlloyDB provides built-in optimizations for pgvector by adding a scalar quantization feature (SQ8) to IVF index creation that can significantly speed up queries.<br>- SQ8 supports vectors with up to 8000 dimensions. <br>- You can choose among 3 distance functions: vector_l2_ops (L2 distance), vector_ip_ops (Inner product) or vector_cosine_ops (Cosine distance).</li></ul><pre>CREATE INDEX embedding_column_idx ON my_products<br> USING ivf (embedding_column vector_l2_ops)<br> WITH (lists = 20, quantizer = 'SQ8');</pre><h4>Vector search</h4><ul><li>Perform vector search using the pgvector nearest-neighbor operator <-> in order to find the database rows with the most semantically similar embeddings:</li></ul><pre>SELECT product_name FROM my_products<br> ORDER BY embedding_column<br> <-> embedding(('textembedding-gecko@003', 'I need a phone that provides the best photography quality')::vector<br> LIMIT 10;</pre><p>Check the following links for more information:</p><ul><li><a href="https://cloud.google.com/alloydb/docs/ai/work-with-embeddings">Work with vector embeddings on AlloyDB</a></li><li><a href="https://github.com/pgvector/pgvector#indexing">pgvector indexing</a></li></ul><h3>BigQuery</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*2XV9VBLwcU7bdryyp-U5DQ.png" /></figure><h4>Setup</h4><ul><li>BigQuery is a serverless service and no resource setup is needed for it.</li><li>Create a remote connection to Vertex AI remote models:</li></ul><pre>bq mk --connection --location=US --project_id={PROJECT_ID} --connection_type=CLOUD_RESOURCE vertex_embeddings</pre><ul><li>Grant ‘Vertex AI User’ role to the service account of the created connection:</li></ul><pre>gcloud projects add-iam-policy-binding {PROJECT_ID} \<br> --member='serviceAccount:{CONNECTION_SERVICE_ACCOUNT}' \<br> --role='roles/aiplatform.user'</pre><h4>Embeddings generation</h4><ul><li>Create a remote embedding model to represented the hosted textembedding-gecko model:</li></ul><pre>CREATE OR REPLACE MODEL test_embeddings.llm_embedding_model<br> REMOTE WITH CONNECTION `us.vertex_embeddings`<br> OPTIONS(ENDPOINT='textembedding-gecko@003');</pre><ul><li>You can now generate text embeddings using the <a href="https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-embedding">ML.GENERATE_EMBEDDING</a> function:<br>- We use data from a public dataset table called imdb.reviews in this example.<br>- The text_embedding column is of type ARRAY<FLOAT> with 768-dimensions.</li></ul><pre>CREATE OR REPLACE TABLE test_embeddings.embedded_reviews<br>AS SELECT content as review, text_embedding<br>FROM<br> ML.GENERATE_TEXT_EMBEDDING(<br> MODEL `test_embeddings.llm_embedding_model`,<br> (SELECT review as content<br> FROM bigquery-public-data.imdb.reviews limit 8000<br> ),<br> STRUCT(TRUE AS flatten_json_output)<br> );</pre><h4>Vector index</h4><ul><li>Create vector index on the embeddings column. Vector index enables Approximate Nearest Neighbor search to help improve vector search performance.<br>- Currently supported distance types are EUCLIDEAN (L2) and COSINE.<br>- Currently only IVF is supported for index type.<br>- The created index is fully managed by BigQuery, the refresh happens automatically as data changes.<br>- The metadata information of the vector index is available via <a href="https://cloud.google.com/bigquery/docs/information-schema-vector-indexes">INFORMATION_SCHEMA.VECTOR_INDEXES</a> view.</li></ul><pre>CREATE VECTOR INDEX embedded_reviews_idx ON test_embeddings.embedded_reviews(text_embedding) OPTIONS(distance_type = 'EUCLIDEAN', index_type='IVF');</pre><h4>Vector search</h4><ul><li>Use VECTOR_SEARCH function to perform text similarity search:<br>- It first generates embeddings from the text query, then compares them to the column `embeddings.embedded_reviews.text_embedding`.</li></ul><pre>SELECT<br> *<br>FROM<br> VECTOR_SEARCH( TABLE `embeddings.embedded_reviews`, 'text_embedding', (<br> SELECT<br> ml_generate_embedding_result,<br> content AS query<br> FROM<br> ML.GENERATE_EMBEDDING( MODEL `embeddings.llm_embedding_model`,<br> (<br> SELECT 'Our family enjoyed this movie, especially the kids were so fascinated by the magical world' AS content<br> )) <br> ),<br> top_k => 5);</pre><p>Check the following links for more information:</p><ul><li><a href="https://cloud.google.com/bigquery/docs/vector-search">Search embeddings with vector search</a></li><li>A <a href="https://github.com/steveloh/demo/blob/main/bigquery/notebook/BigQuery%20Embedding%20and%20Vector%20Search.ipynb">notebook</a> that I created to showcase embedding and vector search in BigQuery.</li></ul><h3>Vertex AI Vector Search</h3><p>Vertex AI is a unified machine learning platform that simplifies and accelerates the end-to-end process of building, deploying, and managing ML models at scale. Vector Search (previously known as Matching Engine) provides highly scalable and performant vector similarity search.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*6Qe6HxbxDsnrxlpgz5BGIA.png" /></figure><p>Following code snippets are based on Python.</p><h4>Setup</h4><ul><li>Import aiplatform package:</li></ul><pre>from google.cloud import aiplatform<br>aiplatform.init(project=PROJECT_ID, location=LOCATION)</pre><ul><li>Vector Search does not provide services to generate embeddings. You can for example generate embeddings via BigQuery, then export the embeddings to a file in a storage bucket, before importing them into Vector Search.</li></ul><h4>Vector index</h4><ul><li>Create a vector index endpoint, which is a server instance that accepts query requests for your index.</li></ul><pre>my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(<br> display_name = f"index-endpoint-{PROJECT_ID}",<br> public_endpoint_enabled = True<br>)</pre><ul><li>Create a vector search index:<br>- EMBEDDING_BUCKET_URI is where you store the files with embeddings, read here about the required <a href="https://cloud.google.com/vertex-ai/docs/vector-search/setup/format-structure">input data format and structure</a><br>- Approximate_neighbors_count specifies the number of neighbors to find through approximate search before exact reordering is performed.<br>- See here for available <a href="https://cloud.google.com/vertex-ai/docs/vector-search/configuring-indexes#distance-measure-type">distance measure type</a>.</li></ul><pre><br>my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(<br> display_name={DISPLAY_NAME},<br> contents_delta_uri={EMBEDDING_BUCKET_URI},<br> dimensions=768,<br> approximate_neighbors_count=10,<br> distance_measure_type="DOT_PRODUCT_DISTANCE",<br>)</pre><ul><li>Deploy the index to the index endpoint:</li></ul><pre>my_index_endpoint = my_index_endpoint.deploy_index(<br> index=my_index, deployed_index_id = DEPLOYED_INDEX_ID<br>)</pre><h4>Vector search</h4><ul><li>Now you can search the vector index using a query embedding:</li></ul><pre># get the query embedding<br>model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003") <br>query = "Our family enjoyed this movie, especially the kids were so fascinated by the magical world"<br>query_embeddings = model.get_embeddings([query])[0]<br><br># query the index endpoint to find 3 nearest neighbors.<br>response = my_index_endpoint.find_neighbors(<br> deployed_index_id=my_index_endpoint.deployed_indexes[0].id,<br> queries=[query_embeddings.values],<br> num_neighbors=3,<br>)</pre><p>I have created a <a href="https://github.com/steveloh/demo/blob/main/vertex/vector-search/vertex-vector-search-python-call.ipynb">notebook</a> to demonstrate how to do vector search in Vertex AI.</p><h3>Summary</h3><p>2023 was the booming year of GenAI foundation models, while this year organizations will focus on building applications harnessing values from these models. This may include accelerating organization’s access to insights, improving productivity, streamlining operations and business processes and building innovative product services. Vector storage and vector search are the backbone for storing and organizing the rich semantic information to ground generative AI models. <strong>Their ability to handle various structures of data, power meaningful search, scale efficiently, and support rapid development makes them the ideal engine for the next generation of AI innovation.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=46cedad68934" width="1" height="1" alt=""><hr><p><a href="https://medium.com/google-cloud/making-sense-of-vector-search-and-embeddings-across-gcp-products-46cedad68934">Making sense of Vector Search and Embeddings across GCP products</a> was originally published in <a href="https://medium.com/google-cloud">Google Cloud - Community</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>' rss_fields: - title - published - categories - entry_id - content - url - author url: https://medium.com/google-cloud/making-sense-of-vector-search-and-embeddings-across-gcp-products-46cedad68934?source=rss----e52cf94d98af---4 author: Steve Loh
Language
Active
Ricc internal notes
Imported via /usr/local/google/home/ricc/git/gemini-news-crawler/webapp/db/seeds.d/import-feedjira.rb on 2024-04-03 13:09:00 +0200. Content is EMPTY here. Entried: title,published,categories,entry_id,content,url,author. TODO add Newspaper: filename = /usr/local/google/home/ricc/git/gemini-news-crawler/webapp/db/seeds.d/../../../crawler/out/feedjira/Blogs/Google Cloud - Medium/2024-04-03-Making_sense_of_Vector_Search_and_Embeddings_across_GCP_products-v2.yaml
Ricc source
Show this article
Back to articles