♊️ GemiNews 🗞️ (dev)

Demo 1: Embeddings + Recommendation Demo 2: Bella RAGa Demo 3: NewRetriever Demo 4: Assistant function calling

🗞️Making sense of Vector Search and Embeddings across GCP products

🗿Semantically Similar Articles (by :title_embedding)

Making sense of Vector Search and Embeddings across GCP products

2024-04-03 - Steve Loh (from Google Cloud - Medium)

IntroMany of you have already used the Large Language Model (LLM) from Generative AI. These models are great in performing certain creative tasks like content generation, text summarization, entity extraction and etc, but that’s not sufficient for enterprises that need to:provide accurate and up-to-date information (reducing hallucination)offer contextual user experiencesoffer secure and governed access to the dataHence comes the Retrieval-Augmented Generation technique (RAG) to fulfill those requirements. It combines the power of LLMs with the ability to reference external knowledge sources, by incorporating the following 2 systems:Retrieval: When a user asks a question, RAG first searches through a database of documents or text to find relevant passages.Generation: the user then sends the retrieved information along as the context in the LLM prompt, effectively grounding LLM’s language understanding with specific knowledge in order to generate a more informed and accurate answer.So how does the RAG retrieval system find the relevant knowledge? Welcome to the world of embeddings and vector search.Vector embeddings are numerical representations of text that capture the semantic meaning and relationships between words and concepts. You would use a pre-trained model to help generate embeddings. For example the Google Vertex textembedding-gecko model generates a 768-dimensional embedding, while a multimodal embedding model would generate a 128, 256, 512, or 1408 dimensional embedding.Vector search comes into play by comparing the user’s query embedding to the vectors representing documents or passages in the knowledge base. This comparison uses similarity metrics to find the most relevant pieces of information based on their semantic closeness to the query.Now with these concepts explained, you can implement RAG with the following steps:Break down large documents or text corpus using a suitable chunking strategyGenerate embeddings for each chunk using a selected embedding modelStore the chunked data and vector embeddings together in a vector databaseUser posts a prompt queryUse the same pre-trained embedding model to generate a vector embedding for the user queryUse the query embedding to search for most similar embeddings in vector database, then retrieve the corresponding data chunkCreate a new prompt for the LLM by incorporating the retrieved chunked text alongside the original user queryVector embeddings need to be stored in a vector database before you can search for embeddings. But adding a vector database to your software stack increases complexity, cost and learning curve. The great news is that most of the GCP data products already support vector out of the box, which means users will no longer need to choose between vector query and other critical database functionality. For example, all GCP transactional databases aims to fully support Vector features in near future:AlloyDB (GA)Cloud SQL for PostgreSQL (GA)Cloud SQL for MySQL (Preview)Spanner (Preview)Memorystore for Redis (Preview)Firestore (Preview)Bigtable (Preview)Here I will showcase vector implementation across 3 main data product families on GCP:AlloyDB — Transactional databaseBigQuery — Enterprise data warehouseVertex AI Vector Search — Machine learning platformDisclaimerI work as a Data Analytics practice lead in Google Cloud, This article is my own opinion and does not reflect the views of my employer.Please take note that by the time you read this article, the information may already be obsolete as GenAI is a fast developing domain and Google Cloud is actively releasing new product features in this space.AlloyDBAlloyDB is a fully Managed, PostgreSQL-compatible cloud native database service built to deliver superior performance, scalability, and high availability for most demanding enterprise workloads. It now comes with AlloyDB AI feature suite that provides the semantic and predictive power of ML models for your data out-of-the-box.SetupMake sure you already have an AlloyDB cluster and instance setup.Enable Vertex AI integration and the pgvector extensions in your AlloyDB instance:psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS vector"Embeddings generationCreate a new column with the type vector to store the embeddings:- The vector dimension should match the model that you use. For example textembedding-gecko model has 768 dimensions.- AlloyDB implements embeddings as arrays of real values, but it can automatically cast from real array to a vector value.ALTER TABLE my_products ADD COLUMN embedding_column VECTOR(768);To generate embedding, use the embedding() function:- To use textembedding-gecko model, the AlloyDB cluster must reside in region us-central1 to match the region of the model.- You can invoke predictions to get around the region restriction.- 003 is the latest version of textembedding-gecko model. Note that it’s always advisable to specify the version tag to avoid mistakes, as a new published model may return different embeddings.SELECT embedding('textembedding-gecko@003', 'Google Pixel 8 Pro redefines smartphone photography with its advanced AI-powered camera system');To generate embedding value based on another column:UPDATE my_products SET embedding_column = embedding(( 'textembedding-gecko@003', product_description);Alternatively, you can also create an embedding column with default value generated from another column:ALTER TABLE my_products ADD COLUMN embedding_column vector GENERATED ALWAYS AS (embedding('textembedding-gecko@003', product_description)) STORED;Vector indexBy default pgvector performs exact nearest neighbor search which provides perfect recall. It can support approximate nearest-neighbor searching through indexing of HNSW or IVFFlat. AlloyDB provides built-in optimizations for pgvector by adding a scalar quantization feature (SQ8) to IVF index creation that can significantly speed up queries.- SQ8 supports vectors with up to 8000 dimensions. - You can choose among 3 distance functions: vector_l2_ops (L2 distance), vector_ip_ops (Inner product) or vector_cosine_ops (Cosine distance).CREATE INDEX embedding_column_idx ON my_products USING ivf (embedding_column vector_l2_ops) WITH (lists = 20, quantizer = 'SQ8');Vector searchPerform vector search using the pgvector nearest-neighbor operator <-> in order to find the database rows with the most semantically similar embeddings:SELECT product_name FROM my_products ORDER BY embedding_column <-> embedding(('textembedding-gecko@003', 'I need a phone that provides the best photography quality')::vector LIMIT 10;Check the following links for more information:Work with vector embeddings on AlloyDBpgvector indexingBigQuerySetupBigQuery is a serverless service and no resource setup is needed for it.Create a remote connection to Vertex AI remote models:bq mk --connection --location=US --project_id={PROJECT_ID} --connection_type=CLOUD_RESOURCE vertex_embeddingsGrant ‘Vertex AI User’ role to the service account of the created connection:gcloud projects add-iam-policy-binding {PROJECT_ID} \ --member='serviceAccount:{CONNECTION_SERVICE_ACCOUNT}' \ --role='roles/aiplatform.user'Embeddings generationCreate a remote embedding model to represented the hosted textembedding-gecko model:CREATE OR REPLACE MODEL test_embeddings.llm_embedding_model REMOTE WITH CONNECTION `us.vertex_embeddings` OPTIONS(ENDPOINT='textembedding-gecko@003');You can now generate text embeddings using the ML.GENERATE_EMBEDDING function:- We use data from a public dataset table called imdb.reviews in this example.- The text_embedding column is of type ARRAY<FLOAT> with 768-dimensions.CREATE OR REPLACE TABLE test_embeddings.embedded_reviewsAS SELECT content as review, text_embeddingFROM ML.GENERATE_TEXT_EMBEDDING( MODEL `test_embeddings.llm_embedding_model`, (SELECT review as content FROM bigquery-public-data.imdb.reviews limit 8000 ), STRUCT(TRUE AS flatten_json_output) );Vector indexCreate vector index on the embeddings column. Vector index enables Approximate Nearest Neighbor search to help improve vector search performance.- Currently supported distance types are EUCLIDEAN (L2) and COSINE.- Currently only IVF is supported for index type.- The created index is fully managed by BigQuery, the refresh happens automatically as data changes.- The metadata information of the vector index is available via INFORMATION_SCHEMA.VECTOR_INDEXES view.CREATE VECTOR INDEX embedded_reviews_idx ON test_embeddings.embedded_reviews(text_embedding) OPTIONS(distance_type = 'EUCLIDEAN', index_type='IVF');Vector searchUse VECTOR_SEARCH function to perform text similarity search:- It first generates embeddings from the text query, then compares them to the column `embeddings.embedded_reviews.text_embedding`.SELECT *FROM VECTOR_SEARCH( TABLE `embeddings.embedded_reviews`, 'text_embedding', ( SELECT ml_generate_embedding_result, content AS query FROM ML.GENERATE_EMBEDDING( MODEL `embeddings.llm_embedding_model`, ( SELECT 'Our family enjoyed this movie, especially the kids were so fascinated by the magical world' AS content )) ), top_k => 5);Check the following links for more information:Search embeddings with vector searchA notebook that I created to showcase embedding and vector search in BigQuery.Vertex AI Vector SearchVertex AI is a unified machine learning platform that simplifies and accelerates the end-to-end process of building, deploying, and managing ML models at scale. Vector Search (previously known as Matching Engine) provides highly scalable and performant vector similarity search.Following code snippets are based on Python.SetupImport aiplatform package:from google.cloud import aiplatformaiplatform.init(project=PROJECT_ID, location=LOCATION)Vector Search does not provide services to generate embeddings. You can for example generate embeddings via BigQuery, then export the embeddings to a file in a storage bucket, before importing them into Vector Search.Vector indexCreate a vector index endpoint, which is a server instance that accepts query requests for your index.my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create( display_name = f"index-endpoint-{PROJECT_ID}", public_endpoint_enabled = True)Create a vector search index:- EMBEDDING_BUCKET_URI is where you store the files with embeddings, read here about the required input data format and structure- Approximate_neighbors_count specifies the number of neighbors to find through approximate search before exact reordering is performed.- See here for available distance measure type.my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index( display_name={DISPLAY_NAME}, contents_delta_uri={EMBEDDING_BUCKET_URI}, dimensions=768, approximate_neighbors_count=10, distance_measure_type="DOT_PRODUCT_DISTANCE",)Deploy the index to the index endpoint:my_index_endpoint = my_index_endpoint.deploy_index( index=my_index, deployed_index_id = DEPLOYED_INDEX_ID)Vector searchNow you can search the vector index using a query embedding:# get the query embeddingmodel = TextEmbeddingModel.from_pretrained("textembedding-gecko@003") query = "Our family enjoyed this movie, especially the kids were so fascinated by the magical world"query_embeddings = model.get_embeddings([query])[0]# query the index endpoint to find 3 nearest neighbors.response = my_index_endpoint.find_neighbors( deployed_index_id=my_index_endpoint.deployed_indexes[0].id, queries=[query_embeddings.values], num_neighbors=3,)I have created a notebook to demonstrate how to do vector search in Vertex AI.Summary2023 was the booming year of GenAI foundation models, while this year organizations will focus on building applications harnessing values from these models. This may include accelerating organization’s access to insights, improving productivity, streamlining operations and business processes and building innovative product services. Vector storage and vector search are the backbone for storing and organizing the rich semantic information to ground generative AI models. Their ability to handle various structures of data, power meaningful search, scale efficiently, and support rapid development makes them the ideal engine for the next generation of AI innovation.Making sense of Vector Search and Embeddings across GCP products was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

[Blogs] 🌎 https://medium.com/google-cloud/making-sense-of-vector-search-and-embeddings-across-gcp-products-46cedad68934?source=rss----e52cf94d98af---4 [🧠] [v2] article_embedding_description: {:llm_project_id=>"Unavailable", :llm_dimensions=>nil, :article_size=>17043, :llm_embeddings_model_name=>"textembedding-gecko"}
[🧠] [v1/3] title_embedding_description: {:ricc_notes=>"[embed-v3] Fixed on 9oct24. Only seems incompatible at first glance with embed v1.", :llm_project_id=>"unavailable possibly not using Vertex", :llm_dimensions=>nil, :article_size=>17043, :poly_field=>"title", :llm_embeddings_model_name=>"textembedding-gecko"}
[🧠] [v1/3] summary_embedding_description:
[🧠] As per bug https://github.com/palladius/gemini-news-crawler/issues/4 we can state this article belongs to titile/summary version: v3 (very few articles updated on 9oct24)

🗿article.to_s

------------------------------
Title: Making sense of Vector Search and Embeddings across GCP products
[content]
IntroMany of you have already used the Large Language Model (LLM) from Generative AI. These models are great in performing certain creative tasks like content generation, text summarization, entity extraction and etc, but that’s not sufficient for enterprises that need to:provide accurate and up-to-date information (reducing hallucination)offer contextual user experiencesoffer secure and governed access to the dataHence comes the Retrieval-Augmented Generation technique (RAG) to fulfill those requirements. It combines the power of LLMs with the ability to reference external knowledge sources, by incorporating the following 2 systems:Retrieval: When a user asks a question, RAG first searches through a database of documents or text to find relevant passages.Generation: the user then sends the retrieved information along as the context in the LLM prompt, effectively grounding LLM’s language understanding with specific knowledge in order to generate a more informed and accurate answer.So how does the RAG retrieval system find the relevant knowledge? Welcome to the world of embeddings and vector search.Vector embeddings are numerical representations of text that capture the semantic meaning and relationships between words and concepts. You would use a pre-trained model to help generate embeddings. For example the Google Vertex textembedding-gecko model generates a 768-dimensional embedding, while a multimodal embedding model would generate a 128, 256, 512, or 1408 dimensional embedding.Vector search comes into play by comparing the user’s query embedding to the vectors representing documents or passages in the knowledge base. This comparison uses similarity metrics to find the most relevant pieces of information based on their semantic closeness to the query.Now with these concepts explained, you can implement RAG with the following steps:Break down large documents or text corpus using a suitable chunking strategyGenerate embeddings for each chunk using a selected embedding modelStore the chunked data and vector embeddings together in a vector databaseUser posts a prompt queryUse the same pre-trained embedding model to generate a vector embedding for the user queryUse the query embedding to search for most similar embeddings in vector database, then retrieve the corresponding data chunkCreate a new prompt for the LLM by incorporating the retrieved chunked text alongside the original user queryVector embeddings need to be stored in a vector database before you can search for embeddings. But adding a vector database to your software stack increases complexity, cost and learning curve. The great news is that most of the GCP data products already support vector out of the box, which means users will no longer need to choose between vector query and other critical database functionality. For example, all GCP transactional databases aims to fully support Vector features in near future:AlloyDB (GA)Cloud SQL for PostgreSQL (GA)Cloud SQL for MySQL (Preview)Spanner (Preview)Memorystore for Redis (Preview)Firestore (Preview)Bigtable (Preview)Here I will showcase vector implementation across 3 main data product families on GCP:AlloyDB — Transactional databaseBigQuery — Enterprise data warehouseVertex AI Vector Search — Machine learning platformDisclaimerI work as a Data Analytics practice lead in Google Cloud, This article is my own opinion and does not reflect the views of my employer.Please take note that by the time you read this article, the information may already be obsolete as GenAI is a fast developing domain and Google Cloud is actively releasing new product features in this space.AlloyDBAlloyDB is a fully Managed, PostgreSQL-compatible cloud native database service built to deliver superior performance, scalability, and high availability for most demanding enterprise workloads. It now comes with AlloyDB AI feature suite that provides the semantic and predictive power of ML models for your data out-of-the-box.SetupMake sure you already have an AlloyDB cluster and instance setup.Enable Vertex AI integration and the pgvector extensions in your AlloyDB instance:psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS vector"Embeddings generationCreate a new column with the type vector to store the embeddings:- The vector dimension should match the model that you use. For example textembedding-gecko model has 768 dimensions.- AlloyDB implements embeddings as arrays of real values, but it can automatically cast from real array to a vector value.ALTER TABLE my_products ADD COLUMN embedding_column VECTOR(768);To generate embedding, use the embedding() function:- To use textembedding-gecko model, the AlloyDB cluster must reside in region us-central1 to match the region of the model.- You can invoke predictions to get around the region restriction.- 003 is the latest version of textembedding-gecko model. Note that it’s always advisable to specify the version tag to avoid mistakes, as a new published model may return different embeddings.SELECT embedding('textembedding-gecko@003', 'Google Pixel 8 Pro redefines smartphone photography with its advanced AI-powered camera system');To generate embedding value based on another column:UPDATE my_products SET embedding_column = embedding(( 'textembedding-gecko@003', product_description);Alternatively, you can also create an embedding column with default value generated from another column:ALTER TABLE my_products ADD COLUMN embedding_column vector GENERATED ALWAYS AS (embedding('textembedding-gecko@003', product_description)) STORED;Vector indexBy default pgvector performs exact nearest neighbor search which provides perfect recall. It can support approximate nearest-neighbor searching through indexing of HNSW or IVFFlat. AlloyDB provides built-in optimizations for pgvector by adding a scalar quantization feature (SQ8) to IVF index creation that can significantly speed up queries.- SQ8 supports vectors with up to 8000 dimensions. - You can choose among 3 distance functions: vector_l2_ops (L2 distance), vector_ip_ops (Inner product) or vector_cosine_ops (Cosine distance).CREATE INDEX embedding_column_idx ON my_products  USING ivf (embedding_column vector_l2_ops)  WITH (lists = 20, quantizer = 'SQ8');Vector searchPerform vector search using the pgvector nearest-neighbor operator <-> in order to find the database rows with the most semantically similar embeddings:SELECT product_name FROM my_products    ORDER BY embedding_column    <-> embedding(('textembedding-gecko@003', 'I need a phone that provides the best photography quality')::vector    LIMIT 10;Check the following links for more information:Work with vector embeddings on AlloyDBpgvector indexingBigQuerySetupBigQuery is a serverless service and no resource setup is needed for it.Create a remote connection to Vertex AI remote models:bq mk --connection --location=US --project_id={PROJECT_ID}  --connection_type=CLOUD_RESOURCE vertex_embeddingsGrant ‘Vertex AI User’ role to the service account of the created connection:gcloud projects add-iam-policy-binding {PROJECT_ID} \  --member='serviceAccount:{CONNECTION_SERVICE_ACCOUNT}' \  --role='roles/aiplatform.user'Embeddings generationCreate a remote embedding model to represented the hosted textembedding-gecko model:CREATE OR REPLACE MODEL test_embeddings.llm_embedding_model  REMOTE WITH CONNECTION `us.vertex_embeddings`  OPTIONS(ENDPOINT='textembedding-gecko@003');You can now generate text embeddings using the ML.GENERATE_EMBEDDING function:- We use data from a public dataset table called imdb.reviews in this example.- The text_embedding column is of type ARRAY<FLOAT> with 768-dimensions.CREATE OR REPLACE TABLE test_embeddings.embedded_reviewsAS SELECT content as review, text_embeddingFROM  ML.GENERATE_TEXT_EMBEDDING(    MODEL `test_embeddings.llm_embedding_model`,    (SELECT review as content      FROM bigquery-public-data.imdb.reviews limit 8000    ),    STRUCT(TRUE AS flatten_json_output)  );Vector indexCreate vector index on the embeddings column. Vector index enables Approximate Nearest Neighbor search to help improve vector search performance.- Currently supported distance types are EUCLIDEAN (L2) and COSINE.- Currently only IVF is supported for index type.- The created index is fully managed by BigQuery, the refresh happens automatically as data changes.- The metadata information of the vector index is available via INFORMATION_SCHEMA.VECTOR_INDEXES view.CREATE VECTOR INDEX embedded_reviews_idx ON test_embeddings.embedded_reviews(text_embedding) OPTIONS(distance_type = 'EUCLIDEAN', index_type='IVF');Vector searchUse VECTOR_SEARCH function to perform text similarity search:- It first generates embeddings from the text query, then compares them to the column `embeddings.embedded_reviews.text_embedding`.SELECT  *FROM  VECTOR_SEARCH( TABLE `embeddings.embedded_reviews`, 'text_embedding', (    SELECT      ml_generate_embedding_result,      content AS query    FROM      ML.GENERATE_EMBEDDING( MODEL `embeddings.llm_embedding_model`,        (        SELECT 'Our family enjoyed this movie, especially the kids were so fascinated by the magical world' AS content        ))     ),    top_k => 5);Check the following links for more information:Search embeddings with vector searchA notebook that I created to showcase embedding and vector search in BigQuery.Vertex AI Vector SearchVertex AI is a unified machine learning platform that simplifies and accelerates the end-to-end process of building, deploying, and managing ML models at scale. Vector Search (previously known as Matching Engine) provides highly scalable and performant vector similarity search.Following code snippets are based on Python.SetupImport aiplatform package:from google.cloud import aiplatformaiplatform.init(project=PROJECT_ID, location=LOCATION)Vector Search does not provide services to generate embeddings. You can for example generate embeddings via BigQuery, then export the embeddings to a file in a storage bucket, before importing them into Vector Search.Vector indexCreate a vector index endpoint, which is a server instance that accepts query requests for your index.my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(    display_name = f"index-endpoint-{PROJECT_ID}",    public_endpoint_enabled = True)Create a vector search index:- EMBEDDING_BUCKET_URI is where you store the files with embeddings, read here about the required input data format and structure- Approximate_neighbors_count specifies the number of neighbors to find through approximate search before exact reordering is performed.- See here for available distance measure type.my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(    display_name={DISPLAY_NAME},    contents_delta_uri={EMBEDDING_BUCKET_URI},    dimensions=768,    approximate_neighbors_count=10,    distance_measure_type="DOT_PRODUCT_DISTANCE",)Deploy the index to the index endpoint:my_index_endpoint = my_index_endpoint.deploy_index(    index=my_index, deployed_index_id = DEPLOYED_INDEX_ID)Vector searchNow you can search the vector index using a query embedding:# get the query embeddingmodel = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")  query = "Our family enjoyed this movie, especially the kids were so fascinated by the magical world"query_embeddings = model.get_embeddings([query])[0]# query the index endpoint to find 3 nearest neighbors.response = my_index_endpoint.find_neighbors(    deployed_index_id=my_index_endpoint.deployed_indexes[0].id,    queries=[query_embeddings.values],    num_neighbors=3,)I have created a notebook to demonstrate how to do vector search in Vertex AI.Summary2023 was the booming year of GenAI foundation models, while this year organizations will focus on building applications harnessing values from these models. This may include accelerating organization’s access to insights, improving productivity, streamlining operations and business processes and building innovative product services. Vector storage and vector search are the backbone for storing and organizing the rich semantic information to ground generative AI models. Their ability to handle various structures of data, power meaningful search, scale efficiently, and support rapid development makes them the ideal engine for the next generation of AI innovation.Making sense of Vector Search and Embeddings across GCP products was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
[/content]

Author: Steve Loh
PublishedDate: 2024-04-03
Category: Blogs
NewsPaper: Google Cloud - Medium
Tags: vector-search, google-cloud-platform, embedding, generative-ai, machine-learning
{"id"=>5259,
"title"=>"Making sense of Vector Search and Embeddings across GCP products",
"summary"=>nil,
"content"=>"

Intro

Many of you have already used the Large Language Model (LLM) from Generative AI. These models are great in performing certain creative tasks like content generation, text summarization, entity extraction and etc, but that’s not sufficient for enterprises that need to:

  • provide accurate and up-to-date information (reducing hallucination)
  • offer contextual user experiences
  • offer secure and governed access to the data

Hence comes the Retrieval-Augmented Generation technique (RAG) to fulfill those requirements. It combines the power of LLMs with the ability to reference external knowledge sources, by incorporating the following 2 systems:

  • Retrieval: When a user asks a question, RAG first searches through a database of documents or text to find relevant passages.
  • Generation: the user then sends the retrieved information along as the context in the LLM prompt, effectively grounding LLM’s language understanding with specific knowledge in order to generate a more informed and accurate answer.

So how does the RAG retrieval system find the relevant knowledge? Welcome to the world of embeddings and vector search.

  • Vector embeddings are numerical representations of text that capture the semantic meaning and relationships between words and concepts. You would use a pre-trained model to help generate embeddings. For example the Google Vertex textembedding-gecko model generates a 768-dimensional embedding, while a multimodal embedding model would generate a 128, 256, 512, or 1408 dimensional embedding.
  • Vector search comes into play by comparing the user’s query embedding to the vectors representing documents or passages in the knowledge base. This comparison uses similarity metrics to find the most relevant pieces of information based on their semantic closeness to the query.

Now with these concepts explained, you can implement RAG with the following steps:

\"\"
  • Break down large documents or text corpus using a suitable chunking strategy
  • Generate embeddings for each chunk using a selected embedding model
  • Store the chunked data and vector embeddings together in a vector database
  • User posts a prompt query
  • Use the same pre-trained embedding model to generate a vector embedding for the user query
  • Use the query embedding to search for most similar embeddings in vector database, then retrieve the corresponding data chunk
  • Create a new prompt for the LLM by incorporating the retrieved chunked text alongside the original user query

Vector embeddings need to be stored in a vector database before you can search for embeddings. But adding a vector database to your software stack increases complexity, cost and learning curve. The great news is that most of the GCP data products already support vector out of the box, which means users will no longer need to choose between vector query and other critical database functionality. For example, all GCP transactional databases aims to fully support Vector features in near future:

  • AlloyDB (GA)
  • Cloud SQL for PostgreSQL (GA)
  • Cloud SQL for MySQL (Preview)
  • Spanner (Preview)
  • Memorystore for Redis (Preview)
  • Firestore (Preview)
  • Bigtable (Preview)

Here I will showcase vector implementation across 3 main data product families on GCP:

  • AlloyDB — Transactional database
  • BigQuery — Enterprise data warehouse
  • Vertex AI Vector Search — Machine learning platform
Disclaimer
I work as a Data Analytics practice lead in Google Cloud, This article is my own opinion and does not reflect the views of my employer.
Please take note that by the time you read this article, the information may already be obsolete as GenAI is a fast developing domain and Google Cloud is actively releasing new product features in this space.

AlloyDB

\"\"

AlloyDB is a fully Managed, PostgreSQL-compatible cloud native database service built to deliver superior performance, scalability, and high availability for most demanding enterprise workloads. It now comes with AlloyDB AI feature suite that provides the semantic and predictive power of ML models for your data out-of-the-box.

Setup

psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"

psql "host=$INSTANCE_IP user=alloydb_user dbname=vector_db" -c "CREATE EXTENSION IF NOT EXISTS vector"

Embeddings generation

  • Create a new column with the type vector to store the embeddings:
    - The vector dimension should match the model that you use. For example textembedding-gecko model has 768 dimensions.
    - AlloyDB implements embeddings as arrays of real values, but it can automatically cast from real array to a vector value.
ALTER TABLE my_products ADD COLUMN embedding_column VECTOR(768);
  • To generate embedding, use the embedding() function:
    - To use textembedding-gecko model, the AlloyDB cluster must reside in region us-central1 to match the region of the model.
    - You can invoke predictions to get around the region restriction.
    - 003 is the latest version of textembedding-gecko model. Note that it’s always advisable to specify the version tag to avoid mistakes, as a new published model may return different embeddings.
SELECT embedding('textembedding-gecko@003', 'Google Pixel 8 Pro redefines smartphone photography with its advanced AI-powered camera system');
  • To generate embedding value based on another column:
UPDATE my_products SET embedding_column = embedding(( 'textembedding-gecko@003', product_description);
  • Alternatively, you can also create an embedding column with default value generated from another column:
ALTER TABLE my_products ADD COLUMN embedding_column vector GENERATED ALWAYS AS (embedding('textembedding-gecko@003', product_description)) STORED;

Vector index

  • By default pgvector performs exact nearest neighbor search which provides perfect recall. It can support approximate nearest-neighbor searching through indexing of HNSW or IVFFlat. AlloyDB provides built-in optimizations for pgvector by adding a scalar quantization feature (SQ8) to IVF index creation that can significantly speed up queries.
    - SQ8 supports vectors with up to 8000 dimensions.
    - You can choose among 3 distance functions: vector_l2_ops (L2 distance), vector_ip_ops (Inner product) or vector_cosine_ops (Cosine distance).
CREATE INDEX embedding_column_idx ON my_products
USING ivf (embedding_column vector_l2_ops)
WITH (lists = 20, quantizer = 'SQ8');

Vector search

  • Perform vector search using the pgvector nearest-neighbor operator <-> in order to find the database rows with the most semantically similar embeddings:
SELECT product_name FROM my_products
ORDER BY embedding_column
<-> embedding(('textembedding-gecko@003', 'I need a phone that provides the best photography quality')::vector
LIMIT 10;

Check the following links for more information:

BigQuery

\"\"

Setup

  • BigQuery is a serverless service and no resource setup is needed for it.
  • Create a remote connection to Vertex AI remote models:
bq mk --connection --location=US --project_id={PROJECT_ID}  --connection_type=CLOUD_RESOURCE vertex_embeddings
  • Grant ‘Vertex AI User’ role to the service account of the created connection:
gcloud projects add-iam-policy-binding {PROJECT_ID} \\
--member='serviceAccount:{CONNECTION_SERVICE_ACCOUNT}' \\
--role='roles/aiplatform.user'

Embeddings generation

  • Create a remote embedding model to represented the hosted textembedding-gecko model:
CREATE OR REPLACE MODEL test_embeddings.llm_embedding_model
REMOTE WITH CONNECTION `us.vertex_embeddings`
OPTIONS(ENDPOINT='textembedding-gecko@003');
  • You can now generate text embeddings using the ML.GENERATE_EMBEDDING function:
    - We use data from a public dataset table called imdb.reviews in this example.
    - The text_embedding column is of type ARRAY<FLOAT> with 768-dimensions.
CREATE OR REPLACE TABLE test_embeddings.embedded_reviews
AS SELECT content as review, text_embedding
FROM
ML.GENERATE_TEXT_EMBEDDING(
MODEL `test_embeddings.llm_embedding_model`,
(SELECT review as content
FROM bigquery-public-data.imdb.reviews limit 8000
),
STRUCT(TRUE AS flatten_json_output)
);

Vector index

  • Create vector index on the embeddings column. Vector index enables Approximate Nearest Neighbor search to help improve vector search performance.
    - Currently supported distance types are EUCLIDEAN (L2) and COSINE.
    - Currently only IVF is supported for index type.
    - The created index is fully managed by BigQuery, the refresh happens automatically as data changes.
    - The metadata information of the vector index is available via INFORMATION_SCHEMA.VECTOR_INDEXES view.
CREATE VECTOR INDEX embedded_reviews_idx ON test_embeddings.embedded_reviews(text_embedding) OPTIONS(distance_type = 'EUCLIDEAN', index_type='IVF');

Vector search

  • Use VECTOR_SEARCH function to perform text similarity search:
    - It first generates embeddings from the text query, then compares them to the column `embeddings.embedded_reviews.text_embedding`.
SELECT
*
FROM
VECTOR_SEARCH( TABLE `embeddings.embedded_reviews`, 'text_embedding', (
SELECT
ml_generate_embedding_result,
content AS query
FROM
ML.GENERATE_EMBEDDING( MODEL `embeddings.llm_embedding_model`,
(
SELECT 'Our family enjoyed this movie, especially the kids were so fascinated by the magical world' AS content
))
),
top_k => 5);

Check the following links for more information:

Vertex AI Vector Search

Vertex AI is a unified machine learning platform that simplifies and accelerates the end-to-end process of building, deploying, and managing ML models at scale. Vector Search (previously known as Matching Engine) provides highly scalable and performant vector similarity search.

\"\"

Following code snippets are based on Python.

Setup

  • Import aiplatform package:
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
  • Vector Search does not provide services to generate embeddings. You can for example generate embeddings via BigQuery, then export the embeddings to a file in a storage bucket, before importing them into Vector Search.

Vector index

  • Create a vector index endpoint, which is a server instance that accepts query requests for your index.
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
display_name = f"index-endpoint-{PROJECT_ID}",
public_endpoint_enabled = True
)
  • Create a vector search index:
    - EMBEDDING_BUCKET_URI is where you store the files with embeddings, read here about the required input data format and structure
    - Approximate_neighbors_count specifies the number of neighbors to find through approximate search before exact reordering is performed.
    - See here for available distance measure type.

my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
display_name={DISPLAY_NAME},
contents_delta_uri={EMBEDDING_BUCKET_URI},
dimensions=768,
approximate_neighbors_count=10,
distance_measure_type="DOT_PRODUCT_DISTANCE",
)
  • Deploy the index to the index endpoint:
my_index_endpoint = my_index_endpoint.deploy_index(
index=my_index, deployed_index_id = DEPLOYED_INDEX_ID
)

Vector search

  • Now you can search the vector index using a query embedding:
# get the query embedding
model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")
query = "Our family enjoyed this movie, especially the kids were so fascinated by the magical world"
query_embeddings = model.get_embeddings([query])[0]

# query the index endpoint to find 3 nearest neighbors.
response = my_index_endpoint.find_neighbors(
deployed_index_id=my_index_endpoint.deployed_indexes[0].id,
queries=[query_embeddings.values],
num_neighbors=3,
)

I have created a notebook to demonstrate how to do vector search in Vertex AI.

Summary

2023 was the booming year of GenAI foundation models, while this year organizations will focus on building applications harnessing values from these models. This may include accelerating organization’s access to insights, improving productivity, streamlining operations and business processes and building innovative product services. Vector storage and vector search are the backbone for storing and organizing the rich semantic information to ground generative AI models. Their ability to handle various structures of data, power meaningful search, scale efficiently, and support rapid development makes them the ideal engine for the next generation of AI innovation.

\"\"

Making sense of Vector Search and Embeddings across GCP products was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

",
"author"=>"Steve Loh",
"link"=>"https://medium.com/google-cloud/making-sense-of-vector-search-and-embeddings-across-gcp-products-46cedad68934?source=rss----e52cf94d98af---4",
"published_date"=>Wed, 03 Apr 2024 04:49:10.000000000 UTC +00:00,
"image_url"=>nil,
"feed_url"=>"https://medium.com/google-cloud/making-sense-of-vector-search-and-embeddings-across-gcp-products-46cedad68934?source=rss----e52cf94d98af---4",
"language"=>nil,
"active"=>true,
"ricc_source"=>"feedjira::v1",
"created_at"=>Wed, 03 Apr 2024 11:09:01.221646000 UTC +00:00,
"updated_at"=>Mon, 21 Oct 2024 18:28:38.128693000 UTC +00:00,
"newspaper"=>"Google Cloud - Medium",
"macro_region"=>"Blogs"}
Edit this article
Back to articles