"title"=>"Fine Tuning Large Language Models: How Vertex AI Takes LLMs to the Next Level",
"summary"=>nil,
"content"=>"
Introduction
Imagine a world where language models understand your business / industry’s specific needs, where they can generate text perfectly tailored to your unique domain or task. This isn’t science fiction; it’s the power of fine-tuning, and Google Cloud’s Vertex AI is making it accessible to everyone.
Why Fine-Tuning Matters
Tuning a foundation model can improve its performance. Foundation models are trained for general purposes and sometimes don’t perform tasks as well as you’d like them to. This might be because the tasks you want the model to perform are specialized tasks that are difficult to teach a model by using only prompt design. In these cases, you can use model tuning to improve the performance of a model for specific tasks. Model tuning can also help it adhere to specific output requirements when instructions aren’t sufficient. Think of large language models (LLMs) like incredibly talented students. They’ve learned a vast amount of information and can perform many tasks, but they excel when given specialized training. Fine-tuning is that training, allowing you to adapt a pre-trained LLM to your specific needs, whether it’s:
Generating creative content: Imagine an LLM that writes marketing copy in your brand voice or composes poems in the style of your favorite poet.
Summarizing complex information: Need to quickly grasp the key points of a lengthy research paper or news article in a particular domain or specialty like medicine or law? A fine-tuned LLM can do that for you.
Translating languages with nuance: Go beyond literal translations and capture the cultural context and subtle meanings with a fine-tuned LLM.
This article provides an overview of model tuning, describes the tuning options available on Vertex AI, and implements fine tuning using the supervised tuning approach for one of the models. More details about model customization are available here.
Use case: From News Articles to Headlines
Let’s see how this works in practice. Imagine you want to automatically generate headlines for news articles. Using Vertex AI, you can fine tune a Large Language Model that generates a suitable summarized title in a specific style and customization of titles that the news channel follows.
We will use BBC FULLTEXT DATA (made available by BigQuery Public Dataset bigquery-public-data.bbc_news.fulltext). We will fine tune an LLM (text-bison@002) to a new fine-tuned model called “bbc-news-summary-tuned” and compare the result to the response from the base model. The sample JSONL is made available for the implementation, feel free to upload it to your Cloud Storage Bucket to execute the fine tuning steps:
Prepare your data: Start with a dataset of news articles and their corresponding headlines, like the BBC News dataset used in the example code.
Fine-tune a pre-trained model: Choose a base model like “text-bison@002” and fine-tune it on your news data using Vertex AI’s Python sdk.
Evaluate the results: Compare the performance of your fine-tuned model with the base model to see the improvement in headline generation quality.
Deploy and use your model: Make your fine-tuned model available through an API endpoint and start generating headlines for new articles automatically.
For this we are going to use the Supervised Tuning approach. Supervised tuning improves the performance of a model by teaching it a new skill. Data that contains hundreds of labeled examples is used to teach the model to mimic a desired behavior or task. We are going to provide a labeled dataset for input text (prompt) and output text (response) to teach the model how to customize the responses for our specific use case.
Let’s dive in!
Vertex AI: Your Fine-Tuning Partner
Vertex AI provides a comprehensive suite of tools and services to guide you through the entire fine-tuning journey:
Vertex AI Pipelines: Streamline your workflow by building and managing end-to-end machine learning pipelines, including data preparation, model training, evaluation, and deployment.
Vertex AI Evaluation Services: Assess the performance of your fine-tuned model with metrics tailored to your specific task, ensuring it meets your quality standards.
Vertex AI Model Registry: Keep track of all your models, including different versions and their performance metrics, in a centralized repository.
Vertex AI Endpoints: Deploy your fine-tuned model as an API endpoint, making it easily accessible for integration into your applications.
High Level Flow Diagram
This diagram represents the flow of data and steps involved in the implementation. Please note that the owner for the respective step is mentioned in the text underneath.
Industry Use cases
With Vertex AI, the possibilities are endless. You can fine-tune LLMs for sentiment analysis, chatbot development, code generation, and much more. This technology is democratizing access to powerful language models, allowing businesses and individuals to unlock new levels of creativity and efficiency.
Hands-on Time
This implementation is done with Vertex AI Python SDK for Generative AI models. You can also perform fine tuning in other ways — HTTP, CURL command, Java SDK, Console.
In 5 easy steps, you can fine-tune and evaluate your model for your customized responses!
- Install and Import dependencies
!pip install google-cloud-aiplatform
!pip install --user datasets
!pip install --user google-cloud-pipeline-components
Follow the rest of the steps as shown in the .ipynb file in the repo. Make sure you replace the PROJECT_ID and BUCKET_NAME with your credentials.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import warnings
warnings.filterwarnings('ignore')
import vertexai
vertexai.init(project=PROJECT_ID, location=REGION)
import kfp
import sys
import uuid
import json
import pandas as pd
from google.auth import default
from datasets import load_dataset
from google.cloud import aiplatform
from vertexai.preview.language_models import TextGenerationModel, EvaluationTextSummarizationSpec
2. Prepare & Load Training Data
Replace YOUR_BUCKET with your bucket and upload the sample TRAIN.jsonl training data file to it.
json_url = 'https://storage.googleapis.com/YOUR_BUCKET/TRAIN.jsonl'
df = pd.read_json(json_url, lines=True)
print (df)
3. Fine Tune a Large Language Model
model_display_name = 'bbc-finetuned-model' # @param {type:"string"}
tuned_model = TextGenerationModel.from_pretrained("text-bison@002")
tuned_model.tune_model(
training_data=df,
train_steps=100,
tuning_job_location="europe-west4",
tuned_model_location="europe-west4",
)
The code above takes the pretrained model “text-bison@002” and tunes it with the data frame that has the training data we loaded in the previous step.
This step will take a few hours to complete. Remember you can always track the progress of the fine tuning pipeline in the pipeline job link it outputs in this step.
4. Predict with the new Fine Tuned Model
Once the fine tuning job is complete, you will be able to predict with your new model.
response = tuned_model.predict("Summarize this text to generate a title: \\n Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk. They say that the shrinking space on aeroplanes is not only uncomfortable it it's putting our health and safety in danger. More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.")
print(response.text)
Here is the output:
Predict with Base Model (text-bison@002) for comparison
base_model = TextGenerationModel.from_pretrained("text-bison@002")
response = base_model.predict("Summarize this text to generate a title: \\n Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk. They say that the shrinking space on aeroplanes is not only uncomfortable it it's putting our health and safety in danger. More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.")
print(response.text)
Here is the output:
Even though both titles generated look appropriate, the first one (generated with the fine-tuned model) is more in tune with the style of titles used in the dataset in question.
Load the fine tuned model
It might be easier to load a model that you just fine-tuned. But remember in step 3, it is invoked in the scope of the code itself so it still holds the tuned model in the variable tuned_model. But what if you want to invoke a model that was tuned in the past?
To do this, you can invoke the get_tuned_model() method on the LLM with the full ENDPOINT URL of the deployed fine tuned model from Vertex AI Model Registry.
tuned_model_1 = TextGenerationModel.get_tuned_model("projects/273845608377/locations/europe-west4/models/4220809634753019904")
print(tuned_model_1.predict("YOUR_PROMPT"))
5. Model Evaluation
This is a big topic in itself. We will reserve that topic of detailed discussion to another day. For now, we will see how we can get some evaluation metrics on the fine tuned model and compare against the base model.
Load the EVALUATION dataset:
json_url = 'https://storage.googleapis.com/YOUR_BUCKET/EVALUATE.jsonl'
df = pd.read_json(json_url, lines=True)
print (df)
Evaluate:
# Define the evaluation specification for a text summarization task on the fine tuned model
task_spec = EvaluationTextSummarizationSpec(
task_name = "summarization",
ground_truth_data=df
)
This step will take a few minutes to complete. You can track the progress using the pipeline job link in the step result. Once complete, you would be able to view the evaluation result:
rougeLSum: This is the ROUGE-L score for the summary. ROUGE-L is a recall-based metric that measures the overlap between a summary and a reference summary. It is calculated by taking the longest common subsequence (LCS) between the two summaries and dividing it by the length of the reference summary.
The rougeLSum score in the given expression is 0.36600753600753694, which means that the summary has a 36.6% overlap with the reference summary.
If you run the evaluation step on the baseline model, you will observe that the summary score is RELATIVELY higher for the Fine Tuned Model.
You can find the evaluation results in the Cloud Storage output directory that you specified when creating the evaluation job. The file is named evaluation_metrics.json. For tuned models, you can also view evaluation results in the Google Cloud console on the Vertex AI Model Registry page.
Important Considerations
Model Support: Always check the model documentation for the latest compatibility.
Rapid Development: The field of LLMs advances quickly. A newer, more powerful model could potentially outperform a fine-tuned model built on an older base. The good news is that you can apply these fine-tuning techniques to newer models when the capability becomes available.
LoRA: LoRA is a technique for efficiently fine-tuning LLMs. It does this by introducing trainable, low-rank decomposition matrices into the existing pre-trained model’s layers. Read more about it here. Instead of updating all the parameters of a massive LLM, LoRA learns smaller matrices that are added to or multiplied with the original model’s weight matrices. This significantly reduces the number of additional parameters introduced during fine-tuning.
Conclusion
Fine-tuning is a powerful technique that allows you to customize LLMs to your domain and tasks. With Vertex AI, you have the tools and resources you need to fine-tune your models efficiently and effectively. Explore the GitHub repositories and experiment with the sample code to experience fine-tuning and evaluation firsthand. Consider how fine-tuned LLMs can address your specific needs, from generating targeted marketing copy to summarizing complex documents or translating languages with cultural nuance. Utilize the comprehensive suite of tools and services offered by Vertex AI to build, train, evaluate, and deploy your fine-tuned models with ease.
Register for the upcoming season of Code Vipassana (Season 6) where we will be building these out in instructor-led virtual hands-on sessions.
Also, if you are coming to Google Cloud NEXT on 9th, 10th and 11th of April 2024 at Vegas, check this in action or just come say hi at our Innovator’s Hive End to End AI demo station!
Fine Tuning Large Language Models: How Vertex AI Takes LLMs to the Next Level was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
","author"=>"Abirami Sukumaran",
"link"=>"https://medium.com/google-cloud/fine-tuning-large-language-models-how-vertex-ai-takes-llms-to-the-next-level-3c113f4007da?source=rss----e52cf94d98af---4",
"published_date"=>Mon, 08 Apr 2024 06:27:19.000000000 UTC +00:00,
"image_url"=>nil,
"feed_url"=>"https://medium.com/google-cloud/fine-tuning-large-language-models-how-vertex-ai-takes-llms-to-the-next-level-3c113f4007da?source=rss----e52cf94d98af---4",
"language"=>nil,
"active"=>true,
"ricc_source"=>"feedjira::v1",
"created_at"=>Tue, 09 Apr 2024 12:41:33.804107000 UTC +00:00,
"updated_at"=>Mon, 21 Oct 2024 20:03:23.147932000 UTC +00:00,
"newspaper"=>"Google Cloud - Medium",
"macro_region"=>"Blogs"}