Editing article

Title

Summary

Content

My obsession with Gemma continues. Folks new to the Gemma model can revisit my previous blog <a href="https://medium.com/google-cloud/gemma-open-models-from-google-0045263e53d2">link</a>.In brief Gemma is the family of lightweight, state of the art (SOTA) open models powered by the same technology powering one of the most popular Google Cloud Gemini models.In this blog we will get started with fine tuning with Gemma with LoRA.Lets understand first a bit on fine tuning. One of the reasons finetuning is picking up is the reason Large language Models(LLMs) are not trained on specific tasks or domain related data. Primarily LLMs often called as foundational models are trained on internet scale massive corpus of data, texts etc. Doing a full training of pre-trained LLM models becomes technically challenging due to expensive computational resources as one of the major concerns.Let’s understand the benefits of Fine tuning.<ol><li>Fine Tuning pre-trained model is much faster and cost effective leading to less computational resources required.</li><li>Better Performances for domain specific tasks especially on industry use cases related to Financial services, Insurance , Healthcare etc.</li><li>Lets not forget about democratization of GenAI models for individual users i.e. developers and others who have less computational power.</li></ol>Lets understand Parameter efficient fine tuning <a href="http://a.ka">a.k.a</a>. PEFT. It’s a subset of fine tuning which effectively utilizes parameters/weights with efficient output. Instead of altering all the parameters of the model PEFT selects a subset of them thereby reducing computational and memory requirements. PEFT plays a major role in the fine tuning process thereby improving the performance of base/foundational LLMs on specific tasks. This is super useful when training LLM models like Gemini and its different variants, PALM,even open source Gemma models etc from Google.We will explore fine tuning Gemma Models with LoRA. LoRA stands for Low Rank Adaptation of Large Language Models. It’s a technique which greatly reduces the number of trainable parameters for downstream tasks by freezing the weights/parameters of the base model and introducing a small number of new weights into the model.Crucial Point to consider In LoRA, the starting point hypothesis is super important . It assumes that the pre-trained model’s weights are already close to the optimal solution for the downstream tasks.Advantages of using LoRA as fine tuning technique<ol><li>Reduces Parameter and memory footprint. LoRA significantly reduces the number of trainable parameters, making it much more memory-efficient and computationally cheaper.</li><li>Fine tuning and so does inference is faster ~ as it uses less parameters/weights.</li><li>Maintains performance: LoRA has been proved to maintain performance close to traditional fine-tuning methods in several tasks.</li></ol>So let’s get started with Fine tuning with LoRA on the Gemma Model.<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UkJdQkyIoYu3I-OH3bJYoA.jpeg" /></figure>For this demo I will be using Google Collab Notebook to get some horsepower with T4 GPUs.Step 1: Get access to GemmaTo complete this collab, you will first need to complete the setup instructions at <a href="https://ai.google.dev/gemma/docs/setup">Gemma setup</a>. The Gemma setup instructions show you how to do the following:<ul><li>Get access to Gemma on <a href="https://kaggle.com/">kaggle.com</a>.</li><li>Select a Colab runtime with sufficient resources to run the Gemma 2B model.</li><li>Generate and configure a Kaggle username and API key.</li></ul>After you’ve completed the Gemma setup, move on to the next section, where you’ll set environment variables for your Colab environment.Step 2 : Select the Runtime<figure><img alt="" src="https://cdn-images-1.medium.com/max/468/0*6QanOb6XI087IwsU" /></figure><h4>Step 3 : Configure your secrets i.e. username and key in Account tab</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/834/0*m_Kg4FsICCsvEH73" /></figure>Step 4 : Select the Data for fine tuning from hugging face. <a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k">Databricks Dolly 15k dataset</a>. This dataset contains 15,000 high-quality human-generated prompt / response pairs specifically designed for fine-tuning LLMs. Brief screenshot of the datasets<figure><img alt="" src="https://cdn-images-1.medium.com/max/997/0*wmgP3pZMciNG5Ix5" /></figure>Step 5 : Set the environment variables and run the below commands in Collabimport osfrom google.colab import userdataos.environ[“KAGGLE_USERNAME”] = userdata.get(‘username’)os.environ[“KAGGLE_KEY”] = userdata.get(‘key’)Step 6 : Install the dependencies!pip install -q -U keras-nlp!pip install -q -U keras&gt;=3Step 7 : Select the backend. You may choose from PyTorch or Tensorflow or Jaxos.environ[“KERAS_BACKEND”] = “jax”.# Avoid memory fragmentation on JAX backend.os.environ[“XLA_PYTHON_CLIENT_MEM_FRACTION”]=”1.00&quot;Step 8 : Import Packages i.e. Keras and KerasNLP.import kerasimport keras_nlpStep 9 : Load the dataset from hugging face.!wget -O databricks-dolly-15k.jsonl <a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl">https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl</a>Step 10 : For this demo purpose I will be using a subset of 1000 examples instead of 15K examples. For better fine tuning you may use more examples.import jsondata = []with open(“databricks-dolly-15k.jsonl”) as file:for line in file:features = json.loads(line)# Filter out examples with context, to keep it simple.if features[“context”]:continue# Format the entire example as a single string.template = “Instruction:\n{instruction}\n\nResponse:\n{response}”data.append(template.format(**features))# Only use 1000 training examples, to keep it fast.data = data[:1000]Step 11 : Now its time to Load the Gemma 2B base Model. You may try using the Gemma 7B base model.gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset(“gemma_2b_en”)gemma_lm.summary()You will see below summary output if everything is working fine.<figure><img alt="" src="https://cdn-images-1.medium.com/max/716/0*df55cDERngDI3ig_" /></figure>Step 11: Lets Inference the Model before fine tuning.Pass the below prompt i.e. “ What should I do on a trip to Europe?”prompt = template.format(instruction=”What should I do on a trip to Europe?”,response=””,)sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)gemma_lm.compile(sampler=sampler)print(gemma_lm.generate(prompt, max_length=256))You will see very generic blant and not so great output from the base model as mentioned below— — — — — — — — — — — — — — — — — — — — —Instruction:What should I do on a trip to Europe?Response:It’s easy, you just need to follow these steps:First you must book your trip with a travel agency.Then you must choose a country and a city.Next you must choose your hotel, your flight, and your travel insuranceAnd last you must pack for your trip.— — — — — — — — — — — — — — —Step 12: Lets fine tuning using LoRA using Databricks Dolly 15K dataset.LoRA rank. It controls the expressiveness and precision of the fine-tuning adjustments.Lower rank means which requirement of computational power and also less precision adaptation. You may start with 4,8 etc for demo/experimentation purposes.&gt;&gt; gemma_lm.backbone.enable_lora(rank=4)&gt;&gt; gemma_lm.summary()Total params: 2,507,536,384 (9.34 GB)Trainable params: 1,363,968 (5.20 MB)Non-trainable params: 2,506,172,416 (9.34 GB)While you run the below section in the collab notebook be patient as it will take some time and you will see reduction in losses.This step will reduce the number of trainable parameters significantly.Epoch = 1 means it will run for 1 time for 1000 datasets.gemma_lm.preprocessor.sequence_length = 512optimizer = keras.optimizers.AdamW( // AdamW ~ optimizer for transformer modelslearning_rate=5e-5,weight_decay=0.01,)optimizer.exclude_from_weight_decay(var_names=[“bias”, “scale”])gemma_lm.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),optimizer=optimizer,weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],)gemma_lm.fit(data, epochs=1, batch_size=1)The output from the above step will show significant reduction in loss with just 1000 datasets.<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*qf-k1hlFro2K4DUh" /></figure>Step 13: Let’s get started with Inferencing post fine tuning.Pass the below prompt again i.e. “ What should I do on a trip to Europe?”prompt = template.format(instruction=”What should I do on a trip to Europe?”,response=””,)sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)gemma_lm.compile(sampler=sampler)print(gemma_lm.generate(prompt, max_length=256))**** Let me know the results. Must be better than before finetuning.Thats’ it folks on Gemma fine tuning with LoRA. Stay tuned for more updates coming your way on QLoRA……..Signing off…. Pritam<img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5d25dbab9e0e" width="1" height="1" alt=""><hr><a href="https://medium.com/google-cloud/fine-tuning-gemma-with-lora-on-gcp-5d25dbab9e0e">Fine tuning Gemma with LoRA on GCP</a> was originally published in <a href="https://medium.com/google-cloud">Google Cloud - Community</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.

Author

Link

Published date

Image url

Feed url

Guid

Hidden blurb

--- !ruby/object:Feedjira::Parser::RSSEntry
title: Fine tuning Gemma with LoRA on GCP
published: 2024-04-16 00:04:52.000000000 Z
categories:
- google-cloud-platform
- finetune-llm
- lora
- machine-learning
- gemma
url: https://medium.com/google-cloud/fine-tuning-gemma-with-lora-on-gcp-5d25dbab9e0e?source=rss----e52cf94d98af---4
entry_id: !ruby/object:Feedjira::Parser::GloballyUniqueIdentifier
 is_perma_link: 'false'
 guid: https://medium.com/p/5d25dbab9e0e
carlessian_info:
 news_filer_version: 2
 newspaper: Google Cloud - Medium
 macro_region: Blogs
content: 'My obsession with Gemma continues. Folks new to the Gemma model can revisit
 my previous blog <a href="https://medium.com/google-cloud/gemma-open-models-from-google-0045263e53d2">link</a>.In
 brief Gemma is the family of lightweight, state of the art (SOTA) open models powered
 by the same technology powering one of the most popular Google Cloud Gemini models.In
 this blog we will get started with fine tuning with Gemma with LoRA.Lets
 understand first a bit on fine tuning. One of the reasons finetuning is picking
 up is the reason Large language Models(LLMs) are not trained on specific tasks or
 domain related data. Primarily LLMs often called as foundational models are trained
 on internet scale massive corpus of data, texts etc. Doing a full training of pre-trained
 LLM models becomes technically challenging due to expensive computational resources
 as one of the major concerns.Let’s understand the benefits of Fine tuning.<ol><li>Fine
 Tuning pre-trained model is much faster and cost effective leading to less computational
 resources required.</li><li>Better Performances for domain specific tasks especially
 on industry use cases related to Financial services, Insurance , Healthcare etc.</li><li>Lets
 not forget about democratization of GenAI models for individual users i.e. developers
 and others who have less computational power.</li></ol>Lets understand Parameter
 efficient fine tuning <a href="http://a.ka">a.k.a</a>. PEFT. It’s a subset of fine
 tuning which effectively utilizes parameters/weights with efficient output. Instead
 of altering all the parameters of the model PEFT selects a subset of them thereby
 reducing computational and memory requirements. PEFT plays a major role in the fine
 tuning process thereby improving the performance of base/foundational LLMs on specific
 tasks. This is super useful when training LLM models like Gemini and its different
 variants, PALM,even open source Gemma models etc from Google.We will explore
 fine tuning Gemma Models with LoRA. LoRA stands
 for Low Rank Adaptation of Large Language Models. It’s a technique which greatly
 reduces the number of trainable parameters for downstream tasks by freezing the
 weights/parameters of the base model and introducing a small number of new weights
 into the model.Crucial Point to consider In LoRA, the starting
 point hypothesis is super important . It assumes that the pre-trained model’s weights
 are already close to the optimal solution for the downstream tasks.Advantages
 of using LoRA as fine tuning technique<ol><li>Reduces Parameter and memory footprint.
 LoRA significantly reduces the number of trainable parameters, making it much more
 memory-efficient and computationally cheaper.</li><li>Fine tuning and so does inference
 is faster ~ as it uses less parameters/weights.</li><li>Maintains performance: LoRA
 has been proved to maintain performance close to traditional fine-tuning methods
 in several tasks.</li></ol>So let’s get started with Fine tuning with
 LoRA on the Gemma Model.<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UkJdQkyIoYu3I-OH3bJYoA.jpeg"
 /></figure>For this demo I will be using Google Collab Notebook to get some horsepower
 with T4 GPUs.Step 1: Get access to GemmaTo complete
 this collab, you will first need to complete the setup instructions at <a href="https://ai.google.dev/gemma/docs/setup">Gemma
 setup</a>. The Gemma setup instructions show you how to do the following:<ul><li>Get
 access to Gemma on <a href="https://kaggle.com/">kaggle.com</a>.</li><li>Select
 a Colab runtime with sufficient resources to run the Gemma 2B model.</li><li>Generate
 and configure a Kaggle username and API key.</li></ul>After you’ve completed
 the Gemma setup, move on to the next section, where you’ll set environment variables
 for your Colab environment.Step 2 : Select the Runtime<figure><img
 alt="" src="https://cdn-images-1.medium.com/max/468/0*6QanOb6XI087IwsU" /></figure><h4>Step
 3 : Configure your secrets i.e. username and key in Account tab</h4><figure><img
 alt="" src="https://cdn-images-1.medium.com/max/834/0*m_Kg4FsICCsvEH73" /></figure>Step
 4 : Select the Data for fine tuning from hugging face. <a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k">Databricks
 Dolly 15k dataset</a>. This dataset contains 15,000 high-quality
 human-generated prompt / response pairs specifically designed for fine-tuning LLMs.
 Brief screenshot of the datasets<figure><img alt="" src="https://cdn-images-1.medium.com/max/997/0*wmgP3pZMciNG5Ix5"
 /></figure>Step 5 : Set the environment variables and run the below commands
 in Collabimport osfrom google.colab import userdataos.environ[“KAGGLE_USERNAME”]
 = userdata.get(‘username’)os.environ[“KAGGLE_KEY”] = userdata.get(‘key’)Step
 6 : Install the dependencies!pip install -q -U keras-nlp!pip
 install -q -U keras&gt;=3Step 7 : Select the backend. You may choose
 from PyTorch or Tensorflow or Jaxos.environ[“KERAS_BACKEND”] = “jax”.#
 Avoid memory fragmentation on JAX backend.os.environ[“XLA_PYTHON_CLIENT_MEM_FRACTION”]=”1.00&quot;Step
 8 : Import Packages i.e. Keras and KerasNLP.import kerasimport
 keras_nlpStep 9 : Load the dataset from hugging face.!wget
 -O databricks-dolly-15k.jsonl <a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl">https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl</a>Step
 10 : For this demo purpose I will be using a subset of 1000 examples instead of
 15K examples. For better fine tuning you may use more examples.import
 jsondata = []with open(“databricks-dolly-15k.jsonl”) as file:for
 line in file:features = json.loads(line)# Filter out examples with
 context, to keep it simple.if features[“context”]:continue#
 Format the entire example as a single string.template = “Instruction:\n{instruction}\n\nResponse:\n{response}”data.append(template.format(**features))#
 Only use 1000 training examples, to keep it fast.data = data[:1000]Step
 11 : Now its time to Load the Gemma 2B base Model. You may try using the Gemma 7B
 base model.gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset(“gemma_2b_en”)gemma_lm.summary()You
 will see below summary output if everything is working fine.<figure><img alt=""
 src="https://cdn-images-1.medium.com/max/716/0*df55cDERngDI3ig_" /></figure>Step
 11: Lets Inference the Model before fine tuning.Pass the below prompt
 i.e. “ What should I do on a trip to Europe?”prompt = template.format(instruction=”What
 should I do on a trip to Europe?”,response=””,)sampler = keras_nlp.samplers.TopKSampler(k=5,
 seed=2)gemma_lm.compile(sampler=sampler)print(gemma_lm.generate(prompt,
 max_length=256))You will see very generic blant and not so great
 output from the base model as mentioned below— — — — — — — — — — — — — — — — — — — — —Instruction:What
 should I do on a trip to Europe?Response:It’s
 easy, you just need to follow these steps:First you must
 book your trip with a travel agency.Then you must choose
 a country and a city.Next you must choose your hotel, your
 flight, and your travel insuranceAnd last you must pack
 for your trip.— — — — — — — — — — — — — — —Step
 12: Lets fine tuning using LoRA using Databricks Dolly 15K dataset.LoRA
 rank. It controls the expressiveness and precision of the fine-tuning adjustments.Lower
 rank means which requirement of computational power and also less precision adaptation.
 You may start with 4,8 etc for demo/experimentation purposes.&gt;&gt; gemma_lm.backbone.enable_lora(rank=4)&gt;&gt;
 gemma_lm.summary()Total params: 2,507,536,384 (9.34 GB)Trainable
 params: 1,363,968 (5.20 MB)Non-trainable params: 2,506,172,416
 (9.34 GB)While you run the below section in the collab notebook
 be patient as it will take some time and you will see reduction in losses.This step
 will reduce the number of trainable parameters significantly.Epoch = 1 means it
 will run for 1 time for 1000 datasets.gemma_lm.preprocessor.sequence_length
 = 512optimizer = keras.optimizers.AdamW( // AdamW ~ optimizer for transformer
 modelslearning_rate=5e-5,weight_decay=0.01,)optimizer.exclude_from_weight_decay(var_names=[“bias”,
 “scale”])gemma_lm.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),optimizer=optimizer,weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],)gemma_lm.fit(data,
 epochs=1, batch_size=1)The output from the above step will show significant
 reduction in loss with just 1000 datasets.<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*qf-k1hlFro2K4DUh"
 /></figure>Step 13: Let’s get started with Inferencing post fine tuning.Pass
 the below prompt again i.e. “ What should I do on a trip to Europe?”prompt
 = template.format(instruction=”What should I do on a trip to Europe?”,response=””,)sampler
 = keras_nlp.samplers.TopKSampler(k=5, seed=2)gemma_lm.compile(sampler=sampler)print(gemma_lm.generate(prompt,
 max_length=256))**** Let me know the results. Must be better than
 before finetuning.Thats’ it folks on Gemma fine tuning with LoRA.
 Stay tuned for more updates coming your way on QLoRA……..Signing off…. Pritam<img
 src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5d25dbab9e0e"
 width="1" height="1" alt=""><hr><a href="https://medium.com/google-cloud/fine-tuning-gemma-with-lora-on-gcp-5d25dbab9e0e">Fine
 tuning Gemma with LoRA on GCP</a> was originally published in <a href="https://medium.com/google-cloud">Google
 Cloud - Community</a> on Medium, where people are continuing the conversation by
 highlighting and responding to this story.'
rss_fields:
- title
- published
- categories
- url
- entry_id
- content
- author
author: pritam sahoo

Language

Active

Ricc internal notes

Ricc source

Show this article Back to articles