♊️ GemiNews 🗞️
(dev)
🏡
📰 Articles
🏷️ Tags
🧠 Queries
📈 Graphs
☁️ Stats
💁🏻 Assistant
💬
🎙️
Demo 1: Embeddings + Recommendation
Demo 2: Bella RAGa
Demo 3: NewRetriever
Demo 4: Assistant function calling
Editing article
Title
Summary
Content
<p>My obsession with Gemma continues. Folks new to the Gemma model can revisit my previous blog <a href="https://medium.com/google-cloud/gemma-open-models-from-google-0045263e53d2">link</a>.</p><p>In brief Gemma is the family of lightweight, state of the art (SOTA) open models powered by the same technology powering one of the most popular Google Cloud Gemini models.</p><p>In this blog we will get started with fine tuning with Gemma with LoRA.</p><p>Lets understand first a bit on fine tuning. One of the reasons finetuning is picking up is the reason Large language Models(LLMs) are not trained on specific tasks or domain related data. Primarily LLMs often called as foundational models are trained on internet scale massive corpus of data, texts etc. Doing a full training of pre-trained LLM models becomes technically challenging due to expensive computational resources as one of the major concerns.</p><p>Let’s understand the benefits of Fine tuning.</p><ol><li>Fine Tuning pre-trained model is much faster and cost effective leading to less computational resources required.</li><li>Better Performances for domain specific tasks especially on industry use cases related to Financial services, Insurance , Healthcare etc.</li><li>Lets not forget about democratization of GenAI models for individual users i.e. developers and others who have less computational power.</li></ol><p>Lets understand Parameter efficient fine tuning <a href="http://a.ka">a.k.a</a>. PEFT. It’s a subset of fine tuning which effectively utilizes parameters/weights with efficient output. Instead of altering all the parameters of the model PEFT selects a subset of them thereby reducing computational and memory requirements. PEFT plays a major role in the fine tuning process thereby improving the performance of base/foundational LLMs on specific tasks. This is super useful when training LLM models like Gemini and its different variants, PALM,even open source Gemma models etc from Google.</p><p>We will explore fine tuning Gemma Models with <strong>LoRA</strong>. <strong>LoRA</strong> stands for Low Rank Adaptation of Large Language Models. It’s a technique which greatly reduces the number of trainable parameters for downstream tasks by freezing the weights/parameters of the base model and introducing a small number of new weights into the model.</p><p><strong>Crucial Point to consider</strong> In LoRA, the starting point hypothesis is super important . It assumes that the pre-trained model’s weights are already close to the optimal solution for the downstream tasks.</p><p>Advantages of using LoRA as fine tuning technique</p><ol><li>Reduces Parameter and memory footprint. LoRA significantly reduces the number of trainable parameters, making it much more memory-efficient and computationally cheaper.</li><li>Fine tuning and so does inference is faster ~ as it uses less parameters/weights.</li><li>Maintains performance: LoRA has been proved to maintain performance close to traditional fine-tuning methods in several tasks.</li></ol><p><strong>So let’s get started with Fine tuning with LoRA on the Gemma Model.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UkJdQkyIoYu3I-OH3bJYoA.jpeg" /></figure><p>For this demo I will be using Google Collab Notebook to get some horsepower with T4 GPUs.</p><p><strong>Step 1: Get access to Gemma</strong></p><p>To complete this collab, you will first need to complete the setup instructions at <a href="https://ai.google.dev/gemma/docs/setup">Gemma setup</a>. The Gemma setup instructions show you how to do the following:</p><ul><li>Get access to Gemma on <a href="https://kaggle.com/">kaggle.com</a>.</li><li>Select a Colab runtime with sufficient resources to run the Gemma 2B model.</li><li>Generate and configure a Kaggle username and API key.</li></ul><p>After you’ve completed the Gemma setup, move on to the next section, where you’ll set environment variables for your Colab environment.</p><p><strong>Step 2 : Select the Runtime</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/468/0*6QanOb6XI087IwsU" /></figure><h4><strong>Step 3 : Configure your secrets i.e. username and key in Account tab</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/834/0*m_Kg4FsICCsvEH73" /></figure><p><strong>Step 4 : Select the Data for fine tuning from hugging face. </strong><a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k"><strong>Databricks Dolly 15k dataset</strong></a><strong>. </strong>This dataset contains 15,000 high-quality human-generated prompt / response pairs specifically designed for fine-tuning LLMs. Brief screenshot of the datasets</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/997/0*wmgP3pZMciNG5Ix5" /></figure><p><strong>Step 5 : Set the environment variables and run the below commands in Collab</strong></p><p>import os</p><p>from google.colab import userdata</p><p>os.environ[“KAGGLE_USERNAME”] = userdata.get(‘username’)</p><p>os.environ[“KAGGLE_KEY”] = userdata.get(‘key’)</p><p><strong>Step 6 : Install the dependencies</strong></p><p>!pip install -q -U keras-nlp</p><p>!pip install -q -U keras>=3</p><p><strong>Step 7 : Select the backend. You may choose from PyTorch or Tensorflow or Jax</strong></p><p>os.environ[“KERAS_BACKEND”] = “jax”.</p><p># Avoid memory fragmentation on JAX backend.</p><p>os.environ[“XLA_PYTHON_CLIENT_MEM_FRACTION”]=”1.00"</p><p><strong>Step 8 : Import Packages i.e. Keras and KerasNLP.</strong></p><p>import keras</p><p>import keras_nlp</p><p><strong>Step 9 : Load the dataset from hugging face.</strong></p><p>!wget -O databricks-dolly-15k.jsonl <a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl">https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl</a></p><p><strong>Step 10 : For this demo purpose I will be using a subset of 1000 examples instead of 15K examples. For better fine tuning you may use more examples.</strong></p><p>import json</p><p>data = []</p><p>with open(“databricks-dolly-15k.jsonl”) as file:</p><p>for line in file:</p><p>features = json.loads(line)</p><p># Filter out examples with context, to keep it simple.</p><p>if features[“context”]:</p><p>continue</p><p># Format the entire example as a single string.</p><p>template = “Instruction:\n{instruction}\n\nResponse:\n{response}”</p><p>data.append(template.format(**features))</p><p># Only use 1000 training examples, to keep it fast.</p><p>data = data[:1000]</p><p><strong>Step 11 : Now its time to Load the Gemma 2B base Model. You may try using the Gemma 7B base model.</strong></p><p>gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset(“gemma_2b_en”)</p><p>gemma_lm.summary()</p><p>You will see below summary output if everything is working fine.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/716/0*df55cDERngDI3ig_" /></figure><p><strong>Step 11: Lets Inference the Model before fine tuning.</strong></p><p>Pass the below prompt i.e. “ What should I do on a trip to Europe?”</p><p>prompt = template.format(</p><p>instruction=”What should I do on a trip to Europe?”,</p><p>response=””,</p><p>)</p><p>sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)</p><p>gemma_lm.compile(sampler=sampler)</p><p>print(gemma_lm.generate(prompt, max_length=256))</p><p><strong>You will see very generic blant and not so great output from the base model as mentioned below</strong></p><p>— — — — — — — — — — — — — — — — — — — — —</p><p><strong>Instruction:</strong></p><p><strong>What should I do on a trip to Europe?</strong></p><p><strong>Response:</strong></p><p><strong>It’s easy, you just need to follow these steps:</strong></p><p><strong>First you must book your trip with a travel agency.</strong></p><p><strong>Then you must choose a country and a city.</strong></p><p><strong>Next you must choose your hotel, your flight, and your travel insurance</strong></p><p><strong>And last you must pack for your trip.</strong></p><p><strong>— — — — — — — — — — — — — — —</strong></p><p><strong>Step 12: Lets fine tuning using LoRA using Databricks Dolly 15K dataset.</strong></p><p>LoRA rank. It controls the expressiveness and precision of the fine-tuning adjustments.Lower rank means which requirement of computational power and also less precision adaptation. You may start with 4,8 etc for demo/experimentation purposes.</p><p>>> gemma_lm.backbone.enable_lora(rank=4)</p><p>>> gemma_lm.summary()</p><p><strong>Total params: 2,507,536,384 (9.34 GB)</strong></p><p><strong>Trainable params: 1,363,968 (5.20 MB)</strong></p><p><strong>Non-trainable params: 2,506,172,416 (9.34 GB)</strong></p><p><strong>While you run the below section in the collab notebook be patient as it will take some time and you will see reduction in losses.This step will reduce the number of trainable parameters significantly.Epoch = 1 means it will run for 1 time for 1000 datasets.</strong></p><p>gemma_lm.preprocessor.sequence_length = 512</p><p>optimizer = keras.optimizers.AdamW( // AdamW ~ optimizer for transformer models</p><p>learning_rate=5e-5,</p><p>weight_decay=0.01,</p><p>)</p><p>optimizer.exclude_from_weight_decay(var_names=[“bias”, “scale”])</p><p>gemma_lm.compile(</p><p>loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),</p><p>optimizer=optimizer,</p><p>weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],</p><p>)</p><p>gemma_lm.fit(data, epochs=1, batch_size=1)</p><p><strong>The output from the above step will show significant reduction in loss with just 1000 datasets.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*qf-k1hlFro2K4DUh" /></figure><p><strong>Step 13: Let’s get started with Inferencing post fine tuning.</strong></p><p>Pass the below prompt again i.e. “ What should I do on a trip to Europe?”</p><p>prompt = template.format(</p><p>instruction=”What should I do on a trip to Europe?”,</p><p>response=””,</p><p>)</p><p>sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)</p><p>gemma_lm.compile(sampler=sampler)</p><p>print(gemma_lm.generate(prompt, max_length=256))</p><p><strong>**** Let me know the results. Must be better than before finetuning.</strong></p><p>Thats’ it folks on Gemma fine tuning with LoRA. Stay tuned for more updates coming your way on QLoRA……..</p><p><strong>Signing off…. Pritam</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5d25dbab9e0e" width="1" height="1" alt=""><hr><p><a href="https://medium.com/google-cloud/fine-tuning-gemma-with-lora-on-gcp-5d25dbab9e0e">Fine tuning Gemma with LoRA on GCP</a> was originally published in <a href="https://medium.com/google-cloud">Google Cloud - Community</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>
Author
Link
Published date
Image url
Feed url
Guid
Hidden blurb
--- !ruby/object:Feedjira::Parser::RSSEntry title: Fine tuning Gemma with LoRA on GCP published: 2024-04-16 00:04:52.000000000 Z categories: - google-cloud-platform - finetune-llm - lora - machine-learning - gemma url: https://medium.com/google-cloud/fine-tuning-gemma-with-lora-on-gcp-5d25dbab9e0e?source=rss----e52cf94d98af---4 entry_id: !ruby/object:Feedjira::Parser::GloballyUniqueIdentifier is_perma_link: 'false' guid: https://medium.com/p/5d25dbab9e0e carlessian_info: news_filer_version: 2 newspaper: Google Cloud - Medium macro_region: Blogs content: '<p>My obsession with Gemma continues. Folks new to the Gemma model can revisit my previous blog <a href="https://medium.com/google-cloud/gemma-open-models-from-google-0045263e53d2">link</a>.</p><p>In brief Gemma is the family of lightweight, state of the art (SOTA) open models powered by the same technology powering one of the most popular Google Cloud Gemini models.</p><p>In this blog we will get started with fine tuning with Gemma with LoRA.</p><p>Lets understand first a bit on fine tuning. One of the reasons finetuning is picking up is the reason Large language Models(LLMs) are not trained on specific tasks or domain related data. Primarily LLMs often called as foundational models are trained on internet scale massive corpus of data, texts etc. Doing a full training of pre-trained LLM models becomes technically challenging due to expensive computational resources as one of the major concerns.</p><p>Let’s understand the benefits of Fine tuning.</p><ol><li>Fine Tuning pre-trained model is much faster and cost effective leading to less computational resources required.</li><li>Better Performances for domain specific tasks especially on industry use cases related to Financial services, Insurance , Healthcare etc.</li><li>Lets not forget about democratization of GenAI models for individual users i.e. developers and others who have less computational power.</li></ol><p>Lets understand Parameter efficient fine tuning <a href="http://a.ka">a.k.a</a>. PEFT. It’s a subset of fine tuning which effectively utilizes parameters/weights with efficient output. Instead of altering all the parameters of the model PEFT selects a subset of them thereby reducing computational and memory requirements. PEFT plays a major role in the fine tuning process thereby improving the performance of base/foundational LLMs on specific tasks. This is super useful when training LLM models like Gemini and its different variants, PALM,even open source Gemma models etc from Google.</p><p>We will explore fine tuning Gemma Models with <strong>LoRA</strong>. <strong>LoRA</strong> stands for Low Rank Adaptation of Large Language Models. It’s a technique which greatly reduces the number of trainable parameters for downstream tasks by freezing the weights/parameters of the base model and introducing a small number of new weights into the model.</p><p><strong>Crucial Point to consider</strong> In LoRA, the starting point hypothesis is super important . It assumes that the pre-trained model’s weights are already close to the optimal solution for the downstream tasks.</p><p>Advantages of using LoRA as fine tuning technique</p><ol><li>Reduces Parameter and memory footprint. LoRA significantly reduces the number of trainable parameters, making it much more memory-efficient and computationally cheaper.</li><li>Fine tuning and so does inference is faster ~ as it uses less parameters/weights.</li><li>Maintains performance: LoRA has been proved to maintain performance close to traditional fine-tuning methods in several tasks.</li></ol><p><strong>So let’s get started with Fine tuning with LoRA on the Gemma Model.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UkJdQkyIoYu3I-OH3bJYoA.jpeg" /></figure><p>For this demo I will be using Google Collab Notebook to get some horsepower with T4 GPUs.</p><p><strong>Step 1: Get access to Gemma</strong></p><p>To complete this collab, you will first need to complete the setup instructions at <a href="https://ai.google.dev/gemma/docs/setup">Gemma setup</a>. The Gemma setup instructions show you how to do the following:</p><ul><li>Get access to Gemma on <a href="https://kaggle.com/">kaggle.com</a>.</li><li>Select a Colab runtime with sufficient resources to run the Gemma 2B model.</li><li>Generate and configure a Kaggle username and API key.</li></ul><p>After you’ve completed the Gemma setup, move on to the next section, where you’ll set environment variables for your Colab environment.</p><p><strong>Step 2 : Select the Runtime</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/468/0*6QanOb6XI087IwsU" /></figure><h4><strong>Step 3 : Configure your secrets i.e. username and key in Account tab</strong></h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/834/0*m_Kg4FsICCsvEH73" /></figure><p><strong>Step 4 : Select the Data for fine tuning from hugging face. </strong><a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k"><strong>Databricks Dolly 15k dataset</strong></a><strong>. </strong>This dataset contains 15,000 high-quality human-generated prompt / response pairs specifically designed for fine-tuning LLMs. Brief screenshot of the datasets</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/997/0*wmgP3pZMciNG5Ix5" /></figure><p><strong>Step 5 : Set the environment variables and run the below commands in Collab</strong></p><p>import os</p><p>from google.colab import userdata</p><p>os.environ[“KAGGLE_USERNAME”] = userdata.get(‘username’)</p><p>os.environ[“KAGGLE_KEY”] = userdata.get(‘key’)</p><p><strong>Step 6 : Install the dependencies</strong></p><p>!pip install -q -U keras-nlp</p><p>!pip install -q -U keras>=3</p><p><strong>Step 7 : Select the backend. You may choose from PyTorch or Tensorflow or Jax</strong></p><p>os.environ[“KERAS_BACKEND”] = “jax”.</p><p># Avoid memory fragmentation on JAX backend.</p><p>os.environ[“XLA_PYTHON_CLIENT_MEM_FRACTION”]=”1.00"</p><p><strong>Step 8 : Import Packages i.e. Keras and KerasNLP.</strong></p><p>import keras</p><p>import keras_nlp</p><p><strong>Step 9 : Load the dataset from hugging face.</strong></p><p>!wget -O databricks-dolly-15k.jsonl <a href="https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl">https://huggingface.co/datasets/databricks/databricks-dolly-15k/resolve/main/databricks-dolly-15k.jsonl</a></p><p><strong>Step 10 : For this demo purpose I will be using a subset of 1000 examples instead of 15K examples. For better fine tuning you may use more examples.</strong></p><p>import json</p><p>data = []</p><p>with open(“databricks-dolly-15k.jsonl”) as file:</p><p>for line in file:</p><p>features = json.loads(line)</p><p># Filter out examples with context, to keep it simple.</p><p>if features[“context”]:</p><p>continue</p><p># Format the entire example as a single string.</p><p>template = “Instruction:\n{instruction}\n\nResponse:\n{response}”</p><p>data.append(template.format(**features))</p><p># Only use 1000 training examples, to keep it fast.</p><p>data = data[:1000]</p><p><strong>Step 11 : Now its time to Load the Gemma 2B base Model. You may try using the Gemma 7B base model.</strong></p><p>gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset(“gemma_2b_en”)</p><p>gemma_lm.summary()</p><p>You will see below summary output if everything is working fine.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/716/0*df55cDERngDI3ig_" /></figure><p><strong>Step 11: Lets Inference the Model before fine tuning.</strong></p><p>Pass the below prompt i.e. “ What should I do on a trip to Europe?”</p><p>prompt = template.format(</p><p>instruction=”What should I do on a trip to Europe?”,</p><p>response=””,</p><p>)</p><p>sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)</p><p>gemma_lm.compile(sampler=sampler)</p><p>print(gemma_lm.generate(prompt, max_length=256))</p><p><strong>You will see very generic blant and not so great output from the base model as mentioned below</strong></p><p>— — — — — — — — — — — — — — — — — — — — —</p><p><strong>Instruction:</strong></p><p><strong>What should I do on a trip to Europe?</strong></p><p><strong>Response:</strong></p><p><strong>It’s easy, you just need to follow these steps:</strong></p><p><strong>First you must book your trip with a travel agency.</strong></p><p><strong>Then you must choose a country and a city.</strong></p><p><strong>Next you must choose your hotel, your flight, and your travel insurance</strong></p><p><strong>And last you must pack for your trip.</strong></p><p><strong>— — — — — — — — — — — — — — —</strong></p><p><strong>Step 12: Lets fine tuning using LoRA using Databricks Dolly 15K dataset.</strong></p><p>LoRA rank. It controls the expressiveness and precision of the fine-tuning adjustments.Lower rank means which requirement of computational power and also less precision adaptation. You may start with 4,8 etc for demo/experimentation purposes.</p><p>>> gemma_lm.backbone.enable_lora(rank=4)</p><p>>> gemma_lm.summary()</p><p><strong>Total params: 2,507,536,384 (9.34 GB)</strong></p><p><strong>Trainable params: 1,363,968 (5.20 MB)</strong></p><p><strong>Non-trainable params: 2,506,172,416 (9.34 GB)</strong></p><p><strong>While you run the below section in the collab notebook be patient as it will take some time and you will see reduction in losses.This step will reduce the number of trainable parameters significantly.Epoch = 1 means it will run for 1 time for 1000 datasets.</strong></p><p>gemma_lm.preprocessor.sequence_length = 512</p><p>optimizer = keras.optimizers.AdamW( // AdamW ~ optimizer for transformer models</p><p>learning_rate=5e-5,</p><p>weight_decay=0.01,</p><p>)</p><p>optimizer.exclude_from_weight_decay(var_names=[“bias”, “scale”])</p><p>gemma_lm.compile(</p><p>loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),</p><p>optimizer=optimizer,</p><p>weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],</p><p>)</p><p>gemma_lm.fit(data, epochs=1, batch_size=1)</p><p><strong>The output from the above step will show significant reduction in loss with just 1000 datasets.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*qf-k1hlFro2K4DUh" /></figure><p><strong>Step 13: Let’s get started with Inferencing post fine tuning.</strong></p><p>Pass the below prompt again i.e. “ What should I do on a trip to Europe?”</p><p>prompt = template.format(</p><p>instruction=”What should I do on a trip to Europe?”,</p><p>response=””,</p><p>)</p><p>sampler = keras_nlp.samplers.TopKSampler(k=5, seed=2)</p><p>gemma_lm.compile(sampler=sampler)</p><p>print(gemma_lm.generate(prompt, max_length=256))</p><p><strong>**** Let me know the results. Must be better than before finetuning.</strong></p><p>Thats’ it folks on Gemma fine tuning with LoRA. Stay tuned for more updates coming your way on QLoRA……..</p><p><strong>Signing off…. Pritam</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=5d25dbab9e0e" width="1" height="1" alt=""><hr><p><a href="https://medium.com/google-cloud/fine-tuning-gemma-with-lora-on-gcp-5d25dbab9e0e">Fine tuning Gemma with LoRA on GCP</a> was originally published in <a href="https://medium.com/google-cloud">Google Cloud - Community</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>' rss_fields: - title - published - categories - url - entry_id - content - author author: pritam sahoo
Language
Active
Ricc internal notes
Imported via /Users/ricc/git/gemini-news-crawler/webapp/db/seeds.d/import-feedjira.rb on 2024-04-16 21:08:50 +0200. Content is EMPTY here. Entried: title,published,categories,url,entry_id,content,author. TODO add Newspaper: filename = /Users/ricc/git/gemini-news-crawler/webapp/db/seeds.d/../../../crawler/out/feedjira/Blogs/Google Cloud - Medium/2024-04-16-Fine_tuning_Gemma_with_LoRA_on_GCP-v2.yaml
Ricc source
Show this article
Back to articles