♊️ GemiNews 🗞️ (dev)

Demo 1: Embeddings + Recommendation Demo 2: Bella RAGa Demo 3: NewRetriever Demo 4: Assistant function calling

🗞️Hey Gemini, explain me these pictures — in bash ‍

🗿Semantically Similar Articles (by :title_embedding)

Hey Gemini, explain me these pictures — in bash ‍

2023-12-14 - Riccardo Carlesso (from Riccardo Carlesso - Medium)

Hey Gemini, explain me these pictures — in bash 🧑🏽‍💻Gemini is finally out!And… it’s multimodal! Which means, you can use both images and text to ask for questions (the output is only text — for now 😉).Note: all the code snippets (and images) in here are available in my GitHub repo.Let’s get started!Hey Gemini, describe what you see..Let’s start by an album by my favourite band:One of my favourite albums of all time — will Gemini recognize it?$ ./gemini-generic.sh images/genesis-selling-england.jpg Describe what you see# 🤌 QUESTION: Describe what you see# 🌡️ TEMPERATURE: 0.2 # 👀 Examining image images/genesis-selling-england.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, baseline, precision 8, 536x528, components 3. # ♊ Gemini no Saga answer for you:The cover of Genesis album Selling England by the Pound features a painting by British artist Paul Whitehead. The painting depicts a group of people in a park, with a man sleeping on a bench in the foreground. The people are allwearing clothes from the 1920s or 1930s, and the painting has a nostalgic, almost surreal feel to it. The colors are muted and the figures are slightly blurred, which gives the painting a dreamlike quality. The painting is also full of symbolism, with the sleeping man representing England and the people around him representing the different aspects of English society. The paintinghas been interpreted in many different ways, but it is generally seen as acommentary on the state of England in the 1970s.♊ The cover of Genesis’ album Selling England by the Pound features a painting by British artist Paul Whitehead. The painting depicts a group of people in a park, with a man sleeping on a bench in the foreground. The people are all wearing clothes from the 1920s or 1930s, and the painting has a nostalgic, almost surreal feel to it. The colors are muted and the figures are slightly blurred, which gives the painting a dreamlike quality. The painting is also full of symbolism, with the sleeping man representing England and the people around him representing the different aspects of English society. The painting has been interpreted in many different ways, but it is generally seen as a commentary on the state of England in the 1970s.A quick googling confirms that Paul Whitehead actually covered one of my favourite album of all times. If you love Genesis too and want to see me play Firth of Fifth, please check my youtube video.Let’s try this cat now:This is a cat from Google sample imagesGemini: A cat is walking in the snow. The cat is brown and white and has a long tail. The cat is looking at the camera. The snow is white and is covering the ground.How did I do it?If you check my bash script (gemini-generic.sh), you can see the request looks like this:data=$(_base64_encode_mac_or_linux "$IMAGE") # Mac or Linux should both work!# ...cat > "$REQUEST_FILE" <<EOF{'contents': { 'role': 'USER', 'parts': [ {'text': '$QUESTION'}, {'inline_data': { 'data': '$data', 'mime_type':'image/jpeg'}}] }}EOFThe hardest part is actually the base64conversion, and making it work both on my Mac and on my Linux (as our grand-parents say in Italy “You wanted /bin/bash.? Now you pedal!”).An Italian image, explained in ItalianThe city of XXX, in Italy. Seen from the YYY palace (surprise!). Picture from my colleague Daria../gemini-explain-image.sh images/italian-town.jpg[..] This is a view of the city of Trento, Italy from the Buonconsiglio Castle.Gemini (🇬🇧): This is a view of the city of Trento, Italy from the Buonconsiglio Castle.Let’s ask the same in Italian:$ GENERATE_MP3=true ./gemini-explain-image-italian.sh images/italian-town.jpg# 🤌 QUESTION: Descrivimi cosa vedi in questa immagine# 🌡️ TEMPERATURE: 0.2# 👀 Examining image images/italian-town.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=1, software=Google], baseline, precision 8, 926x1230, components 3.# ♊ Gemini no Saga answer for you: La foto mostra una loggia con delle colonne in pietra che incorniciano la vista di una città. La città è circondata da montagne e si possono vedere i tetti delle case e le torri delle chiese. Il cielo è azzurro e ci sono delle nuvole bianche.# TTS_LANG: it-ITWritten .tmp.tts-output.json. curl_ret=0t.audio.encoded: ASCII text, with very long lines (65536), with no line terminatorst.mp3: MPEG ADTS, layer III, v2, 32 kbps, 24 kHz, Monauralt.mp3: MPEG ADTS, layer III, v2, 32 kbps, 24 kHz, MonauralAll good. MP3 created: 't.La foto mostra una loggia con delle colonne in pie.mp3'Gemini (🇮🇹): La foto mostra una loggia con delle colonne in pietra che incorniciano la vista di una città. La città è circondata da montagne e si possono vedere i tetti delle case e le torri delle chiese. Il cielo è azzurro e ci sono delle nuvole bianche.As you see, Italian is more verbose (my wife says the same about me!) but it’s not aware of the Buonconsiglio Palace. Interesting! I presume the Italian model has less training material to learn from than the English one.Makes sense.Btw, I highly recommend Trento, I was cycling around there: great views and great wines!Comparing two imagesHow about comparing two images?Gemini constellation vs Saint Seiya’s Gemini saint.Some sample code is in my Makefile for convenience. try this:ricc@derek:~/git/genai-googlecloud-scripts/07-bash-gemini-multimodal-jam$ 🐼 !makmake compare-two-geminis ./gemini-generic-two-pics.sh images/gemini-constellation.png images/saga-blue-hair.jpg♊️ Question: Can you highlight similarity and differences between the two? Also, do you recognize the same person in both of them? 👀 Examining image1 images/gemini-constellation.png: images/gemini-constellation.png: PNG image data, 1675 x 1302, 8-bit/color RGBA, non-interlaced. 👀 Examining image2 images/saga-blue-hair.jpg: images/saga-blue-hair.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 193x261, components 3. ♊️ Describing attached image: The two images are of the constellation Gemini and the anime character Gemini Saga. The constellation is said to represent the twins Castor and Pollux, while the character Gemini Saga is a Gemini Saint in the anime series Saint Seiya. Both images depict two figures that are connected to each other. The constellation is made up of stars, while the character is a human.♊ Response: ️ The two images are of the constellation Gemini and the anime character Gemini Saga. The constellation is said to represent the twins Castor and Pollux, while the character Gemini Saga is a Gemini Saint in the anime series Saint Seiya. Both images depict two figures that are connected to each other. The constellation is made up of stars, while the character is a human.Today we’ve learnt something we certainly didn’t know: a constellation is made of stars, while the guy on the right is human 😃Introducing Audio!Why don’t we throw some audio in the mix?I’ve added a tts.sh , which creates an MP3 out of an english (or Italian!) text given in ARGV. Convenient uh?My kids beside American Santa Klaus (we live in Zurich, there’s a Santa for every language and ethnicity!)A few weeks ago my kids went to visit Santa Klaus.. the American Santa — they’ll tell you there’s another who speaks German instead, he’s called Samiklaus, they still can’t decide if its the same guy — my kids are X and Y year old; how old? Let’s ask Gemini!$ make age-test# equivalent to:$ GENERATE_MP3=true ./gemini-generic.sh images/ricc-family-with-santa.jpg Tell me the age of the people you see, from left to right.# 🤌 QUESTION: Tell me the age of the people you see, from left to right.# 🌡️ TEMPERATURE: 0.2# 👀 Examining image images/ricc-family-with-santa.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=3, software=Google], baseline, precision 8, 1164x826, components 3.# ♊ Gemini no Saga answer for you:1. 30-352. 2-33. 40-454. 2-35. 60-65[..]All good. MP3 created [..]Now I’m older than 40–45 (46) so technically speaking this is a mistake, but as a human interaction I will consider it as a success 😉Now this is not the best of MP3s, but it works! Check out ricc-family-with-santa.jpg.mp3.You can also find Trento in Italian here: italian-town.jpg.mp3.Something useful now: understand a diagram!A colleague told me yesterday: the power of Gemini is that not only “he” understands text and images, also the very complex relationship between items in a diagram. This morning my Accrux earphones wouldn’t work, so I decided to understand how they work. I downloaded the User Manual, uploaded to Gemini as image, and asked Gemini a few questions.I’m not so young anymore so I can’t read this unless it’s printed on A3. And I respect Amazon too much!From UI, this is easy peasy. Upload image and ask question:And what about bash? Let’s do the same from CLI:$ make read-instruction-manual-for-me [..]./gemini-generic.sh images/instruction-manuals/Acrux-User-Manual-4700503.png '1. How do i TURN it on? 2. Where is the power button located? 3. Is this the one called ANC?'# 🤌 QUESTION: 1. How do i TURN it on? 2. Where is the power button located? 3. Is this the one called ANC?# 🌡️ TEMPERATURE: 0.2 # 👀 Examining image images/instruction-manuals/Acrux-User-Manual-4700503.png: PNG image data, 1664 x 929, 8-bit/color RGBA, non-interlaced. # ♊ Gemini no Saga answer for you:1. Long press the power button for 2 seconds.2. The power button is located on the right earcup.3. Yes, this is the one called ANC.# Note: No mp3 file generated (use GENERATE_MP3=true to generate one)My question (I actually asked the questions one by one, so think of it as an avalanche-chat):How do i TURN it on?Where is the power button located?Is this the one called ANC?# ♊ Gemini answer:1. Long press the power button for 2 seconds.2. The power button is located on the right earcup.3. Yes, this is the one called ANC.Bingo!An unexpected Games of Thrones plot twistThis is the avatar I use in Google. I’m really curious to see what gemini can infer from here, also because I used some filters.Riccardo in the Amsterdam office, a few years/kilos ago.$ ./gemini-explain-image.sh images/ricc-logo.png[..]This is a photo of a man standing behind a Google Cloud Platform cutout.The man is smiling and wearing a shirt that says, “That’s what I do,I drink and I know things.” The background is a brick wall with blueand white accents.♊ Gemini: This is a photo of a man standing behind a Google Cloud Platform cutout. The man is smiling and wearing a shirt that says, “That’s what I do,I drink and I know things.” The background is a brick wall with blueand white accents.And I thought! Of course, this is my favourite Games of Thrones t-shirt. Let’s see if he knows:Q: Do you recognize the quote in this person tshirt?$ GENERATE_MP3=true ./gemini-generic.sh images/ricc-logo.png Do you recognize the quote in this person tshirt[..] "That's what I do, I drink and I know things" is a quote from the TV show Game of Thrones, said by the character Tyrion Lannister.♊ Response: “That’s what I do, I drink and I know things” is a quote from the TV show Game of Thrones, said by the character Tyrion Lannister.And you also have the MP3 if you wish: images/ricc-logo.png.mp3Chapeau, Gemini!ConclusionsCalling Gemini is easy, fun and powerful!Easy. If you can invoke Gemini via curl, imagine how easy it is to invoke it with your favourite language! Try out our SDK in Python, Node, Java, Go. For everything else there’s REST/cURL!Fun. look at how many use cases I’ve shown in this page: comparing two images, asking questions abuot an image, a writing, age of people, .. possibilities are endless.Powerful. Multi-modal is a game changer. And for what I hear, more modes are coming. 😉Did you like this? Want more of this? Write it in the comments!RiccardoHey Gemini, explain me these pictures — in bash 🧑🏽‍💻 was originally published in DevOps.dev on Medium, where people are continuing the conversation by highlighting and responding to this story.

[Blogs] 🌎 https://blog.devops.dev/hey-gemini-explain-me-these-pictures-in-bash-06c03d0d0512?source=rss-b5293b96912f------2 [🧠] [v2] article_embedding_description: {:llm_project_id=>"Unavailable", :llm_dimensions=>nil, :article_size=>16677, :llm_embeddings_model_name=>"textembedding-gecko"}
[🧠] [v1/3] title_embedding_description: {:ricc_notes=>"[embed-v3] Fixed on 9oct24. Only seems incompatible at first glance with embed v1.", :llm_project_id=>"unavailable possibly not using Vertex", :llm_dimensions=>nil, :article_size=>16677, :poly_field=>"title", :llm_embeddings_model_name=>"textembedding-gecko"}
[🧠] [v1/3] summary_embedding_description:
[🧠] As per bug https://github.com/palladius/gemini-news-crawler/issues/4 we can state this article belongs to titile/summary version: v3 (very few articles updated on 9oct24)

🗿article.to_s

------------------------------
Title: Hey Gemini, explain me these pictures — in bash ‍
[content]
Hey Gemini, explain me these pictures — in bash 🧑🏽‍💻Gemini is finally out!And… it’s multimodal! Which means, you can use both images and text to ask for questions (the output is only text — for now 😉).Note: all the code snippets (and images) in here are available in my GitHub repo.Let’s get started!Hey Gemini, describe what you see..Let’s start by an album by my favourite band:One of my favourite albums of all time — will Gemini recognize it?$ ./gemini-generic.sh images/genesis-selling-england.jpg Describe what you see# 🤌  QUESTION: Describe what you see# 🌡️ TEMPERATURE: 0.2 # 👀 Examining image images/genesis-selling-england.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, baseline, precision 8, 536x528, components 3. # ♊ Gemini no Saga answer for you:The cover of Genesis album Selling England by the Pound features a painting by British artist Paul Whitehead. The painting depicts a group of people in a park, with a man sleeping on a bench in the foreground. The people are allwearing clothes from the 1920s or 1930s, and the painting has a nostalgic, almost surreal feel to it. The colors are muted and the figures are slightly blurred, which gives the painting a dreamlike quality. The painting is also full of symbolism, with the sleeping man representing England and the people around him representing the different aspects of English society. The paintinghas been interpreted in many different ways, but it is generally seen as acommentary on the state of England in the 1970s.♊ The cover of Genesis’ album Selling England by the Pound features a painting by British artist Paul Whitehead. The painting depicts a group of people in a park, with a man sleeping on a bench in the foreground. The people are all wearing clothes from the 1920s or 1930s, and the painting has a nostalgic, almost surreal feel to it. The colors are muted and the figures are slightly blurred, which gives the painting a dreamlike quality. The painting is also full of symbolism, with the sleeping man representing England and the people around him representing the different aspects of English society. The painting has been interpreted in many different ways, but it is generally seen as a commentary on the state of England in the 1970s.A quick googling confirms that Paul Whitehead actually covered one of my favourite album of all times. If you love Genesis too and want to see me play Firth of Fifth, please check my youtube video.Let’s try this cat now:This is a cat from Google sample imagesGemini: A cat is walking in the snow. The cat is brown and white and has a long tail. The cat is looking at the camera. The snow is white and is covering the ground.How did I do it?If you check my bash script (gemini-generic.sh), you can see the request looks like this:data=$(_base64_encode_mac_or_linux "$IMAGE") # Mac or Linux should both work!# ...cat > "$REQUEST_FILE" <<EOF{'contents': {      'role': 'USER',      'parts': [        {'text': '$QUESTION'},        {'inline_data': {            'data': '$data',            'mime_type':'image/jpeg'}}]    }}EOFThe hardest part is actually the base64conversion, and making it work both on my Mac and on my Linux (as our grand-parents say in Italy “You wanted /bin/bash.? Now you pedal!”).An Italian image, explained in ItalianThe city of XXX, in Italy. Seen from the YYY palace (surprise!). Picture from my colleague Daria../gemini-explain-image.sh images/italian-town.jpg[..] This is a view of the city of Trento, Italy from the Buonconsiglio Castle.Gemini (🇬🇧): This is a view of the city of Trento, Italy from the Buonconsiglio Castle.Let’s ask the same in Italian:$ GENERATE_MP3=true ./gemini-explain-image-italian.sh images/italian-town.jpg# 🤌  QUESTION: Descrivimi cosa vedi in questa immagine# 🌡️  TEMPERATURE: 0.2# 👀 Examining image images/italian-town.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=1, software=Google], baseline, precision 8, 926x1230, components 3.# ♊ Gemini no Saga answer for you: La foto mostra una loggia con delle colonne in pietra che incorniciano la vista di una città. La città è circondata da montagne e si possono vedere i tetti delle case e le torri delle chiese. Il cielo è azzurro e ci sono delle nuvole bianche.# TTS_LANG: it-ITWritten .tmp.tts-output.json. curl_ret=0t.audio.encoded: ASCII text, with very long lines (65536), with no line terminatorst.mp3:           MPEG ADTS, layer III, v2,  32 kbps, 24 kHz, Monauralt.mp3: MPEG ADTS, layer III, v2,  32 kbps, 24 kHz, MonauralAll good. MP3 created: 't.La foto mostra una loggia con delle colonne in pie.mp3'Gemini (🇮🇹): La foto mostra una loggia con delle colonne in pietra che incorniciano la vista di una città. La città è circondata da montagne e si possono vedere i tetti delle case e le torri delle chiese. Il cielo è azzurro e ci sono delle nuvole bianche.As you see, Italian is more verbose (my wife says the same about me!) but it’s not aware of the Buonconsiglio Palace. Interesting! I presume the Italian model has less training material to learn from than the English one.Makes sense.Btw, I highly recommend Trento, I was cycling around there: great views and great wines!Comparing two imagesHow about comparing two images?Gemini constellation vs Saint Seiya’s Gemini saint.Some sample code is in my Makefile for convenience. try this:ricc@derek:~/git/genai-googlecloud-scripts/07-bash-gemini-multimodal-jam$ 🐼 !makmake compare-two-geminis ./gemini-generic-two-pics.sh  images/gemini-constellation.png   images/saga-blue-hair.jpg♊️ Question: Can you highlight similarity and differences between the two? Also, do you recognize the same person in both of them? 👀 Examining image1 images/gemini-constellation.png: images/gemini-constellation.png: PNG image data, 1675 x 1302, 8-bit/color RGBA, non-interlaced.  👀 Examining image2 images/saga-blue-hair.jpg: images/saga-blue-hair.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 193x261, components 3. ♊️ Describing attached image: The two images are of the constellation Gemini and the anime character Gemini Saga. The constellation is said to represent the twins Castor and Pollux, while the character Gemini Saga is a Gemini Saint in the anime series Saint Seiya. Both images depict two figures that are connected to each other. The constellation is made up of stars, while the character is a human.♊ Response: ️ The two images are of the constellation Gemini and the anime character Gemini Saga. The constellation is said to represent the twins Castor and Pollux, while the character Gemini Saga is a Gemini Saint in the anime series Saint Seiya. Both images depict two figures that are connected to each other. The constellation is made up of stars, while the character is a human.Today we’ve learnt something we certainly didn’t know: a constellation is made of stars, while the guy on the right is human 😃Introducing Audio!Why don’t we throw some audio in the mix?I’ve added a tts.sh , which creates an MP3 out of an english (or Italian!) text given in ARGV. Convenient uh?My kids beside American Santa Klaus (we live in Zurich, there’s a Santa for every language and ethnicity!)A few weeks ago my kids went to visit Santa Klaus.. the American Santa — they’ll tell you there’s another who speaks German instead, he’s called Samiklaus, they still can’t decide if its the same guy — my kids are X and Y year old; how old? Let’s ask Gemini!$ make age-test# equivalent to:$ GENERATE_MP3=true ./gemini-generic.sh images/ricc-family-with-santa.jpg Tell me the age of the people you see, from left to right.# 🤌  QUESTION: Tell me the age of the people you see, from left to right.# 🌡️  TEMPERATURE: 0.2# 👀 Examining image images/ricc-family-with-santa.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=3, software=Google], baseline, precision 8, 1164x826, components 3.# ♊ Gemini no Saga answer for you:1. 30-352. 2-33. 40-454. 2-35. 60-65[..]All good. MP3 created [..]Now I’m older than 40–45 (46) so technically speaking this is a mistake, but as a human interaction I will consider it as a success 😉Now this is not the best of MP3s, but it works! Check out ricc-family-with-santa.jpg.mp3.You can also find Trento in Italian here: italian-town.jpg.mp3.Something useful now: understand a diagram!A colleague told me yesterday: the power of Gemini is that not only “he” understands text and images, also the very complex relationship between items in a diagram. This morning my Accrux earphones wouldn’t work, so I decided to understand how they work. I downloaded the User Manual, uploaded to Gemini as image, and asked Gemini a few questions.I’m not so young anymore so I can’t read this unless it’s printed on A3. And I respect Amazon too much!From UI, this is easy peasy. Upload image and ask question:And what about bash? Let’s do the same from CLI:$ make read-instruction-manual-for-me [..]./gemini-generic.sh images/instruction-manuals/Acrux-User-Manual-4700503.png '1. How do i TURN it on? 2. Where is the power button located? 3. Is this the one called ANC?'# 🤌  QUESTION: 1. How do i TURN it on? 2. Where is the power button located? 3. Is this the one called ANC?# 🌡️  TEMPERATURE: 0.2 # 👀 Examining image images/instruction-manuals/Acrux-User-Manual-4700503.png: PNG image data, 1664 x 929, 8-bit/color RGBA, non-interlaced. # ♊ Gemini no Saga answer for you:1. Long press the power button for 2 seconds.2. The power button is located on the right earcup.3. Yes, this is the one called ANC.# Note: No mp3 file generated (use GENERATE_MP3=true to generate one)My question (I actually asked the questions one by one, so think of it as an avalanche-chat):How do i TURN it on?Where is the power button located?Is this the one called ANC?# ♊ Gemini answer:1. Long press the power button for 2 seconds.2. The power button is located on the right earcup.3. Yes, this is the one called ANC.Bingo!An unexpected Games of Thrones plot twistThis is the avatar I use in Google. I’m really curious to see what gemini can infer from here, also because I used some filters.Riccardo in the Amsterdam office, a few years/kilos ago.$ ./gemini-explain-image.sh images/ricc-logo.png[..]This is a photo of a man standing behind a Google Cloud Platform cutout.The man is smiling and wearing a shirt that says, “That’s what I do,I drink and I know things.” The background is a brick wall with blueand white accents.♊ Gemini: This is a photo of a man standing behind a Google Cloud Platform cutout. The man is smiling and wearing a shirt that says, “That’s what I do,I drink and I know things.” The background is a brick wall with blueand white accents.And I thought! Of course, this is my favourite Games of Thrones t-shirt. Let’s see if he knows:Q: Do you recognize the quote in this person tshirt?$ GENERATE_MP3=true ./gemini-generic.sh images/ricc-logo.png Do you recognize the quote in this person tshirt[..] "That's what I do, I drink and I know things" is a quote from the TV show Game of Thrones, said by the character Tyrion Lannister.♊ Response: “That’s what I do, I drink and I know things” is a quote from the TV show Game of Thrones, said by the character Tyrion Lannister.And you also have the MP3 if you wish: images/ricc-logo.png.mp3Chapeau, Gemini!ConclusionsCalling Gemini is easy, fun and powerful!Easy. If you can invoke Gemini via curl, imagine how easy it is to invoke it with your favourite language! Try out our SDK in Python, Node, Java, Go. For everything else there’s REST/cURL!Fun. look at how many use cases I’ve shown in this page: comparing two images, asking questions abuot an image, a writing, age of people, .. possibilities are endless.Powerful. Multi-modal is a game changer. And for what I hear, more modes are coming. 😉Did you like this? Want more of this? Write it in the comments!RiccardoHey Gemini, explain me these pictures — in bash 🧑🏽‍💻 was originally published in DevOps.dev on Medium, where people are continuing the conversation by highlighting and responding to this story.
[/content]

Author: Riccardo Carlesso
PublishedDate: 2023-12-14
Category: Blogs
NewsPaper: Riccardo Carlesso - Medium
Tags: bash, gemini, curl, genai, google
{"id"=>202,
"title"=>"Hey Gemini, explain me these pictures — in bash ‍",
"summary"=>nil,
"content"=>"

Hey Gemini, explain me these pictures — in bash 🧑🏽‍💻

Gemini is finally out!

And… it’s multimodal! Which means, you can use both images and text to ask for questions (the output is only text — for now 😉).

Note: all the code snippets (and images) in here are available in my GitHub repo.

Let’s get started!

Hey Gemini, describe what you see..

Let’s start by an album by my favourite band:

\"\"
One of my favourite albums of all time — will Gemini recognize it?
$ ./gemini-generic.sh images/genesis-selling-england.jpg Describe what you see
# 🤌 QUESTION: Describe what you see
# 🌡️ TEMPERATURE: 0.2
# 👀 Examining image images/genesis-selling-england.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, baseline, precision 8, 536x528, components 3.
# ♊ Gemini no Saga answer for you:
The cover of Genesis album Selling England by the Pound features a painting
by British artist Paul Whitehead. The painting depicts a group of people in
a park, with a man sleeping on a bench in the foreground. The people are all
wearing clothes from the 1920s or 1930s, and the painting has a nostalgic,
almost surreal feel to it. The colors are muted and the figures are slightly
blurred, which gives the painting a dreamlike quality. The painting is also
full of symbolism, with the sleeping man representing England and the people
around him representing the different aspects of English society. The painting
has been interpreted in many different ways, but it is generally seen as a
commentary on the state of England in the 1970s.

♊ The cover of Genesis’ album Selling England by the Pound features a painting by British artist Paul Whitehead. The painting depicts a group of people in a park, with a man sleeping on a bench in the foreground. The people are all wearing clothes from the 1920s or 1930s, and the painting has a nostalgic, almost surreal feel to it. The colors are muted and the figures are slightly blurred, which gives the painting a dreamlike quality. The painting is also full of symbolism, with the sleeping man representing England and the people around him representing the different aspects of English society. The painting has been interpreted in many different ways, but it is generally seen as a commentary on the state of England in the 1970s.

A quick googling confirms that Paul Whitehead actually covered one of my favourite album of all times. If you love Genesis too and want to see me play Firth of Fifth, please check my youtube video.

Let’s try this cat now:

\"\"
This is a cat from Google sample images

Gemini: A cat is walking in the snow. The cat is brown and white and has a long tail. The cat is looking at the camera. The snow is white and is covering the ground.

How did I do it?

If you check my bash script (gemini-generic.sh), you can see the request looks like this:

data=$(_base64_encode_mac_or_linux "$IMAGE") # Mac or Linux should both work!
# ...
cat > "$REQUEST_FILE" <<EOF
{'contents': {
'role': 'USER',
'parts': [
{'text': '$QUESTION'},
{'inline_data': {
'data': '$data',
'mime_type':'image/jpeg'}}]
}

}
EOF

The hardest part is actually the base64conversion, and making it work both on my Mac and on my Linux (as our grand-parents say in Italy “You wanted /bin/bash.? Now you pedal!”).

An Italian image, explained in Italian

\"\"
The city of XXX, in Italy. Seen from the YYY palace (surprise!). Picture from my colleague Daria.
./gemini-explain-image.sh images/italian-town.jpg
[..]
This is a view of the city of Trento, Italy from the Buonconsiglio Castle.

Gemini (🇬🇧): This is a view of the city of Trento, Italy from the Buonconsiglio Castle.

Let’s ask the same in Italian:

$ GENERATE_MP3=true ./gemini-explain-image-italian.sh images/italian-town.jpg
# 🤌 QUESTION: Descrivimi cosa vedi in questa immagine
# 🌡️ TEMPERATURE: 0.2
# 👀 Examining image images/italian-town.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=1, software=Google], baseline, precision 8, 926x1230, components 3.
# ♊ Gemini no Saga answer for you:
La foto mostra una loggia con delle colonne in pietra che incorniciano la vista di una città.
La città è circondata da montagne e si possono vedere i tetti delle case e le torri delle chiese.
Il cielo è azzurro e ci sono delle nuvole bianche.
# TTS_LANG: it-IT
Written .tmp.tts-output.json. curl_ret=0
t.audio.encoded: ASCII text, with very long lines (65536), with no line terminators
t.mp3: MPEG ADTS, layer III, v2, 32 kbps, 24 kHz, Monaural
t.mp3: MPEG ADTS, layer III, v2, 32 kbps, 24 kHz, Monaural
All good. MP3 created: 't.La foto mostra una loggia con delle colonne in pie.mp3'

Gemini (🇮🇹): La foto mostra una loggia con delle colonne in pietra che incorniciano la vista di una città. La città è circondata da montagne e si possono vedere i tetti delle case e le torri delle chiese. Il cielo è azzurro e ci sono delle nuvole bianche.

As you see, Italian is more verbose (my wife says the same about me!) but it’s not aware of the Buonconsiglio Palace. Interesting! I presume the Italian model has less training material to learn from than the English one.

Makes sense.

\"\"
Btw, I highly recommend Trento, I was cycling around there: great views and great wines!

Comparing two images

How about comparing two images?

\"\"
Gemini constellation vs Saint Seiya’s Gemini saint.

Some sample code is in my Makefile for convenience. try this:

ricc@derek:~/git/genai-googlecloud-scripts/07-bash-gemini-multimodal-jam$ 🐼 !mak
make compare-two-geminis
./gemini-generic-two-pics.sh images/gemini-constellation.png images/saga-blue-hair.jpg
♊️ Question: Can you highlight similarity and differences between the two? Also, do you recognize the same person in both of them?
👀 Examining image1 images/gemini-constellation.png: images/gemini-constellation.png: PNG image data, 1675 x 1302, 8-bit/color RGBA, non-interlaced.
👀 Examining image2 images/saga-blue-hair.jpg: images/saga-blue-hair.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 193x261, components 3.
♊️ Describing attached image:
The two images are of the constellation Gemini and the anime character Gemini Saga. The constellation is said to represent the twins Castor and Pollux, while the character Gemini Saga is a Gemini Saint in the anime series Saint Seiya. Both images depict two figures that are connected to each other. The constellation is made up of stars, while the character is a human.

♊ Response: ️ The two images are of the constellation Gemini and the anime character Gemini Saga. The constellation is said to represent the twins Castor and Pollux, while the character Gemini Saga is a Gemini Saint in the anime series Saint Seiya. Both images depict two figures that are connected to each other. The constellation is made up of stars, while the character is a human.

Today we’ve learnt something we certainly didn’t know: a constellation is made of stars, while the guy on the right is human 😃

Introducing Audio!

Why don’t we throw some audio in the mix?

I’ve added a tts.sh , which creates an MP3 out of an english (or Italian!) text given in ARGV. Convenient uh?

\"\"
My kids beside American Santa Klaus (we live in Zurich, there’s a Santa for every language and ethnicity!)

A few weeks ago my kids went to visit Santa Klaus.. the American Santa — they’ll tell you there’s another who speaks German instead, he’s called Samiklaus, they still can’t decide if its the same guy — my kids are X and Y year old; how old? Let’s ask Gemini!

$ make age-test
# equivalent to:
$ GENERATE_MP3=true ./gemini-generic.sh images/ricc-family-with-santa.jpg Tell me the age of the people you see, from left to right.
# 🤌 QUESTION: Tell me the age of the people you see, from left to right.
# 🌡️ TEMPERATURE: 0.2
# 👀 Examining image images/ricc-family-with-santa.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=3, software=Google], baseline, precision 8, 1164x826, components 3.
# ♊ Gemini no Saga answer for you:
1. 30-35
2. 2-3
3. 40-45
4. 2-3
5. 60-65
[..]
All good. MP3 created [..]

Now I’m older than 40–45 (46) so technically speaking this is a mistake, but as a human interaction I will consider it as a success 😉

Now this is not the best of MP3s, but it works! Check out ricc-family-with-santa.jpg.mp3.

You can also find Trento in Italian here: italian-town.jpg.mp3.

Something useful now: understand a diagram!

A colleague told me yesterday: the power of Gemini is that not only “he” understands text and images, also the very complex relationship between items in a diagram. This morning my Accrux earphones wouldn’t work, so I decided to understand how they work. I downloaded the User Manual, uploaded to Gemini as image, and asked Gemini a few questions.

\"\"
I’m not so young anymore so I can’t read this unless it’s printed on A3. And I respect Amazon too much!

From UI, this is easy peasy. Upload image and ask question:

\"\"

And what about bash? Let’s do the same from CLI:

$ make read-instruction-manual-for-me 
[..]
./gemini-generic.sh images/instruction-manuals/Acrux-User-Manual-4700503.png '1. How do i TURN it on? 2. Where is the power button located? 3. Is this the one called ANC?'
# 🤌 QUESTION: 1. How do i TURN it on? 2. Where is the power button located? 3. Is this the one called ANC?
# 🌡️ TEMPERATURE: 0.2
# 👀 Examining image images/instruction-manuals/Acrux-User-Manual-4700503.png: PNG image data, 1664 x 929, 8-bit/color RGBA, non-interlaced.
# ♊ Gemini no Saga answer for you:
1. Long press the power button for 2 seconds.
2. The power button is located on the right earcup.
3. Yes, this is the one called ANC.
# Note: No mp3 file generated (use GENERATE_MP3=true to generate one)

My question (I actually asked the questions one by one, so think of it as an avalanche-chat):

  1. How do i TURN it on?
  2. Where is the power button located?
  3. Is this the one called ANC?

# ♊ Gemini answer:
1. Long press the power button for 2 seconds.
2. The power button is located on the right earcup.
3. Yes, this is the one called ANC.

Bingo!

An unexpected Games of Thrones plot twist

This is the avatar I use in Google. I’m really curious to see what gemini can infer from here, also because I used some filters.

\"\"
Riccardo in the Amsterdam office, a few years/kilos ago.
$ ./gemini-explain-image.sh images/ricc-logo.png
[..]
This is a photo of a man standing behind a Google Cloud Platform cutout.
The man is smiling and wearing a shirt that says, “That’s what I do,
I drink and I know things.” The background is a brick wall with blue
and white accents.

Gemini: This is a photo of a man standing behind a Google Cloud Platform cutout. The man is smiling and wearing a shirt that says, “That’s what I do,
I drink and I know things.” The background is a brick wall with blue
and white accents.

And I thought! Of course, this is my favourite Games of Thrones t-shirt. Let’s see if he knows:

Q: Do you recognize the quote in this person tshirt?

$ GENERATE_MP3=true ./gemini-generic.sh images/ricc-logo.png Do you recognize the quote in this person tshirt
[..]
"That's what I do, I drink and I know things" is a quote from the TV show Game of Thrones,
said by the character Tyrion Lannister.

Response: “That’s what I do, I drink and I know things” is a quote from the TV show Game of Thrones, said by the character Tyrion Lannister.

And you also have the MP3 if you wish: images/ricc-logo.png.mp3

\"\"

Chapeau, Gemini!

Conclusions

Calling Gemini is easy, fun and powerful!

  • Easy. If you can invoke Gemini via curl, imagine how easy it is to invoke it with your favourite language! Try out our SDK in Python, Node, Java, Go. For everything else there’s REST/cURL!
  • Fun. look at how many use cases I’ve shown in this page: comparing two images, asking questions abuot an image, a writing, age of people, .. possibilities are endless.
  • Powerful. Multi-modal is a game changer. And for what I hear, more modes are coming. 😉

Did you like this? Want more of this? Write it in the comments!

Riccardo

\"\"

Hey Gemini, explain me these pictures — in bash 🧑🏽‍💻 was originally published in DevOps.dev on Medium, where people are continuing the conversation by highlighting and responding to this story.

",
"author"=>"Riccardo Carlesso",
"link"=>"https://blog.devops.dev/hey-gemini-explain-me-these-pictures-in-bash-06c03d0d0512?source=rss-b5293b96912f------2",
"published_date"=>Thu, 14 Dec 2023 15:16:16.000000000 UTC +00:00,
"image_url"=>nil,
"feed_url"=>"https://blog.devops.dev/hey-gemini-explain-me-these-pictures-in-bash-06c03d0d0512?source=rss-b5293b96912f------2",
"language"=>nil,
"active"=>true,
"ricc_source"=>"feedjira::v1",
"created_at"=>Sun, 31 Mar 2024 09:22:22.307697000 UTC +00:00,
"updated_at"=>Mon, 21 Oct 2024 15:33:53.788239000 UTC +00:00,
"newspaper"=>"Riccardo Carlesso - Medium",
"macro_region"=>"Blogs"}
Edit this article
Back to articles