"title"=>"Deterministic Generative AI with Gemini Function Calling in Java",
"summary"=>nil,
"content"=>"
Generative AI models are remarkable at understanding and responding to natural language. But what if you need precise, predictable outputs for critical tasks like address standardization? Traditional generative models can sometimes provide different responses at different times for the same prompts, potentially leading to inconsistencies. That’s where Gemini’s Function Calling capability shines, allowing you to deterministically control elements of the AI’s response.
In this blog, we’ll illustrate this concept with the address completion and standardization use case. For this we will be building a Java Cloud Function that:
- Takes latitude and longitude coordinates
- Calls the Google Maps Geocoding API to get corresponding addresses
- Uses Gemini 1.o Pro Function Calling feature to deterministically standardize and summarize those addresses in a specific format that we need
Let’s dive in!
Gemini Function Calling
Gemini Function Calling stands out in the Generative AI era because it lets you blend the flexibility of generative language models with the precision of traditional programming. Here’s how it works:
Defining Functions: You describe functions as if you were explaining them to a coworker. These descriptions include:
- The function’s name (e.g., “getAddress”)
- The parameters it expects (e.g., “latlng” as a string)
- The type of data it returns (e.g., a list of address strings)
“Tools” for Gemini: You package function descriptions in the form of API specification into “Tools”. Think of a tool as a specialized toolbox Gemini can use to understand the functionality of the API.
Gemini as API Orchestrator: When you send a prompt to Gemini, it can analyze your request and recognize where it can use the tools you’ve provided. Gemini then acts as a smart orchestrator:
- Generates API Parameters: It produces the necessary parameters to call your defined functions.
- Calls External APIs: Gemini doesn’t call the API on your behalf. You call the API based on the parameters and signature that Gemini function calling has generated for you.
- Processes Results: Gemini feeds the results from your API calls back into its generation, letting it incorporate structured information into its final response which you can process in any way you desire for your application.
High Level Flow Diagram
This diagram represents the flow of data and steps involved in the implementation. Please note that the owner for the respective step is mentioned in the text underneath.
Industry Use cases and why it matters
Below are some of the examples and industry specific use cases for function calling with Gemini.
- Geocoding (Our Use Case): You’ve seen how to define a “getAddress” function and use it within the context of address standardization.
- Data Validation: Imagine a function called “validateEmail” that takes a string and checks it against an email validation service. Gemini can help you formulate the parameters string so you can call the email validation API to ensure the quality of generated responses. Remember, Gemini does not make the API call for you.
- Fact-Checking: Define a “lookupFact” function. Gemini could use this to consult a trusted knowledge base, making its responses more reliable within specific domains.
Why Function Calling Matters
Bridging Two Worlds: LLMs can’t know everything (especially private information, customer details, news that are more recent than its knowledge cut-off date), and by integrating function calling, it’s able to extend its knowledge and capabilities to bridge the gap between open-ended generative AI and the deterministic execution of traditional code.
Controlled Creativity: Function Calling injects structured, predictable elements into Gemini’s creative process, ideal for critical use cases or maintaining consistency.
Building Agent: Chain together multiple function calls and Gemini processing steps, enabling multi-stage generative AI workflows.
Create the Java Cloud Function
This is the Gen 2 Cloud Function implementation where we will invoke the Gemini model to orchestrate the input for function Calling, invoke the API and then process the response in another Gemini call and deploy it to a REST endpoint.
Setup
- In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
- Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
- You will use Cloud Shell, a command-line environment running in Google Cloud that comes preloaded with bq. From the Cloud Console, click Activate Cloud Shell on the top right corner.
- Just for support with building and delivering the app, let’s enable Duet AI. Navigate to Duet AI Marketplace to enable the API. You can also run the following command in the Cloud Shell terminal:
gcloud services enable cloudaicompanion.googleapis.com –project PROJECT_ID
5. Enable necessary APIs for this implementation if you haven’t already.
Alternative to the gcloud command is through the console using this link.
Java Cloud Function
- Open Cloud Shell Terminal and navigate to the root directory or your default workspace path.
- Click the Cloud Code Sign In icon from the bottom left corner of the status bar and select the active Google Cloud Project that you want to create the Cloud Functions in.
- Click the same icon again and this time select the option to create a new application.
- In the Create New Application pop-up, select Cloud Functions application:
5. Select Java: Hello World option from the next pop-up:
6. Provide a name for the project in the project path. In this case, “GeminiFunctionCalling”.
7. You should see the project structure opened up in a new Cloud Shell Editor view:
8. Now go ahead and add the necessary dependencies within the <dependencies>… </dependencies> tag in the pom.xml file. You can access the entire pom.xml from this project’s github repository.
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-vertexai</artifactId>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.10</version>
</dependency>
9. You can access the entire HelloWorld.java (or whatever you changed it to) class from the github link. Let’s understand Function Calling by breaking down this class:
Prompt Input:
In this example, this is what the input prompt looks like:
“What's the address for the latlong value 40.714224,-73.961452”
You can find the below code snippet relevant to the input prompt in the file:
String promptText = "What's the address for the latlong value '" + latlngString + "'?"; //40.714224,-73.961452
API Specification / Signature Definition:
We decided to use the Reverse Geocoding API. In this example, this is what the API spec looks like:
/* Declare the function for the API that we want to invoke (Geo coding API) */
FunctionDeclaration functionDeclaration = FunctionDeclaration.newBuilder()
.setName("getAddress")
.setDescription("Get the address for the given latitude and longitude value.")
.setParameters(
Schema.newBuilder()
.setType(Type.OBJECT)
.putProperties("latlng", Schema.newBuilder()
.setType(Type.STRING)
.setDescription("This must be a string of latitude and longitude coordinates separated by comma")
.build())
.addRequired("latlng")
.build())
.build();
Gemini to orchestrate the prompt with the API specification:
This is the part where we send the prompt input and the API spec to Gemini:
// Add the function to a "tool"
Tool tool = Tool.newBuilder()
.addFunctionDeclarations(functionDeclaration)
.build();
// Invoke the Gemini model with the use of the tool to generate the API parameters from the prompt input.
GenerativeModel model = GenerativeModel.newBuilder()
.setModelName(modelName)
.setVertexAi(vertexAI)
.setTools(Arrays.asList(tool))
.build();
GenerateContentResponse response = model.generateContent(promptText);
Content responseJSONCnt = response.getCandidates(0).getContent();
The response from this is the orchestrated parameters JSON to the API. Output from this step would look like below:
role: "model"
parts {
function_call {
name: "getAddress"
args {
fields {
key: "latlng"
value {
string_value: "40.714224,-73.961452"
}
}
}
}
}
The parameter that needs to be passed to the Reverse Geocoding API is this:
“latlng=40.714224,-73.961452”
From the Content object, you can get the Part, call the hasFunctionCall() method to know if it has a function call request that’s returned by the LLM. Call getFunctionCall() to get a FunctionCall object. Use the hasArgs() method to check if there are arguments, and then a getArgs() method to get the actual arguments. It’s a protobuf Struct object. Match the orchestrated result to the format “latlng=VALUE”. Refer to the full code here.
Invoke the API:
At this point you have everything you need to invoke the API. The part of the code that does it is below:
// Create a request
String url = API_STRING + "?key=" + API_KEY + params;
java.net.http.HttpRequest request = java.net.http.HttpRequest.newBuilder()
.uri(URI.create(url))
.GET()
.build();
// Send the request and get the response
java.net.http.HttpResponse<String> httpresponse = client.send(request, java.net.http.HttpResponse.BodyHandlers.ofString());
// Save the response
String jsonResult = httpresponse.body().toString();
The string jsonResult holds the stringified response from the reverse Geocoding API. It looks something like this: (This is a formatted version of the output. Please note the result is truncated as well).
“...277 Bedford Ave, Brooklyn, NY 11211, USA; 279 Bedford Ave, Brooklyn, NY 11211, USA; 277 Bedford Ave, Brooklyn, NY 11211, USA;...”
Process the API response and prepare the prompt:
The below code processes the response from the API and prepares the prompt with instructions on how to process the response:
// Provide an answer to the model so that it knows what the result
// of a "function call" is.
String promptString =
"You are an AI address standardizer for assisting with standardizing addresses accurately. Your job is to give the accurate address in the standard format as a JSON object containing the fields DOOR_NUMBER, STREET_ADDRESS, AREA, CITY, TOWN, COUNTY, STATE, COUNTRY, ZIPCODE, LANDMARK by leveraging the address string that follows in the end. Remember the response cannot be empty or null. ";
Content content =
ContentMaker.fromMultiModalData(
PartMaker.fromFunctionResponse(
"getAddress",
Collections.singletonMap("address", formattedAddress)));
String contentString = content.toString();
String address = contentString.substring(contentString.indexOf("string_value: \\"") + "string_value: \\"".length(), contentString.indexOf('"', contentString.indexOf("string_value: \\"") + "string_value: \\"".length()));
List<SafetySetting> safetySettings = Arrays.asList(
SafetySetting.newBuilder()
.setCategory(HarmCategory.HARM_CATEGORY_HATE_SPEECH)
.setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH)
.build(),
SafetySetting.newBuilder()
.setCategory(HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT)
.setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH)
.build()
);
Invoke Gemini and return the standardized address :
The below code passes the processed output from the above step as prompt to Gemini:
GenerativeModel modelForFinalResponse = GenerativeModel.newBuilder()
.setModelName(modelName)
.setVertexAi(vertexAI)
.build();
GenerateContentResponse finalResponse = modelForFinalResponse.generateContent(promptString + ": " + address, safetySettings);
System.out.println("promptString + content: " + promptString + ": " + address);
// See what the model replies now
System.out.println("Print response: ");
System.out.println(finalResponse.toString());
String finalAnswer = ResponseHandler.getText(finalResponse);
System.out.println(finalAnswer);
The finalAnswer variable has the standardized address in JSON format. Sample output below:
{"replies":["{ \\"DOOR_NUMBER\\": null, \\"STREET_ADDRESS\\": \\"277 Bedford Ave\\", \\"AREA\\": \\"Brooklyn\\", \\"CITY\\": \\"New York\\", \\"TOWN\\": null, \\"COUNTY\\": null, \\"STATE\\": \\"NY\\", \\"COUNTRY\\": \\"USA\\", \\"ZIPCODE\\": \\"11211\\", \\"LANDMARK\\": null} null}"]}
Now that you have understood how Gemini Function Calling works with the address standardization use case, go ahead and deploy the Cloud Function.
10. Now that you have understood how Gemini Function Calling works with the address standardization use case, go ahead and deploy the Cloud Function.
gcloud functions deploy gemini-fn-calling --gen2 --region=us-central1 --runtime=java11 --source=. --entry-point=cloudcode.helloworld.HelloWorld --trigger-http
The result for this deploy command would be a REST URL in the format as below :
https://us-central1-YOUR_PROJECT_ID.cloudfunctions.net/gemini-fn-calling
11. Test this Cloud Function by running the following command from the terminal:
gcloud functions call gemini-fn-calling --region=us-central1 --gen2 --data '{"calls":[["40.714224,-73.961452"]]}'
Response for a random sample prompt:
'{"replies":["{ \\"DOOR_NUMBER\\": \\"277\\", \\"STREET_ADDRESS\\": \\"Bedford Ave\\", \\"AREA\\":
null, \\"CITY\\": \\"Brooklyn\\", \\"TOWN\\": null, \\"COUNTY\\": \\"Kings County\\", \\"STATE\\":
\\"NY\\", \\"COUNTRY\\": \\"USA\\", \\"ZIPCODE\\": \\"11211\\", \\"LANDMARK\\": null}}```"]}'
The request and response parameters of this Cloud Function are implemented in a way that is compatible with BigQuery’s remote function invocation. It can be directly consumed from BigQuery data in-place. It means that if your data input (lat and long data) lives in BigQuery then you can call the remote function on the data and get the function response which can be stored or processed within BigQuery directly. To learn how to leverage this Cloud Function in performing in-place LLM insights on your database, refer to this blog.
Conclusion
This project has demonstrated the power of Gemini Function Calling, transforming a generative AI task into a deterministic, reliable process. If you work with generative AI, don’t let its sometimes-unpredictable nature hold you back. Use the power of Gemini 1.0 Pro Function Calling feature and create applications that are as innovative as they are dependable. Start exploring how you can incorporate this feature into your own work! Are there datasets you could validate, information gaps you could fill, or tasks that could be automated with structured calls embedded within your generative AI responses? Here is the link to the repo and for further reading, refer to the documentation for Vertex AI, BigQuery Remote Functions, and Cloud Functions for more in-depth guidance in these areas.
Deterministic Generative AI with Gemini Function Calling in Java was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.
","author"=>"Abirami Sukumaran",
"link"=>"https://medium.com/google-cloud/using-gemini-function-calling-in-java-for-deterministic-generative-ai-responses-4c86a5ab80a9?source=rss----e52cf94d98af---4",
"published_date"=>Wed, 03 Apr 2024 04:48:45.000000000 UTC +00:00,
"image_url"=>nil,
"feed_url"=>"https://medium.com/google-cloud/using-gemini-function-calling-in-java-for-deterministic-generative-ai-responses-4c86a5ab80a9?source=rss----e52cf94d98af---4",
"language"=>nil,
"active"=>true,
"ricc_source"=>"feedjira::v1",
"created_at"=>Wed, 03 Apr 2024 11:08:57.196305000 UTC +00:00,
"updated_at"=>Mon, 21 Oct 2024 18:28:35.069687000 UTC +00:00,
"newspaper"=>"Google Cloud - Medium",
"macro_region"=>"Blogs"}