Gemini

The Gemini component is an AI component that allows users to connect to Google's Gemini multimodal AI models. It can carry out the following tasks:

Chat
Cache

Release Stage

Alpha

Configuration

The component definition and tasks are defined in the definition.yaml and tasks.yaml files respectively.

Setup

In order to communicate with Google, the following connection details need to be provided. You may specify them directly in a pipeline recipe as key-value pairs within the component's setup block, or you can create a Connection from the Integration Settings page and reference the whole setup as setup: ${connection.<my-connection-id>}.

Field	Field ID	Type	Note
API Key	`api-key`	string	Fill in your Gemini API key. To find your keys, visit your Gemini's API Keys page.

Supported Tasks

Chat

Gemini's multimodal models understand text and images. They generate text outputs in response to prompts that can include text and images. The inputs to these models are also referred to as "prompts". Designing a prompt is how you guide the model, usually by providing instructions or examples to successfully complete a task.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_CHAT`
Model (required)	`model`	string	ID of the model to use. The value is one of the following: `gemini-2.5-pro`: Optimized for enhanced thinking and reasoning, multimodal understanding, advanced coding, and more. `gemini-2.5-flash`: Optimized for adaptive thinking, cost efficiency. `gemini-2.5-flash-lite`: Optimized for most cost-efficient model supporting high throughput. `gemini-2.5-flash-image-preview`: Optimized for precise, conversational image generation and editing. Enum values `gemini-2.5-pro` `gemini-2.5-flash` `gemini-2.5-flash-lite` `gemini-2.5-flash-image-preview`
Stream	`stream`	boolean	Whether to incrementally stream the response using server-sent events (SSE).
Prompt	`prompt`	string	The main text instruction or query for the model.
Images	`images`	array[string]	URI references or base64 content of input images.
Audio	`audio`	array[string]	URI references or base64 content of input audio.
Videos	`videos`	array[string]	URI references or base64 content of input videos.
Documents	`documents`	array[string]	URI references or base64 content of input documents. Different vendors might have different constraints on the document format. For example, Gemini supports only PDF.
System Message	`system-message`	string	Instruction to set the assistant's behavior, tone, or persona. Different vendors might name this field differently.
Chat History	`chat-history`	array[object]	Conversation history, each message includes a role and content.
Max Output Token	`max-output-tokens`	integer	The maximum number of tokens to generate in the model output.
Temperature	`temperature`	number	A parameter that controls the randomness and creativity of a large language model's output by adjusting the probability of the next word it chooses. A low temperature (e.g., near 0) produces more deterministic, focused, and consistent text, while a high temperature (e.g., near 1) leads to more creative, random, and varied output.
Top-P	`top-p`	number	A parameter, also known as nucleus sampling, that controls the randomness and creativity of the generated text by selecting a dynamic subset of tokens. It works by sorting all possible next tokens by their probability, and then summing their probabilities from highest to lowest until the cumulative sum reaches the specified `top-p` value (a number between 0 and 1). The model then randomly selects the next token only from this "nucleus" of high-probability tokens. A higher `top-p` value creates a larger, more diverse set of possible words, leading to more creative and potentially unpredictable output, while a lower `top-p` value restricts the choice to a smaller, more focused set of highly probable words, resulting in more factual and conservative output.
Top-K	`top-k`	integer	A text generation parameter that limits the selection of the next token to the K most probable tokens, discarding the rest to control randomness and maintain coherence. By specifying a fixed number of top tokens, `top-k` acts as a "safety net," preventing nonsensical choices, but a small K can also stifle creativity and lead to repetitive outputs. It is often used in conjunction with other parameters like temperature and `top-p` to fine-tune the LLM's output. Note that OpenAI and Mistral models don't have the `top-k` exposed.
Seed	`seed`	integer	A random seed used to control the stochasticity of text generation to produce repeatable outputs
Contents	`contents`	array[object]	The input contents to the model. Each item represents a user or model turn composed of parts (text or images).
Tools	`tools`	array[object]	Tools available to the model, e.g., function declarations.
Tool Config	`tool-config`	object	Configuration for tool usage and function calling.
Safety Settings	`safety-settings`	array[object]	Safety settings for content filtering.
System Instruction	`system-instruction`	object	A system instruction to guide the model behavior.
Generation Config	`generation-config`	object	Generation configuration for the request.
Cached Content	`cached-content`	string	The name of a cached content to use as context. Format: cachedContents/{cachedContent}.

Input Objects in Chat

Chat History

Conversation history, each message includes a role and content.

Field	Field ID	Type	Note
Parts	`parts`	array	Parts of the content.
Role	`role`	string	The producer of the content. Must be either 'user' or 'model'. Useful to set for multi-turn conversations, otherwise can be left blank or unset. Optional. The value is one of the following: `USER`: User content. `MODEL`: Model content. Enum values `USER` `MODEL`

Parts

Parts of the content.

Field	Field ID	Type	Note
Thought	`thought`	boolean	Indicates if the part is a thought from the model.
Thought Signature	`thought-signature`	string	Opaque signature for the thought (base64-encoded bytes).
Video Metadata	`video-metadata`	object	Optional video metadata (only with blob or fileData video content).

Video Metadata

Optional video metadata (only with blob or fileData video content).

Field	Field ID	Type	Note
End Offset	`end-offset`	string	The end offset of the video (duration string, e.g. "3.5s").
FPS	`fps`	number	Frame rate of the video sent to the model. Range (0.0, 24.0].
Start Offset	`start-offset`	string	The start offset of the video (duration string, e.g. "3.5s").

The input contents to the model. Each item represents a user or model turn composed of parts (text or images).

Field	Field ID	Type	Note
Parts	`parts`	array	Parts of the content.
Role	`role`	string	The producer of the content. Must be either 'user' or 'model'. Useful to set for multi-turn conversations, otherwise can be left blank or unset. Optional. The value is one of the following: `USER`: User content. `MODEL`: Model content. Enum values `USER` `MODEL`

Parts

Parts of the content.

Field	Field ID	Type	Note
Thought	`thought`	boolean	Indicates if the part is a thought from the model.
Thought Signature	`thought-signature`	string	Opaque signature for the thought (base64-encoded bytes).
Video Metadata	`video-metadata`	object	Optional video metadata (only with blob or fileData video content).

Video Metadata

Optional video metadata (only with blob or fileData video content).

Field	Field ID	Type	Note
End Offset	`end-offset`	string	The end offset of the video (duration string, e.g. "3.5s").
FPS	`fps`	number	Frame rate of the video sent to the model. Range (0.0, 24.0].
Start Offset	`start-offset`	string	The start offset of the video (duration string, e.g. "3.5s").

Tools

Tools available to the model, e.g., function declarations.

Field	Field ID	Type	Note
Code Execution	`code-execution`	object	Tool that executes code generated by the model, and automatically returns the result to the model.
Function Declarations	`function-declarations`	array	Functions the model may call.
Google Search	`google-search`	object	GoogleSearch tool type. Tool to support Google Search in Model. Powered by Google.
Google Search Retrieval	`google-search-retrieval`	object	Tool to retrieve public web data for grounding, powered by Google.
URL Context	`url-context`	object	Tool to support URL context retrieval.

Function Declarations

Functions the model may call.

Field	Field ID	Type	Note
Description	`description`	string	A brief description of the function.
Name	`name`	string	The name of the function to call.
Parameters	`parameters`	object	Describes the parameters to this function. Reflects the Open API 3.03 Parameter Object string Key: the name of the parameter. Parameter names are case sensitive. Schema Value: the Schema defining the type used for the parameter.

Parameters

Describes the parameters to this function. Reflects the Open API 3.03 Parameter Object string Key: the name of the parameter. Parameter names are case sensitive. Schema Value: the Schema defining the type used for the parameter.

Field	Field ID	Type	Note
Any Of	`anyOf`	array	Value must satisfy any of the sub-schemas.
Default	`default`	object	Default value for the field (ignored for validation).
Description	`description`	string	Optional description of the schema.
Enum	`enum`	array	Enum values for STRING with enum format.
Format	`format`	string	Optional format of the data.
Items	`items`	object	Schema of elements for ARRAY type.
Max Items	`max-items`	integer	Maximum number of elements for ARRAY type.
Max Length	`max-length`	integer	Maximum length for STRING type.
Max Properties	`max-properties`	integer	Maximum number of properties for OBJECT type.
Maximum	`maximum`	number	Maximum value for INTEGER/NUMBER types.
Min Items	`min-items`	integer	Minimum number of elements for ARRAY type.
Min Length	`min-length`	integer	Minimum length for STRING type.
Min Properties	`min-properties`	integer	Minimum number of properties for OBJECT type.
Minimum	`minimum`	number	Minimum value for INTEGER/NUMBER types.
Nullable	`nullable`	boolean	Indicates if the value may be null.
Pattern	`pattern`	string	Regex pattern constraint for STRING type.
Properties	`properties`	object	Properties for OBJECT type.
Property Ordering	`property-ordering`	array	Order of properties for OBJECT type (non-standard).
Required	`required`	array	Required properties for OBJECT type.
Title	`title`	string	Optional title of the schema.
Type	`type`	string	Required data type of the schema. Enum values `TYPE_UNSPECIFIED` `STRING` `NUMBER` `INTEGER` `BOOLEAN` `ARRAY` `OBJECT`

Google Search Retrieval

Tool to retrieve public web data for grounding, powered by Google.

Field	Field ID	Type	Note
Dynamic Retrieval Config	`dynamic-retrieval-config`	object	Specifies the dynamic retrieval configuration for the given source.

Dynamic Retrieval Config

Specifies the dynamic retrieval configuration for the given source.

Field	Field ID	Type	Note
Dynamic Threshold	`dynamic-threshold`	number	The threshold to be used in dynamic retrieval. If not set, a system default value is used.
Mode	`mode`	string	The mode of the predictor to be used in dynamic retrieval. The value is one of the following: `MODE_UNSPECIFIED`: Always trigger retrieval. `MODE_DYNAMIC`: Run retrieval only when system decides it is necessary. Enum values `MODE_UNSPECIFIED` `MODE_DYNAMIC`

Google Search

GoogleSearch tool type. Tool to support Google Search in Model. Powered by Google.

Field	Field ID	Type	Note
Time Range Filter	`time-range-filter`	object	Filter search results to a specific time range. If customers set a start time, they must set an end time (and vice versa).

Time Range Filter

Filter search results to a specific time range. If customers set a start time, they must set an end time (and vice versa).

Field	Field ID	Type	Note
End Time	`end-time`	string	Exclusive end of the interval. If specified, a Timestamp matching this interval will have to be before the end. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".
Start Time	`start-time`	string	Inclusive start of the interval. If specified, a Timestamp matching this interval will have to be the same or after the start. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

Tool Config

Configuration for tool usage and function calling.

Field	Field ID	Type	Note
Function Calling Config	`function-calling-config`	object	Configuration for specifying function calling behavior.

Function Calling Config

Configuration for specifying function calling behavior.

Field	Field ID	Type	Note
Allowed Function Names	`allowed-function-names`	array	A set of function names that, when provided, limits the functions the model will call. This should only be set when the Mode is ANY or VALIDATED. Function names should match `[FunctionDeclaration.name]`. When set, model will predict a function call from only allowed function names.
Mode	`mode`	string	Specifies the mode in which function calling should execute. If unspecified, the default value will be set to `AUTO`. The value is one of the following: `MODE_UNSPECIFIED`: Unspecified function calling mode. This value should not be used. `AUTO`: Default model behavior, model decides to predict either a function call or a natural language response. `ANY`: Model is constrained to always predicting a function call only. If "allowedFunctionNames" are set, the predicted function call will be limited to any one of "allowedFunctionNames", else the predicted function call will be any one of the provided "functionDeclarations". `NONE`: Model will not predict any function call. Model behavior is same as when not passing any function declarations. `VALIDATED`: Model decides to predict either a function call or a natural language response, but will validate function calls with constrained decoding. If "allowedFunctionNames" are set, the predicted function call will be limited to any one of "allowedFunctionNames", else the predicted function call will be any one of the provided "functionDeclarations". Enum values `MODE_UNSPECIFIED` `AUTO` `ANY` `NONE`

Safety Settings

Safety settings for content filtering.

Field	Field ID	Type	Note
Harm Category	`category`	string	The category of a rating for safety. The value is one of the following: `HARM_CATEGORY_UNSPECIFIED`: Category is unspecified. `HARM_CATEGORY_DEROGATORY`: PaLM - Negative or harmful comments targeting identity and/or protected attribute. `HARM_CATEGORY_TOXICITY`: PaLM - Content that is rude, disrespectful, or profane. `HARM_CATEGORY_VIOLENCE`: PaLM - Describes scenarios depicting violence against an individual or group, or general descriptions of gore. `HARM_CATEGORY_SEXUAL`: PaLM - Contains references to sexual acts or other lewd content. `HARM_CATEGORY_MEDICAL`: PaLM - Promotes unchecked medical advice. `HARM_CATEGORY_DANGEROUS`: PaLM - Dangerous content that promotes, facilitates, or encourages harmful acts. `HARM_CATEGORY_HARASSMENT`: Gemini - Harassment content. `HARM_CATEGORY_HATE_SPEECH`: Gemini - Hate speech and content. `HARM_CATEGORY_SEXUALLY_EXPLICIT`: Gemini - Sexually explicit content. `HARM_CATEGORY_DANGEROUS_CONTENT`: Gemini - Dangerous content. `HARM_CATEGORY_CIVIC_INTEGRITY`: Gemini - Content that may be used to harm civic integrity. DEPRECATED: use enableEnhancedCivicAnswers instead. Enum values `HARM_CATEGORY_UNSPECIFIED` `HARM_CATEGORY_DEROGATORY` `HARM_CATEGORY_TOXICITY` `HARM_CATEGORY_VIOLENCE` `HARM_CATEGORY_SEXUAL` `HARM_CATEGORY_MEDICAL` `HARM_CATEGORY_DANGEROUS` `HARM_CATEGORY_HARASSMENT` `HARM_CATEGORY_HATE_SPEECH` `HARM_CATEGORY_SEXUALLY_EXPLICIT` `HARM_CATEGORY_DANGEROUS_CONTENT`
Harm Block Threshold	`threshold`	string	Block at and beyond a specified harm probability. The value is one of the following: `HARM_BLOCK_THRESHOLD_UNSPECIFIED`: Threshold is unspecified. `BLOCK_LOW_AND_ABOVE`: Content with NEGLIGIBLE will be allowed. `BLOCK_MEDIUM_AND_ABOVE`: Content with NEGLIGIBLE and LOW will be allowed. `BLOCK_ONLY_HIGH`: Content with NEGLIGIBLE, LOW, and MEDIUM will be allowed. `BLOCK_NONE`: All content will be allowed. `OFF`: Turn off the safety filter. Enum values `HARM_BLOCK_THRESHOLD_UNSPECIFIED` `BLOCK_LOW_AND_ABOVE` `BLOCK_MEDIUM_AND_ABOVE` `BLOCK_ONLY_HIGH` `BLOCK_NONE` `OFF`

System Instruction

A system instruction to guide the model behavior.

Field	Field ID	Type	Note
Parts	`parts`	array	Parts of the content.
Role	`role`	string	The producer of the content. Must be either 'user' or 'model'. Useful to set for multi-turn conversations, otherwise can be left blank or unset. Optional. The value is one of the following: `USER`: User content. `MODEL`: Model content. Enum values `USER` `MODEL`

Parts

Parts of the content.

Field	Field ID	Type	Note
Thought	`thought`	boolean	Indicates if the part is a thought from the model.
Thought Signature	`thought-signature`	string	Opaque signature for the thought (base64-encoded bytes).
Video Metadata	`video-metadata`	object	Optional video metadata (only with blob or fileData video content).

Video Metadata

Optional video metadata (only with blob or fileData video content).

Field	Field ID	Type	Note
End Offset	`end-offset`	string	The end offset of the video (duration string, e.g. "3.5s").
FPS	`fps`	number	Frame rate of the video sent to the model. Range (0.0, 24.0].
Start Offset	`start-offset`	string	The start offset of the video (duration string, e.g. "3.5s").

Generation Config

Generation configuration for the request.

Field	Field ID	Type	Note
Candidate Count	`candidate-count`	integer	Number of candidates to generate.
Enable Enhanced Civic Answers	`enable-enhanced-civic-answers`	boolean	Enables enhanced civic answers.
Frequency Penalty	`frequency-penalty`	number	Frequency penalty applied proportional to the number of times a token has been seen.
Logprobs	`logprobs`	integer	Number of top logprobs to return at each decoding step (1-5). Only valid if response-logprobs is true.
Max Output Tokens	`max-output-tokens`	integer	The maximum number of tokens to generate in the response.
Media Resolution	`media-resolution`	string	Media resolution for multimodal generation. Controls how many tokens are budgeted for media understanding and reframing. The value is one of the following: `MEDIA_RESOLUTION_UNSPECIFIED`: Media resolution has not been set. `MEDIA_RESOLUTION_LOW`: Media resolution set to low (64 tokens). `MEDIA_RESOLUTION_MEDIUM`: Media resolution set to medium (256 tokens). `MEDIA_RESOLUTION_HIGH`: Media resolution set to high (zoomed reframing with 256 tokens). Enum values `MEDIA_RESOLUTION_UNSPECIFIED` `MEDIA_RESOLUTION_LOW` `MEDIA_RESOLUTION_MEDIUM` `MEDIA_RESOLUTION_HIGH`
Presence Penalty	`presence-penalty`	number	Presence penalty applied to next-token logprobs if token already seen.
Response Logprobs	`response-logprobs`	boolean	If true, export the logprobs results in response.
Response MIME Type	`response-mime-type`	string	Desired response MIME type (e.g., application/json for JSON mode).
Response Modalities	`response-modalities`	array	Requested modalities of the response. Empty means text only.
Response Schema	`response-schema`	object	JSON Schema to constrain the response when using JSON mode.
Seed	`seed`	integer	Seed used in decoding.
Speech Config	`speech-config`	object	Speech generation configuration.
Stop Sequences	`stop-sequences`	array	List of sequences that will stop further token generation.
Temperature	`temperature`	number	Sampling temperature, controls randomness.
Thinking Config	`thinking-config`	object	Config for thinking features.
Top K	`top-k`	number	Top-k sampling cutoff.
Top P	`top-p`	number	Nucleus sampling probability mass.

Response Schema

JSON Schema to constrain the response when using JSON mode.

Field	Field ID	Type	Note
Any Of	`anyOf`	array	Value must satisfy any of the sub-schemas.
Default	`default`	object	Default value for the field (ignored for validation).
Description	`description`	string	Optional description of the schema.
Enum	`enum`	array	Enum values for STRING with enum format.
Format	`format`	string	Optional format of the data.
Items	`items`	object	Schema of elements for ARRAY type.
Max Items	`max-items`	integer	Maximum number of elements for ARRAY type.
Max Length	`max-length`	integer	Maximum length for STRING type.
Max Properties	`max-properties`	integer	Maximum number of properties for OBJECT type.
Maximum	`maximum`	number	Maximum value for INTEGER/NUMBER types.
Min Items	`min-items`	integer	Minimum number of elements for ARRAY type.
Min Length	`min-length`	integer	Minimum length for STRING type.
Min Properties	`min-properties`	integer	Minimum number of properties for OBJECT type.
Minimum	`minimum`	number	Minimum value for INTEGER/NUMBER types.
Nullable	`nullable`	boolean	Indicates if the value may be null.
Pattern	`pattern`	string	Regex pattern constraint for STRING type.
Properties	`properties`	object	Properties for OBJECT type.
Property Ordering	`property-ordering`	array	Order of properties for OBJECT type (non-standard).
Required	`required`	array	Required properties for OBJECT type.
Title	`title`	string	Optional title of the schema.
Type	`type`	string	Required data type of the schema. Enum values `TYPE_UNSPECIFIED` `STRING` `NUMBER` `INTEGER` `BOOLEAN` `ARRAY` `OBJECT`

Speech Config

Speech generation configuration.

Field	Field ID	Type	Note
Language Code	`language-code`	string	Language code (BCP 47) for speech synthesis. Enum values `de-DE` `en-AU` `en-GB` `en-IN` `en-US` `es-US` `fr-FR` `hi-IN` `pt-BR` `ar-XA` `es-ES` `fr-CA` `id-ID` `it-IT` `ja-JP` `tr-TR` `vi-VN` `bn-IN` `gu-IN` `kn-IN` `ml-IN` `mr-IN` `ta-IN` `te-IN` `nl-NL` `ko-KR` `cmn-CN` `pl-PL` `ru-RU` `th-TH`
Multi Speaker Voice Config	`multi-speaker-voice-config`	object	Configuration for the multi-speaker setup. Mutually exclusive with voice-config.
Voice Config	`voice-config`		Configuration for the voice to use. Union type.

Multi Speaker Voice Config

Configuration for the multi-speaker setup. Mutually exclusive with voice-config.

Field	Field ID	Type	Note
Speaker Voice Configs	`speaker-voice-configs`	array	All the enabled speaker voices.

Speaker Voice Configs

All the enabled speaker voices.

Field	Field ID	Type	Note
Speaker	`speaker`	string	The name of the speaker to use. Should match the name used in the prompt.
Voice Config	`voice-config`		Configuration for the voice to use. Union type.

Thinking Config

Config for thinking features.

Field	Field ID	Type	Note
Include Thoughts	`include-thoughts`	boolean	Whether to include thoughts in the response when available.
Thinking Budget	`thinking-budget`	integer	The number of thought tokens the model should generate.

The parts Object

Parts

parts must fulfill one of the following schemas:

Field	Field ID	Type	Note
Text	`text`	string	Inline text content.

Field	Field ID	Type	Note
Blob	`blob`	object	Raw media bytes. Text should use the 'text' field instead.

Field	Field ID	Type	Note
Function Call	`function-call`	object	Predicted function call with name and arguments.

Field	Field ID	Type	Note
Function Response	`function-response`	object	Result of a function call with name and structured response.

Field	Field ID	Type	Note
File Data	`file-data`	object	URI-based data reference with MIME type.

Field	Field ID	Type	Note
Executable Code	`executable-code`	object	Code generated by the model that is meant to be executed.

Field	Field ID	Type	Note
Code Execution Result	`code-execution-result`	object	Result of executing the ExecutableCode.

Output	Field ID	Type	Description
Texts (optional)	`texts`	array[string]	Simplified text output extracted from candidates. Each string represents the concatenated text content from the corresponding candidate's parts, including thought processes when `include-thoughts` is enabled. This field provides easy access to the generated text without needing to traverse the candidate structure. Updated in real-time during streaming.
Images (optional)	`images`	array[image/webp]	Images output extracted and converted from candidates. This field provides easy access to the generated images as base64-encoded strings. The original binary data is removed from the candidates field to prevent raw binary exposure in JSON output. This field is only available when the model supports image generation.
Usage (optional)	`usage`	object	Token usage statistics: prompt tokens, completion tokens, total tokens, etc.
Candidates (optional)	`candidates`	array[object]	Complete candidate objects from the model containing rich metadata and structured content. Each candidate includes safety ratings, finish reason, token counts, citations, content parts (including thought processes when include-thoughts is enabled), and other detailed information. This provides full access to all response data beyond just text. Updated incrementally during streaming with accumulated content and latest metadata.
Usage Metadata (optional)	`usage-metadata`	object	Metadata on the generation request's token usage.
Prompt Feedback (optional)	`prompt-feedback`	object	Feedback on the prompt including any safety blocking information.
Model Version (optional)	`model-version`	string	The model version used to generate the response.
Response ID (optional)	`response-id`	string	Identifier for this response.

Output Objects in Chat

Candidates

Field	Field ID	Type	Note
Average Logprobs	`avg-logprobs`	number	Average log probability score of the candidate.
Citation Metadata	`citation-metadata`	object	Citation metadata for generated content, listing sources.
Content	`content`	object	Base structured datatype with producer role and ordered parts.
Finish Reason	`finish-reason`	string	Reason why the model stopped generating for a candidate. The value is one of the following: `FINISH_REASON_UNSPECIFIED`: Default value. This value is unused. `STOP`: Natural stop point of the model or provided stop sequence. `MAX_TOKENS`: The maximum number of tokens as specified in the request was reached. `SAFETY`: The response candidate content was flagged for safety reasons. `RECITATION`: The response candidate content was flagged for recitation reasons. `LANGUAGE`: The response candidate content was flagged for using an unsupported language. `OTHER`: Unknown reason. `BLOCKLIST`: Token generation stopped because the content contains forbidden terms. `PROHIBITED_CONTENT`: Token generation stopped for potentially containing prohibited content. `SPII`: Token generation stopped because the content potentially contains Sensitive Personally IDentifiable Information (SPII). `MALFORMED_FUNCTION_CALL`: The function call generated by the model is invalid. `IMAGE_SAFETY`: Token generation stopped because generated images contain safety violations. `UNEXPECTED_TOOL_CALL`: Model generated a tool call but no tools were enabled in the request. `TOO_MANY_TOOL_CALLS`: Model called too many tools consecutively, thus the system exited execution. Enum values `FINISH_REASON_UNSPECIFIED` `STOP` `MAX_TOKENS` `SAFETY` `RECITATION` `LANGUAGE` `OTHER` `BLOCKLIST` `PROHIBITED_CONTENT` `SPII` `MALFORMED_FUNCTION_CALL` `IMAGE_SAFETY` `UNEXPECTED_TOOL_CALL` `TOO_MANY_TOOL_CALLS`
Grounding Attributions	`grounding-attributions`	array	Attribution information for sources that contributed to a grounded answer.
Grounding Metadata	`grounding-metadata`	object	Metadata returned to client when grounding is enabled.
Index	`index`	integer	Position of the candidate in the returned list.
Logprobs Result	`logprobs-result`	object	Log probabilities for generated tokens.
Safety Ratings	`safety-ratings`	array	Safety ratings applied to this candidate.
Token Count	`token-count`	integer	Token count for this candidate.
URL Context Metadata	`url-context-metadata`	object	Metadata related to URL context retrieval tool.

Content

Field	Field ID	Type	Note
Parts	`parts`	array	Parts of the content.
Role	`role`	string	The producer of the content. Must be either 'user' or 'model'. Useful to set for multi-turn conversations, otherwise can be left blank or unset. Optional. The value is one of the following: `USER`: User content. `MODEL`: Model content. Enum values `USER` `MODEL`

Parts

Field	Field ID	Type	Note
Thought	`thought`	boolean	Indicates if the part is a thought from the model.
Thought Signature	`thought-signature`	string	Opaque signature for the thought (base64-encoded bytes).
Video Metadata	`video-metadata`	object	Optional video metadata (only with blob or fileData video content).

Video Metadata

Field	Field ID	Type	Note
End Offset	`end-offset`	string	The end offset of the video (duration string, e.g. "3.5s").
FPS	`fps`	number	Frame rate of the video sent to the model. Range (0.0, 24.0].
Start Offset	`start-offset`	string	The start offset of the video (duration string, e.g. "3.5s").

Safety Ratings

Field	Field ID	Type	Note
Blocked	`blocked`	boolean	Whether the content was blocked by this rating.
Harm Category	`category`	string	Harm category.
Probability	`probability`	string	Probability level of harm.

Citation Metadata

Field	Field ID	Type	Note
Citations	`citations`	array	Citations to sources for a specific response.

Citations

Field	Field ID	Type	Note
End Index	`end-index`	integer	Optional. End of the attributed segment, exclusive.
License	`license`	string	Optional. License for the GitHub project that is attributed as a source for segment. License info is required for code citations.
Start Index	`start-index`	integer	Optional. Start of segment of the response that is attributed to this source. Index indicates the start of the segment, measured in bytes.
URI	`uri`	string	Optional. URI that is attributed as a source for a portion of the text.

Grounding Attributions

Field	Field ID	Type	Note
Content	`content`	object	Grounding source content that makes up this attribution.
Source ID	`source-id`	object	Identifier for the source contributing to this attribution.

Content

Field	Field ID	Type	Note
Parts	`parts`	array	Parts of the content.
Role	`role`	string	The producer of the content. Must be either 'user' or 'model'. Useful to set for multi-turn conversations, otherwise can be left blank or unset. Optional. The value is one of the following: `USER`: User content. `MODEL`: Model content. Enum values `USER` `MODEL`

Parts

Field	Field ID	Type	Note
Thought	`thought`	boolean	Indicates if the part is a thought from the model.
Thought Signature	`thought-signature`	string	Opaque signature for the thought (base64-encoded bytes).
Video Metadata	`video-metadata`	object	Optional video metadata (only with blob or fileData video content).

Video Metadata

Field	Field ID	Type	Note
End Offset	`end-offset`	string	The end offset of the video (duration string, e.g. "3.5s").
FPS	`fps`	number	Frame rate of the video sent to the model. Range (0.0, 24.0].
Start Offset	`start-offset`	string	The start offset of the video (duration string, e.g. "3.5s").

Source ID

Field	Field ID	Type	Note
Grounding Passage ID	`grounding-passage`	object	Identifier for an inline passage.
Semantic Retriever Chunk	`semantic-retriever-chunk`	object	Identifier for a Chunk fetched via Semantic Retriever

Grounding Passage ID

Field	Field ID	Type	Note
Part Index	`part-index`	integer	Index of the part within the GroundingPassage.content
Passage ID	`passage-id`	string	ID of the passage matching the request's GroundingPassage.id

Semantic Retriever Chunk

Field	Field ID	Type	Note
Chunk	`chunk`	string	Name of the Chunk containing the attributed text
Source	`source`	string	Name of the Semantic Retriever source (e.g. corpora/123)

Logprobs Result

Field	Field ID	Type	Note
Top Candidates	`top-candidates`	array	Length = total number of decoding steps.

Top Candidates

Field	Field ID	Type	Note
Log Probability	`logprob`	number	The candidate's log probability.
Token	`token`	string	The candidate's token string value.
Token ID	`token-id`	integer	The candidate's token id value.

URL Context Metadata

Field	Field ID	Type	Note
URL Metadata	`url-metadata`	array	List of url context.

URL Metadata

Field	Field ID	Type	Note
Retrieved URL	`retrieved-url`	string	Retrieved url by the tool.
URL Retrieval Status	`url-retrieval-status`	string	Retrieval status for URL-based context. The value is one of the following: `URL_RETRIEVAL_STATUS_UNSPECIFIED`: Default value. This value is unused. `URL_RETRIEVAL_STATUS_SUCCESS`: URL retrieval is successful. `URL_RETRIEVAL_STATUS_ERROR`: URL retrieval is failed due to error. `URL_RETRIEVAL_STATUS_PAYWALL`: URL retrieval is failed because the content is behind paywall. `URL_RETRIEVAL_STATUS_UNSAFE`: URL retrieval is failed because the content is unsafe. Enum values `URL_RETRIEVAL_STATUS_UNSPECIFIED` `URL_RETRIEVAL_STATUS_SUCCESS` `URL_RETRIEVAL_STATUS_ERROR` `URL_RETRIEVAL_STATUS_PAYWALL` `URL_RETRIEVAL_STATUS_UNSAFE`

Grounding Metadata

Field	Field ID	Type	Note
Grounding Chunks	`grounding-chunks`	array	Supporting references retrieved from grounding source
Grounding Supports	`grounding-supports`	array	List of grounding support
Retrieval Metadata	`retrieval-metadata`	object	Retrieval metadata for grounding flow
Search Entry Point	`search-entry-point`	object	Google search entry for follow-up web searches
Web Search Queries	`web-search-queries`	array	Web search queries for follow-up search

Grounding Chunks

Field	Field ID	Type	Note
Web	`web`	object	Web grounding chunk

Web

Field	Field ID	Type	Note
Title	`title`	string	Title of the chunk
URI	`uri`	string	URI reference of the chunk

Grounding Supports

Field	Field ID	Type	Note
Confidence Scores	`confidence-scores`	array	Confidence scores aligned with groundingChunkIndices
Grounding Chunk Indices	`grounding-chunk-indices`	array	Indices into groundingChunks that support the claim
Segment	`segment`	object	Segment of the content

Segment

Field	Field ID	Type	Note
End Index	`end-index`	integer	End byte offset in the Part (exclusive)
Part Index	`part-index`	integer	Index of the Part in its parent Content
Start Index	`start-index`	integer	Start byte offset in the Part (inclusive)
Text	`text`	string	Text of the segment

Search Entry Point

Field	Field ID	Type	Note
Rendered Content	`rendered-content`	string	Web content snippet suitable for embedding
SDK Blob	`sdk-blob`	string	Base64-encoded JSON of `<search term, search url>` tuples

Retrieval Metadata

Field	Field ID	Type	Note
Google Search Dynamic Retrieval Score	`google-search-dynamic-retrieval-score`	number	Likelihood [0,1] that Google Search could help answer the prompt

Usage Metadata

Field	Field ID	Type	Note
Cache Tokens Details	`cache-tokens-details`	array	Output only. List of modalities of the cached content in the request input.
Cached Content Token Count	`cached-content-token-count`	integer	Number of tokens in the cached part of the prompt (the cached content)
Candidates Token Count	`candidates-token-count`	integer	Total number of tokens across all the generated response candidates.
Candidates Tokens Details	`candidates-tokens-details`	array	List of modalities that were returned in the response.
Prompt Token Count	`prompt-token-count`	integer	Number of tokens in the prompt. When cachedContent is set, this is still the total effective prompt size meaning this includes the number of tokens in the cached content.
Prompt Tokens Details	`prompt-tokens-details`	array	List of modalities that were processed in the request input.
Thoughts Token Count	`thoughts-token-count`	integer	Number of tokens of thoughts for thinking models.
Tool-use Prompt Token Count	`tool-use-prompt-token-count`	integer	Number of tokens present in tool-use prompt(s).
Tool-use Prompt Tokens Details	`tool-use-prompt-tokens-details`	array	List of modalities that were processed for tool-use request inputs.
Total Token Count	`total-token-count`	integer	Total token count for the generation request (prompt + response candidates).

Prompt Tokens Details

Field	Field ID	Type	Note
Modality	`modality`	string	Content Part modality. Indicates the media type of a content part. The value is one of the following: `MODALITY_UNSPECIFIED`: Unspecified modality. `TEXT`: Plain text. `IMAGE`: Image. `VIDEO`: Video. `AUDIO`: Audio. `DOCUMENT`: Document, e.g. PDF. Enum values `MODALITY_UNSPECIFIED` `TEXT` `IMAGE` `VIDEO` `AUDIO` `DOCUMENT`
Token Count	`token-count`	integer	Number of tokens.

Cache Tokens Details

Field	Field ID	Type	Note
Modality	`modality`	string	Content Part modality. Indicates the media type of a content part. The value is one of the following: `MODALITY_UNSPECIFIED`: Unspecified modality. `TEXT`: Plain text. `IMAGE`: Image. `VIDEO`: Video. `AUDIO`: Audio. `DOCUMENT`: Document, e.g. PDF. Enum values `MODALITY_UNSPECIFIED` `TEXT` `IMAGE` `VIDEO` `AUDIO` `DOCUMENT`
Token Count	`token-count`	integer	Number of tokens.

Candidates Tokens Details

Field	Field ID	Type	Note
Modality	`modality`	string	Content Part modality. Indicates the media type of a content part. The value is one of the following: `MODALITY_UNSPECIFIED`: Unspecified modality. `TEXT`: Plain text. `IMAGE`: Image. `VIDEO`: Video. `AUDIO`: Audio. `DOCUMENT`: Document, e.g. PDF. Enum values `MODALITY_UNSPECIFIED` `TEXT` `IMAGE` `VIDEO` `AUDIO` `DOCUMENT`
Token Count	`token-count`	integer	Number of tokens.

Tool Use Prompt Tokens Details

Field	Field ID	Type	Note
Modality	`modality`	string	Content Part modality. Indicates the media type of a content part. The value is one of the following: `MODALITY_UNSPECIFIED`: Unspecified modality. `TEXT`: Plain text. `IMAGE`: Image. `VIDEO`: Video. `AUDIO`: Audio. `DOCUMENT`: Document, e.g. PDF. Enum values `MODALITY_UNSPECIFIED` `TEXT` `IMAGE` `VIDEO` `AUDIO` `DOCUMENT`
Token Count	`token-count`	integer	Number of tokens.

Prompt Feedback

Field	Field ID	Type	Note
Block Reason	`block-reason`	string	Specifies the reason why the prompt was blocked. The value is one of the following: `BLOCK_REASON_UNSPECIFIED`: Default value. This value is unused. `SAFETY`: Prompt was blocked due to safety reasons. Inspect safetyRatings to understand which safety category blocked it. `OTHER`: Prompt was blocked due to unknown reasons. `BLOCKLIST`: Prompt was blocked due to the terms which are included from the terminology blocklist. `PROHIBITED_CONTENT`: Prompt was blocked due to prohibited content. `IMAGE_SAFETY`: Candidates blocked due to unsafe image generation content. Enum values `BLOCK_REASON_UNSPECIFIED` `SAFETY` `OTHER` `BLOCKLIST` `PROHIBITED_CONTENT` `IMAGE_SAFETY`
Safety Ratings	`safety-ratings`	array	Safety rating for a piece of content. The safety rating contains the category of harm and the harm probability level in that category for a piece of content. Content is classified for safety across a number of harm categories and the probability of the harm classification is included here.

Safety Ratings

Field	Field ID	Type	Note
Blocked	`blocked`	boolean	Whether the content was blocked by this rating.
Harm Category	`category`	string	Harm category.
Probability	`probability`	string	Probability level of harm.

Cache

Context caching allows you to cache input tokens and reference them in subsequent requests, reducing costs and improving performance for repetitive large contexts. This task supports creating, listing, getting, updating, and deleting cached content with proper time-to-live (TTL) management. The minimum input token count for context caching is 1,024 for 2.5 Flash and 4,096 for 2.5 Pro models.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_CACHE`
Operation (required)	`operation`	string	The cache operation to perform. The value is one of the following: `create`: Create a new cached content. `list`: List all cached contents. `get`: Retrieve a specific cached content. `update`: Update an existing cached content (only expiration time can be updated). `delete`: Delete a cached content. Enum values `create` `list` `get` `update` `delete`
Model (required)	`model`	string	ID of the model to use for caching. Required for create operations. The model is immutable after creation. The value is one of the following: `gemini-2.5-pro`: Optimized for enhanced thinking and reasoning, multimodal understanding, advanced coding, and more. `gemini-2.5-flash`: Optimized for Adaptive thinking, cost efficiency. `gemini-2.0-flash-lite`: Optimized for Most cost-efficient model supporting high throughput. Enum values `gemini-2.5-pro` `gemini-2.5-flash` `gemini-2.0-flash-lite`
Cache Name	`cache-name`	string	[GET, UPDATE, DELETE] The name of the cached content for get, update, or delete operations. Format: cachedContents/{cachedContent}. Required for get, update, and delete operations.
Prompt	`prompt`	string	[CREATE] The main text instruction or query to be cached for create operations.
Images	`images`	array[string]	[CREATE] URI references or base64 content of input images to be cached for create operations.
Audio	`audio`	array[string]	[CREATE] URI references or base64 content of input audio to be cached for create operations.
Videos	`videos`	array[string]	[CREATE] URI references or base64 content of input videos to be cached for create operations.
Documents	`documents`	array[string]	[CREATE] URI references or base64 content of input documents to be cached for create operations. Different vendors might have different constraints on the document format. For example, Gemini supports only PDF.
System Message	`system-message`	string	[CREATE] A system message to guide model behavior for create operations. Takes precedence over system-instruction.
Display Name	`display-name`	string	[CREATE] Optional. The user-provided name of the cached content for create operations.
System Instruction	`system-instruction`	object	[CREATE] Optional. A system instruction to guide the model behavior for create operations.
Contents	`contents`	array[object]	The input contents to cache for create operations. Each item represents a user or model turn composed of parts (text or images). This is the main content that will be cached for reuse in subsequent requests.
Tools	`tools`	array[object]	[CREATE] Optional. Tools available to the model for create operations, e.g., function declarations.
Tool Config	`tool-config`	object	Configuration for tool usage and function calling.
TTL	`ttl`	string	[CREATE, UPDATE] Time to live duration for the cached content in Google Duration format. A duration in seconds with up to nine fractional digits, ending with 's'. Example: "3.5s". Must be at least 60 seconds. Maximum is 7 days (604800 seconds).
Expire Time	`expire-time`	string	[CREATE, UPDATE] Absolute expiration time for the cached content in RFC3339 format. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".
Page Size	`page-size`	integer	[LIST] Optional. The maximum number of cached contents to return for list operations. Default is 50.
Page Token	`page-token`	string	[LIST] Optional. A page token from a previous list operation for pagination.

Input Objects in Cache

System Instruction

[CREATE] Optional. A system instruction to guide the model behavior for create operations.

Field	Field ID	Type	Note
Parts	`parts`	array	Parts of the content.
Role	`role`	string	The producer of the content. Must be either 'user' or 'model'. Useful to set for multi-turn conversations, otherwise can be left blank or unset. Optional. The value is one of the following: `USER`: User content. `MODEL`: Model content. Enum values `USER` `MODEL`

Parts

Parts of the content.

Field	Field ID	Type	Note
Thought	`thought`	boolean	Indicates if the part is a thought from the model.
Thought Signature	`thought-signature`	string	Opaque signature for the thought (base64-encoded bytes).
Video Metadata	`video-metadata`	object	Optional video metadata (only with blob or fileData video content).

Video Metadata

Optional video metadata (only with blob or fileData video content).

Field	Field ID	Type	Note
End Offset	`end-offset`	string	The end offset of the video (duration string, e.g. "3.5s").
FPS	`fps`	number	Frame rate of the video sent to the model. Range (0.0, 24.0].
Start Offset	`start-offset`	string	The start offset of the video (duration string, e.g. "3.5s").

The input contents to cache for create operations. Each item represents a user or model turn composed of parts (text or images). This is the main content that will be cached for reuse in subsequent requests.

Field	Field ID	Type	Note
Parts	`parts`	array	Parts of the content.
Role	`role`	string	The producer of the content. Must be either 'user' or 'model'. Useful to set for multi-turn conversations, otherwise can be left blank or unset. Optional. The value is one of the following: `USER`: User content. `MODEL`: Model content. Enum values `USER` `MODEL`

Parts

Parts of the content.

Field	Field ID	Type	Note
Thought	`thought`	boolean	Indicates if the part is a thought from the model.
Thought Signature	`thought-signature`	string	Opaque signature for the thought (base64-encoded bytes).
Video Metadata	`video-metadata`	object	Optional video metadata (only with blob or fileData video content).

Video Metadata

Optional video metadata (only with blob or fileData video content).

Field	Field ID	Type	Note
End Offset	`end-offset`	string	The end offset of the video (duration string, e.g. "3.5s").
FPS	`fps`	number	Frame rate of the video sent to the model. Range (0.0, 24.0].
Start Offset	`start-offset`	string	The start offset of the video (duration string, e.g. "3.5s").

Tools

[CREATE] Optional. Tools available to the model for create operations, e.g., function declarations.

Field	Field ID	Type	Note
Code Execution	`code-execution`	object	Tool that executes code generated by the model, and automatically returns the result to the model.
Function Declarations	`function-declarations`	array	Functions the model may call.
Google Search	`google-search`	object	GoogleSearch tool type. Tool to support Google Search in Model. Powered by Google.
Google Search Retrieval	`google-search-retrieval`	object	Tool to retrieve public web data for grounding, powered by Google.
URL Context	`url-context`	object	Tool to support URL context retrieval.

Function Declarations

Functions the model may call.

Field	Field ID	Type	Note
Description	`description`	string	A brief description of the function.
Name	`name`	string	The name of the function to call.
Parameters	`parameters`	object	Describes the parameters to this function. Reflects the Open API 3.03 Parameter Object string Key: the name of the parameter. Parameter names are case sensitive. Schema Value: the Schema defining the type used for the parameter.

Parameters

Field	Field ID	Type	Note
Any Of	`anyOf`	array	Value must satisfy any of the sub-schemas.
Default	`default`	object	Default value for the field (ignored for validation).
Description	`description`	string	Optional description of the schema.
Enum	`enum`	array	Enum values for STRING with enum format.
Format	`format`	string	Optional format of the data.
Items	`items`	object	Schema of elements for ARRAY type.
Max Items	`max-items`	integer	Maximum number of elements for ARRAY type.
Max Length	`max-length`	integer	Maximum length for STRING type.
Max Properties	`max-properties`	integer	Maximum number of properties for OBJECT type.
Maximum	`maximum`	number	Maximum value for INTEGER/NUMBER types.
Min Items	`min-items`	integer	Minimum number of elements for ARRAY type.
Min Length	`min-length`	integer	Minimum length for STRING type.
Min Properties	`min-properties`	integer	Minimum number of properties for OBJECT type.
Minimum	`minimum`	number	Minimum value for INTEGER/NUMBER types.
Nullable	`nullable`	boolean	Indicates if the value may be null.
Pattern	`pattern`	string	Regex pattern constraint for STRING type.
Properties	`properties`	object	Properties for OBJECT type.
Property Ordering	`property-ordering`	array	Order of properties for OBJECT type (non-standard).
Required	`required`	array	Required properties for OBJECT type.
Title	`title`	string	Optional title of the schema.
Type	`type`	string	Required data type of the schema. Enum values `TYPE_UNSPECIFIED` `STRING` `NUMBER` `INTEGER` `BOOLEAN` `ARRAY` `OBJECT`

Google Search Retrieval

Tool to retrieve public web data for grounding, powered by Google.

Field	Field ID	Type	Note
Dynamic Retrieval Config	`dynamic-retrieval-config`	object	Specifies the dynamic retrieval configuration for the given source.

Dynamic Retrieval Config

Specifies the dynamic retrieval configuration for the given source.

Field	Field ID	Type	Note
Dynamic Threshold	`dynamic-threshold`	number	The threshold to be used in dynamic retrieval. If not set, a system default value is used.
Mode	`mode`	string	The mode of the predictor to be used in dynamic retrieval. The value is one of the following: `MODE_UNSPECIFIED`: Always trigger retrieval. `MODE_DYNAMIC`: Run retrieval only when system decides it is necessary. Enum values `MODE_UNSPECIFIED` `MODE_DYNAMIC`

Google Search

GoogleSearch tool type. Tool to support Google Search in Model. Powered by Google.

Field	Field ID	Type	Note
Time Range Filter	`time-range-filter`	object	Filter search results to a specific time range. If customers set a start time, they must set an end time (and vice versa).

Time Range Filter

Filter search results to a specific time range. If customers set a start time, they must set an end time (and vice versa).

Field	Field ID	Type	Note
End Time	`end-time`	string	Exclusive end of the interval. If specified, a Timestamp matching this interval will have to be before the end. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".
Start Time	`start-time`	string	Inclusive start of the interval. If specified, a Timestamp matching this interval will have to be the same or after the start. Uses RFC 3339, where generated output will always be Z-normalized and use 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted. Examples: "2014-10-02T15:01:23Z", "2014-10-02T15:01:23.045123456Z" or "2014-10-02T15:01:23+05:30".

Tool Config

Configuration for tool usage and function calling.

Field	Field ID	Type	Note
Function Calling Config	`function-calling-config`	object	Configuration for specifying function calling behavior.

Function Calling Config

Configuration for specifying function calling behavior.

Field	Field ID	Type	Note
Allowed Function Names	`allowed-function-names`	array	A set of function names that, when provided, limits the functions the model will call. This should only be set when the Mode is ANY or VALIDATED. Function names should match `[FunctionDeclaration.name]`. When set, model will predict a function call from only allowed function names.
Mode	`mode`	string	Specifies the mode in which function calling should execute. If unspecified, the default value will be set to `AUTO`. The value is one of the following: `MODE_UNSPECIFIED`: Unspecified function calling mode. This value should not be used. `AUTO`: Default model behavior, model decides to predict either a function call or a natural language response. `ANY`: Model is constrained to always predicting a function call only. If "allowedFunctionNames" are set, the predicted function call will be limited to any one of "allowedFunctionNames", else the predicted function call will be any one of the provided "functionDeclarations". `NONE`: Model will not predict any function call. Model behavior is same as when not passing any function declarations. `VALIDATED`: Model decides to predict either a function call or a natural language response, but will validate function calls with constrained decoding. If "allowedFunctionNames" are set, the predicted function call will be limited to any one of "allowedFunctionNames", else the predicted function call will be any one of the provided "functionDeclarations". Enum values `MODE_UNSPECIFIED` `AUTO` `ANY` `NONE`

The parts Object

Parts

parts must fulfill one of the following schemas:

Field	Field ID	Type	Note
Text	`text`	string	Inline text content.

Field	Field ID	Type	Note
Blob	`blob`	object	Raw media bytes. Text should use the 'text' field instead.

Field	Field ID	Type	Note
Function Call	`function-call`	object	Predicted function call with name and arguments.

Field	Field ID	Type	Note
Function Response	`function-response`	object	Result of a function call with name and structured response.

Field	Field ID	Type	Note
File Data	`file-data`	object	URI-based data reference with MIME type.

Field	Field ID	Type	Note
Executable Code	`executable-code`	object	Code generated by the model that is meant to be executed.

Field	Field ID	Type	Note
Code Execution Result	`code-execution-result`	object	Result of executing the ExecutableCode.

Output	Field ID	Type	Description
Operation (optional)	`operation`	string	The cache operation that was performed.
Cached Content (optional)	`cached-content`	object	[CREATE, GET, UPDATE] The cached content object for create, get, and update operations. Not returned for delete operations.
Cached Contents (optional)	`cached-contents`	array[object]	[LIST] List of cached contents for list operations.
Next Page Token (optional)	`next-page-token`	string	[LIST] Token for retrieving the next page of results for list operations.

Output Objects in Cache

Cached Content

Field	Field ID	Type	Note
Create Time	`create-time`	string	Creation time of the cached content in RFC3339 format.
Display Name	`display-name`	string	Optional. The user-provided name of the cached content.
Expire Time	`expire-time`	string	Expiration time of the cached content in RFC3339 format.
Model	`model`	string	The name of the Model to use for cached content.
Name	`name`	string	The resource name referring to the cached content. Format: cachedContents/{cachedContent}
Update Time	`update-time`	string	Last update time of the cached content in RFC3339 format.
Usage Metadata	`usage-metadata`	object	Token usage statistics for the cached content.

Usage Metadata

Field	Field ID	Type	Note
Audio Duration Seconds	`audio-duration-seconds`	integer	Duration of audio in seconds.
Image Count	`image-count`	integer	Number of images.
Text Count	`text-count`	integer	Number of text characters.
Total Token Count	`total-token-count`	integer	Total number of tokens that the cached content consumes.
Video Duration Seconds	`video-duration-seconds`	integer	Duration of video in seconds.

Cached Contents

Field	Field ID	Type	Note
Create Time	`create-time`	string	Creation time of the cached content in RFC3339 format.
Display Name	`display-name`	string	Optional. The user-provided name of the cached content.
Expire Time	`expire-time`	string	Expiration time of the cached content in RFC3339 format.
Model	`model`	string	The name of the Model to use for cached content.
Name	`name`	string	The resource name referring to the cached content. Format: cachedContents/{cachedContent}
Update Time	`update-time`	string	Last update time of the cached content in RFC3339 format.
Usage Metadata	`usage-metadata`	object	Token usage statistics for the cached content.

Usage Metadata

Field	Field ID	Type	Note
Audio Duration Seconds	`audio-duration-seconds`	integer	Duration of audio in seconds.
Image Count	`image-count`	integer	Number of images.
Text Count	`text-count`	integer	Number of text characters.
Total Token Count	`total-token-count`	integer	Total number of tokens that the cached content consumes.
Video Duration Seconds	`video-duration-seconds`	integer	Duration of video in seconds.

Example Recipes

version: v1beta
component:
  gemini:
    type: gemini
    task: TASK_CHAT
    input:
      model: gemini-2.5-pro
      stream: ${variable.stream}
      prompt: ${variable.prompt}
      images:
        - ${variable.image:base64}
      documents:
        - ${variable.document:base64}
      system-message: You are a helpful assistant.
      temperature: 1
      top-p: 1
    setup: ${connection.gemini}
variable:
  prompt:
    title: Prompt
    description: Prompt to instruct the model
    type: string
  document:
    title: Document
    description: Document to convert to Markdown
    type: document
  image:
    title: image
    description: Image to take a look
    type: image
  stream:
    title: Enable Stream
    description: whether to enable streaming
    type: boolean
output:
  texts:
    title: texts[0]
    value: ${gemini.output.texts[0]}
  usage:
    title: usage
    value: ${gemini.output.usage}
  candidates:
    title: candidates
    value: ${gemini.output.candidates}
  usage-metadata:
    title: usage-metadata
    value: ${gemini.output.usage-metadata}
  prompt-feedback:
    title: prompt-feedback
    value: ${gemini.output.prompt-feedback}
  model-version:
    title: model-version
    value: ${gemini.output.model-version}
  response-id:
    title: response-id
    value: ${gemini.output.response-id}