Instill Model

The Instill Model component is an AI component that allows users to connect the Al models served in the Instill Core platform. It can carry out the following tasks:

Classification
Instance Segmentation
Keypoint
Detection
OCR
Semantic Segmentation
Text Generation
Text Generation Chat
Text to Image
Visual Question Answering
Chat
Embedding

Release Stage

Alpha

Configuration

The component definition and tasks are defined in the definition.yaml and tasks.yaml files respectively.

Supported Tasks

Classification

Classify images into predefined categories.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_CLASSIFICATION`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Image (required)	`image-base64`	string	Image base64.

Output	Field ID	Type	Description
Category	`category`	string	The predicted category of the input.
Score	`score`	number	The confidence score of the predicted category of the input.

Instance Segmentation

Detect, localize and delineate multiple objects in images.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_INSTANCE_SEGMENTATION`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Image (required)	`image-base64`	string	Image base64.

Output	Field ID	Type	Description
Objects	`objects`	array[object]	A list of detected instance bounding boxes.

Output Objects in Instance Segmentation

Objects

Field	Field ID	Type	Note
Bounding Box	`bounding-box`	object	The detected bounding box in (left, top, width, height) format.
Category	`category`	string	The predicted category of the bounding box.
RLE	`rle`	string	Run Length Encoding (RLE) of instance mask within the bounding box.
Score	`score`	number	The confidence score of the predicted instance object.

Bounding Box

Field	Field ID	Type	Note
Height	`height`	number	Bounding box height value
Left	`left`	number	Bounding box left x-axis value
Top	`top`	number	Bounding box top y-axis value
Width	`width`	number	Bounding box width value

Keypoint

Detect and localize multiple keypoints of objects in images.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_KEYPOINT`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Image (required)	`image-base64`	string	Image base64.

Output	Field ID	Type	Description
Objects	`objects`	array[object]	A list of keypoint objects, a keypoint object includes all the pre-defined keypoints of a detected object.

Output Objects in Keypoint

Objects

Field	Field ID	Type	Note
Bounding Box	`bounding-box`	object	The detected bounding box in (left, top, width, height) format.
Keypoints	`keypoints`	array	A keypoint group is composed of a list of pre-defined keypoints of a detected object.
Score	`score`	number	The confidence score of the predicted object.

Keypoints

Field	Field ID	Type	Note
Visibility Score	`v`	number	visibility score of the keypoint.
X Coordinate	`x`	number	x coordinate of the keypoint.
Y Coordinate	`y`	number	y coordinate of the keypoint.

Bounding Box

Field	Field ID	Type	Note
Height	`height`	number	Bounding box height value
Left	`left`	number	Bounding box left x-axis value
Top	`top`	number	Bounding box top y-axis value
Width	`width`	number	Bounding box width value

Detection

Detect and localize multiple objects in images.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_DETECTION`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Image (required)	`image-base64`	string	Image base64.

Output	Field ID	Type	Description
Objects	`objects`	array[object]	A list of detected objects.

Output Objects in Detection

Objects

Field	Field ID	Type	Note
Bounding box	`bounding-box`	object	The detected bounding box in (left, top, width, height) format.
Category	`category`	string	The predicted category of the bounding box.
Score	`score`	number	The confidence score of the predicted category of the bounding box.

Bounding Box

Field	Field ID	Type	Note
Height	`height`	number	Bounding box height value
Left	`left`	number	Bounding box left x-axis value
Top	`top`	number	Bounding box top y-axis value
Width	`width`	number	Bounding box width value

OCR

Detect and recognize text in images.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_OCR`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Image (required)	`image-base64`	string	Image base64.

Output	Field ID	Type	Description
Objects	`objects`	array[object]	A list of detected bounding boxes.

Output Objects in OCR

Objects

Field	Field ID	Type	Note
Bounding Box	`bounding-box`	object	The detected bounding box in (left, top, width, height) format.
Score	`score`	number	The confidence score of the predicted object.
Text	`text`	string	Text string recognised per bounding box.

Bounding Box

Field	Field ID	Type	Note
Height	`height`	number	Bounding box height value
Left	`left`	number	Bounding box left x-axis value
Top	`top`	number	Bounding box top y-axis value
Width	`width`	number	Bounding box width value

Semantic Segmentation

Classify image pixels into predefined categories.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_SEMANTIC_SEGMENTATION`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Image (required)	`image-base64`	string	Image base64.

Output	Field ID	Type	Description
Stuffs	`stuffs`	array[object]	A list of RLE binary masks.

Output Objects in Semantic Segmentation

Stuffs

Field	Field ID	Type	Note
Category	`category`	string	Category text string corresponding to each stuff mask.
RLE	`rle`	string	Run Length Encoding (RLE) of each stuff mask within the image.

Text Generation

Generate texts from input text prompts.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_TEXT_GENERATION`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Prompt (required)	`prompt`	string	The prompt text.
System Message	`system-message`	string	The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant.".
Seed	`seed`	integer	The seed.
Temperature	`temperature`	number	The temperature for sampling.
Max New Tokens	`max-new-tokens`	integer	The maximum number of tokens for model to generate.

Output	Field ID	Type	Description
Text	`text`	string	Text.

Text Generation Chat

Generate texts from input text prompts and chat history.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_TEXT_GENERATION_CHAT`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Prompt (required)	`prompt`	string	The prompt text.
System Message	`system-message`	string	The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant.".
Prompt Images	`prompt-images`	array[string]	The prompt images.
Chat History	`chat-history`	array[object]	Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: {"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"}.
Seed	`seed`	integer	The seed.
Temperature	`temperature`	number	The temperature for sampling.
Max New Tokens	`max-new-tokens`	integer	The maximum number of tokens for model to generate.

Input Objects in Text Generation Chat

Chat History

Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: {"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"}.

Field	Field ID	Type	Note
Content	`content`	array	The message content.
Role	`role`	string	The message role, i.e. 'system', 'user' or 'assistant'.

Content

The message content.

Field	Field ID	Type	Note
Image URL	`image-url`	object	The image URL
Text	`text`	string	The text content.
Type	`type`	string	The type of the content part. Enum values `text` `image-url`

Image URL

The image URL

Field	Field ID	Type	Note
URL	`url`	string	Either a URL of the image or the base64 encoded image data.

Output	Field ID	Type	Description
Text	`text`	string	Text.

Text to Image

Generate images from input text prompts.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_TEXT_TO_IMAGE`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Prompt (required)	`prompt`	string	The prompt text.
Samples	`samples`	integer	The number of generated samples, default is 1.
Seed	`seed`	integer	The seed, default is 0.
Aspect Ratio	`negative-prompt`	string	Keywords of what you do not wish to see in the output image.
Aspect Ratio	`aspect-ratio`	string	Controls the aspect ratio of the generated image. Defaults to 1:1. Enum values `16:9` `1:1` `21:9` `2:3` `3:2` `4:5` `5:4` `9:16` `9:21`

Output	Field ID	Type	Description
Images	`images`	array[image/jpeg]	Images.

Visual Question Answering

Answer questions based on a prompt and an image.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_VISUAL_QUESTION_ANSWERING`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Prompt (required)	`prompt`	string	The prompt text.
System Message	`system-message`	string	The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant.".
Prompt Images	`prompt-images`	array[string]	The prompt images.
Chat History	`chat-history`	array[object]	Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: {"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"}.
Seed	`seed`	integer	The seed.
Temperature	`temperature`	number	The temperature for sampling.
Max New Tokens	`max-new-tokens`	integer	The maximum number of tokens for model to generate.

Input Objects in Visual Question Answering

Chat History

Field	Field ID	Type	Note
Content	`content`	array	The message content.
Role	`role`	string	The message role, i.e. 'system', 'user' or 'assistant'.

Content

The message content.

Field	Field ID	Type	Note
Image URL	`image-url`	object	The image URL
Text	`text`	string	The text content.
Type	`type`	string	The type of the content part. Enum values `text` `image-url`

Image URL

The image URL

Field	Field ID	Type	Note
URL	`url`	string	Either a URL of the image or the base64 encoded image data.

Output	Field ID	Type	Description
Text	`text`	string	Text.

Chat

Generate texts from input text prompts and chat history.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_CHAT`
Model Name (required)	`model-name`	string	The Instill Model model to be used.
Prompt (required)	`prompt`	string	The prompt text.
System Message	`system-message`	string	The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model’s behavior is using a generic message as "You are a helpful assistant.".
Prompt Images	`prompt-images`	array[string]	The prompt images.
Chat History	`chat-history`	array[object]	Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format: {"role": "The message role, i.e. 'system', 'user' or 'assistant'", "content": "message content"}.
Seed	`seed`	integer	The seed.
Temperature	`temperature`	number	The temperature for sampling.
Max New Tokens	`max-new-tokens`	integer	The maximum number of tokens for model to generate.

Input Objects in Chat

Chat History

Field	Field ID	Type	Note
Content	`content`	array	The message content.
Role	`role`	string	The message role, i.e. 'system', 'user' or 'assistant'.

Content

The message content.

Field	Field ID	Type	Note
Image URL	`image-url`	object	The image URL
Text	`text`	string	The text content.
Type	`type`	string	The type of the content part. Enum values `text` `image-url`

Image URL

The image URL

Field	Field ID	Type	Note
URL	`url`	string	Either a URL of the image or the base64 encoded image data.

Output	Field ID	Type	Description
Text	`text`	string	Text.

Embedding

This task refers to the process of generating vector embeddings from input data, which can be text or images. This transformation converts the data into a dense, fixed-length numerical representation that captures the essential features of the original input. These embeddings are typically used in machine learning tasks to represent complex data in a more structured, simplified form.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_EMBEDDING`
Data (required)	`data`	object	Input data.
Parameter	`parameter`	object	Input parameter.

Input Objects in Embedding

Data

Input data.

Field	Field ID	Type	Note
Embeddings	`embeddings`	array	List of input data to be embedded.
Model	`model`	string	The model to be used for generating embeddings. It should be `namespace/model-name/version`. i.e. `abrc/yolov7-stomata/v0.1.0`. You can see the version from the Versions tab of Model page.

Parameter

Input parameter.

Field	Field ID	Type	Note
Dimensions	`dimensions`	integer	Number of dimensions in the output embedding vectors.
Data Format	`format`	string	The data format of the embeddings. Defaults to float. Enum values `float` `base64`
Input Type	`input-type`	string	The type of input data to be embedded (e.g., query, document).
Truncate	`truncate`	string	How to handle inputs longer than the max token length. Defaults to 'End'. Enum values `None` `End` `Start`

The embeddings Object

Embeddings

embeddings must fulfill one of the following schemas:

`Text`

Field	Field ID	Type	Note
Text Content	`text`	string	When the input is text, the raw text is tokenized and processed into a dense, fixed-length vector that captures semantic information such as word meanings and relationships. These text embeddings enable tasks like sentiment analysis, search, or classification.
Text	`type`	string	Must be `"text"`

`Image URL`

Field	Field ID	Type	Note
Image URL	`image-url`	string	When the input is an image from a URL, the image is first fetched from the URL and then decoded into its original format. It is then processed into a fixed-length vector representing essential visual features like shapes and colors. These image embeddings are useful for tasks like image classification or similarity search, providing structured numerical data for complex visual inputs.
Image URL	`type`	string	Must be `"image-url"`

`Image Base64`

Field	Field ID	Type	Note
Image File	`image-base64`	string	When the input is an image in base64 format, the base64-encoded data is first decoded into its original image form. The image is then processed and transformed into a dense, fixed-length numerical vector, capturing key visual features like shapes, colors, or textures.
Image File	`type`	string	Must be `"image-base64"`

Output	Field ID	Type	Description
Data	`data`	object	Output data.

Output Objects in Embedding

Data

Field	Field ID	Type	Note
Embeddings	`embeddings`	array	List of generated embeddings.

Embeddings

Field	Field ID	Type	Note
Created	`created`	integer	The Unix timestamp (in seconds) of when the embedding was created.
Index	`index`	integer	The index of the embedding vector in the array.
Embedding Vector	`vector`	array	The embedding vector.