OpenAI

The OpenAI component is an AI component that allows users to connect the AI models served on the OpenAI Platform. It can carry out the following tasks:

Text Generation
Text Embeddings
Speech Recognition
Text to Speech
Text to Image

Release Stage

Alpha

Configuration

The component definition and tasks are defined in the definition.yaml and tasks.yaml files respectively.

Setup

In order to communicate with OpenAI, the following connection details need to be provided. You may specify them directly in a pipeline recipe as key-value pairs within the component's setup block, or you can create a Connection from the Integration Settings page and reference the whole setup as setup: ${connection.<my-connection-id>}.

Field	Field ID	Type	Note
API Key	`api-key`	string	Fill in your OpenAI API key. To find your keys, visit your OpenAI's API Keys page.
Organization ID	`organization`	string	Specify which organization is used for the requests. Usage will count against the specified organization's subscription quota.

Supported Tasks

Text Generation

OpenAI's text generation models (often called generative pre-trained transformers or large language models) have been trained to understand natural language, code, and images. The models provide text outputs in response to their inputs. The inputs to these models are also referred to as "prompts". Designing a prompt is essentially how you "program" a large language model model, usually by providing instructions or some examples of how to successfully complete a task.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_TEXT_GENERATION`
Model (required)	`model`	string	ID of the model to use. Enum values `o1` `o1-preview` `o1-mini` `gpt-4o-mini` `gpt-4o` `gpt-4o-2024-05-13` `gpt-4o-2024-08-06` `gpt-4-turbo` `gpt-4-turbo-2024-04-09` `gpt-4-0125-preview` `gpt-4-turbo-preview` `gpt-4-1106-preview` `gpt-4-vision-preview` `gpt-4` `gpt-4-0314` `gpt-4-0613` `gpt-4-32k` `gpt-4-32k-0314` `gpt-4-32k-0613` `gpt-3.5-turbo` `gpt-3.5-turbo-16k` `gpt-3.5-turbo-0301` `gpt-3.5-turbo-0613` `gpt-3.5-turbo-1106` `gpt-3.5-turbo-0125` `gpt-3.5-turbo-16k-0613`
Prompt (required)	`prompt`	string	The prompt text.
System Message	`system-message`	string	The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. By default, the model's behavior is using a generic message as "You are a helpful assistant.".
Image	`images`	array[string]	The images.
Chat History	`chat-history`	array[object]	Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format `{"role": "The message role, i.e. ''system'', ''user'' or ''assistant''", "content": "message content"}`.
Temperature	`temperature`	number	What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or `top-p` but not both.
N	`n`	integer	How many chat completion choices to generate for each input message. Note that you will be charged based on the number of generated tokens across all of the choices. Keep `n` as `1` to minimize costs.
Max Tokens	`max-tokens`	integer	The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.
Response Format	`response-type`	object	Response format.
Top P	`top-p`	number	An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. We generally recommend altering this or `temperature` but not both."
Presence Penalty	`presence-penalty`	number	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Frequency Penalty	`frequency-penalty`	number	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
Prediction	`prediction`	object	Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.
Tools	`tools`	array[object]	A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.
Tool Choice	`tool-choice`	any	Controls which (if any) tool is called by the model. 'none' means the model will not call any tool and instead generates a message. 'auto' means the model can pick between generating a message or calling one or more tools. 'required' means the model must call one or more tools.
Reasoning Effort	`reasoning-effort`	string	Constrains effort on reasoning for reasoning models. Currently supported values are minimal, low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response. Enum values `minimal` `low` `medium` `high`
Verbosity	`verbosity`	string	Constrains the verbosity of the model's response. Lower values will result in more concise responses, while higher values will result in more verbose responses. Currently supported values are low, Enum values `low` `medium` `high`

Input Objects in Text Generation

Chat History

Incorporate external chat history, specifically previous messages within the conversation. Please note that System Message will be ignored and will not have any effect when this field is populated. Each message should adhere to the format {"role": "The message role, i.e. ''system'', ''user'' or ''assistant''", "content": "message content"}.

Field	Field ID	Type	Note
Content	`content`	array	The message content.
Role	`role`	string	The message role, i.e. 'system', 'user' or 'assistant'.

Content

The message content.

Field	Field ID	Type	Note
Image URL	`image-url`	object	The image URL
Text	`text`	string	The text content.
Type	`type`	string	The type of the content part. Enum values `text` `image-url`

Image URL

The image URL

Field	Field ID	Type	Note
URL	`url`	string	Either a URL of the image or the base64 encoded image data.

Prediction

Configuration for a Predicted Output, which can greatly improve response times when large parts of the model response are known ahead of time. This is most common when you are regenerating a file with only minor changes to most of the content.

Field	Field ID	Type	Note
Content	`content`	string	The content that should be matched when generating a model response. If generated tokens would match this content, the entire model response can be returned much more quickly.

Tools

A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported.

Field	Field ID	Type	Note
Function	`function`	object	The function to call.

Function

The function to call.

Field	Field ID	Type	Note
Description	`description`	string	A description of what the function does, used by the model to choose when and how to call the function.
Name	`name`	string	The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
Parameters	`parameters`	object	The parameters the functions accepts, described as a JSON Schema object. Omitting parameters defines a function with an empty parameter list.
Strict	`strict`	boolean	Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the parameters field.

The response-type Object

Response Type

response-type must fulfill one of the following schemas:

`Text`

Field	Field ID	Type	Note
Type	`type`	string	Must be `"text"`

`JSON Object`

Field	Field ID	Type	Note
Type	`type`	string	Must be `"json_object"`

`JSON Schema`

Field	Field ID	Type	Note
JSON Schema	`json-schema`	string	Set up the schema of the structured output.
Type	`type`	string	Must be `"json_schema"`

Output	Field ID	Type	Description
Texts	`texts`	array[string]	Texts.
Tool Calls (optional)	`tool-calls`	array[object]	The tool calls generated by the model, such as function calls.
Usage (optional)	`usage`	object	Usage statistics related to the query.

Output Objects in Text Generation

Usage

Field	Field ID	Type	Note
Completion token details	`completion-token-details`	object	Breakdown of tokens used in a completion.
Completion tokens	`completion-tokens`	integer	Total number of tokens used (completion).
Prompt token details	`prompt-token-details`	object	Breakdown of tokens used in the prompt.
Prompt tokens	`prompt-tokens`	integer	Total number of tokens used (prompt).
Total tokens	`total-tokens`	integer	Total number of tokens used (prompt + completion).

Prompt Token Details

Field	Field ID	Type	Note
Audio tokens	`audio-tokens`	integer	Audio input tokens present in the prompt.
Cached tokens	`cached-tokens`	integer	Cached tokens present in the prompt.

Completion Token Details

Field	Field ID	Type	Note
Accepted prediction tokens	`accepted-prediction-tokens`	integer	When using Predicted Outputs, the number of tokens in the prediction that appeared in the completion.
Audio tokens	`audio-tokens`	integer	Audio input tokens generated by the model.
Reasoning tokens	`reasoning-tokens`	integer	Tokens generated by the model for reasoning.
Rejected prediction tokens	`rejected-prediction-tokens`	integer	When using Predicted Outputs, the number of tokens in the prediction that did not appear in the completion. However, like reasoning tokens, these tokens are still counted in the total completion tokens for purposes of billing, output, and context window limits.

Tool Calls

Field	Field ID	Type	Note
Function	`function`	object	The function that the model called.
Type	`type`	string	The type of the tool. Currently, only function is supported.

Function

Field	Field ID	Type	Note
Arguments	`arguments`	string	The arguments to call the function with, as generated by the model in JSON format. Note that the model does not always generate valid JSON, and may hallucinate parameters not defined by your function schema. Validate the arguments in your code before calling your function.
Name	`name`	string	The name of the function to call.

Text Embeddings

Turn text into numbers, unlocking use cases like search.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_TEXT_EMBEDDINGS`
Model (required)	`model`	string	ID of the model to use. Enum values `text-embedding-ada-002` `text-embedding-3-small` `text-embedding-3-large`
Text (required)	`text`	string	The text.
Dimensions	`dimensions`	integer	The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models.

Output	Field ID	Type	Description
Embedding	`embedding`	array[number]	Embedding of the input text.

Speech Recognition

Turn audio into text.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_SPEECH_RECOGNITION`
Model (required)	`model`	string	ID of the model to use. Only `whisper-1` is currently available. Enum values `whisper-1`
Audio (required)	`audio`	audio/*	The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
Prompt	`prompt`	string	An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
Language	`language`	string	The language of the input audio. Supplying the input language in ISO-639-1 format will improve accuracy and latency.
Temperature	`temperature`	number	The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

Output	Field ID	Type	Description
Text	`text`	string	Generated text.

Text to Speech

Turn text into lifelike spoken audio

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_TEXT_TO_SPEECH`
Model (required)	`model`	string	One of the available TTS models: `tts-1` or `tts-1-hd`. Enum values `tts-1` `tts-1-hd`
Text (required)	`text`	string	The text to generate audio for. The maximum length is 4096 characters.
Voice (required)	`voice`	string	The voice to use when generating the audio. Supported voices are `alloy`, `echo`, `fable`, `onyx`, `nova`, and `shimmer`. Enum values `alloy` `echo` `fable` `onyx` `nova` `shimmer`
Response Format	`response-type`	string	The format to audio in. Supported formats are `mp3`, `opus`, `aac`, and `flac`. Enum values `mp3` `opus` `aac` `flac`
Speed	`speed`	number	The speed of the generated audio. Select a value from `0.25` to `4.0`. `1.0` is the default.

Output	Field ID	Type	Description
Audio (optional)	`audio`	audio/wav	AI generated audio.

Text to Image

Generate or manipulate images with DALL·E.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_TEXT_TO_IMAGE`
Model (required)	`model`	string	The model to use for image generation. Enum values `dall-e-2` `dall-e-3`
Prompt (required)	`prompt`	string	A text description of the desired image(s). The maximum length is 1000 characters for `dall-e-2` and 4000 characters for `dall-e-3`.
N	`n`	integer	The number of images to generate. Must be between 1 and 10. For `dall-e-3`, only `n=1` is supported.
Quality	`quality`	string	The quality of the image that will be generated. `hd` creates images with finer details and greater consistency across the image. This param is only supported for `dall-e-3`. Enum values `standard` `hd`
Size	`size`	string	The size of the generated images. Must be one of `256x256`, `512x512`, or `1024x1024` for `dall-e-2`. Must be one of `1024x1024`, `1792x1024`, or `1024x1792` for `dall-e-3` models. Enum values `256x256` `512x512` `1024x1024` `1792x1024` `1024x1792`
N	`style`	string	The style of the generated images. Must be one of `vivid` or `natural`. Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural causes the model to produce more natural, less hyper-real looking images. This param is only supported for `dall-e-3`. Enum values `vivid` `natural`

Output	Field ID	Type	Description
Images	`results`	array[object]	Generated results.

Output Objects in Text to Image

Images

Field	Field ID	Type	Note
Generated Image	`image`	image/webp	Generated image.
Revised Prompt	`revised-prompt`	string	Revised prompt.

Example Recipes

version: v1beta
component:
  mistral-0:
    type: mistral-ai
    task: TASK_TEXT_GENERATION_CHAT
    input:
      max-new-tokens: 100
      model-name: open-mixtral-8x22b
      prompt: |-
        Generate a Picasso-inspired image based on the following user input:

        ${variable.prompt}

        Using the specified Picasso period: ${variable.period}


        Transform this input into a detailed text-to-image prompt by:

        1. Identifying the key elements or subjects in the user's description

        2. Adding artistic elements and techniques specific to the ${variable.period} period of Picasso's work

        3. Including cubist or abstract features characteristic of the ${variable.period}

        4. Suggesting a composition or scene layout typical of Picasso's work from this era

        Enhance the prompt with vivid, descriptive language and specific Picasso-style elements from the ${variable.period}. The final prompt should begin with "Create an image in the style of Picasso's ${variable.period} period:" followed by the enhanced description.
      safe: false
      system-message: You are a helpful assistant.
      temperature: 0.7
      top-k: 10
      top-p: 0.5
    setup:
      api-key: ${secret.INSTILL_SECRET}
  openai-0:
    type: openai
    task: TASK_TEXT_TO_IMAGE
    input:
      model: dall-e-3
      n: 1
      prompt: |-
        Using this primary color palette: ${variable.colour}

        ${mistral-0.output.text}
      quality: standard
      size: 1024x1024
      style: vivid
    setup:
      api-key: ${secret.INSTILL_SECRET}
variable:
  colour:
    title: Colour
    description: Describe the main colour to use i.e. blue, random
    type: string
    instill-ui-order: 1
  period:
    title: Period
    description: |
      Input different Picasso periods i.e. Blue, Rose, African, Synthetic Cubism, etc.
    type: string
  prompt:
    title: Prompt
    description: Input prompt here i.e. "A cute baby wombat"
    type: string
output:
  image:
    title: Image
    value: ${openai-0.output.results}

version: v1beta
component:
  openai:
    type: openai
    task: TASK_TEXT_GENERATION
    input:
      model: gpt-4o-mini
      n: 1
      prompt: |-
        Talk about this topic in ${variable.language}  in a concise and beginner-friendly way:
        ${variable.prompt}
      response-format:
        type: text
      system-message: You are a helpful assistant.
      temperature: 1
      top-p: 1
    setup:
      api-key: ${secret.INSTILL_SECRET}
variable:
  language:
    title: Language
    description: Input a language i.e. Chinese, Japanese, French, etc.
    type: string
  prompt:
    title: Prompt
    description: Write the topic you want to ask about here i.e. "Tell me about small LLMs"
    type: string
output:
  result:
    title: Result
    value: ${openai.output.texts}