The Instill Artifact component is a data component that allows users to access files and perform RAG-based search and retrieval through catalogs in the Instill Core platform.
It can carry out the following tasks:
To use Artifact Component, you will need to set up the OpenAI API key for self-hosted deployment of Instill Core.
You can do this by setting the OPENAI_API_KEY environment variable.
Please refer to configuring-the-embedding-feature
p.s. In Instill Cloud case, you do not need to set up the OpenAI API key.
Alpha
The component definition and tasks are defined in the definition.yaml and tasks.yaml files respectively.
Upload and process the files into chunks into Catalog.
Input Field ID Type Description Task ID (required) taskstring TASK_UPLOAD_FILEOptions (required)optionsobject Choose to upload the files to existing catalog or create a new catalog.
The options Object Options options must fulfill one of the following schemas:
Existing CatalogField Field ID Type Note Catalog ID catalog-idstring Catalog ID that you input in the Catalog. File filestring Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog. File Name file-namestring Name of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters. Namespace namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. Option optionstring Must be "existing catalog"
Create New CatalogField Field ID Type Note Catalog ID catalog-idstring Catalog ID for new catalog you want to create. Description descriptionstring Description of the catalog. File filestring Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML file to be uploaded into catalog. File Name file-namestring Name of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters. Namespace namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. Option optionstring Must be "create new catalog" Tags tagsarray Tags for the catalog.
Output Field ID Type Description File fileobject Result of uploading file into catalog. Status statusboolean The status of trigger file processing, if succeeded, return true.
Output Objects in Upload File File Field Field ID Type Note Catalog ID catalog-idstring ID of the catalog that you upload files. Create Time create-timestring Creation time of the file in ISO 8601 format. File Name file-namestring Name of the file. Type file-typestring Type of the file. File UID file-uidstring Unique identifier of the file. Size sizenumber Size of the file in bytes. Update Time update-timestring Update time of the file in ISO 8601 format.
Upload and process the files into chunks into Catalog.
Input Field ID Type Description Task ID (required) taskstring TASK_UPLOAD_FILESOptions (required)optionsobject Choose to upload the files to existing catalog or create a new catalog.
The options Object Options options must fulfill one of the following schemas:
Existing CatalogField Field ID Type Note Catalog ID catalog-idstring Catalog ID that you input in the Catalog. File Names file-namesarray Name of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters. Files filesarray Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog. Namespace namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. Option optionstring Must be "existing catalog"
Create New CatalogField Field ID Type Note Catalog ID catalog-idstring Catalog ID for new catalog you want to create. Description descriptionstring Description of the catalog. File Names file-namesarray Name of the file, including the extension (e.g. example.pdf). The length of this field is limited to 100 characters. Files filesarray Base64 encoded PDF/DOCX/DOC/PPTX/PPT/HTML files to be uploaded into catalog. Namespace namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. Option optionstring Must be "create new catalog" Tags tagsarray Tags for the catalog.
Output Field ID Type Description Files filesarray[object] Files metadata in catalog. Status statusboolean The status of trigger file processing, if ALL succeeded, return true.
Output Objects in Upload Files Files Field Field ID Type Note Catalog ID catalog-idstring ID of the catalog that you upload files. Create Time create-timestring Creation time of the file in ISO 8601 format. File Name file-namestring Name of the file. Type file-typestring Type of the file. File UID file-uidstring Unique identifier of the file. Size sizenumber Size of the file in bytes. Update Time update-timestring Update time of the file in ISO 8601 format.
get the metadata of the files in the catalog.
Input Field ID Type Description Task ID (required) taskstring TASK_GET_FILES_METADATANamespace (required) namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. Catalog ID (required) catalog-idstring Catalog ID that you input to search files in the Catalog.
Output Field ID Type Description Files filesarray[object] Files metadata in catalog.
Output Objects in Get Files Metadata Field Field ID Type Note Catalog ID catalog-idstring ID of the catalog that you upload files. Create Time create-timestring Creation time of the file in ISO 8601 format. File Name file-namestring Name of the file. Type file-typestring Type of the file. File UID file-uidstring Unique identifier of the file. Size sizenumber Size of the file in bytes. Update Time update-timestring Update time of the file in ISO 8601 format.
get the metadata of the chunks from a file in the catalog.
Input Field ID Type Description Task ID (required) taskstring TASK_GET_CHUNKS_METADATACatalog ID (required) catalog-idstring Catalog ID that you input to search files in the Catalog. Namespace (required) namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. File UID (required) file-uidstring The unique identifier of the file.
Output Field ID Type Description Chunks chunksarray[object] Chunks metadata of the file in catalog.
Output Objects in Get Chunks Metadata Field Field ID Type Note Chunk UID chunk-uidstring The unique identifier of the chunk. Create Time create-timestring The creation time of the chunk in ISO 8601 format. End Position end-positioninteger The end position of the chunk in the file. File UID original-file-uidstring The unique identifier of the file. Retrievable retrievableboolean The retrievable status of the chunk. Start Position start-positioninteger The start position of the chunk in the file. Token Count token-countinteger The token count of the chunk.
get the file content in markdown format.
Input Field ID Type Description Task ID (required) taskstring TASK_GET_FILE_IN_MARKDOWNCatalog ID (required) catalog-idstring Catalog ID that you input to search files in the Catalog. Namespace (required) namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. File UID (required) file-uidstring The unique identifier of the file.
Output Field ID Type Description File UID original-file-uidstring The unique identifier of the file. Content contentstring The content of the file in markdown format. Create Time create-timestring The creation time of the source file in ISO 8601 format. Update Time update-timestring The update time of the source file in ISO 8601 format.
Check if the specified file's processing status is done.
Input Field ID Type Description Task ID (required) taskstring TASK_MATCH_FILE_STATUSCatalog ID (required) catalog-idstring Catalog ID that you input to check files' processing status in the Catalog. Namespace (required) namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. File UID (required) file-uidstring The unique identifier of the file.
Output Field ID Type Description Status succeededboolean The status of the file processing, if succeeded, return true.
search the chunks in the catalog.
Input Field ID Type Description Task ID (required) taskstring TASK_RETRIEVECatalog ID (required) catalog-idstring Catalog ID that you input to search files in the Catalog. Namespace (required) namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. Text Prompt (required) text-promptstring The prompt string to search the chunks. Top K top-kinteger The number of top chunks to return. The range is from 1~20, and default is 5. File UIDs file-uidsarray[string] Optional list of file UIDs to filter the results by. The elements of the list must be UUID-formatted strings. File Media Type file-media-typestring The media type to filter, empty for all. Enum values Content Type content-typestring The content type to filter, empty for all. Enum values
Output Field ID Type Description Chunks chunksarray[object] Chunks data from smart search.
Output Objects in Retrieve Chunks Field Field ID Type Note Chunk UID chunk-uidstring The unique identifier of the chunk. Chunk Reference referenceobject Reference to the position of the chunk within the original file. Similarity similarity-scorenumber The similarity score of the chunk. Source File Name source-file-namestring The name of the source file. Source File UID source-file-uidstring The UID of the source file. Text Content text-contentstring The text content of the chunk.
Chunk Reference Field Field ID Type Note File Position endobject Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type. File Position startobject Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type.
File Position Field Field ID Type Note Coordinates coordinatesarray Position value. Unit unitstring Unit of measurement for a position within a file. Enum values UNIT_CHARACTERUNIT_PAGEUNIT_TIME_MSUNIT_PIXEL
File Position Field Field ID Type Note Coordinates coordinatesarray Position value. Unit unitstring Unit of measurement for a position within a file. Enum values UNIT_CHARACTERUNIT_PAGEUNIT_TIME_MSUNIT_PIXEL
Reply the questions based on the files in the catalog.
Input Field ID Type Description Task ID (required) taskstring TASK_ASKCatalog ID (required) catalog-idstring Catalog ID that you input to search files in the Catalog. Namespace (required) namespacestring Fill in your namespace, you can get namespace through the tab of switching namespace. Question (required) questionstring The question to reply. Top K top-kinteger The number of top answers to return. The range is from 1~20, and default is 5. File UIDs file-uidsarray[string] Optional list of file UIDs to filter the results by. The elements of the list must be UUID-formatted strings.
Output Field ID Type Description Answer answerstring Answers data from smart search. Chunks (optional)chunksarray[object] Chunks data to answer question.
Output Objects in Ask Chunks Field Field ID Type Note Chunk UID chunk-uidstring The unique identifier of the chunk. Chunk Reference referenceobject Reference to the position of the chunk within the original file. Similarity similarity-scorenumber The similarity score of the chunk. Source File Name source-file-namestring The name of the source file. Source File UID source-file-uidstring The UID of the source file. Text Content text-contentstring The text content of the chunk.
Chunk Reference Field Field ID Type Note File Position endobject Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type. File Position startobject Position within a file as coordinates ina specific unit. The number of dimensions of the coordinates depends on the unit type.
File Position Field Field ID Type Note Coordinates coordinatesarray Position value. Unit unitstring Unit of measurement for a position within a file. Enum values UNIT_CHARACTERUNIT_PAGEUNIT_TIME_MSUNIT_PIXEL
File Position Field Field ID Type Note Coordinates coordinatesarray Position value. Unit unitstring Unit of measurement for a position within a file. Enum values UNIT_CHARACTERUNIT_PAGEUNIT_TIME_MSUNIT_PIXEL
YAML
version: v1beta
component:
artifact-0:
type: instill-artifact
task: TASK_ASK
input:
catalog-id: ${variable.catalog_name}
namespace: ${variable.namespace}
question: ${variable.question}
top-k: 5
variable:
catalog_name:
title: catalog-name
description: The name of your catalog i.e. "instill-ai"
type: string
namespace:
title: namespace
description: The namespace of your catalog i.e. "instill-ai"
type: string
question:
title: question
description: The question to ask your catalog i.e. "What is Instill AI doing?", "What is Artifact?"
type: string
output:
answer:
title: answer
value: ${artifact-0.output.answer}
Sync files from Google Drive to Instill Catalog.
YAML
version: v1beta
variable:
namespace:
title: Namespace
type: string
catalog:
title: Catalog
type: string
folder-link:
title: Folder Link
type: string
component:
read-folder:
type: google-drive
input:
shared-link: ${variable.folder-link}
read-content: true
setup:
refresh-token: ${secret.refresh-token-gd}
task: TASK_READ_FOLDER
sync:
type: instill-artifact
input:
namespace: ${variable.namespace}
catalog-id: ${variable.catalog}
third-party-files: ${read-folder.output.files}
task: TASK_SYNC_FILES
output:
sync-result:
title: Sync Result
value: ${sync.output}