Pinecone

The Pinecone component is a data component that allows users to build and search vector datasets. It can carry out the following tasks:

Query
Upsert
Batch Upsert
Rerank

Release Stage

Alpha

Configuration

The component definition and tasks are defined in the definition.yaml and tasks.yaml files respectively.

Setup

In order to communicate with Pinecone, the following connection details need to be provided. You may specify them directly in a pipeline recipe as key-value pairs within the component's setup block, or you can create a Connection from the Integration Settings page and reference the whole setup as setup: ${connection.<my-connection-id>}.

Field	Field ID	Type	Note
API Key (required)	`api-key`	string	Fill in your Pinecone AI API key. You can create an api key in Pinecone Console.
Pinecone Index URL	`url`	string	Fill in your Pinecone index URL. It is in the form.

Supported Tasks

Query

Retrieve the ids of the most similar items in a namespace, along with their similarity scores.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_QUERY`
ID	`id`	string	The unique ID of the vector to be used as a query vector. If present, the vector parameter will be ignored.
Vector (required)	`vector`	array[number]	An array of dimensions for the query vector.
Top K (required)	`top-k`	integer	The number of results to return for each query.
Namespace	`namespace`	string	The namespace to query.
Filter	`filter`	object	The filter to apply. You can use vector metadata to limit your search. See more details here.
Minimum Score	`min-score`	number	Exclude results whose score is below this value.
Include Metadata	`include-metadata`	boolean	Indicates whether metadata is included in the response as well as the IDs.
Include Values	`include-values`	boolean	Indicates whether vector values are included in the response.

Output	Field ID	Type	Description
Namespace	`namespace`	string	The namespace of the query.
Matches	`matches`	array[object]	The matches returned for the query.

Output Objects in Query

Matches

Field	Field ID	Type	Note
ID	`id`	string	The ID of the matched vector.
Metadata	`metadata`	json	Metadata.
Score	`score`	number	A measure of similarity between this vector and the query vector. The higher the score, the more similar they are.
Values	`values`	array	Vector data values.

Upsert

Writes vectors into a namespace. If a new value is upserted for an existing vector id, it will overwrite the previous value. This task will be soon replaced by TASK_BATCH_UPSERT, which extends its functionality.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_UPSERT`
ID (required)	`id`	string	This is the vector's unique id.
Values (required)	`values`	array[number]	An array of dimensions for the vector to be saved.
Namespace	`namespace`	string	The namespace to query.
Metadata	`metadata`	object	The vector metadata.

Output	Field ID	Type	Description
Upserted Count	`upserted-count`	integer	Number of records modified or added.

Batch Upsert

Writes vectors into a namespace. If a new value is upserted for an existing vector ID, it will overwrite the previous value.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_BATCH_UPSERT`
Vectors (required)	`vectors`	array[object]	Array of vectors to upsert
Namespace	`namespace`	string	The namespace to query.

Input Objects in Batch Upsert

Vectors

Array of vectors to upsert

Field	Field ID	Type	Note
ID	`id`	string	The unique ID of the vector.
Metadata	`metadata`	object	The vector metadata. This is a set of key-value pairs that can be used to store additional information about the vector. The values can have the following types: string, number, boolean, or array of strings.
Values	`values`	array	An array of dimensions for the vector to be saved.

Output	Field ID	Type	Description
Upserted Count	`upserted-count`	integer	Number of records modified or added.

Rerank

Rerank documents, such as text passages, according to their relevance to a query. The input is a list of documents and a query. The output is a list of documents, sorted by relevance to the query.

Input	Field ID	Type	Description
Task ID (required)	`task`	string	`TASK_RERANK`
Query (required)	`query`	string	The query to rerank the documents.
Documents (required)	`documents`	array[string]	The documents to rerank.
Top N	`top-n`	integer	The number of results to return sorted by relevance. Defaults to the number of inputs.

Output	Field ID	Type	Description
Reranked Documents.	`documents`	array[string]	Reranked documents.
Scores	`scores`	array[number]	The relevance score of the documents normalized between 0 and 1.