Configuration
Configure the Embedding Feature
To enable the embedding feature in Artifact, you must configure your deployment with a valid OpenAI Secret Key. This key is necessary for the Process Files API, which uses embedding models to encode text data. For now, the OpenAI API is the only supported embedding option, but in the future, we plan to offer additional options, including local embedding solutions.
Artifact uses a pipeline that connects to OpenAI to carry out embedding tasks. Therefore, the secret key is configured through the CFG_COMPONENT_SECRETS_OPENAI_APIKEY
environment variable in that service.
On Docker Compose
- Open the
.env.component
File:- Locate and open the
.env.component
file in your project directory.
- Locate and open the
- Add the OpenAI Secret Key:
- Insert the following line into the
.env.component
file, replacingsk-XXX
with your actual OpenAI secret key:
# Set the OpenAI secret key for embedding feature CFG_COMPONENT_SECRETS_OPENAI_APIKEY='sk-XXX'
- Insert the following line into the
- Restart Instill Core:
- After setting the environment variable, restart the
instill-core
service to apply the changes.
- After setting the environment variable, restart the
On Kubernetes
Upgrade your Helm installation with the CFG_COMPONENT_SECRETS_OPENAI_APIKEY
, replacing sk-XXX
with your actual OpenAI secret key:
helm install core instill-ai/core --devel --namespace instill-ai --create-namespace \
--set tags.prometheusStack=true \
--set 'pipelineBackend.extraEnv[0].name=CFG_COMPONENT_SECRETS_OPENAI_APIKEY' \
--set 'pipelineBackend.extraEnv[0].value='sk-XXX'
Anonymized Usage Collection
To help us better understand how Instill Core CE is used and can be improved, we collect and report anonymized usage statistics.
What Data is Collected
We value your privacy. So, we chose to collect only anonymous data and selected a set of details from your Instill Core instance that would give us insights into how to improve Instill Core CE without being invasive.
When a new Instill Core is running, the usage client in services including pipeline-backend
, model-backend
, and mgmt-backend
will ask for a new session, respectively. Our usage server returns a token used for future reporting. For each session, we collect Session
data including some basic information about the service and the system details the service is running on:
- name of the service to collect data from, e.g.,
SERVICE_PIPELINE
forpipeline-backend
- edition of the service to identify the deployment, e.g.,
local-ce
for local community edition deployment - version of the service, e.g.,
0.5.0-alpha
- architecture of the system the service is running on, e.g.,
amd64
- operating system the service is running on, e.g.,
Linux
- uptime in seconds to identify the rough lifespan of the service
Each session is assigned a random UUID for tracking and identification. Then, each session will collect and send its own SessionReport
data every 10 minutes:
MgmtUsageData
reports data formgmt-backend
session- UUID of the onboarded User
- a list of user metadata
- UUID of the onboarded User
PipelineUsageData
reports data forpipeline-backend
session of the onboarded User- UUID of the onboarded User
- a list of pipeline trigger metadata
ModelUsageData
reports data formodel-backend
session of the onboarded User- UUID of the onboarded User
- a list of model trigger metadata
You can check the full usage data structs in protobufs. These data do not allow us to track Personal Data but enable us to measure session counts and usage statistics.
Implementation
The anonymous usage report client library is in usage-client
. To limit risk exposure, we keep the usage server implementation private for now. In summary, the Session data and SessionReport sent from each session get updated in the usage server.
Opting out
Instill Core CE usage collection helps the entire community. We'd appreciate it if you left it enabled. However, if you want to opt out, you can disable it by overriding the .env
file in Core:
USAGE_ENABLED=false
This will disable the Instill Core usage collection for the entire project.
Acknowledgements
Our anonymized usage collection was inspired by KrakenD's How we built our telemetry service and we would like to acknowledge that their design has helped us to bootstrap our usage collection project.
Updated about 6 hours ago