Prepare Model

To prepare a model version to be served in Model, follow these steps on your local system:

  1. Install the latest Python SDK:
pip install instill-sdk
  1. Create an empty folder for your custom model and run the init command to generate the required files, with sample code and comments that describe their function:
instill init
  1. Modify the model config file (instill.yaml) to describe your model's dependencies.
  2. Modify the model.py file, which defines the model class that will be decorated into a servable model with the Python SDK.
  3. Organize the repository files into a valid model layout.

Model Configuration

Model configuration is handled within the instill.yaml file that accompanies the model. It describes the model's necessary dependency information and is crucial for reproducibility, sharing, and discoverability.

In the instill.yaml file, you can specify the following details:

build:
  # (Required) Set to true if your model requires GPU.
  gpu: true

  # (Optional) LLM runtime to use.
  # Supported: mlc-llm, vllm, transformers
  # llm_runtime: mlc-llm

  # (Optional) Python version
  # Supported: >=3.9 and <3.13 and defaults to 3.12
  # python_version: "3.12"

  # (Optional) CUDA version if `gpu` is set to true
  # Supported: >=12.2 and <=12.8 and defaults to 12.8
  # cuda_version: "12.8"

  # (Optional) A list of python packages in the format {package-name}=={version}
  # python_packages:
  #   - numpy==1.21.0
  #   - pandas==2.1.0

  # (Optional) A list of system packages from the apt package manager
  # system_packages:
  #   - curl
  #   - libglib2.0-0

Below is an example instill.yaml for the TinyLlama model:

build:
  gpu: true
  python_version: "3.11" # supports only 3.11
  cuda_version: "12.1"
  python_packages:
    - torch==2.2.1
    - transformers==4.36.2
    - accelerate==0.25.0

Model Layout

To deploy a model in Model, we suggest you prepare the model files similar to the following layout:

.
├── instill.yaml
├── model.py
├── <additional_modules>
└── <weights>
    ├── <weight_file_1>
    ├── <weight_file_2>
    ├── ...
    └── <weight_file_n>

The above layout displays a typical model in Model consisting of:

  • instill.yaml - model config file that describes the model dependencies
  • model.py - a decorated model class that contains custom inference logic
  • <additional_modules> - a directory that holds supporting Python modules if necessary
  • <weights> - a directory that holds the weight files if necessary

You can name the <weights> and <additional_modules> folders freely, provided that they can be properly loaded and used by the model.py file.

Prepare Model Code

To implement a custom model that can be deployed and served in Model, you only need to construct a simple model class within the model.py file.

The custom model class is required to contain two methods:

  1. __init__: This is where the model loading process is defined, allowing the weights to be stored in memory and yielding faster auto-scaling behavior.
  2. __call__: This is the inference request entry point, and is where you implement your model inference logic.

You will also need to determine which AI task spec your model will follow. By using the

  • parse_task_***_to_***_input, you can convert the request input into an easy-to-use dataclass
  • construct_task_xxx_output, you can convert various standard type outputs into the response format

Below is a simple example implementation of the TinyLlama model with explanations:

# import necessary packages
import time
import torch
from transformers import pipeline

# import SDK helper functions which
# parse input request into easily workable dataclass
# and convert output data into response class
from instill.helpers import (
    parse_task_chat_to_chat_input,
    construct_task_chat_output,
)

# ray_config package hosts the decorators and deployment object for the model class
from instill.helpers.ray_config import instill_deployment, InstillDeployable


# use instill_deployment decorator to convert the model class to a servable model
@instill_deployment
class TinyLlama:
    # within the __init__ function, set up the model instance with the desired framework, in this
    # case it is a transformers pipeline
    def __init__(self):
        self.pipeline = pipeline(
            "text-generation",
            model="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
            torch_dtype=torch.bfloat16,
        )

    # __call__ handles the trigger request from Model
    async def __call__(self, request):
        # use helper package to parse the request and get the corresponding input
        # for -chat task
        conversation_inputs = await parse_task_chat_to_chat_input(request=request)

        # construct necessary Chat task output variables
        finish_reasons = []
        indexes = []
        created = []
        messages = []
        for i, inp in enumerate(conversation_inputs):
            prompt = self.pipeline.tokenizer.apply_chat_template(
                inp.messages,
                tokenize=False,
                add_generation_prompt=True,
            )

            # inference
            sequences = self.pipeline(
                prompt,
                max_new_tokens=inp.max_tokens,
                do_sample=True,
                temperature=inp.temperature,
                top_p=inp.top_p,
            )

            output = (
                sequences[0]["generated_text"]
                .split("<|assistant|>\n")[-1]
                .strip()
            )

            messages.append([{"content": output, "role": "assistant"}])
            finish_reasons.append(["length"])
            indexes.append([i])
            created.append([int(time.time())])

        return construct_task_chat_output(
            request=request,
            finish_reasons=finish_reasons,
            indexes=indexes,
            messages=messages,
            created_timestamps=created,
        )

# now simply declare a global entry point for deployment
entrypoint = InstillDeployable(TinyLlama).get_deployment_handle()

Once all the required model files are prepared, refer to the Build Model Image and Push Model Image pages for further information about creating and pushing your custom model image to Model in Instill Core.