Llama Stack

Llama Stack is an open-source library that standardizes the core building blocks that simplify AI application development. It codifies best practices across the Llama ecosystem. More specifically, it provides

Unified API layer for Inference, RAG, Agents, Tools, Safety, Evals, and Telemetry.
Plugin architecture to support the rich ecosystem of different API implementations in various environments, including local development, on-premises, cloud, and mobile.
Prepackaged verified distributions which offer a one-stop solution for developers to get started quickly and reliably in any environment.
Multiple developer interfaces like CLI and SDKs for Python, Typescript, iOS, and Android.
Standalone applications as examples for how to build production-grade AI applications with Llama Stack.

To try Llama Stack locally, run:

curl -LsSf https://github.com/meta-llama/llama-stack/raw/main/install.sh | bash

Learn more about how to get started with llama stack in this guide

Chat model

We recommend configuring Llama 4 Maverick as your chat model.

YAML
JSON

config.yaml
models:
  - name: Llama4 Maverick
    provider: llamastack
    model: meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
    apiBase: http://<llama stack endpoint>/v1/openai/v1/

config.json
{
  "models": [
    {
      "title": "Llama4 Maverick",
      "provider": "llamastack",
      "model": "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8",
      "apiBase": "http://<llama stack endpoint>/v1/openai/v1/"
    }
  ]
}

Autocomplete model

We recommend configuring CodeLlama 7B as your autocomplete model.

YAML
JSON

config.yaml
models:
  - name: CodeLlama 7B
    provider: llamastack
    model: codellama:7b
    apiBase: http://<llama stack endpoint>/v1/openai/v1/
    roles:
      - autocomplete

config.json
{
  "tabAutocompleteModel": {
     "title": "CodeLlama 7B",
     "provider": "llamastack",
     "model": "codellama:7b",
     "apiBase": "http://<llama stack endpoint>/v1/openai/v1/"
  }
}

Embeddings model

By default, Llama Stack uses all-MiniLM-L6-v2 as the embeddings model.

YAML
JSON

config.yaml
models:
  - name: all-MiniLM-L6-v2
    provider: llamastack
    model:  all-MiniLM-L6-v2
    apiBase: http://<llama stack endpoint>/v1/openai/v1/
    roles:
      - embed

config.json
{
  "embeddingsProvider": {
    "provider": "llamastack",
    "model": "all-MiniLM-L6-v2",
    "apiBase": "http://<llama stack endpoint>/v1/openai/v1/"
  }
}

Reranking model

Llama Stack currently did not support Reranking API yet.

Chat model​

Autocomplete model​

Embeddings model​

Reranking model​

Chat model

Autocomplete model

Embeddings model

Reranking model