More | Continue

📄️ SambaNova Cloud

The SambaNova Cloud is a cloud platform for running large AI models with the world record open source models performance. You can follow the instructions in this blog post to configure your setup.

📄️ Ask Sage

To get an Ask Sage API key login to the Ask Sage platform (If you don't have an account, you can create one here) and follow the instructions in the Ask Sage Docs: Ask Sage API Key

📄️ Cerebras Inference

Cerebras Inference uses specialized silicon to provides fast inference.

📄️ Cloudflare Workers AI

Cloudflare Workers AI can be used for both chat and tab autocompletion in Continue. Here is an example of Cloudflare Workers AI configuration:

📄️ Cohere

Before using Cohere, visit the Cohere dashboard to create an API key.

DeepInfra provides inference for open-source models at very low cost. To get started with DeepInfra, obtain your API key here. Then, find the model you want to use here and copy the name of the model. Continue can then be configured to use the DeepInfra LLM class, like the example here:

📄️ Flowise

Flowise is a low-code/no-code drag & drop tool with the aim to make it easy for people to visualize and build LLM apps. Continue can then be configured to use the Flowise LLM class, like the example here:

📄️ Function Network

Private, Affordable User-Owned AI

📄️ Groq

Check if your chosen model is still supported by referring to the model documentation. If a model has been deprecated, you may encounter a 404 error when attempting to use it.

📄️ HuggingFace Inference Endpoints

Hugging Face Inference Endpoints are an easy way to setup instances of open-source language models on any cloud. Sign up for an account and add billing here, access the Inference Endpoints here, click on “New endpoint”, and fill out the form (e.g. select a model like WizardCoder-Python-34B-V1.0), and then deploy your model by clicking “Create Endpoint”. Change ~/.continue/config.json to look like this:

📄️ IPEX-LLM

IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc A-Series, Flex and Max) with very low latency.

📄️ Kindo

Kindo offers centralized control over your organization's AI operations, ensuring data protection and compliance with internal policies while supporting various commercial and open-source models. To get started, sign up here, create an API key in Settings > API > API Keys, and choose a model from the list of supported models in the "Available Models" tab or copy and paste the config in Plugins > Your Configuration.

📄️ LlamaCpp

Run the llama.cpp server binary to start the API server. If running on a remote server, be sure to set host to 0.0.0.0:

📄️ Llamafile

A llamafile is a self-contained binary that can run an open-source LLM. You can configure this provider in your config.json as follows:

📄️ LM Studio

LM Studio is an application for Mac, Windows, and Linux that makes it easy to locally run open-source models and comes with a great UI. To get started with LM Studio, download from the website, use the UI to download a model, and then start the local inference server. Continue can then be configured to use the LMStudio LLM class:

📄️ Moonshot

Moonshot AI provides high-quality large language model services with competitive pricing and excellent performance.

📄️ Morph

Morph provides a fast apply model that helps you quickly and accurately apply code changes from chat suggestions to your files. It's optimized for speed and precision when integrating generated code into your existing codebase. You can sign up for Morph's generous free tier here. Then, update your configuration file as follows:

📄️ Msty

Msty is an application for Windows, Mac, and Linux that makes it really easy to run online as well as local open-source models, including Llama-2, DeepSeek Coder, etc. No need to fidget with your terminal, run a command, or anything. Just download the app from the website, click a button, and you are up and running. Continue can then be configured to use the Msty LLM class:

📄️ nCompass

The nCompass Technologies API exposes an extremely fast inference engine for open-source language models. You can sign up here, copy your API key on the initial welcome screen, and then hit the play button on any model from the nCompass Models list. Change ~/.continue/config.json to look like this:

📄️ Nebius AI Studio

You can get an API key from the Nebius AI Studio API keys page

📄️ Novita

Novita AI offers an affordable, reliable, and simple inference platform with scalable LLM API, empowering developers to build AI applications. Try the Novita AI Llama 3 API Demo today!. You can sign up here, copy your API key on the Key Management, and then hit the play button on any model from the Novita AI Models list. Change ~/.continue/config.json to look like this:

📄️ NVIDIA

View the docs to learn how to get an API key.

📄️ OpenRouter

OpenRouter is a unified interface for commercial and open-source models, giving you access to the best models at the best prices. You can sign up here, create your API key on the keys page, and then choose a model from the list of supported models.

📄️ OpenVINO™ Model Server

OpenVINO™ Mode Server is scalable inference server for models optimized with OpenVINO™ for Intel CPU, iGPU, GPU and NPU.

📄️ OVHcloud AI Endpoints

OVHcloud AI Endpoints is a serverless inference API that provides access to a curated selection of models (e.g., Llama, Mistral, Qwen, Deepseek). It is designed with security and data privacy in mind and is compliant with GDPR.

📄️ Relace

Relace provides a fast apply model through their API that helps you reliably and almost instantly apply chat suggestions to your codebase. You can sign up and obtain an API key here. Then, change your configuration file to look like this:

📄️ ReplicateLLM

Replicate is a great option for newly released language models or models that you've deployed through their platform. Sign up for an account here, copy your API key, and then select any model from the Replicate Streaming List. Change ~/.continue/config.json to look like this:

📄️ AWS SageMaker

SageMaker can be used for both chat and embedding models. Chat models are supported for endpoints deployed with LMI, and embedding models are supported for endpoints deployed with HuggingFace TEI

📄️ Scaleway

Scaleway Generative APIs give you instant access to leading AI models hosted in European data centers, ideal for developers requiring low latency, full data privacy, and compliance with EU AI Act.

📄️ SiliconFlow

You can get an API key from the Silicon Cloud.

📄️ TextGenWebUI

TextGenWebUI is a comprehensive, open-source language model UI and local server. You can set it up with an OpenAI-compatible server plugin, and then configure it like this:

📄️ Together

The Together API is a cloud platform for running large AI models. You can sign up here, copy your API key on the initial welcome screen, and then hit the play button on any model from the Together Models list. Change ~/.continue/config.json to look like this:

📄️ Venice

Venice.AI is a privacy focused generative AI platform, allowing users to interact with open-source LLMs without storing any private user data. To get started with Venice's API, either purchase a pro account, stake $VVV to obtain daily inference allotments or fund your account with USD and head over to https//venice.ai/api.

📄️ vLLM

vLLM is an open-source library for fast LLM inference which typically is used to serve multiple users at the same time. It can also be used to run a large model on multiple GPU:s (e.g. when it doesn´t fit in a single GPU). Run their OpenAI-compatible server using vllm serve. See their server documentation and the engine arguments documentation.

📄️ IBM watsonx

watsonx, developed by IBM, offers a variety of pre-trained AI foundation models that can be used for natural language processing (NLP), computer vision, and speech recognition tasks.