📄️ SambaNova Cloud
The SambaNova Cloud is a cloud platform for running large AI models with the world record Llama 3.1 70B/405B performance. You can follow the instructions in this blog post to configure your setup.
📄️ Ask Sage
To get an Ask Sage API key login to the Ask Sage platform (If you don't have an account, you can create one here) and follow the instructions in the Ask Sage Docs:Ask Sage API Key
📄️ Cerebras Inference
Cerebras Inference uses specialized silicon to provides fast inference.
📄️ Cloudflare Workers AI
Cloudflare Workers AI can be used for both chat and tab autocompletion in Continue. To setup Cloudflare Workers AI, add the following to your config.json file:
📄️ Cohere
Before using Cohere, visit the Cohere dashboard to create an API key.
📄️ DeepInfra
DeepInfra provides inference for open-source models at very low cost. To get started with DeepInfra, obtain your API key here. Then, find the model you want to use here and copy the name of the model. Continue can then be configured to use the DeepInfra LLM class, like the example here:
📄️ Flowise
Flowise is a low-code/no-code drag & drop tool with the aim to make it easy for people to visualize and build LLM apps. Continue can then be configured to use the Flowise LLM class, like the example here:
📄️ Free Trial
The "free-trial" provider lets new users quickly try out the best experience in Continue using our API keys through a secure proxy server. To prevent abuse, we will ask you to sign in with GitHub, which you can read more about below.
📄️ Function Network
Private, Affordable User-Owned AI
📄️ Groq
Groq provides the fastest available inference for open-source language models, including the entire Llama 3.1 family.
📄️ HuggingFace Inference Endpoints
Hugging Face Inference Endpoints are an easy way to setup instances of open-source language models on any cloud. Sign up for an account and add billing here, access the Inference Endpoints here, click on “New endpoint”, and fill out the form (e.g. select a model like WizardCoder-Python-34B-V1.0), and then deploy your model by clicking “Create Endpoint”. Change ~/.continue/config.json to look like this:
📄️ IPEX-LLM
IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc A-Series, Flex and Max) with very low latency.
📄️ Kindo
Kindo offers centralized control over your organization's AI operations, ensuring data protection and compliance with internal policies while supporting various commercial and open-source models. To get started, sign up here, create an API key in Settings > API > API Keys, and choose a model from the list of supported models in the "Available Models" tab or copy and paste the config in Plugins > Your Configuration.
📄️ LlamaCpp
Run the llama.cpp server binary to start the API server. If running on a remote server, be sure to set host to 0.0.0.0:
📄️ Llamafile
A llamafile is a self-contained binary that can run an open-source LLM. You can configure this provider in your config.json as follows:
📄️ LM Studio
LM Studio is an application for Mac, Windows, and Linux that makes it easy to locally run open-source models and comes with a great UI. To get started with LM Studio, download from the website, use the UI to download a model, and then start the local inference server. Continue can then be configured to use the LMStudio LLM class:
📄️ Msty
Msty is an application for Windows, Mac, and Linux that makes it really easy to run online as well as local open-source models, including Llama-2, DeepSeek Coder, etc. No need to fidget with your terminal, run a command, or anything. Just download the app from the website, click a button, and you are up and running. Continue can then be configured to use the Msty LLM class:
📄️ Nebius AI Studio
You can get an API key from the Nebius AI Studio API keys page
📄️ Novita
Novita AI offers an affordable, reliable, and simple inference platform with scalable LLM API, empowering developers to build AI applications. Try the Novita AI Llama 3 API Demo today!. You can sign up here, copy your API key on the Key Management, and then hit the play button on any model from the Novita AI Models list. Change ~/.continue/config.json to look like this:
📄️ NVIDIA
View the docs to learn how to get an API key.
📄️ OpenRouter
OpenRouter is a unified interface for commercial and open-source models, giving you access to the best models at the best prices. You can sign up here, create your API key on the keys page, and then choose a model from the list of supported models.
📄️ ReplicateLLM
Replicate is a great option for newly released language models or models that you've deployed through their platform. Sign up for an account here, copy your API key, and then select any model from the Replicate Streaming List. Change ~/.continue/config.json to look like this:
📄️ AWS SageMaker
SageMaker can be used for both chat and embedding models. Chat models are supported for endpoints deployed with LMI, and embedding models are supported for endpoints deployed with HuggingFace TEI
📄️ Scaleway
Scaleway Generative APIs give you instant access to leading AI models hosted in European data centers, ideal for developers requiring low latency, full data privacy, and compliance with EU AI Act.
📄️ SiliconFlow
You can get an API key from the Silicon Cloud.
📄️ TextGenWebUI
TextGenWebUI is a comprehensive, open-source language model UI and local server. You can set it up with an OpenAI-compatible server plugin, and then configure it in your config.json like this:
📄️ Together
The Together API is a cloud platform for running large AI models. You can sign up here, copy your API key on the initial welcome screen, and then hit the play button on any model from the Together Models list. Change ~/.continue/config.json to look like this:
📄️ vLLM
vLLM is an open-source library for fast LLM inference which typically is used to serve multiple users at the same. It can also be used to run a large model on multiple GPU:s (e.g. when it doesn´t fit in a single GPU). Run their OpenAI-compatible server using vllm serve. See their server documentation and the engine arguments documentation.
📄️ IBM watsonx
watsonx, developed by IBM, offers a variety of pre-trained AI foundation models that can be used for natural language processing (NLP), computer vision, and speech recognition tasks.