Select providers
Continue makes it easy to use different providers for serving your chat, autocomplete, and embeddings models.
To select the ones you want to use, add them to your config.json
.
Self-hosted
Local
You can run a model on your local computer using:
- Ollama
- LM Studio
- Llama.cpp
- KoboldCpp (OpenAI compatible server)
- llamafile (OpenAI compatible server)
- LocalAI (OpenAI compatible server)
- Text generation web UI (OpenAI compatible server)
- FastChat (OpenAI compatible server)
- llama-cpp-python (OpenAI compatible server)
- TensorRT-LLM (OpenAI compatible server)
- IPEX-LLM (Local LLM on Intel GPU)
Remote
You can deploy a model in your AWS, GCP, Azure, or other clouds using:
- HuggingFace TGI
- vLLM
- SkyPilot
- Anyscale Private Endpoints (OpenAI compatible API)
SaaS
You can access both open-source and commercial LLMs via:
Open-source models
You can run open-source LLMs with cloud services like:
- Codestral API
- Together
- HuggingFace Inference Endpoints
- Anyscale Endpoints (OpenAI compatible API)
- Replicate
- Deepinfra
- Groq (OpenAI compatible API)
- AWS Bedrock
Commercial models
You can use commercial LLMs via APIs using:
- Anthrophic API
- OpenAI API
- Azure OpenAI Service
- Google Gemini API
- Mistral API
- Voyage AI API
- Cohere API
In addition to selecting providers, you will need to figure out what models to use.