vLLM
Run the OpenAI-compatible server by vLLM using vllm serve
. See their server documentation and the engine arguments documentation.
vllm serve NousResearch/Meta-Llama-3-8B-Instruct --max-model-len 1024
The continue implementation uses OpenAI under the hood and automatically selects the available model. You only need to set the apiBase
like this:
config.json
{
"models": [
{
"title": "My vLLM OpenAI-compatible server",
"apiBase": "http://localhost:8000/v1"
}
]
}