LLM Providers
Instructions on how to setup and configure Large Language Models from OpenAI, Cohere, and other providers. Here you'll learn what you need to configure and how you can customize LLMs to work efficiently with your specific use case.
Overview
All Rasa components which make use of an LLM can be configured. This includes:
- The LLM provider
- The model
- The sampling temperature
- The prompt template
and other settings. This page applies to the following components which use LLMs:
- LLMCommandGenerator
- EnterpriseSearchPolicy
- IntentlessPolicy
- ContextualResponseRephraser
- LLMIntentClassifier
OpenAI Configuration
The configuration describes in detail how to connect to OpenAI. Rasa is LLM agnostic and can be configured with different LLMs, but OpenAI is the default.
If you want to configure your assistant with a different LLM, you can find instructions for other LLM providers further down the page.
API Token
The API token authenticates your requests to the OpenAI API.
To configure the API token, follow these steps:
If you haven't already, sign up for an account on the OpenAI platform.
Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining your API key.
To set the API key as an environment variable, you can use the following command in a terminal or command prompt:
- Linux/MacOS
- Windows
Replace <your-api-key>
with the actual API key you obtained from the OpenAI platform.
Model Configuration
Many LLM providers offer multiple models through their API.
The model is specified individually for each component, so that if you want to you can use
a combination of various models. For instance here is how you could configure a different model
for the LLMCommandGenerator
and the EnterpriseSearchPolicy
:
- Rasa Pro <=3.7.x
- Rasa Pro >=3.8.x
Additional Configuration for Azure OpenAI Service
For those using Azure OpenAI Service, there are additional parameters that need to be configured:
api_type
- The type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service. Can be set through ENV varOPENAI_API_TYPE
.api_base
- The URL for your Azure OpenAI instance. An example might look like this:https://my-azure.openai.azure.com/
. Can be set through ENV varOPENAI_API_BASE
.api_version
- The API version to use for this operation. This follows the YYYY-MM-DD format. Can be set through ENV varOPENAI_API_VERSION
.engine
/deployment
- Name of the deployment for chat model or embeddings on Azure.chunk_size
- Size of text chunk embeddings sent to Azure.
More detailed descriptions of these parameters can be found at the end of this section.
To configure these parameters, follow these steps:
Step 1: Configure the api_type
either as an environment variable or set it in the config file.
To create an environment variable use the following instructions:
- Linux/MacOS
- Windows
To configure the api_type
in the config file, add it in the pipeline component like this:
Step 2: Configure the api_base
either as an environment variable or set it in the config file.
To create an environment variable use the following instructions:
- Linux/MacOS
- Windows
To configure the api_base
in the config file, add it in the pipeline component like this:
Step 3: To configure the api_version
in the config file, add it in the pipeline component like this:
Step 4: To configure the engine
in the config file, add it in the pipeline component like this:
Step 5: To configure the deployment
/engine
for embeddings in the config.yml file,
add it in the pipeline component like this:
- Rasa Pro <=3.7.x
- Rasa Pro >=3.8.x
Using engine
field:
Using deployment
field:
Please note that you must set openai_api_type
to azure
in the embeddings configuration to use the deployment
field.
Step 6: To configure chunk_size
in the config file, add it in the pipeline components under embeddings
object like this:
- Rasa Pro <=3.7.x
- Rasa Pro >=3.8.x
A complete configuration of the LLMCommandGenerator
using Azure OpenAI Service might look,
for example, like this:
- Rasa Pro <=3.7.x
- Rasa Pro >=3.8.x
A more comprehensive example which includes:
llm
andembeddings
configuration for components inconfig.yml
:IntentlessPolicy
EnterpriseSearchPolicy
LLMCommandGenerator
flow_retrieval
in 3.8.x
llm
configuration for rephrase inendpoints.yml
(ContextualResponseRephraser
)
- Rasa Pro <=3.7.x
- Rasa Pro >=3.8.x
Try increasing the request_timeout
value if you find langchain consistently hitting a timeout warning.
The value to which you should set this parameter may depend on your azure instance.
How to configure the llm and embeddings fields
Azure Open AI Config Key | Rasa config sub-section(s) | Rasa config sub-section key | Description |
---|---|---|---|
api_type | llm and embeddings | api_type | The type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service. |
api_base | llm and embeddings | api_base | The URL for your Azure OpenAI instance. An example might look like this: https://my-azure.openai.azure.com/ . |
api_version | llm and embeddings | api_version | The API version to use for this operation. See Azure docs for more information about supported API versions. |
engine | llm | engine | Name of the deployment for chat model on Azure. If you are using the chat models, you must already have an existing OpenAI deployment on Azure OpenAI. |
deployment | embeddings | engine | Name of the deployment for embeddings model on Azure. Note that you must already have an existing embeddings model OpenAI deployment on Azure OpenAI. |
chunk_size | llm and embeddings | chunk_size | Size of text chunk embeddings sent to Azure. Some azure plans might restrict you from sending larger chunks of text for embeddings. If you see an an error that says Too many inputs , you should decrease your chunk_size . By default, chunk_size is 1000 but this can be configured to a lower value under the embeddings portion in the config.yml |
To use the deployment
parameter in the embeddings instead of engine
you must set openai_api_type
to azure
in the embeddings configuration.
Other LLMs/Embeddings
The LLM and embeddings provider can be configured separately for each component. All components default to using OpenAI.
important
If you switch to a different LLM / embedding provider, you need to go through additional installation and setup. Please note the mentioned additional requirements for each provider in their respective section.
caution
We are currently working on adding support for other LLM providers. We support configuring alternative LLM and embedding providers, but we have tested the functionality with OpenAI only. The performance of your assistant may vary when using other LLMs, but improvements can be made by experimenting with the prompt.
Configuring an LLM provider
The LLM provider can be configured using the llm
property of each component.
The llm.type
property specifies the LLM provider to use.
The above configuration specifies that the LLMCommandGenerator should use the Cohere LLM provider rather than OpenAI.
important
If you switch to a different LLM provider, all default parameters for different components will be ignored and the default for the new provider is used.
E.g. If a component sets temperature=0.7
and you switch to a different LLM
provider, this default will be ignored and it is up to you to set the
temperature for the new provider.
The following LLM providers are supported:
OpenAI
Default LLM provider. Requires the OPENAI_API_KEY
environment variable to be set.
The model cam be configured as an optional parameter
Cohere
Support for Cohere needs to be installed, e.g. using pip install cohere
.
Additionally, requires the COHERE_API_KEY
environment variable to be set.
Vertex AI
To use Vertex AI you need to install pip install google-cloud-aiplatform
The credentials for Vertex AI can be configured as described in the
google auth documentation.
Hugging Face Hub
The Hugging Face Hub LLM uses models from Hugging Face.
It requires additional packages to be installed: pip install huggingface_hub
.
The environment variable HUGGINGFACEHUB_API_TOKEN
needs to be set to a
valid API token.
llama-cpp
To use the llama-cpp language model, you should install the required python library
pip install llama-cpp-python
. A path to the Llama model must be provided.
For more details, check out the llama-cpp project.
Other LLM providers
If you want to use a different LLM provider, you can specify the name of the
provider in the llm.type
property accoring to this mapping.
Configuring an embeddings provider
The embeddings provider can be configured using the embeddings
property of each
component. The embeddings.type
property specifies the embeddings provider to use.
- Rasa Pro <=3.7.x
- Rasa Pro >=3.8.x
The above configuration specifies that the LLMIntentClassifier should use the Cohere embeddings provider rather than OpenAI.
Only Some Components need Embeddings
Not every component uses embeddings. For example, the
ContextualResponseRephraser component does not use embeddings.
For these components, no embeddings
property is needed.
The following embeddings providers are supported:
OpenAI
Default embeddings. Requires the OPENAI_API_KEY
environment variable to be set.
The model cam be configured as an optional parameter
Cohere
Embeddings from Cohere. Requires the python package
for cohere to be installed, e.g. uing pip install cohere
. The
COHERE_API_KEY
environment variable must be set. The model
can be configured as an optional parameter.
spaCy
The spacy embeddings provider uses en_core_web_sm
model to generate
embeddings. The model needs to be installed separately, e.g. using
python -m spacy download en_core_web_sm
.
Vertex AI
To use Vertex AI you need to install pip install google-cloud-aiplatform
The credentials for Vertex AI can be configured as described in the
google auth documentation.
Hugging Face Hub
The Hugging Face Hub embeddings provider uses models from Hugging Face.
It requires additional packages to be installed: pip install huggingface_hub
.
The environment variable HUGGINGFACEHUB_API_TOKEN
needs to be set to a
valid API token.
llama-cpp
To use the llama-cpp embeddings, you should install the required python library
pip install llama-cpp-python
. A path to the Llama model must be provided.
For more details, check out the llama-cpp project.
Huggingface
The embedding types huggingface
, huggingface_instruct
and huggingface_bge
can be used to locally run models from Huggingface. They are intended for different
kinds of embedding models. For the following models, please refer to the documentation
of Sentence Transformers library
to see the list of available parameters. Here's how to configure each of these:
huggingface
: Hugging Face Sentence-Transformer embedding models. As a prerequisite, you should install thesentence_transformers
python package.
huggingface_instruct
: Huggingface instruct embedding models. You should have thesentence_transformers
andInstructorEmbedding
python packages installed.
huggingface_bge
: BGE models are currently one of the best open source embedding models (according to the MTEB leaderboards) It requires the installation ofsentence_transformers
python package.
FAQ
Does OpenAI use my data to train their models?
No. OpenAI does not use your data to train their models. From their website:
Data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI's service offering.