Version: Latest

LLM Providers

Instructions on how to setup and configure Large Language Models from OpenAI, Cohere, and other providers. Here you'll learn what you need to configure and how you can customize LLMs to work efficiently with your specific use case.

Rasa Labs

Overview

All Rasa components which make use of an LLM can be configured. This includes:

The LLM provider
The model
The sampling temperature
The prompt template

and other settings. This page applies to the following components which use LLMs:

LLMCommandGenerator
EnterpriseSearchPolicy
IntentlessPolicy
ContextualResponseRephraser
LLMIntentClassifier

OpenAI Configuration

The configuration describes in detail how to connect to OpenAI. Rasa is LLM agnostic and can be configured with different LLMs, but OpenAI is the default.

If you want to configure your assistant with a different LLM, you can find instructions for other LLM providers further down the page.

API Token

The API token authenticates your requests to the OpenAI API.

To configure the API token, follow these steps:

If you haven't already, sign up for an account on the OpenAI platform.
Navigate to the OpenAI Key Management page, and click on the "Create New Secret Key" button to initiate the process of obtaining your API key.
To set the API key as an environment variable, you can use the following command in a terminal or command prompt:

Linux/MacOS
Windows

export OPENAI_API_KEY=your-api-key

Replace <your-api-key> with the actual API key you obtained from the OpenAI platform.

Model Configuration

Many LLM providers offer multiple models through their API. The model is specified individually for each component, so that if you want to you can use a combination of various models. For instance here is how you could configure a different model for the LLMCommandGenerator and the EnterpriseSearchPolicy:

Rasa Pro <=3.7.x
Rasa Pro >=3.8.x

config.yml

   recipe: default.v1
   language: en
   pipeline:
   - name: LLMCommandGenerator
     model: "gpt-4"

   policies:
   - name: rasa.core.policies.flow_policy.FlowPolicy
   - name: EnterpriseSearchPolicy
     model: "gpt-3.5-turbo"

Additional Configuration for Azure OpenAI Service

For those using Azure OpenAI Service, there are additional parameters that need to be configured:

api_type - The type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service. Can be set through ENV var OPENAI_API_TYPE.
api_base - The URL for your Azure OpenAI instance. An example might look like this: https://my-azure.openai.azure.com/. Can be set through ENV var OPENAI_API_BASE.
api_version - The API version to use for this operation. This follows the YYYY-MM-DD format. Can be set through ENV var OPENAI_API_VERSION.
engine/deployment - Name of the deployment for chat model or embeddings on Azure.
chunk_size - Size of text chunk embeddings sent to Azure.

More detailed descriptions of these parameters can be found at the end of this section.

To configure these parameters, follow these steps:

Step 1: Configure the api_type either as an environment variable or set it in the config file. To create an environment variable use the following instructions:

Linux/MacOS
Windows

export OPENAI_API_TYPE="azure"

To configure the api_type in the config file, add it in the pipeline component like this:

config.yml

    - name: LLMCommandGenerator
      llm:
        model_name: gpt-3.5-turbo
        api_type: azure
      # additional configuration parameters

Step 2: Configure the api_base either as an environment variable or set it in the config file. To create an environment variable use the following instructions:

Linux/MacOS
Windows

export OPENAI_API_BASE=your-azure-openai-instance-url

To configure the api_base in the config file, add it in the pipeline component like this:

config.yml

    - name: LLMCommandGenerator
      llm:
        model_name: gpt-3.5-turbo
        api_base: https://my-azure.openai.azure.com/
      # additional configuration parameters

Step 3: To configure the api_version in the config file, add it in the pipeline component like this:

config.yml

    - name: LLMCommandGenerator
      llm:
        model_name: gpt-3.5-turbo
        api_version: 2024-02-15-preview
      # additional configuration parameters

Step 4: To configure the engine in the config file, add it in the pipeline component like this:

config.yml

    - name: LLMCommandGenerator
      llm:
        model_name: gpt-3.5-turbo
        engine: <name_of_deployment_on_azure>
      # additional configuration parameters

Step 5: To configure the deployment/engine for embeddings in the config.yml file, add it in the pipeline component like this:

Rasa Pro <=3.7.x
Rasa Pro >=3.8.x

Using engine field:

config.yml

     - name: LLMIntentClassifier
       fallback_intent: "out_of_scope"
       embeddings:
         model: text-embedding-ada-002
         engine: <name_of_deployment_on_azure>
       # additional configuration parameters

Using deployment field:

Please note that you must set openai_api_type to azure in the embeddings configuration to use the deployment field.

config.yml

     - name: LLMIntentClassifier
       fallback_intent: "out_of_scope"
       embeddings:
         model: text-embedding-ada-002
         deployment: <name_of_deployment_on_azure>
         openai_api_type: azure
       # additional configuration parameters

Step 6: To configure chunk_size in the config file, add it in the pipeline components under embeddings object like this:

Rasa Pro <=3.7.x
Rasa Pro >=3.8.x

config.yml

     - name: LLMIntentClassifier
       fallback_intent: "out_of_scope"
       embeddings:
         model: text-embedding-ada-002
         chunk_size: 16
       # additional configuration parameters

A complete configuration of the LLMCommandGenerator using Azure OpenAI Service might look, for example, like this:

Rasa Pro <=3.7.x
Rasa Pro >=3.8.x

config.yml

    - name: LLMCommandGenerator
      llm:
        model_name: gpt-4
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: 2024-02-15-preview
        request_timeout: 7
      flow_retrieval:
        embeddings:
          model_name: text-embedding-3-small
          engine: rasa-embedding-small
          api_type: azure
          api_base: https://my-azure.openai.azure.com/
          api_version: 2024-02-15-preview
          request_timeout: 7

A more comprehensive example which includes:

llm and embeddings configuration for components in config.yml:
- IntentlessPolicy
- EnterpriseSearchPolicy
- LLMCommandGenerator
- flow_retrieval in 3.8.x
llm configuration for rephrase in endpoints.yml (ContextualResponseRephraser)

Rasa Pro <=3.7.x
Rasa Pro >=3.8.x

endpoints.yml

nlg:
  type: rephrase
  llm:
    model_name: gpt-4
    engine: rasa-gpt-4
    api_type: azure
    api_version: 2024-02-15-preview
    api_base: https://my-azure.openai.azure.com
    request_timeout: 7

config.yml

    recipe: default.v1
    language: en
    pipeline:
    - name: LLMCommandGenerator
      llm:
        model_name: gpt-4
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: 2024-02-15-preview
        request_timeout: 7
      flow_retrieval:
        embeddings:
          model_name: text-embedding-3-small
          engine: rasa-embedding-small
          api_type: azure
          api_base: https://my-azure.openai.azure.com/
          api_version: 2024-02-15-preview
          request_timeout: 7

    policies:
    - name: FlowPolicy
    - name: IntentlessPolicy
      llm:
        model_name: gpt-4
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: 2024-02-15-preview
        request_timeout: 7
      embeddings:
        model_name: text-embedding-3-small
        engine: rasa-embedding-small
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: 2024-02-15-preview
        request_timeout: 7
    - name: EnterpriseSearchPolicy
      vector_store:
        type: "faiss"
        threshold: 0.0
      llm:
        model_name: gpt-4
        engine: rasa-gpt-4
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: 2024-02-15-preview
        request_timeout: 7
      embeddings:
        model_name: text-embedding-3-small
        engine: rasa-embedding-small
        api_type: azure
        api_base: https://my-azure.openai.azure.com/
        api_version: 2024-02-15-preview
        request_timeout: 7

Try increasing the request_timeout value if you find langchain consistently hitting a timeout warning. The value to which you should set this parameter may depend on your azure instance.

How to configure the llm and embeddings fields

Azure Open AI Config Key	Rasa config sub-section(s)	Rasa config sub-section key	Description
`api_type`	`llm` and `embeddings`	`api_type`	The type of API to use. This should be set to "azure" to indicate the use of Azure OpenAI Service.
`api_base`	`llm` and `embeddings`	`api_base`	The URL for your Azure OpenAI instance. An example might look like this: `https://my-azure.openai.azure.com/`.
`api_version`	`llm` and `embeddings`	`api_version`	The API version to use for this operation. See Azure docs for more information about supported API versions.
`engine`	`llm`	`engine`	Name of the deployment for chat model on Azure. If you are using the chat models, you must already have an existing OpenAI deployment on Azure OpenAI.
`deployment`	`embeddings`	`engine`	Name of the deployment for embeddings model on Azure. Note that you must already have an existing embeddings model OpenAI deployment on Azure OpenAI.
`chunk_size`	`llm` and `embeddings`	`chunk_size`	Size of text chunk embeddings sent to Azure. Some azure plans might restrict you from sending larger chunks of text for embeddings. If you see an an error that says `Too many inputs`, you should decrease your `chunk_size`. By default, `chunk_size` is 1000 but this can be configured to a lower value under the `embeddings` portion in the `config.yml`

To use the deployment parameter in the embeddings instead of engine you must set openai_api_type to azure in the embeddings configuration.

Other LLMs/Embeddings

The LLM and embeddings provider can be configured separately for each component. All components default to using OpenAI.

important

If you switch to a different LLM / embedding provider, you need to go through additional installation and setup. Please note the mentioned additional requirements for each provider in their respective section.

caution

We are currently working on adding support for other LLM providers. We support configuring alternative LLM and embedding providers, but we have tested the functionality with OpenAI only. The performance of your assistant may vary when using other LLMs, but improvements can be made by experimenting with the prompt.

Configuring an LLM provider

The LLM provider can be configured using the llm property of each component. The llm.type property specifies the LLM provider to use.

config.yml

pipeline:
  - name: "LLMCommandGenerator"
    llm:
      type: "cohere"

The above configuration specifies that the LLMCommandGenerator should use the Cohere LLM provider rather than OpenAI.

important

If you switch to a different LLM provider, all default parameters for different components will be ignored and the default for the new provider is used.

E.g. If a component sets temperature=0.7 and you switch to a different LLM provider, this default will be ignored and it is up to you to set the temperature for the new provider.

The following LLM providers are supported:

OpenAI

Default LLM provider. Requires the OPENAI_API_KEY environment variable to be set. The model cam be configured as an optional parameter

llm:
  type: "openai"
  model_name: "gpt-4"
  temperature: 0.7

Cohere

Support for Cohere needs to be installed, e.g. using pip install cohere. Additionally, requires the COHERE_API_KEY environment variable to be set.

llm:
  type: "cohere"
  model: "command"
  temperature: 0.7

Vertex AI

To use Vertex AI you need to install pip install google-cloud-aiplatform The credentials for Vertex AI can be configured as described in the google auth documentation.

llm:
  type: "vertexai"
  model_name: "text-bison"
  temperature: 0.7

Hugging Face Hub

The Hugging Face Hub LLM uses models from Hugging Face. It requires additional packages to be installed: pip install huggingface_hub. The environment variable HUGGINGFACEHUB_API_TOKEN needs to be set to a valid API token.

llm:
  type: "huggingface_hub"
  repo_id: "HuggingFaceH4/zephyr-7b-beta"
  task: "text-generation"

llama-cpp

To use the llama-cpp language model, you should install the required python library pip install llama-cpp-python. A path to the Llama model must be provided. For more details, check out the llama-cpp project.

llm:
  type: "llamacpp"
  model_path: "/path/to/model.bin"
  temperature: 0.7

Other LLM providers

If you want to use a different LLM provider, you can specify the name of the provider in the llm.type property accoring to this mapping.

Configuring an embeddings provider

The embeddings provider can be configured using the embeddings property of each component. The embeddings.type property specifies the embeddings provider to use.

Rasa Pro <=3.7.x
Rasa Pro >=3.8.x

config.yml

pipeline:
  - name: "LLMIntentClassifier"
    embeddings:
      type: "cohere"

The above configuration specifies that the LLMIntentClassifier should use the Cohere embeddings provider rather than OpenAI.

Only Some Components need Embeddings

Not every component uses embeddings. For example, the ContextualResponseRephraser component does not use embeddings. For these components, no embeddings property is needed.

The following embeddings providers are supported:

OpenAI

Default embeddings. Requires the OPENAI_API_KEY environment variable to be set. The model cam be configured as an optional parameter

embeddings:
  type: "openai"
  model: "text-embedding-ada-002"

Cohere

Embeddings from Cohere. Requires the python package for cohere to be installed, e.g. uing pip install cohere. The COHERE_API_KEY environment variable must be set. The model can be configured as an optional parameter.

embeddings:
  type: "cohere"
  model: "embed-english-v2.0"

spaCy

The spacy embeddings provider uses en_core_web_sm model to generate embeddings. The model needs to be installed separately, e.g. using python -m spacy download en_core_web_sm.

embeddings:
  type: "spacy"

Vertex AI

To use Vertex AI you need to install pip install google-cloud-aiplatform The credentials for Vertex AI can be configured as described in the google auth documentation.

embeddings:
  type: "vertexai"
  model_name: "textembedding-gecko"

Hugging Face Hub

The Hugging Face Hub embeddings provider uses models from Hugging Face. It requires additional packages to be installed: pip install huggingface_hub. The environment variable HUGGINGFACEHUB_API_TOKEN needs to be set to a valid API token.

embeddings:
  type: "huggingface_hub"
  repo_id: "sentence-transformers/all-mpnet-base-v2"
  task: "feature-extraction"

llama-cpp

To use the llama-cpp embeddings, you should install the required python library pip install llama-cpp-python. A path to the Llama model must be provided. For more details, check out the llama-cpp project.

embeddings:
  type: "llamacpp"
  model_path: "/path/to/model.bin"

Huggingface

The embedding types huggingface, huggingface_instruct and huggingface_bge can be used to locally run models from Huggingface. They are intended for different kinds of embedding models. For the following models, please refer to the documentation of Sentence Transformers library to see the list of available parameters. Here's how to configure each of these:

huggingface: Hugging Face Sentence-Transformer embedding models. As a prerequisite, you should install the sentence_transformers python package.

embeddings:
  type: "huggingface"
  model_name: "sentence-transformers/all-mpnet-base-v2"
  model_kwargs:
    device: "cpu"
  encode_kwargs:
    normalize_embeddings: True

huggingface_instruct: Huggingface instruct embedding models. You should have the sentence_transformers and InstructorEmbedding python packages installed.

embeddings:
  type: "huggingface_instruct"
  model_name: "hkunlp/instructor-large"
  model_kwargs:
    device: "cpu"
  encode_kwargs:
    normalize_embeddings: True

huggingface_bge: BGE models are currently one of the best open source embedding models (according to the MTEB leaderboards) It requires the installation of sentence_transformers python package.

embeddings:
  type: "huggingface_bge"
  model_name: "BAAI/bge-small-en-v1.5"
  model_kwargs:
    device: "cpu"
  encode_kwargs:
    normalize_embeddings: True

FAQ

Does OpenAI use my data to train their models?

No. OpenAI does not use your data to train their models. From their website:

Data submitted through the OpenAI API is not used to train OpenAI models or improve OpenAI's service offering.

Overview#

OpenAI Configuration#

API Token#

Model Configuration#

Additional Configuration for Azure OpenAI Service#

How to configure the llm and embeddings fields#

Other LLMs/Embeddings#

important

caution

Configuring an LLM provider#

important

OpenAI#

Cohere#

Vertex AI#

Hugging Face Hub#

llama-cpp#

Other LLM providers#

Configuring an embeddings provider#

Only Some Components need Embeddings

OpenAI#

Cohere#

spaCy#

Vertex AI#

Hugging Face Hub#

llama-cpp#

Huggingface#

FAQ#

Does OpenAI use my data to train their models?#

Overview

OpenAI Configuration

API Token

Model Configuration

Additional Configuration for Azure OpenAI Service

How to configure the llm and embeddings fields

Other LLMs/Embeddings

Configuring an LLM provider

OpenAI

Cohere

Vertex AI

Hugging Face Hub

llama-cpp

Other LLM providers

Configuring an embeddings provider

OpenAI

Cohere

spaCy

Vertex AI

Hugging Face Hub

llama-cpp

Huggingface

FAQ

Does OpenAI use my data to train their models?