This is unreleased documentation for Rasa Documentation Main/Unreleased version.
For the latest released documentation, see the latest version (3.x).

Version: Main/Unreleased


Token Objects

class Token()

Used by Tokenizers which split a single message into multiple Tokens.


def __init__(text: Text,
start: int,
end: Optional[int] = None,
data: Optional[Dict[Text, Any]] = None,
lemma: Optional[Text] = None) -> None

Create a Token.


  • text - The token text.
  • start - The start index of the token within the entire message.
  • end - The end index of the token within the entire message.
  • data - Additional token data.
  • lemma - An optional lemmatized version of the token text.


def set(prop: Text, info: Any) -> None

Set property value.


def get(prop: Text, default: Optional[Any] = None) -> Any

Returns token value.


def fingerprint() -> Text

Returns a stable hash for this Token.

Tokenizer Objects

class Tokenizer(GraphComponent, abc.ABC)

Base class for tokenizers.


def __init__(config: Dict[Text, Any]) -> None

Construct a new tokenizer.


def create(cls, config: Dict[Text, Any], model_storage: ModelStorage,
resource: Resource,
execution_context: ExecutionContext) -> GraphComponent

Creates a new component (see parent class for full docstring).


def tokenize(message: Message, attribute: Text) -> List[Token]

Tokenizes the text of the provided attribute of the incoming message.


def process_training_data(training_data: TrainingData) -> TrainingData

Tokenize all training data.


def process(messages: List[Message]) -> List[Message]

Tokenize the incoming messages.