Version: 3.x
rasa.nlu.tokenizers.tokenizer
Token Objects
class Token()
Used by Tokenizers
which split a single message into multiple Token
s.
__init__
def __init__(text: Text,
start: int,
end: Optional[int] = None,
data: Optional[Dict[Text, Any]] = None,
lemma: Optional[Text] = None) -> None
Create a Token
.
Arguments:
text
- The token text.start
- The start index of the token within the entire message.end
- The end index of the token within the entire message.data
- Additional token data.lemma
- An optional lemmatized version of the token text.
set
def set(prop: Text, info: Any) -> None
Set property value.
get
def get(prop: Text, default: Optional[Any] = None) -> Any
Returns token value.
fingerprint
def fingerprint() -> Text
Returns a stable hash for this Token.
Tokenizer Objects
class Tokenizer(GraphComponent, abc.ABC)
Base class for tokenizers.
__init__
def __init__(config: Dict[Text, Any]) -> None
Construct a new tokenizer.
create
@classmethod
def create(cls, config: Dict[Text, Any], model_storage: ModelStorage,
resource: Resource,
execution_context: ExecutionContext) -> GraphComponent
Creates a new component (see parent class for full docstring).
tokenize
@abc.abstractmethod
def tokenize(message: Message, attribute: Text) -> List[Token]
Tokenizes the text of the provided attribute of the incoming message.
process_training_data
def process_training_data(training_data: TrainingData) -> TrainingData
Tokenize all training data.
process
def process(messages: List[Message]) -> List[Message]
Tokenize the incoming messages.