Version: 3.x
rasa.nlu.tokenizers.whitespace_tokenizer
WhitespaceTokenizer Objects
@DefaultV1Recipe.register(
DefaultV1Recipe.ComponentType.MESSAGE_TOKENIZER, is_trainable=False
)
class WhitespaceTokenizer(Tokenizer)
Creates features for entity extraction.
not_supported_languages
@staticmethod
def not_supported_languages() -> Optional[List[Text]]
The languages that are not supported.
get_default_config
@staticmethod
def get_default_config() -> Dict[Text, Any]
Returns the component's default config.
__init__
def __init__(config: Dict[Text, Any]) -> None
Initialize the tokenizer.
create
@classmethod
def create(cls, config: Dict[Text, Any], model_storage: ModelStorage,
resource: Resource,
execution_context: ExecutionContext) -> WhitespaceTokenizer
Creates a new component (see parent class for full docstring).
remove_emoji
def remove_emoji(text: Text) -> Text
Remove emoji if the full text, aka token, matches the emoji regex.