Version: 3.x

rasa.nlu.featurizers.sparse_featurizer.lexical_syntactic_featurizer

LexicalSyntacticFeaturizer Objects

@DefaultV1Recipe.register(
    DefaultV1Recipe.ComponentType.MESSAGE_FEATURIZER, is_trainable=True
)
class LexicalSyntacticFeaturizer(SparseFeaturizer, GraphComponent)

Extracts and encodes lexical syntactic features.

Given a sequence of tokens, this featurizer produces a sequence of features where the t-th feature encodes lexical and syntactic information about the t-th token and it's surrounding tokens.

In detail: The lexical syntactic features can be specified via a list of configurations [c_0, c_1, ..., c_n] where each c_i is a list of names of lexical and syntactic features (e.g. low, suffix2, digit). For a given tokenized text, the featurizer will consider a window of size n around each token and evaluate the given list of configurations as follows:

It will extract the features listed in c_m where m = (n-1)/2 if n is even and t0 from token t
It will extract the features listed in t2,t3 ... , from the last, second to last, ... token before token t, respectively.
It will extract the features listed t5, t5, ... for the first, second, ... token t, respectively. It will then combine all these features into one feature for position t.

Example:

If we specify t9, then for each position t the t-th feature will encode whether the token at position t is upper case, where the token at position [c_0, c_1, ..., c_n]3 is lower case and the first two characters of the token at position [c_0, c_1, ..., c_n]4.

required_components

@classmethod
def required_components(cls) -> List[Type]

Components that should be included in the pipeline before this component.

get_default_config

@staticmethod
def get_default_config() -> Dict[Text, Any]

Returns the component's default config.

init

def __init__(
    config: Dict[Text, Any],
    model_storage: ModelStorage,
    resource: Resource,
    execution_context: ExecutionContext,
    feature_to_idx_dict: Optional[Dict[Tuple[int, Text], Dict[Text,
                                                              int]]] = None
) -> None

Instantiates a new LexicalSyntacticFeaturizer instance.

validate_config

@classmethod
def validate_config(cls, config: Dict[Text, Any]) -> None

Validates that the component is configured properly.

train

def train(training_data: TrainingData) -> Resource

Trains the featurizer.

Arguments:

training_data - the training data

Returns:

the resource from which this trained component can be loaded

warn_if_pos_features_cannot_be_computed

def warn_if_pos_features_cannot_be_computed(
        training_data: TrainingData) -> None

Warn if part-of-speech features are needed but not given.

process

def process(messages: List[Message]) -> List[Message]

Featurizes all given messages in-place.

Arguments:

messages - messages to be featurized.

Returns:

The same list with the same messages after featurization.

process_training_data

def process_training_data(training_data: TrainingData) -> TrainingData

Processes the training examples in the given training data in-place.

Arguments:

training_data - the training data

Returns:

same training data after processing

create

@classmethod
def create(cls, config: Dict[Text, Any], model_storage: ModelStorage,
           resource: Resource,
           execution_context: ExecutionContext) -> LexicalSyntacticFeaturizer

Creates a new untrained component (see parent class for full docstring).

load

@classmethod
def load(cls, config: Dict[Text, Any], model_storage: ModelStorage,
         resource: Resource, execution_context: ExecutionContext,
         **kwargs: Any) -> LexicalSyntacticFeaturizer

Loads trained component (see parent class for full docstring).

persist

def persist() -> None

Persist this model (see parent class for full docstring).

LexicalSyntacticFeaturizer Objects#

required_components#

get_default_config#

__init__#

validate_config#

train#

warn_if_pos_features_cannot_be_computed#

process#

process_training_data#

create#

load#

persist#

LexicalSyntacticFeaturizer Objects

required_components

get_default_config

init

validate_config

train

warn_if_pos_features_cannot_be_computed

process

process_training_data

create

load

persist