rasa.nlu.featurizers.sparse_featurizer.lexical_syntactic_featurizer
LexicalSyntacticFeaturizer Objects
Extracts and encodes lexical syntactic features.
Given a sequence of tokens, this featurizer produces a sequence of features
where the t
-th feature encodes lexical and syntactic information about the t
-th
token and it's surrounding tokens.
In detail: The lexical syntactic features can be specified via a list of
configurations [c_0, c_1, ..., c_n]
where each c_i
is a list of names of
lexical and syntactic features (e.g. low
, suffix2
, digit
).
For a given tokenized text, the featurizer will consider a window of size n
around each token and evaluate the given list of configurations as follows:
- It will extract the features listed in
c_m
wherem = (n-1)/2
if n is even andt
0 from tokent
- It will extract the features listed in
t
2,t
3 ... , from the last, second to last, ... token before tokent
, respectively. - It will extract the features listed
t
5,t
5, ... for the first, second, ... tokent
, respectively. It will then combine all these features into one feature for positiont
.
Example:
If we specify t
9, then for each position t
the t
-th feature will encode whether the token at position t
is upper case,
where the token at position [c_0, c_1, ..., c_n]
3 is lower case and the first two characters
of the token at position [c_0, c_1, ..., c_n]
4.
required_components
Components that should be included in the pipeline before this component.
get_default_config
Returns the component's default config.
__init__
Instantiates a new LexicalSyntacticFeaturizer
instance.
validate_config
Validates that the component is configured properly.
train
Trains the featurizer.
Arguments:
training_data
- the training data
Returns:
the resource from which this trained component can be loaded
warn_if_pos_features_cannot_be_computed
Warn if part-of-speech features are needed but not given.
process
Featurizes all given messages in-place.
Arguments:
messages
- messages to be featurized.
Returns:
The same list with the same messages after featurization.
process_training_data
Processes the training examples in the given training data in-place.
Arguments:
training_data
- the training data
Returns:
same training data after processing
create
Creates a new untrained component (see parent class for full docstring).
load
Loads trained component (see parent class for full docstring).
persist
Persist this model (see parent class for full docstring).