The class translates a set of strings into numbers using various encoding algorithms. More...
Public Member Functions | |
template<typename ... ArgTypes> | |
StringEncoding (ArgTypes &&... args) | |
Pass the given arguments to the policy constructor and create the StringEncoding object using the policy. More... | |
StringEncoding (EncodingPolicyType encodingPolicy) | |
Construct the class from the given encoding policy. More... | |
StringEncoding (StringEncoding &) | |
A variant of the copy constructor for non-constant objects. More... | |
StringEncoding (const StringEncoding &) | |
Default copy-constructor. More... | |
StringEncoding (StringEncoding &&) | |
Default move-constructor. More... | |
void | Clear () |
Clear the dictionary. More... | |
template < typename TokenizerType > | |
void | CreateMap (const std::string &input, const TokenizerType &tokenizer) |
Initialize the dictionary using the given corpus. More... | |
const DictionaryType & | Dictionary () const |
Return the dictionary. More... | |
DictionaryType & | Dictionary () |
Modify the dictionary. More... | |
template < typename OutputType , typename TokenizerType > | |
void | Encode (const std::vector< std::string > &input, OutputType &output, const TokenizerType &tokenizer) |
Encode the given text and write the result to the given output. More... | |
const EncodingPolicyType & | EncodingPolicy () const |
Return the encoding policy object. More... | |
EncodingPolicyType & | EncodingPolicy () |
Modify the encoding policy object. More... | |
StringEncoding & | operator= (const StringEncoding &)=default |
Default copy assignment operator. More... | |
StringEncoding & | operator= (StringEncoding &&)=default |
Default move assignment operator. More... | |
template < typename Archive > | |
void | serialize (Archive &ar, const uint32_t) |
Serialize the class to the given archive. More... | |
The class translates a set of strings into numbers using various encoding algorithms.
The encoder writes data either in the column-major order or in the row-major order depending on the output data type.
EncodingPolicyType | Type of the encoding algorithm itself. |
DictionaryType | Type of the dictionary. |
Definition at line 35 of file string_encoding.hpp.
StringEncoding | ( | ArgTypes &&... | args | ) |
Pass the given arguments to the policy constructor and create the StringEncoding object using the policy.
StringEncoding | ( | EncodingPolicyType | encodingPolicy | ) |
Construct the class from the given encoding policy.
encodingPolicy | The given encoding policy. |
StringEncoding | ( | StringEncoding< EncodingPolicyType, DictionaryType > & | ) |
A variant of the copy constructor for non-constant objects.
StringEncoding | ( | const StringEncoding< EncodingPolicyType, DictionaryType > & | ) |
Default copy-constructor.
StringEncoding | ( | StringEncoding< EncodingPolicyType, DictionaryType > && | ) |
Default move-constructor.
void Clear | ( | ) |
Clear the dictionary.
void CreateMap | ( | const std::string & | input, |
const TokenizerType & | tokenizer | ||
) |
Initialize the dictionary using the given corpus.
TokenizerType | Type of the tokenizer. |
input | Corpus of text to encode. |
tokenizer | The tokenizer object. |
The tokenization algorithm has to be an object with two public methods:
|
inline |
Return the dictionary.
Definition at line 124 of file string_encoding.hpp.
|
inline |
Modify the dictionary.
Definition at line 126 of file string_encoding.hpp.
void Encode | ( | const std::vector< std::string > & | input, |
OutputType & | output, | ||
const TokenizerType & | tokenizer | ||
) |
Encode the given text and write the result to the given output.
The encoder writes data in the column-major order or in the row-major order depending on the output data type.
If the output type is either arma::mat or arma::sp_mat then the function writes it in the column-major order. If the output type is 2D std::vector then the function writes it in the row major order.
OutputType | Type of the output container. The function supports the following types: arma::mat, arma::sp_mat, std::vector<std::vector<>>. |
TokenizerType | Type of the tokenizer. |
input | Corpus of text to encode. |
output | Output container to store the result. |
tokenizer | The tokenizer object. |
The tokenization algorithm has to be an object with two public methods:
|
inline |
Return the encoding policy object.
Definition at line 129 of file string_encoding.hpp.
|
inline |
Modify the encoding policy object.
Definition at line 131 of file string_encoding.hpp.
References StringEncoding< EncodingPolicyType, DictionaryType >::serialize().
|
default |
Default copy assignment operator.
|
default |
Default move assignment operator.
void serialize | ( | Archive & | ar, |
const uint32_t | |||
) |
Serialize the class to the given archive.
Referenced by StringEncoding< EncodingPolicyType, DictionaryType >::EncodingPolicy().