The class translates a set of strings into numbers using various encoding algorithms. More...
Public Member Functions | |
| template<typename ... ArgTypes> | |
| StringEncoding (ArgTypes &&... args) | |
| Pass the given arguments to the policy constructor and create the StringEncoding object using the policy. More... | |
| StringEncoding (EncodingPolicyType encodingPolicy) | |
| Construct the class from the given encoding policy. More... | |
| StringEncoding (StringEncoding &) | |
| A variant of the copy constructor for non-constant objects. More... | |
| StringEncoding (const StringEncoding &) | |
| Default copy-constructor. More... | |
| StringEncoding (StringEncoding &&) | |
| Default move-constructor. More... | |
| void | Clear () |
| Clear the dictionary. More... | |
template < typename TokenizerType > | |
| void | CreateMap (const std::string &input, const TokenizerType &tokenizer) |
| Initialize the dictionary using the given corpus. More... | |
| const DictionaryType & | Dictionary () const |
| Return the dictionary. More... | |
| DictionaryType & | Dictionary () |
| Modify the dictionary. More... | |
template < typename OutputType , typename TokenizerType > | |
| void | Encode (const std::vector< std::string > &input, OutputType &output, const TokenizerType &tokenizer) |
| Encode the given text and write the result to the given output. More... | |
| const EncodingPolicyType & | EncodingPolicy () const |
| Return the encoding policy object. More... | |
| EncodingPolicyType & | EncodingPolicy () |
| Modify the encoding policy object. More... | |
| StringEncoding & | operator= (const StringEncoding &)=default |
| Default copy assignment operator. More... | |
| StringEncoding & | operator= (StringEncoding &&)=default |
| Default move assignment operator. More... | |
template < typename Archive > | |
| void | serialize (Archive &ar, const uint32_t) |
| Serialize the class to the given archive. More... | |
The class translates a set of strings into numbers using various encoding algorithms.
The encoder writes data either in the column-major order or in the row-major order depending on the output data type.
| EncodingPolicyType | Type of the encoding algorithm itself. |
| DictionaryType | Type of the dictionary. |
Definition at line 35 of file string_encoding.hpp.
| StringEncoding | ( | ArgTypes &&... | args | ) |
Pass the given arguments to the policy constructor and create the StringEncoding object using the policy.
| StringEncoding | ( | EncodingPolicyType | encodingPolicy | ) |
Construct the class from the given encoding policy.
| encodingPolicy | The given encoding policy. |
| StringEncoding | ( | StringEncoding< EncodingPolicyType, DictionaryType > & | ) |
A variant of the copy constructor for non-constant objects.
| StringEncoding | ( | const StringEncoding< EncodingPolicyType, DictionaryType > & | ) |
Default copy-constructor.
| StringEncoding | ( | StringEncoding< EncodingPolicyType, DictionaryType > && | ) |
Default move-constructor.
| void Clear | ( | ) |
Clear the dictionary.
| void CreateMap | ( | const std::string & | input, |
| const TokenizerType & | tokenizer | ||
| ) |
Initialize the dictionary using the given corpus.
| TokenizerType | Type of the tokenizer. |
| input | Corpus of text to encode. |
| tokenizer | The tokenizer object. |
The tokenization algorithm has to be an object with two public methods:
|
inline |
Return the dictionary.
Definition at line 124 of file string_encoding.hpp.
|
inline |
Modify the dictionary.
Definition at line 126 of file string_encoding.hpp.
| void Encode | ( | const std::vector< std::string > & | input, |
| OutputType & | output, | ||
| const TokenizerType & | tokenizer | ||
| ) |
Encode the given text and write the result to the given output.
The encoder writes data in the column-major order or in the row-major order depending on the output data type.
If the output type is either arma::mat or arma::sp_mat then the function writes it in the column-major order. If the output type is 2D std::vector then the function writes it in the row major order.
| OutputType | Type of the output container. The function supports the following types: arma::mat, arma::sp_mat, std::vector<std::vector<>>. |
| TokenizerType | Type of the tokenizer. |
| input | Corpus of text to encode. |
| output | Output container to store the result. |
| tokenizer | The tokenizer object. |
The tokenization algorithm has to be an object with two public methods:
|
inline |
Return the encoding policy object.
Definition at line 129 of file string_encoding.hpp.
|
inline |
Modify the encoding policy object.
Definition at line 131 of file string_encoding.hpp.
References StringEncoding< EncodingPolicyType, DictionaryType >::serialize().
|
default |
Default copy assignment operator.
|
default |
Default move assignment operator.
| void serialize | ( | Archive & | ar, |
| const uint32_t | |||
| ) |
Serialize the class to the given archive.
Referenced by StringEncoding< EncodingPolicyType, DictionaryType >::EncodingPolicy().