Definition of the BagOfWordsEncodingPolicy class. More...
Public Member Functions | |
template < typename Archive > | |
void | serialize (Archive &, const uint32_t) |
Serialize the class to the given archive. More... | |
Static Public Member Functions | |
template < typename MatType > | |
static void | Encode (MatType &output, const size_t value, const size_t line, const size_t) |
The function performs the bag of words encoding algorithm i.e. More... | |
template < typename ElemType > | |
static void | Encode (std::vector< std::vector< ElemType >> &output, const size_t value, const size_t line, const size_t) |
The function performs the bag of words encoding algorithm i.e. More... | |
template < typename MatType > | |
static void | InitMatrix (MatType &output, const size_t datasetSize, const size_t, const size_t dictionarySize) |
The function initializes the output matrix. More... | |
template < typename ElemType > | |
static void | InitMatrix (std::vector< std::vector< ElemType >> &output, const size_t datasetSize, const size_t, const size_t dictionarySize) |
The function initializes the output matrix. More... | |
static void | PreprocessToken (size_t, size_t, size_t) |
The function is not used by the bag of words encoding policy. More... | |
static void | Reset () |
Clear the necessary internal variables. More... | |
Definition of the BagOfWordsEncodingPolicy class.
BagOfWords is used as a helper class for StringEncoding. The encoder maps each dataset item to a vector of size N, where N is equal to the total unique number of tokens. The i-th coordinate of the output vector is equal to the number of times when the i-th token occurs in the corresponding dataset item. The order in which the tokens are labeled is defined by the dictionary used by the StringEncoding class. The encoder writes data either in the column-major order or in the row-major order depending on the output data type.
Definition at line 35 of file bag_of_words_encoding_policy.hpp.
|
inlinestatic |
The function performs the bag of words encoding algorithm i.e.
it writes the encoded token to the output. The encoder writes data in the column-major order.
MatType | The output matrix type. |
output | Output matrix to store the encoded results (sp_mat or mat). |
value | The encoded token. |
line | The line number at which the encoding is performed. |
* | (index) The token index in the line. |
Definition at line 103 of file bag_of_words_encoding_policy.hpp.
|
inlinestatic |
The function performs the bag of words encoding algorithm i.e.
it writes the encoded token to the output. The encoder writes data in the row-major order.
Overloaded function to accept vector<vector<ElemType>> as the output type.
ElemType | Type of the output values. |
output | Output matrix to store the encoded results. |
value | The encoded token. |
line | The line number at which the encoding is performed. |
* | (index) The line token number at which the encoding is performed. |
Definition at line 128 of file bag_of_words_encoding_policy.hpp.
|
inlinestatic |
The function initializes the output matrix.
The encoder writes data in the column-major order.
MatType | The output matrix type. |
output | Output matrix to store the encoded results (sp_mat or mat). |
datasetSize | The number of strings in the input dataset. |
* | (maxNumTokens) The maximum number of tokens in the strings of the input dataset (not used). |
dictionarySize | The size of the dictionary. |
Definition at line 59 of file bag_of_words_encoding_policy.hpp.
|
inlinestatic |
The function initializes the output matrix.
The encoder writes data in the row-major order.
Overloaded function to save the result in vector<vector<ElemType>>.
ElemType | Type of the output values. |
output | Output matrix to store the encoded results. |
datasetSize | The number of strings in the input dataset. |
* | (maxNumTokens) The maximum number of tokens in the strings of the input dataset (not used). |
dictionarySize | The size of the dictionary. |
Definition at line 82 of file bag_of_words_encoding_policy.hpp.
|
inlinestatic |
The function is not used by the bag of words encoding policy.
* | (line) The line number at which the encoding is performed. |
* | (index) The token sequence number in the line. |
* | (value) The encoded token. |
Definition at line 144 of file bag_of_words_encoding_policy.hpp.
|
inlinestatic |
Clear the necessary internal variables.
Definition at line 41 of file bag_of_words_encoding_policy.hpp.
|
inline |
Serialize the class to the given archive.
Definition at line 153 of file bag_of_words_encoding_policy.hpp.