DictionaryEncodingPolicy Class Reference

DicitonaryEnocdingPolicy is used as a helper class for StringEncoding. More...

Public Member Functions

template
<
typename
Archive
>
void serialize (Archive &, const uint32_t)
 Serialize the class to the given archive. More...

 

Static Public Member Functions

template
<
typename
MatType
>
static void Encode (MatType &output, const size_t value, const size_t line, const size_t index)
 The function performs the dictionary encoding algorithm i.e. More...

 
template
<
typename
ElemType
>
static void Encode (std::vector< ElemType > &output, size_t value)
 The function performs the dictionary encoding algorithm i.e. More...

 
template
<
typename
MatType
>
static void InitMatrix (MatType &output, const size_t datasetSize, const size_t maxNumTokens, const size_t)
 The function initializes the output matrix. More...

 
static void PreprocessToken (const size_t, const size_t, const size_t)
 The function is not used by the dictionary encoding policy. More...

 
static void Reset ()
 Clear the necessary internal variables. More...

 

Detailed Description

DicitonaryEnocdingPolicy is used as a helper class for StringEncoding.

The encoder assigns a positive integer number to each unique token and treats the dataset as categorical. The numbers are assigned sequentially starting from one. The order in which the tokens are labeled is defined by the dictionary used by the StringEncoding class. The encoder writes data either in the column-major order or in the row-major order depending on the output data type.

Definition at line 32 of file dictionary_encoding_policy.hpp.

Member Function Documentation

◆ Encode() [1/2]

static void Encode ( MatType &  output,
const size_t  value,
const size_t  line,
const size_t  index 
)
inlinestatic

The function performs the dictionary encoding algorithm i.e.

it writes the encoded token to the output. The encoder writes data in the column-major order.

Template Parameters
MatTypeThe output matrix type.
Parameters
outputOutput matrix to store the encoded results (sp_mat or mat).
valueThe encoded token.
lineThe line number at which the encoding is performed.
indexThe token index in the line.

Definition at line 77 of file dictionary_encoding_policy.hpp.

◆ Encode() [2/2]

static void Encode ( std::vector< ElemType > &  output,
size_t  value 
)
inlinestatic

The function performs the dictionary encoding algorithm i.e.

it writes the encoded token to the output. This is an overloaded function which saves the result into the given vector to avoid padding. The encoder writes data in the row-major order.

Template Parameters
ElemTypeType of the output values.
Parameters
outputOutput vector to store the encoded line.
valueThe encoded token.

Definition at line 97 of file dictionary_encoding_policy.hpp.

◆ InitMatrix()

static void InitMatrix ( MatType &  output,
const size_t  datasetSize,
const size_t  maxNumTokens,
const size_t   
)
inlinestatic

The function initializes the output matrix.

The encoder writes data in the column-major order.

Template Parameters
MatTypeThe output matrix type.
Parameters
outputOutput matrix to store the encoded results (sp_mat or mat).
datasetSizeThe number of strings in the input dataset.
maxNumTokensThe maximum number of tokens in the strings of the input dataset.
*(dictionarySize) The size of the dictionary (not used).

Definition at line 56 of file dictionary_encoding_policy.hpp.

◆ PreprocessToken()

static void PreprocessToken ( const size_t  ,
const size_t  ,
const size_t   
)
inlinestatic

The function is not used by the dictionary encoding policy.

Parameters
*(line) The line number at which the encoding is performed.
*(index) The token sequence number in the line.
*(value) The encoded token.

Definition at line 109 of file dictionary_encoding_policy.hpp.

◆ Reset()

static void Reset ( )
inlinestatic

Clear the necessary internal variables.

Definition at line 38 of file dictionary_encoding_policy.hpp.

◆ serialize()

void serialize ( Archive &  ,
const uint32_t   
)
inline

Serialize the class to the given archive.

Definition at line 118 of file dictionary_encoding_policy.hpp.


The documentation for this class was generated from the following file: