StringEncoding< EncodingPolicyType, DictionaryType > Class Template Reference

The class translates a set of strings into numbers using various encoding algorithms. More...

Public Member Functions

template<typename ... ArgTypes>
 StringEncoding (ArgTypes &&... args)
 Pass the given arguments to the policy constructor and create the StringEncoding object using the policy. More...

 
 StringEncoding (EncodingPolicyType encodingPolicy)
 Construct the class from the given encoding policy. More...

 
 StringEncoding (StringEncoding &)
 A variant of the copy constructor for non-constant objects. More...

 
 StringEncoding (const StringEncoding &)
 Default copy-constructor. More...

 
 StringEncoding (StringEncoding &&)
 Default move-constructor. More...

 
void Clear ()
 Clear the dictionary. More...

 
template
<
typename
TokenizerType
>
void CreateMap (const std::string &input, const TokenizerType &tokenizer)
 Initialize the dictionary using the given corpus. More...

 
const DictionaryType & Dictionary () const
 Return the dictionary. More...

 
DictionaryType & Dictionary ()
 Modify the dictionary. More...

 
template
<
typename
OutputType
,
typename
TokenizerType
>
void Encode (const std::vector< std::string > &input, OutputType &output, const TokenizerType &tokenizer)
 Encode the given text and write the result to the given output. More...

 
const EncodingPolicyType & EncodingPolicy () const
 Return the encoding policy object. More...

 
EncodingPolicyType & EncodingPolicy ()
 Modify the encoding policy object. More...

 
StringEncodingoperator= (const StringEncoding &)=default
 Default copy assignment operator. More...

 
StringEncodingoperator= (StringEncoding &&)=default
 Default move assignment operator. More...

 
template
<
typename
Archive
>
void serialize (Archive &ar, const uint32_t)
 Serialize the class to the given archive. More...

 

Detailed Description


template
<
typename
EncodingPolicyType
,
typename
DictionaryType
>

class mlpack::data::StringEncoding< EncodingPolicyType, DictionaryType >

The class translates a set of strings into numbers using various encoding algorithms.

The encoder writes data either in the column-major order or in the row-major order depending on the output data type.

Template Parameters
EncodingPolicyTypeType of the encoding algorithm itself.
DictionaryTypeType of the dictionary.

Definition at line 35 of file string_encoding.hpp.

Constructor & Destructor Documentation

◆ StringEncoding() [1/5]

StringEncoding ( ArgTypes &&...  args)

Pass the given arguments to the policy constructor and create the StringEncoding object using the policy.

◆ StringEncoding() [2/5]

StringEncoding ( EncodingPolicyType  encodingPolicy)

Construct the class from the given encoding policy.

Parameters
encodingPolicyThe given encoding policy.

◆ StringEncoding() [3/5]

StringEncoding ( StringEncoding< EncodingPolicyType, DictionaryType > &  )

A variant of the copy constructor for non-constant objects.

◆ StringEncoding() [4/5]

StringEncoding ( const StringEncoding< EncodingPolicyType, DictionaryType > &  )

Default copy-constructor.

◆ StringEncoding() [5/5]

StringEncoding ( StringEncoding< EncodingPolicyType, DictionaryType > &&  )

Default move-constructor.

Member Function Documentation

◆ Clear()

void Clear ( )

Clear the dictionary.

◆ CreateMap()

void CreateMap ( const std::string &  input,
const TokenizerType &  tokenizer 
)

Initialize the dictionary using the given corpus.

Template Parameters
TokenizerTypeType of the tokenizer.
Parameters
inputCorpus of text to encode.
tokenizerThe tokenizer object.

The tokenization algorithm has to be an object with two public methods:

  1. operator() which accepts a reference to boost::string_view, extracts the next token from the given view, removes the prefix containing the extracted token and returns the token;
  2. IsTokenEmpty() that accepts a token and returns true if the given token is empty.

◆ Dictionary() [1/2]

const DictionaryType& Dictionary ( ) const
inline

Return the dictionary.

Definition at line 124 of file string_encoding.hpp.

◆ Dictionary() [2/2]

DictionaryType& Dictionary ( )
inline

Modify the dictionary.

Definition at line 126 of file string_encoding.hpp.

◆ Encode()

void Encode ( const std::vector< std::string > &  input,
OutputType &  output,
const TokenizerType &  tokenizer 
)

Encode the given text and write the result to the given output.

The encoder writes data in the column-major order or in the row-major order depending on the output data type.

If the output type is either arma::mat or arma::sp_mat then the function writes it in the column-major order. If the output type is 2D std::vector then the function writes it in the row major order.

Template Parameters
OutputTypeType of the output container. The function supports the following types: arma::mat, arma::sp_mat, std::vector<std::vector<>>.
TokenizerTypeType of the tokenizer.
Parameters
inputCorpus of text to encode.
outputOutput container to store the result.
tokenizerThe tokenizer object.

The tokenization algorithm has to be an object with two public methods:

  1. operator() which accepts a reference to boost::string_view, extracts the next token from the given view, removes the prefix containing the extracted token and returns the token;
  2. IsTokenEmpty() that accepts a token and returns true if the given token is empty.

◆ EncodingPolicy() [1/2]

const EncodingPolicyType& EncodingPolicy ( ) const
inline

Return the encoding policy object.

Definition at line 129 of file string_encoding.hpp.

◆ EncodingPolicy() [2/2]

EncodingPolicyType& EncodingPolicy ( )
inline

Modify the encoding policy object.

Definition at line 131 of file string_encoding.hpp.

References StringEncoding< EncodingPolicyType, DictionaryType >::serialize().

◆ operator=() [1/2]

StringEncoding& operator= ( const StringEncoding< EncodingPolicyType, DictionaryType > &  )
default

Default copy assignment operator.

◆ operator=() [2/2]

StringEncoding& operator= ( StringEncoding< EncodingPolicyType, DictionaryType > &&  )
default

Default move assignment operator.

◆ serialize()

void serialize ( Archive &  ar,
const uint32_t   
)

Serialize the class to the given archive.

Referenced by StringEncoding< EncodingPolicyType, DictionaryType >::EncodingPolicy().


The documentation for this class was generated from the following file: