DatasetMapper< PolicyType, InputType > Class Template Reference

Auxiliary information for a dataset, including mappings to/from strings (or other types) and the datatype of each dimension. More...

Inheritance diagram for DatasetMapper< PolicyType, InputType >:

Public Member Functions

 DatasetMapper (const size_t dimensionality=0)
 Create the DatasetMapper object with the given dimensionality. More...

 
 DatasetMapper (PolicyType &policy, const size_t dimensionality=0)
 Create the DatasetMapper object with the given policy and dimensionality. More...

 
size_t Dimensionality () const
 Get the dimensionality of the DatasetMapper object (that is, how many dimensions it has information for). More...

 
template
<
typename
T
>
void MapFirstPass (const InputType &input, const size_t dimension)
 Preprocessing: during a first pass of the data, pass the input on to the MapPolicy if they are needed. More...

 
template
<
typename
T
>
MapString (const InputType &input, const size_t dimension)
 Given the input and the dimension to which it belongs, return its numeric mapping. More...

 
size_t NumMappings (const size_t dimension) const
 Get the number of mappings for a particular dimension. More...

 
template
<
typename
T
>
size_t NumUnmappings (const T value, const size_t dimension) const
 Get the number of possible unmappings for a string in a given dimension. More...

 
const PolicyType & Policy () const
 Return the policy of the mapper. More...

 
PolicyType & Policy ()
 Modify the policy of the mapper (be careful!). More...

 
void Policy (PolicyType &&policy)
 Modify (Replace) the policy of the mapper with a new policy. More...

 
template
<
typename
Archive
>
void serialize (Archive &ar, const uint32_t)
 Serialize the dataset information. More...

 
void SetDimensionality (const size_t dimensionality)
 Set the dimensionality of an existing DatasetMapper object. More...

 
Datatype Type (const size_t dimension) const
 Return the type of a given dimension (numeric or categorical). More...

 
DatatypeType (const size_t dimension)
 Modify the type of a given dimension (be careful!). More...

 
template
<
typename
T
>
const InputType & UnmapString (const T value, const size_t dimension, const size_t unmappingIndex=0) const
 Return the input that corresponds to a given value in a given dimension. More...

 
PolicyType::MappedType UnmapValue (const InputType &input, const size_t dimension)
 Return the value that corresponds to a given input in a given dimension. More...

 

Detailed Description


template
<
typename
PolicyType
,
typename
InputType
=
std::string
>

class mlpack::data::DatasetMapper< PolicyType, InputType >

Auxiliary information for a dataset, including mappings to/from strings (or other types) and the datatype of each dimension.

DatasetMapper objects are optionally produced by data::Load(), and store the type of each dimension (Datatype::numeric or Datatype::categorical) as well as mappings from strings to unsigned integers and vice versa.

DatasetMapper objects can also map from arbitrary types; the type to map from can be specified with the InputType template parameter. By default, the InputType parameter is std::string.

Template Parameters
PolicyTypeMapping policy used to specify MapString().
InputTypeType of input to be mapped.

Definition at line 41 of file dataset_mapper.hpp.

Constructor & Destructor Documentation

◆ DatasetMapper() [1/2]

DatasetMapper ( const size_t  dimensionality = 0)
explicit

Create the DatasetMapper object with the given dimensionality.

Note that the dimensionality cannot be changed later; you will have to create a new DatasetMapper object.

◆ DatasetMapper() [2/2]

DatasetMapper ( PolicyType &  policy,
const size_t  dimensionality = 0 
)
explicit

Create the DatasetMapper object with the given policy and dimensionality.

Note that the dimensionality cannot be changed later; you will have to create a new DatasetMapper object. Policy can be modified by the modifier.

Member Function Documentation

◆ Dimensionality()

size_t Dimensionality ( ) const

Get the dimensionality of the DatasetMapper object (that is, how many dimensions it has information for).

If this object was created by a call to mlpack::data::Load(), then the dimensionality will be the same as the number of rows (dimensions) in the dataset.

Referenced by LoadCSV::GetMatrixSize(), and LoadCSV::GetTransposeMatrixSize().

◆ MapFirstPass()

void MapFirstPass ( const InputType &  input,
const size_t  dimension 
)

Preprocessing: during a first pass of the data, pass the input on to the MapPolicy if they are needed.

Parameters
inputInput to map.
dimensionDimension to map for.

◆ MapString()

T MapString ( const InputType &  input,
const size_t  dimension 
)

Given the input and the dimension to which it belongs, return its numeric mapping.

If no mapping yet exists, the input is added to the list of mappings for the given dimension. The dimension parameter refers to the index of the dimension of the string (i.e. the row in the dataset).

Template Parameters
TNumeric type to map to (int/double/float/etc.).
Parameters
inputInput to find/create mapping for.
dimensionIndex of the dimension of the string.

Referenced by MockCategoricalData(), and mlpack::util::SetParamWithInfo().

◆ NumMappings()

size_t NumMappings ( const size_t  dimension) const

Get the number of mappings for a particular dimension.

If the dimension is numeric, then this will return 0.

◆ NumUnmappings()

size_t NumUnmappings ( const T  value,
const size_t  dimension 
) const

Get the number of possible unmappings for a string in a given dimension.

◆ Policy() [1/3]

const PolicyType& Policy ( ) const

Return the policy of the mapper.

Referenced by DatasetMapper< mlpack::data::IncrementPolicy, double >::serialize().

◆ Policy() [2/3]

PolicyType& Policy ( )

Modify the policy of the mapper (be careful!).

◆ Policy() [3/3]

void Policy ( PolicyType &&  policy)

Modify (Replace) the policy of the mapper with a new policy.

◆ serialize()

void serialize ( Archive &  ar,
const uint32_t   
)
inline

Serialize the dataset information.

Definition at line 154 of file dataset_mapper.hpp.

◆ SetDimensionality()

void SetDimensionality ( const size_t  dimensionality)

Set the dimensionality of an existing DatasetMapper object.

This resets all mappings (but not the PolicyType).

Parameters
dimensionalityNew dimensionality.

Referenced by LoadCSV::GetMatrixSize(), LoadCSV::GetTransposeMatrixSize(), and LoadBostonHousingDataset().

◆ Type() [1/2]

Datatype Type ( const size_t  dimension) const

Return the type of a given dimension (numeric or categorical).

Referenced by LoadBostonHousingDataset(), MockCategoricalData(), and mlpack::util::SetParamWithInfo().

◆ Type() [2/2]

Datatype& Type ( const size_t  dimension)

Modify the type of a given dimension (be careful!).

◆ UnmapString()

const InputType& UnmapString ( const T  value,
const size_t  dimension,
const size_t  unmappingIndex = 0 
) const

Return the input that corresponds to a given value in a given dimension.

If the value is not a valid mapping in the given dimension, a std::invalid_argument is thrown. Note that this does not remove the mapping.

If the mapping is non-unique (i.e. many strings can map to the same value), then you can pass a different value for unmappingIndex to get a different string that maps to the given value. unmappingIndex should be in the range from 0 to (NumUnmappings(value, dimension) - 1).

If the mapping is unique (which it is for DatasetInfo), then the unmappingIndex parameter can be left as the default.

Parameters
valueMapped value for input.
dimensionDimension to unmap string from.
unmappingIndexIndex of non-unique unmapping (optional).

◆ UnmapValue()

PolicyType::MappedType UnmapValue ( const InputType &  input,
const size_t  dimension 
)

Return the value that corresponds to a given input in a given dimension.

If the value is not a valid mapping in the given dimension, a std::invalid_argument is thrown. Note that this does not remove the mapping.

Parameters
inputMapped input for value.
dimensionDimension to unmap input from.

The documentation for this class was generated from the following file: