Multihead Attention allows the model to jointly attend to information from different representation subspaces at different positions. More...

Public Member Functions
	MultiheadAttention ()
	Default constructor. More...

	MultiheadAttention (const size_t tgtSeqLen, const size_t srcSeqLen, const size_t embedDim, const size_t numHeads)
	Create the MultiheadAttention object using the specified modules. More...

OutputDataType const &	AttentionMask () const
	Get the two dimensional Attention Mask. More...

OutputDataType &	AttentionMask ()
	Modify the two dimensional Attention Mask. More...

template < typename eT >
void	Backward (const arma::Mat< eT > &, const arma::Mat< eT > &gy, arma::Mat< eT > &g)
	Ordinary feed backward pass of a neural network, calculating the function f(x) by propagating x backwards trough f. More...

OutputDataType const &	Delta () const
	Get the delta. More...

OutputDataType &	Delta ()
	Modify the delta. More...

size_t	EmbedDim () const
	Get the embedding dimension. More...

size_t &	EmbedDim ()
	Modify the embedding dimension. More...

template < typename eT >
void	Forward (const arma::Mat< eT > &input, arma::Mat< eT > &output)
	Ordinary feed forward pass of a neural network, evaluating the function f(x) by propagating the activity forward through f. More...

template < typename eT >
void	Gradient (const arma::Mat< eT > &input, const arma::Mat< eT > &error, arma::Mat< eT > &gradient)
	Calculate the gradient using the output delta and the input activation. More...

OutputDataType const &	Gradient () const
	Get the gradient. More...

OutputDataType &	Gradient ()
	Modify the gradient. More...

size_t	InputShape () const

OutputDataType const &	KeyPaddingMask () const
	Get Key Padding Mask. More...

OutputDataType &	KeyPaddingMask ()
	Modify the Key Padding Mask. More...

size_t	NumHeads () const
	Get the number of attention heads. More...

size_t &	NumHeads ()
	Modify the number of attention heads. More...

OutputDataType const &	OutputParameter () const
	Get the output parameter. More...

OutputDataType &	OutputParameter ()
	Modify the output parameter. More...

OutputDataType const &	Parameters () const
	Get the parameters. More...

OutputDataType &	Parameters ()
	Modify the parameters. More...

void	Reset ()
	Reset the layer parameters. More...

template < typename Archive >
void	serialize (Archive &ar, const uint32_t)
	Serialize the layer. More...

size_t	SrcSeqLen () const
	Get the source sequence length. More...

size_t &	SrcSeqLen ()
	Modify the source sequence length. More...

size_t	TgtSeqLen () const
	Get the target sequence length. More...

size_t &	TgtSeqLen ()
	Modify the target sequence length. More...

size_t	WeightSize () const
	Get the size of the weights. More...

Detailed Description

template
<
typename
InputDataType
=
arma::mat
,
typename
OutputDataType
=
arma::mat
,
typename
RegularizerType
=
NoRegularizer
>

class mlpack::ann::MultiheadAttention< InputDataType, OutputDataType, RegularizerType >

Multihead Attention allows the model to jointly attend to information from different representation subspaces at different positions.

With a single attention head, averaging inhibits this. [arxiv.org:1706.03762v5]

The MultiheadAttention class takes concatenated form of query, key and value. The query, key and value are concatenated into single matrix and fed to the Forward function as input.

The query, key and value are matrices of shapes (embedDim * tgtSeqLen, batchSize), (embedDim * srcSeqLen, batchSize) and (embedDim * srcSeqLen, batchSize) respectively. The output is a matrix of shape (embedDim * tgtSeqLen, batchSize). The embeddings are stored consequently.

Template Parameters

InputDataType	Type of the input data (arma::colvec, arma::mat, arma::sp_mat or arma::cube).
OutputDataType	Type of the output data (arma::colvec, arma::mat, arma::sp_mat or arma::cube).
RegularizerType	Type of the regularizer to be used.

Definition at line 129 of file layer_types.hpp.

Constructor & Destructor Documentation

◆ MultiheadAttention() [1/2]

MultiheadAttention ( )

Default constructor.

◆ MultiheadAttention() [2/2]

MultiheadAttention	(	const size_t	tgtSeqLen,
		const size_t	srcSeqLen,
		const size_t	embedDim,
		const size_t	numHeads
	)

Create the MultiheadAttention object using the specified modules.

Parameters

tgtSeqLen	Target sequence length.
srcSeqLen	Source sequence length.
embedDim	Total dimension of the model.
numHeads	Number of parallel attention heads.

Member Function Documentation

◆ AttentionMask() [1/2]

OutputDataType const& AttentionMask ( ) const

inline

Get the two dimensional Attention Mask.

Definition at line 153 of file multihead_attention.hpp.

◆ AttentionMask() [2/2]

OutputDataType& AttentionMask ( )

inline

Modify the two dimensional Attention Mask.

Definition at line 155 of file multihead_attention.hpp.

◆ Backward()

void Backward	(	const arma::Mat< eT > &	,
		const arma::Mat< eT > &	gy,
		arma::Mat< eT > &	g
	)

Ordinary feed backward pass of a neural network, calculating the function f(x) by propagating x backwards trough f.

Using the results from the feed forward pass.

Parameters

gy	The backpropagated error.
g	The calculated gradient.

◆ Delta() [1/2]

OutputDataType const& Delta ( ) const

inline

Get the delta.

Definition at line 168 of file multihead_attention.hpp.

◆ Delta() [2/2]

OutputDataType& Delta ( )

inline

Modify the delta.

Definition at line 170 of file multihead_attention.hpp.

◆ EmbedDim() [1/2]

size_t EmbedDim ( ) const

inline

Get the embedding dimension.

Definition at line 143 of file multihead_attention.hpp.

◆ EmbedDim() [2/2]

size_t& EmbedDim ( )

inline

Modify the embedding dimension.

Definition at line 145 of file multihead_attention.hpp.

◆ Forward()

void Forward	(	const arma::Mat< eT > &	input,
		arma::Mat< eT > &	output
	)

Ordinary feed forward pass of a neural network, evaluating the function f(x) by propagating the activity forward through f.

Parameters

input	The query matrix.
output	Resulting output activation.

◆ Gradient() [1/3]

void Gradient	(	const arma::Mat< eT > &	input,
		const arma::Mat< eT > &	error,
		arma::Mat< eT > &	gradient
	)

Calculate the gradient using the output delta and the input activation.

Parameters

input	The input data used for evaluating specified function.
error	The calculated error.
gradient	The calculated gradient.

◆ Gradient() [2/3]

OutputDataType const& Gradient ( ) const

inline

Get the gradient.

Definition at line 173 of file multihead_attention.hpp.

◆ Gradient() [3/3]

OutputDataType& Gradient ( )

inline

Modify the gradient.

Definition at line 175 of file multihead_attention.hpp.

◆ InputShape()

size_t InputShape ( ) const

inline

Definition at line 182 of file multihead_attention.hpp.

◆ KeyPaddingMask() [1/2]

OutputDataType const& KeyPaddingMask ( ) const

inline

Get Key Padding Mask.

Definition at line 158 of file multihead_attention.hpp.

◆ KeyPaddingMask() [2/2]

OutputDataType& KeyPaddingMask ( )

inline

Modify the Key Padding Mask.

Definition at line 160 of file multihead_attention.hpp.

◆ NumHeads() [1/2]

size_t NumHeads ( ) const

inline

Get the number of attention heads.

Definition at line 148 of file multihead_attention.hpp.

◆ NumHeads() [2/2]

size_t& NumHeads ( )

inline

Modify the number of attention heads.

Definition at line 150 of file multihead_attention.hpp.

◆ OutputParameter() [1/2]

OutputDataType const& OutputParameter ( ) const

inline

Get the output parameter.

Definition at line 163 of file multihead_attention.hpp.

◆ OutputParameter() [2/2]

OutputDataType& OutputParameter ( )

inline

Modify the output parameter.

Definition at line 165 of file multihead_attention.hpp.

◆ Parameters() [1/2]

OutputDataType const& Parameters ( ) const

inline

Get the parameters.

Definition at line 178 of file multihead_attention.hpp.

◆ Parameters() [2/2]

OutputDataType& Parameters ( )

inline

Modify the parameters.

Definition at line 180 of file multihead_attention.hpp.

◆ Reset()

void Reset ( )

Reset the layer parameters.

◆ serialize()

void serialize	(	Archive &	ar,
		const uint32_t
	)

Serialize the layer.

Referenced by MultiheadAttention< InputDataType, OutputDataType, RegularizerType >::WeightSize().

◆ SrcSeqLen() [1/2]

size_t SrcSeqLen ( ) const

inline

Get the source sequence length.

Definition at line 138 of file multihead_attention.hpp.

◆ SrcSeqLen() [2/2]

size_t& SrcSeqLen ( )

inline

Modify the source sequence length.

Definition at line 140 of file multihead_attention.hpp.

◆ TgtSeqLen() [1/2]

size_t TgtSeqLen ( ) const

inline

Get the target sequence length.

Definition at line 133 of file multihead_attention.hpp.

◆ TgtSeqLen() [2/2]

size_t& TgtSeqLen ( )

inline

Modify the target sequence length.

Definition at line 135 of file multihead_attention.hpp.

◆ WeightSize()

size_t WeightSize ( ) const

inline

Get the size of the weights.

Definition at line 124 of file multihead_attention.hpp.

References MultiheadAttention< InputDataType, OutputDataType, RegularizerType >::serialize().

The documentation for this class was generated from the following files:

/home/ryan/src/mlpack.org/_src/mlpack-git/src/mlpack/methods/ann/layer/layer_types.hpp
/home/ryan/src/mlpack.org/_src/mlpack-git/src/mlpack/methods/ann/layer/multihead_attention.hpp

Public Member Functions

Detailed Description

template<typenameInputDataType=arma::mat,typenameOutputDataType=arma::mat,typenameRegularizerType=NoRegularizer> class mlpack::ann::MultiheadAttention< InputDataType, OutputDataType, RegularizerType >

Constructor & Destructor Documentation

◆ MultiheadAttention() [1/2]

◆ MultiheadAttention() [2/2]

Member Function Documentation

◆ AttentionMask() [1/2]

◆ AttentionMask() [2/2]

◆ Backward()

◆ Delta() [1/2]

◆ Delta() [2/2]

◆ EmbedDim() [1/2]

◆ EmbedDim() [2/2]

◆ Forward()

◆ Gradient() [1/3]

◆ Gradient() [2/3]

◆ Gradient() [3/3]

◆ InputShape()

◆ KeyPaddingMask() [1/2]

◆ KeyPaddingMask() [2/2]

◆ NumHeads() [1/2]

◆ NumHeads() [2/2]

◆ OutputParameter() [1/2]

◆ OutputParameter() [2/2]

◆ Parameters() [1/2]

◆ Parameters() [2/2]

◆ Reset()

◆ serialize()

◆ SrcSeqLen() [1/2]

◆ SrcSeqLen() [2/2]

◆ TgtSeqLen() [1/2]

◆ TgtSeqLen() [2/2]

◆ WeightSize()

template
<
typename
InputDataType
=
arma::mat
,
typename
OutputDataType
=
arma::mat
,
typename
RegularizerType
=
NoRegularizer
>

class mlpack::ann::MultiheadAttention< InputDataType, OutputDataType, RegularizerType >