Implementation of prioritized experience replay. More...

Classes
struct	Transition

Public Types
using	ActionType = typename EnvironmentType::Action
	Convenient typedef for action. More...

using	StateType = typename EnvironmentType::State
	Convenient typedef for state. More...

Public Member Functions
	PrioritizedReplay ()
	Default constructor. More...

	PrioritizedReplay (const size_t batchSize, const size_t capacity, const double alpha, const size_t nSteps=1, const size_t dimension=StateType::dimension)
	Construct an instance of prioritized experience replay class. More...

void	BetaAnneal ()
	Annealing the beta. More...

void	GetNStepInfo (double &reward, StateType &nextState, bool &isEnd, const double &discount)
	Get the reward, next state and terminal boolean for nth step. More...

const size_t &	NSteps () const
	Get the number of steps for n-step agent. More...

void	Sample (arma::mat &sampledStates, std::vector< ActionType > &sampledActions, arma::rowvec &sampledRewards, arma::mat &sampledNextStates, arma::irowvec &isTerminal)
	Sample some experience according to their priorities. More...

arma::ucolvec	SampleProportional ()
	Sample some experience according to their priorities. More...

const size_t &	Size ()
	Get the number of transitions in the memory. More...

void	Store (StateType state, ActionType action, double reward, StateType nextState, bool isEnd, const double &discount)
	Store the given experience and set the priorities for the given experience. More...

void	Update (arma::mat target, std::vector< ActionType > sampledActions, arma::mat nextActionValues, arma::mat &gradients)
	Update the priorities of transitions and Update the gradients. More...

void	UpdatePriorities (arma::ucolvec &indices, arma::colvec &priorities)
	Update priorities of sampled transitions. More...

Detailed Description

template
<
typename
EnvironmentType
>

class mlpack::rl::PrioritizedReplay< EnvironmentType >

Implementation of prioritized experience replay.

Prioritized experience replay can replay important transitions more frequently by prioritizing transitions, and make agent learn more efficiently.

@article{schaul2015prioritized,
 title   = {Prioritized experience replay},
 author  = {Schaul, Tom and Quan, John and Antonoglou,
            Ioannis and Silver, David},
 journal = {arXiv preprint arXiv:1511.05952},
 year    = {2015}
 }

Template Parameters

EnvironmentType Desired task.

Definition at line 39 of file prioritized_replay.hpp.

Member Typedef Documentation

◆ ActionType

using ActionType = typename EnvironmentType::Action

Convenient typedef for action.

Definition at line 43 of file prioritized_replay.hpp.

◆ StateType

using StateType = typename EnvironmentType::State

Convenient typedef for state.

Definition at line 46 of file prioritized_replay.hpp.

Constructor & Destructor Documentation

◆ PrioritizedReplay() [1/2]

PrioritizedReplay ( )

inline

Default constructor.

Definition at line 60 of file prioritized_replay.hpp.

References alpha().

◆ PrioritizedReplay() [2/2]

PrioritizedReplay	(	const size_t	batchSize,
		const size_t	capacity,
		const double	alpha,
		const size_t	nSteps = `1`,
		const size_t	dimension = `StateType::dimension`
	)

inline

Construct an instance of prioritized experience replay class.

Parameters

batchSize	Number of examples returned at each sample.
capacity	Total memory size in terms of number of examples.
alpha	How much prioritization is used.
nSteps	Number of steps to look in the future.
dimension	The dimension of an encoded state.

Definition at line 82 of file prioritized_replay.hpp.

Member Function Documentation

◆ BetaAnneal()

void BetaAnneal ( )

inline

Annealing the beta.

Definition at line 276 of file prioritized_replay.hpp.

Referenced by PrioritizedReplay< EnvironmentType >::Sample().

◆ GetNStepInfo()

void GetNStepInfo	(	double &	reward,
		StateType &	nextState,
		bool &	isEnd,
		const double &	discount
	)

inline

Get the reward, next state and terminal boolean for nth step.

Parameters

reward	Given reward.
nextState	Given next state.
isEnd	Whether next state is terminal state.
discount	The discount parameter.

Definition at line 171 of file prioritized_replay.hpp.

Referenced by PrioritizedReplay< EnvironmentType >::Store().

◆ NSteps()

const size_t& NSteps ( ) const

inline

Get the number of steps for n-step agent.

Definition at line 308 of file prioritized_replay.hpp.

References alpha().

◆ Sample()

void Sample	(	arma::mat &	sampledStates,
		std::vector< ActionType > &	sampledActions,
		arma::rowvec &	sampledRewards,
		arma::mat &	sampledNextStates,
		arma::irowvec &	isTerminal
	)

inline

Sample some experience according to their priorities.

Parameters

sampledStates	Sampled encoded states.
sampledActions	Sampled actions.
sampledRewards	Sampled rewards.
sampledNextStates	Sampled encoded next states.
isTerminal	Indicate whether corresponding next state is terminal state.

Definition at line 221 of file prioritized_replay.hpp.

References PrioritizedReplay< EnvironmentType >::BetaAnneal(), and PrioritizedReplay< EnvironmentType >::SampleProportional().

◆ SampleProportional()

arma::ucolvec SampleProportional ( )

inline

Sample some experience according to their priorities.

Returns: The indices to be chosen.

Definition at line 198 of file prioritized_replay.hpp.

Referenced by PrioritizedReplay< EnvironmentType >::Sample().

◆ Size()

const size_t& Size ( )

inline

Get the number of transitions in the memory.

Returns: Actual used memory size.

Definition at line 268 of file prioritized_replay.hpp.

◆ Store()

void Store	(	StateType	state,
		ActionType	action,
		double	reward,
		StateType	nextState,
		bool	isEnd,
		const double &	discount
	)

inline

Store the given experience and set the priorities for the given experience.

Parameters

state	Given state.
action	Given action.
reward	Given reward.
nextState	Given next state.
isEnd	Whether next state is terminal state.
discount	The discount parameter.

Definition at line 122 of file prioritized_replay.hpp.

References PrioritizedReplay< EnvironmentType >::Transition::action, alpha(), PrioritizedReplay< EnvironmentType >::GetNStepInfo(), PrioritizedReplay< EnvironmentType >::Transition::isEnd, PrioritizedReplay< EnvironmentType >::Transition::nextState, PrioritizedReplay< EnvironmentType >::Transition::reward, and PrioritizedReplay< EnvironmentType >::Transition::state.

◆ Update()

void Update	(	arma::mat	target,
		std::vector< ActionType >	sampledActions,
		arma::mat	nextActionValues,
		arma::mat &	gradients
	)

inline

Update the priorities of transitions and Update the gradients.

Parameters

target	The learned value.
sampledActions	Agent's sampled action.
nextActionValues	Agent's next action.
gradients	The model's gradients.

Definition at line 289 of file prioritized_replay.hpp.

References PrioritizedReplay< EnvironmentType >::Transition::action, and PrioritizedReplay< EnvironmentType >::UpdatePriorities().

◆ UpdatePriorities()

void UpdatePriorities	(	arma::ucolvec &	indices,
		arma::colvec &	priorities
	)

inline

Update priorities of sampled transitions.

Parameters

indices	The indices of sample to be updated.
priorities	Their corresponding priorities.

Definition at line 256 of file prioritized_replay.hpp.

References alpha().

Referenced by PrioritizedReplay< EnvironmentType >::Update().

The documentation for this class was generated from the following file:

/home/ryan/src/mlpack.org/_src/mlpack-git/src/mlpack/methods/reinforcement_learning/replay/prioritized_replay.hpp

Classes

Public Types

Public Member Functions

Detailed Description

template<typenameEnvironmentType> class mlpack::rl::PrioritizedReplay< EnvironmentType >

Member Typedef Documentation

◆ ActionType

◆ StateType

Constructor & Destructor Documentation

◆ PrioritizedReplay() [1/2]

◆ PrioritizedReplay() [2/2]

Member Function Documentation

◆ BetaAnneal()

◆ GetNStepInfo()

◆ NSteps()

◆ Sample()

◆ SampleProportional()

◆ Size()

◆ Store()

◆ Update()

◆ UpdatePriorities()

template
<
typename
EnvironmentType
>

class mlpack::rl::PrioritizedReplay< EnvironmentType >