Implementation of prioritized experience replay. More...
Classes | |
| struct | Transition | 
Public Types | |
| using | ActionType = typename EnvironmentType::Action | 
| Convenient typedef for action.  More... | |
| using | StateType = typename EnvironmentType::State | 
| Convenient typedef for state.  More... | |
Public Member Functions | |
| PrioritizedReplay () | |
| Default constructor.  More... | |
| PrioritizedReplay (const size_t batchSize, const size_t capacity, const double alpha, const size_t nSteps=1, const size_t dimension=StateType::dimension) | |
| Construct an instance of prioritized experience replay class.  More... | |
| void | BetaAnneal () | 
| Annealing the beta.  More... | |
| void | GetNStepInfo (double &reward, StateType &nextState, bool &isEnd, const double &discount) | 
| Get the reward, next state and terminal boolean for nth step.  More... | |
| const size_t & | NSteps () const | 
| Get the number of steps for n-step agent.  More... | |
| void | Sample (arma::mat &sampledStates, std::vector< ActionType > &sampledActions, arma::rowvec &sampledRewards, arma::mat &sampledNextStates, arma::irowvec &isTerminal) | 
| Sample some experience according to their priorities.  More... | |
| arma::ucolvec | SampleProportional () | 
| Sample some experience according to their priorities.  More... | |
| const size_t & | Size () | 
| Get the number of transitions in the memory.  More... | |
| void | Store (StateType state, ActionType action, double reward, StateType nextState, bool isEnd, const double &discount) | 
| Store the given experience and set the priorities for the given experience.  More... | |
| void | Update (arma::mat target, std::vector< ActionType > sampledActions, arma::mat nextActionValues, arma::mat &gradients) | 
| Update the priorities of transitions and Update the gradients.  More... | |
| void | UpdatePriorities (arma::ucolvec &indices, arma::colvec &priorities) | 
| Update priorities of sampled transitions.  More... | |
Implementation of prioritized experience replay.
Prioritized experience replay can replay important transitions more frequently by prioritizing transitions, and make agent learn more efficiently.
| EnvironmentType | Desired task. | 
Definition at line 39 of file prioritized_replay.hpp.
| using ActionType = typename EnvironmentType::Action | 
Convenient typedef for action.
Definition at line 43 of file prioritized_replay.hpp.
| using StateType = typename EnvironmentType::State | 
Convenient typedef for state.
Definition at line 46 of file prioritized_replay.hpp.
      
  | 
  inline | 
      
  | 
  inline | 
Construct an instance of prioritized experience replay class.
| batchSize | Number of examples returned at each sample. | 
| capacity | Total memory size in terms of number of examples. | 
| alpha | How much prioritization is used. | 
| nSteps | Number of steps to look in the future. | 
| dimension | The dimension of an encoded state. | 
Definition at line 82 of file prioritized_replay.hpp.
      
  | 
  inline | 
Annealing the beta.
Definition at line 276 of file prioritized_replay.hpp.
Referenced by PrioritizedReplay< EnvironmentType >::Sample().
      
  | 
  inline | 
Get the reward, next state and terminal boolean for nth step.
| reward | Given reward. | 
| nextState | Given next state. | 
| isEnd | Whether next state is terminal state. | 
| discount | The discount parameter. | 
Definition at line 171 of file prioritized_replay.hpp.
Referenced by PrioritizedReplay< EnvironmentType >::Store().
      
  | 
  inline | 
Get the number of steps for n-step agent.
Definition at line 308 of file prioritized_replay.hpp.
References alpha().
      
  | 
  inline | 
Sample some experience according to their priorities.
| sampledStates | Sampled encoded states. | 
| sampledActions | Sampled actions. | 
| sampledRewards | Sampled rewards. | 
| sampledNextStates | Sampled encoded next states. | 
| isTerminal | Indicate whether corresponding next state is terminal state. | 
Definition at line 221 of file prioritized_replay.hpp.
References PrioritizedReplay< EnvironmentType >::BetaAnneal(), and PrioritizedReplay< EnvironmentType >::SampleProportional().
      
  | 
  inline | 
Sample some experience according to their priorities.
Definition at line 198 of file prioritized_replay.hpp.
Referenced by PrioritizedReplay< EnvironmentType >::Sample().
      
  | 
  inline | 
Get the number of transitions in the memory.
Definition at line 268 of file prioritized_replay.hpp.
      
  | 
  inline | 
Store the given experience and set the priorities for the given experience.
| state | Given state. | 
| action | Given action. | 
| reward | Given reward. | 
| nextState | Given next state. | 
| isEnd | Whether next state is terminal state. | 
| discount | The discount parameter. | 
Definition at line 122 of file prioritized_replay.hpp.
References PrioritizedReplay< EnvironmentType >::Transition::action, alpha(), PrioritizedReplay< EnvironmentType >::GetNStepInfo(), PrioritizedReplay< EnvironmentType >::Transition::isEnd, PrioritizedReplay< EnvironmentType >::Transition::nextState, PrioritizedReplay< EnvironmentType >::Transition::reward, and PrioritizedReplay< EnvironmentType >::Transition::state.
      
  | 
  inline | 
Update the priorities of transitions and Update the gradients.
| target | The learned value. | 
| sampledActions | Agent's sampled action. | 
| nextActionValues | Agent's next action. | 
| gradients | The model's gradients. | 
Definition at line 289 of file prioritized_replay.hpp.
References PrioritizedReplay< EnvironmentType >::Transition::action, and PrioritizedReplay< EnvironmentType >::UpdatePriorities().
      
  | 
  inline | 
Update priorities of sampled transitions.
| indices | The indices of sample to be updated. | 
| priorities | Their corresponding priorities. | 
Definition at line 256 of file prioritized_replay.hpp.
References alpha().
Referenced by PrioritizedReplay< EnvironmentType >::Update().