Implementation of random experience replay. More...
Classes | |
| struct | Transition |
Public Types | |
| using | ActionType = typename EnvironmentType::Action |
| Convenient typedef for action. More... | |
| using | StateType = typename EnvironmentType::State |
| Convenient typedef for state. More... | |
Public Member Functions | |
| RandomReplay () | |
| RandomReplay (const size_t batchSize, const size_t capacity, const size_t nSteps=1, const size_t dimension=StateType::dimension) | |
| Construct an instance of random experience replay class. More... | |
| void | GetNStepInfo (double &reward, StateType &nextState, bool &isEnd, const double &discount) |
| Get the reward, next state and terminal boolean for nth step. More... | |
| const size_t & | NSteps () const |
| Get the number of steps for n-step agent. More... | |
| void | Sample (arma::mat &sampledStates, std::vector< ActionType > &sampledActions, arma::rowvec &sampledRewards, arma::mat &sampledNextStates, arma::irowvec &isTerminal) |
| Sample some experiences. More... | |
| const size_t & | Size () |
| Get the number of transitions in the memory. More... | |
| void | Store (StateType state, ActionType action, double reward, StateType nextState, bool isEnd, const double &discount) |
| Store the given experience. More... | |
| void | Update (arma::mat, std::vector< ActionType >, arma::mat, arma::mat &) |
| Update the priorities of transitions and Update the gradients. More... | |
Implementation of random experience replay.
At each time step, interactions between the agent and the environment will be saved to a memory buffer. When necessary, we can simply sample previous experiences from the buffer to train the agent. Typically this would be a random sample and the memory will be a First-In-First-Out buffer.
For more information, see the following.
| EnvironmentType | Desired task. |
Definition at line 44 of file random_replay.hpp.
| using ActionType = typename EnvironmentType::Action |
Convenient typedef for action.
Definition at line 48 of file random_replay.hpp.
| using StateType = typename EnvironmentType::State |
Convenient typedef for state.
Definition at line 51 of file random_replay.hpp.
|
inline |
Definition at line 62 of file random_replay.hpp.
|
inline |
Construct an instance of random experience replay class.
| batchSize | Number of examples returned at each sample. |
| capacity | Total memory size in terms of number of examples. |
| nSteps | Number of steps to look in the future. |
| dimension | The dimension of an encoded state. |
Definition at line 78 of file random_replay.hpp.
|
inline |
Get the reward, next state and terminal boolean for nth step.
| reward | Given reward. |
| nextState | Given next state. |
| isEnd | Whether next state is terminal state. |
| discount | The discount parameter. |
Definition at line 151 of file random_replay.hpp.
Referenced by RandomReplay< EnvironmentType >::Store().
|
inline |
Get the number of steps for n-step agent.
Definition at line 228 of file random_replay.hpp.
|
inline |
Sample some experiences.
| sampledStates | Sampled encoded states. |
| sampledActions | Sampled actions. |
| sampledRewards | Sampled rewards. |
| sampledNextStates | Sampled encoded next states. |
| isTerminal | Indicate whether corresponding next state is terminal state. |
Definition at line 183 of file random_replay.hpp.
|
inline |
Get the number of transitions in the memory.
Definition at line 206 of file random_replay.hpp.
|
inline |
Store the given experience.
| state | Given state. |
| action | Given action. |
| reward | Given reward. |
| nextState | Given next state. |
| isEnd | Whether next state is terminal state. |
| discount | The discount parameter. |
Definition at line 104 of file random_replay.hpp.
References RandomReplay< EnvironmentType >::Transition::action, RandomReplay< EnvironmentType >::GetNStepInfo(), RandomReplay< EnvironmentType >::Transition::isEnd, RandomReplay< EnvironmentType >::Transition::nextState, RandomReplay< EnvironmentType >::Transition::reward, and RandomReplay< EnvironmentType >::Transition::state.
|
inline |
Update the priorities of transitions and Update the gradients.
| * | (target) The learned value |
| * | (sampledActions) Agent's sampled action |
| * | (nextActionValues) Agent's next action |
| * | (gradients) The model's gradients |
Definition at line 219 of file random_replay.hpp.