Implementation for epsilon greedy policy. More...
Public Types | |
using | ActionType = typename EnvironmentType::Action |
Convenient typedef for action. More... | |
Public Member Functions | |
GreedyPolicy (const double initialEpsilon, const size_t annealInterval, const double minEpsilon, const double decayRate=1.0) | |
Constructor for epsilon greedy policy class. More... | |
void | Anneal () |
Exploration probability will anneal at each step. More... | |
const double & | Epsilon () const |
ActionType | Sample (const arma::colvec &actionValue, bool deterministic=false, const bool isNoisy=false) |
Sample an action based on given action values. More... | |
Implementation for epsilon greedy policy.
In general we will select an action greedily based on the action value, however sometimes we will also randomly select an action to encourage exploration.
EnvironmentType | The reinforcement learning task. |
Definition at line 31 of file greedy_policy.hpp.
using ActionType = typename EnvironmentType::Action |
Convenient typedef for action.
Definition at line 35 of file greedy_policy.hpp.
|
inline |
Constructor for epsilon greedy policy class.
initialEpsilon | The initial probability to explore (select a random action). |
annealInterval | The steps during which the probability to explore will anneal. |
minEpsilon | Epsilon will never be less than this value. |
decayRate | How much to change the model in response to the estimated error each time the model weights are updated. |
Definition at line 48 of file greedy_policy.hpp.
|
inline |
Exploration probability will anneal at each step.
Definition at line 90 of file greedy_policy.hpp.
|
inline |
Definition at line 99 of file greedy_policy.hpp.
|
inline |
Sample an action based on given action values.
actionValue | Values for each action. |
deterministic | Always select the action greedily. |
isNoisy | Specifies whether the network used is noisy. |
Definition at line 65 of file greedy_policy.hpp.
References mlpack::math::RandInt(), and mlpack::math::Random().