Implementation of Soft Actor-Critic, a model-free off-policy actor-critic based deep reinforcement learning algorithm. More...
Public Types | |
| using | ActionType = typename EnvironmentType::Action |
| Convenient typedef for action. More... | |
| using | StateType = typename EnvironmentType::State |
| Convenient typedef for state. More... | |
Public Member Functions | |
| SAC (TrainingConfig &config, QNetworkType &learningQ1Network, PolicyNetworkType &policyNetwork, ReplayType &replayMethod, UpdaterType qNetworkUpdater=UpdaterType(), UpdaterType policyNetworkUpdater=UpdaterType(), EnvironmentType environment=EnvironmentType()) | |
| Create the SAC object with given settings. More... | |
| ~SAC () | |
| Clean memory. More... | |
| const ActionType & | Action () const |
| Get the action of the agent. More... | |
| bool & | Deterministic () |
| Modify the training mode / test mode indicator. More... | |
| const bool & | Deterministic () const |
| Get the indicator of training mode / test mode. More... | |
| double | Episode () |
| Execute an episode. More... | |
| void | SelectAction () |
| Select an action, given an agent. More... | |
| void | SoftUpdate (double rho) |
| Softly update the learning Q network parameters to the target Q network parameters. More... | |
| StateType & | State () |
| Modify the state of the agent. More... | |
| const StateType & | State () const |
| Get the state of the agent. More... | |
| size_t & | TotalSteps () |
| Modify total steps from beginning. More... | |
| const size_t & | TotalSteps () const |
| Get total steps from beginning. More... | |
| void | Update () |
| Update the Q and policy networks. More... | |
Implementation of Soft Actor-Critic, a model-free off-policy actor-critic based deep reinforcement learning algorithm.
For more details, see the following:
| EnvironmentType | The environment of the reinforcement learning task. |
| NetworkType | The network to compute action value. |
| UpdaterType | How to apply gradients when training. |
| ReplayType | Experience replay method. |
| using ActionType = typename EnvironmentType::Action |
| using StateType = typename EnvironmentType::State |
| SAC | ( | TrainingConfig & | config, |
| QNetworkType & | learningQ1Network, | ||
| PolicyNetworkType & | policyNetwork, | ||
| ReplayType & | replayMethod, | ||
| UpdaterType | qNetworkUpdater = UpdaterType(), |
||
| UpdaterType | policyNetworkUpdater = UpdaterType(), |
||
| EnvironmentType | environment = EnvironmentType() |
||
| ) |
Create the SAC object with given settings.
If you want to pass in a parameter and discard the original parameter object, you can directly pass the parameter, as the constructor takes a reference. This avoids unnecessary copy.
| config | Hyper-parameters for training. |
| learningQ1Network | The network to compute action value. |
| policyNetwork | The network to produce an action given a state. |
| replayMethod | Experience replay method. |
| qNetworkUpdater | How to apply gradients to Q network when training. |
| policyNetworkUpdater | How to apply gradients to policy network when training. |
| environment | Reinforcement learning task. |
| ~SAC | ( | ) |
Clean memory.
|
inline |
|
inline |
|
inline |
| double Episode | ( | ) |
Execute an episode.
| void SelectAction | ( | ) |
Select an action, given an agent.
| void SoftUpdate | ( | double | rho | ) |
Softly update the learning Q network parameters to the target Q network parameters.
| rho | How "softly" should the parameters be copied. |
|
inline |
|
inline |
|
inline |
| void Update | ( | ) |
Update the Q and policy networks.