SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType > Class Template Reference

Implementation of Soft Actor-Critic, a model-free off-policy actor-critic based deep reinforcement learning algorithm. More...

Public Types

using ActionType = typename EnvironmentType::Action
 Convenient typedef for action. More...

 
using StateType = typename EnvironmentType::State
 Convenient typedef for state. More...

 

Public Member Functions

 SAC (TrainingConfig &config, QNetworkType &learningQ1Network, PolicyNetworkType &policyNetwork, ReplayType &replayMethod, UpdaterType qNetworkUpdater=UpdaterType(), UpdaterType policyNetworkUpdater=UpdaterType(), EnvironmentType environment=EnvironmentType())
 Create the SAC object with given settings. More...

 
 ~SAC ()
 Clean memory. More...

 
const ActionTypeAction () const
 Get the action of the agent. More...

 
bool & Deterministic ()
 Modify the training mode / test mode indicator. More...

 
const bool & Deterministic () const
 Get the indicator of training mode / test mode. More...

 
double Episode ()
 Execute an episode. More...

 
void SelectAction ()
 Select an action, given an agent. More...

 
void SoftUpdate (double rho)
 Softly update the learning Q network parameters to the target Q network parameters. More...

 
StateTypeState ()
 Modify the state of the agent. More...

 
const StateTypeState () const
 Get the state of the agent. More...

 
size_t & TotalSteps ()
 Modify total steps from beginning. More...

 
const size_t & TotalSteps () const
 Get total steps from beginning. More...

 
void Update ()
 Update the Q and policy networks. More...

 

Detailed Description


template<typename EnvironmentType, typename QNetworkType, typename PolicyNetworkType, typename UpdaterType, typename ReplayType = RandomReplay<EnvironmentType>>
class mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >

Implementation of Soft Actor-Critic, a model-free off-policy actor-critic based deep reinforcement learning algorithm.

For more details, see the following:

@misc{haarnoja2018soft,
author = {Tuomas Haarnoja and
Aurick Zhou and
Kristian Hartikainen and
George Tucker and
Sehoon Ha and
Jie Tan and
Vikash Kumar and
Henry Zhu and
Abhishek Gupta and
Pieter Abbeel and
Sergey Levine},
title = {Soft Actor-Critic Algorithms and Applications},
year = {2018},
url = {https://arxiv.org/abs/1812.05905}
}
Template Parameters
EnvironmentTypeThe environment of the reinforcement learning task.
NetworkTypeThe network to compute action value.
UpdaterTypeHow to apply gradients when training.
ReplayTypeExperience replay method.

Definition at line 64 of file sac.hpp.

Member Typedef Documentation

◆ ActionType

using ActionType = typename EnvironmentType::Action

Convenient typedef for action.

Definition at line 71 of file sac.hpp.

◆ StateType

using StateType = typename EnvironmentType::State

Convenient typedef for state.

Definition at line 68 of file sac.hpp.

Constructor & Destructor Documentation

◆ SAC()

SAC ( TrainingConfig config,
QNetworkType &  learningQ1Network,
PolicyNetworkType &  policyNetwork,
ReplayType &  replayMethod,
UpdaterType  qNetworkUpdater = UpdaterType(),
UpdaterType  policyNetworkUpdater = UpdaterType(),
EnvironmentType  environment = EnvironmentType() 
)

Create the SAC object with given settings.

If you want to pass in a parameter and discard the original parameter object, you can directly pass the parameter, as the constructor takes a reference. This avoids unnecessary copy.

Parameters
configHyper-parameters for training.
learningQ1NetworkThe network to compute action value.
policyNetworkThe network to produce an action given a state.
replayMethodExperience replay method.
qNetworkUpdaterHow to apply gradients to Q network when training.
policyNetworkUpdaterHow to apply gradients to policy network when training.
environmentReinforcement learning task.

◆ ~SAC()

~SAC ( )

Clean memory.

Member Function Documentation

◆ Action()

const ActionType& Action ( ) const
inline

Get the action of the agent.

Definition at line 137 of file sac.hpp.

◆ Deterministic() [1/2]

bool& Deterministic ( )
inline

Modify the training mode / test mode indicator.

Definition at line 140 of file sac.hpp.

◆ Deterministic() [2/2]

const bool& Deterministic ( ) const
inline

Get the indicator of training mode / test mode.

Definition at line 142 of file sac.hpp.

◆ Episode()

double Episode ( )

Execute an episode.

Returns
Return of the episode.

◆ SelectAction()

void SelectAction ( )

Select an action, given an agent.

◆ SoftUpdate()

void SoftUpdate ( double  rho)

Softly update the learning Q network parameters to the target Q network parameters.

Parameters
rhoHow "softly" should the parameters be copied.

◆ State() [1/2]

StateType& State ( )
inline

Modify the state of the agent.

Definition at line 132 of file sac.hpp.

◆ State() [2/2]

const StateType& State ( ) const
inline

Get the state of the agent.

Definition at line 134 of file sac.hpp.

◆ TotalSteps() [1/2]

size_t& TotalSteps ( )
inline

Modify total steps from beginning.

Definition at line 127 of file sac.hpp.

◆ TotalSteps() [2/2]

const size_t& TotalSteps ( ) const
inline

Get total steps from beginning.

Definition at line 129 of file sac.hpp.

◆ Update()

void Update ( )

Update the Q and policy networks.


The documentation for this class was generated from the following file:
  • /home/ryan/src/mlpack.org/_src/mlpack-git/src/mlpack/methods/reinforcement_learning/sac.hpp