Implementation of Soft Actor-Critic, a model-free off-policy actor-critic based deep reinforcement learning algorithm. More...

Public Types
using	ActionType = typename EnvironmentType::Action
	Convenient typedef for action. More...

using	StateType = typename EnvironmentType::State
	Convenient typedef for state. More...

Public Member Functions
	SAC (TrainingConfig &config, QNetworkType &learningQ1Network, PolicyNetworkType &policyNetwork, ReplayType &replayMethod, UpdaterType qNetworkUpdater=UpdaterType(), UpdaterType policyNetworkUpdater=UpdaterType(), EnvironmentType environment=EnvironmentType())
	Create the SAC object with given settings. More...

	~SAC ()
	Clean memory. More...

const ActionType &	Action () const
	Get the action of the agent. More...

bool &	Deterministic ()
	Modify the training mode / test mode indicator. More...

const bool &	Deterministic () const
	Get the indicator of training mode / test mode. More...

double	Episode ()
	Execute an episode. More...

void	SelectAction ()
	Select an action, given an agent. More...

void	SoftUpdate (double rho)
	Softly update the learning Q network parameters to the target Q network parameters. More...

StateType &	State ()
	Modify the state of the agent. More...

const StateType &	State () const
	Get the state of the agent. More...

size_t &	TotalSteps ()
	Modify total steps from beginning. More...

const size_t &	TotalSteps () const
	Get total steps from beginning. More...

void	Update ()
	Update the Q and policy networks. More...

Detailed Description

template<typename EnvironmentType, typename QNetworkType, typename PolicyNetworkType, typename UpdaterType, typename ReplayType = RandomReplay<EnvironmentType>>
class mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >

Implementation of Soft Actor-Critic, a model-free off-policy actor-critic based deep reinforcement learning algorithm.

For more details, see the following:

@misc{haarnoja2018soft,
 author    = {Tuomas Haarnoja and
              Aurick Zhou and
              Kristian Hartikainen and
              George Tucker and
              Sehoon Ha and
              Jie Tan and
              Vikash Kumar and
              Henry Zhu and
              Abhishek Gupta and
              Pieter Abbeel and
              Sergey Levine},
 title     = {Soft Actor-Critic Algorithms and Applications},
 year      = {2018},
 url       = {https://arxiv.org/abs/1812.05905}
}

Template Parameters

EnvironmentType	The environment of the reinforcement learning task.
NetworkType	The network to compute action value.
UpdaterType	How to apply gradients when training.
ReplayType	Experience replay method.

Definition at line 64 of file sac.hpp.

Member Typedef Documentation

◆ ActionType

using ActionType = typename EnvironmentType::Action

Convenient typedef for action.

Definition at line 71 of file sac.hpp.

◆ StateType

using StateType = typename EnvironmentType::State

Convenient typedef for state.

Definition at line 68 of file sac.hpp.

Constructor & Destructor Documentation

◆ SAC()

SAC	(	TrainingConfig &	config,
		QNetworkType &	learningQ1Network,
		PolicyNetworkType &	policyNetwork,
		ReplayType &	replayMethod,
		UpdaterType	qNetworkUpdater = `UpdaterType()`,
		UpdaterType	policyNetworkUpdater = `UpdaterType()`,
		EnvironmentType	environment = `EnvironmentType()`
	)

Create the SAC object with given settings.

If you want to pass in a parameter and discard the original parameter object, you can directly pass the parameter, as the constructor takes a reference. This avoids unnecessary copy.

Parameters

config	Hyper-parameters for training.
learningQ1Network	The network to compute action value.
policyNetwork	The network to produce an action given a state.
replayMethod	Experience replay method.
qNetworkUpdater	How to apply gradients to Q network when training.
policyNetworkUpdater	How to apply gradients to policy network when training.
environment	Reinforcement learning task.

◆ ~SAC()

~SAC ( )

Clean memory.

Member Function Documentation

◆ Action()

const ActionType& Action ( ) const

inline

Get the action of the agent.

Definition at line 137 of file sac.hpp.

◆ Deterministic() [1/2]

bool& Deterministic ( )

inline

Modify the training mode / test mode indicator.

Definition at line 140 of file sac.hpp.

◆ Deterministic() [2/2]

const bool& Deterministic ( ) const

inline

Get the indicator of training mode / test mode.

Definition at line 142 of file sac.hpp.

◆ Episode()

double Episode ( )

Execute an episode.

Returns: Return of the episode.

◆ SelectAction()

void SelectAction ( )

Select an action, given an agent.

◆ SoftUpdate()

void SoftUpdate ( double rho )

Softly update the learning Q network parameters to the target Q network parameters.

Parameters

rho	How "softly" should the parameters be copied.

◆ State() [1/2]

StateType& State ( )

inline

Modify the state of the agent.

Definition at line 132 of file sac.hpp.

◆ State() [2/2]

const StateType& State ( ) const

inline

Get the state of the agent.

Definition at line 134 of file sac.hpp.

◆ TotalSteps() [1/2]

size_t& TotalSteps ( )

inline

Modify total steps from beginning.

Definition at line 127 of file sac.hpp.

◆ TotalSteps() [2/2]

const size_t& TotalSteps ( ) const

inline

Get total steps from beginning.

Definition at line 129 of file sac.hpp.

◆ Update()

void Update ( )

Update the Q and policy networks.

The documentation for this class was generated from the following file:

/home/ryan/src/mlpack.org/_src/mlpack-git/src/mlpack/methods/reinforcement_learning/sac.hpp

Public Types

Public Member Functions

Detailed Description

template<typename EnvironmentType, typename QNetworkType, typename PolicyNetworkType, typename UpdaterType, typename ReplayType = RandomReplay<EnvironmentType>> class mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >

Member Typedef Documentation

◆ ActionType

◆ StateType

Constructor & Destructor Documentation

◆ SAC()

◆ ~SAC()

Member Function Documentation

◆ Action()

◆ Deterministic() [1/2]

◆ Deterministic() [2/2]

◆ Episode()

◆ SelectAction()

◆ SoftUpdate()

◆ State() [1/2]

◆ State() [2/2]

◆ TotalSteps() [1/2]

◆ TotalSteps() [2/2]

◆ Update()

template<typename EnvironmentType, typename QNetworkType, typename PolicyNetworkType, typename UpdaterType, typename ReplayType = RandomReplay<EnvironmentType>>
class mlpack::rl::SAC< EnvironmentType, QNetworkType, PolicyNetworkType, UpdaterType, ReplayType >