
Hyperparameters in Deep RL with Optuna

[1편 문제: Hyper-parameters]

[2편 해결: Bayesian Search]

[3편: Hyper-parameter Search with Optuna]

총 3편으로 나눠서 Deep Reinforcement Learning에서의 Hyperparameter 의 중요성과

다양한 Hyperparameter를 선택하는 Search 방법을 소개하고

Optuna 사용 방법을 익혀보겠습니다.

[1] 개요: Hyper-parameters in Deep RL

  • Build a perfect agent to solve the Cart Pole environment using Deep Q
  • set of hyperparameters used
  • to become a real PRO in Reinforcement Learning
  • need to learn how to tune hyperparameters
  • with using the right tools
  • Optuna : open-source library for hyperparameters search in the Python ecosystem

[2] 문제: Hyper-parameters

Machine Learning models

1. parameters

  • numbers found AFTER training your model

2. hyperparameters

  • numbers need to set BEFORE training the model
  • exists all around ML

ex. learning rate in supervised machine learning problems

  • too low number stuck in local minima
  • too large number oscillate too much and never converge to the optimal parameters
  • Deep RL is even more challenging.

  • Deep RL problem
    • have more hyperparameters than supervised ML models
    • hyperparameters in Deep RL have a huge impact on the final training outcome

How can we find the good hyperparameters?

  • to find good hyperparameters we follow trial-and-error approach
  1. choose a set of hyperparameters
  2. train the agent
  3. evaluate the agent
  4. if we are happy with the result, we are done.
    Otherwise, we choose a new set of hp and repeat the whole process.

[3] Grid Search

  • trying all possible combinations and select the one that works best.
  • method called grid search
  • works well for many supervised ML problems
grid search

[4] Grid Search Problem

  • Deep RL problems
    • many more hyperparameters (e.g 10-20)
    • each can take many possible values
    • it creates massive grid in which to search the best combinatio
      • number of combinations grows exponentially
        • ex. 10개 hyperparameter, 1~10까지 가능하다면? grid size = 10,000,000,000
        • 한 번의 training loop이 10분이 걸린다고 해도? 10분 X 10,000,000,000 = 190,258년
        • 현실적으로 불가능!
        • 100,000개의 병렬 프로세스로 돌린다고 해도 2년..
    • 따라서 Grid search는 Deep RL 문제 풀이에는 매우 비효율적인 방법

[5] Random Search

  • instead of checking each of the 𝑁 possible hyperparameter combinations
  • randomly try a subset of them with size 𝑇
    • 𝑇 is much smaller than 𝑁
  • also train and evaluate the agent 𝑇 times


  • with this 𝑇 trials
    • we select the combination of hyperparameters that worked best
  • more efficient way to search and works better than Grid Search
    • speed and quality of the solution
    • but it is just spinning a roulette to decide hyperparameter
    • something more smart way needed.


