Hyperparameters in Deep RL with Optuna
Hyperparameters in Deep RL with Optuna
[1편 문제: Hyper-parameters]
[2편 해결: Bayesian Search]
[3편: Hyper-parameter Search with Optuna]
총 3편으로 나눠서 Deep Reinforcement Learning에서의 Hyperparameter 의 중요성과
다양한 Hyperparameter를 선택하는 Search 방법을 소개하고
Optuna 사용 방법을 익혀보겠습니다.
[1] 개요: Hyper-parameters in Deep RL
- Build a perfect agent to solve the Cart Pole environment using Deep Q
- set of hyperparameters used
- to become a real PRO in Reinforcement Learning
- need to learn how to tune hyperparameters
- with using the right tools
- Optuna : open-source library for hyperparameters search in the Python ecosystem
[2] 문제: Hyper-parameters
Machine Learning models
1. parameters
- numbers found AFTER training your model
2. hyperparameters
- numbers need to set BEFORE training the model
- exists all around ML
ex. learning rate in supervised machine learning problems
- too low number → stuck in local minima
- too large number → oscillate too much and never converge to the optimal parameters
- Deep RL is even more challenging.
- Deep RL problem
- have more hyperparameters than supervised ML models
- hyperparameters in Deep RL have a huge impact on the final training outcome
How can we find the good hyperparameters?
- to find good hyperparameters we follow trial-and-error approach
- choose a set of hyperparameters
- train the agent
- evaluate the agent
- if we are happy with the result, we are done.
Otherwise, we choose a new set of hp and repeat the whole process.
[3] Grid Search
- trying all possible combinations and select the one that works best.
- method called grid search
- works well for many supervised ML problems
![](https://blog.kakaocdn.net/dn/vqYkh/btrTALJd8nb/JZFAXijWlbDUSkpH64XUzk/img.png)
[4] Grid Search Problem
- Deep RL problems
- many more hyperparameters (e.g 10-20)
- each can take many possible values
- it creates massive grid in which to search the best combinatio
- number of combinations grows exponentially
- ex. 10개 hyperparameter, 1~10까지 가능하다면? grid size = 10,000,000,000
- 한 번의 training loop이 10분이 걸린다고 해도? 10분 X 10,000,000,000 = 190,258년
- 현실적으로 불가능!
- 100,000개의 병렬 프로세스로 돌린다고 해도 2년..
- number of combinations grows exponentially
- 따라서 Grid search는 Deep RL 문제 풀이에는 매우 비효율적인 방법
[5] Random Search
- instead of checking each of the 𝑁 possible hyperparameter combinations
- randomly try a subset of them with size 𝑇
- 𝑇 is much smaller than 𝑁
- also train and evaluate the agent 𝑇 times
- with this 𝑇 trials
- we select the combination of hyperparameters that worked best
- more efficient way to search and works better than Grid Search
- speed and quality of the solution
- but it is just spinning a roulette to decide hyperparameter
- something more smart way needed.
'스스로 학습 > seminar' 카테고리의 다른 글
[ChatGPT 블로그 정리] 99%의 ChatGPT 사용자보다 앞서는 방법 (0) | 2023.04.24 |
---|---|
Hyper-parameters in Deep RL with Optuna [3편 Optuna] (0) | 2022.12.17 |
Hyper-parameters in Deep RL with Optuna [2편 해결] (0) | 2022.12.16 |