Русские видео

Сейчас в тренде

Иностранные видео


Скачать с ютуб Policy Gradient Methods | Reinforcement Learning Part 6 в хорошем качестве

Policy Gradient Methods | Reinforcement Learning Part 6 1 год назад


Если кнопки скачивания не загрузились НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием, пожалуйста напишите в поддержку по адресу внизу страницы.
Спасибо за использование сервиса savevideohd.ru



Policy Gradient Methods | Reinforcement Learning Part 6

The machine learning consultancy: https://truetheta.io Join my email list to get educational and useful articles (and nothing else!): https://mailchi.mp/truetheta/true-the... Want to work together? See here: https://truetheta.io/about/#want-to-w... Policy Gradient Methods are among the most effective techniques in Reinforcement Learning. In this video, we'll motivate their design, observe their behavior and understand their background theory. SOCIAL MEDIA LinkedIn :   / dj-rich-90b91753   Twitter :   / duanejrich   Github: https://github.com/Duane321 Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon:   / mutualinformation   SOURCES FOR THE FULL SERIES [1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018. [2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021,    • DeepMind x UCL RL Lecture Series - In...   [3] J. Achiam. Spinning Up in Deep Reinforcement Learning, OpenAI, 2018 ADDITIONAL SOURCES FOR THIS VIDEO [4] J. Achiam, Spinning Up in Deep Reinforcement Learning: Intro to Policy Optimization, OpenAI, 2018, https://spinningup.openai.com/en/late... [5] D. Silver, Lecture 7: Policy Gradient Methods, Deepmind, 2015,    • RL Course by David Silver - Lecture 7...   TIMESTAMPS 0:00 Introduction 0:50 Basic Idea of Policy Gradient Methods 2:30 A Familiar Shape 4:23 Motivating the Update Rule 10:51 Fixing the Update Rule 12:55 Example: Windy Highway 16:47 A Problem with Naive PGMs 19:43 Reinforce with Baseline 21:42 The Policy Gradient Theorem 25:20 General Comments 28:02 Thanking The Sources LINKS Windy Highway: https://github.com/Duane321/mutual_in... NOTES [1] When motivating the update rule with an animation protopoints and theta bars, I don't specify alpha. That's because the lengths of the gradient arrows can only be interpretted on a relative basis. Their absolute numeric values can't be deduced from the animation because there was some unmentioned scaling done to make the animation look natural. Mentioning alpha would have make this calculation possible to attempt, so I avoided it.

Comments