Browse parent directory

unimportant/rl_for_llm.html


2025-04-14

RL for LLM

I'm writing this more for my own understanding than to teach anyone.

LLM solves following problem

RL for LLM solves following problem

Major doubt I have:

In order to do RL for LLM, you have to train the following:

How to train these models:

RL for safety