[Parent directory]
[Home]

unimportant/software_and_ai/rl_for_llm.html


2025-04-14

RL for LLM

I'm writing this more for my own understanding than to teach anyone.

LLM solves following problem

RL for LLM solves following problem

Major doubt I have:

In order to do RL for LLM, you have to train the following:

How to train these models:

RL for safety

Subscribe / Comment

Enter email to subscribe, or enter comment to post comment