Files Included :
01 course-introduction (4.89 MB)
01 basics-of-instruction-tuning (10.19 MB)
02 instruction-tuning-with-hugging-face (12.13 MB)
03 reward-modeling-response-evaluation (6.75 MB)
04 reward-model-training (11.26 MB)
05 reward-modeling-with-hugging-face (14.3 MB)
01 large-language-models-llms-as-distributions (11.04 MB)
02 from-distributions-to-policies (6.76 MB)
03 reinforcement-learning-from-human-feedback-rlhf (12.14 MB)
04 proximal-policy-optimization-ppo (7.52 MB)
05 ppo-with-hugging-face (7.03 MB)
06 ppo-trainer (9.87 MB)
01 dpo-partition-function (9.33 MB)
02 dpo-optimal-solution (11.58 MB)
03 from-optimal-policy-to-dpo (11.03 MB)
04 dpo-with-hugging-face (8.31 MB)
[center]
ScreenshotRapidGatorWarning! You are not allowed to view this text.
FileAxaWarning! You are not allowed to view this text.
TurboBitWarning! You are not allowed to view this text.