About this episode
Seventy3???NotebookLM???????????????????????????crypto????????AI????????????????????????????????????GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL OptimizationSummary???????????????????????????????????????????????????????????????????RL?????????????????????????????????????????????????????????????? Group Relative Policy Optimization?GRPO???????????????????????????????? GRPO???? rollout ????????????????????????????advantage value???????????????????????????????????????????????????? Group reward-Decoupled Normalization Policy Optimization?GDPO???????????????????????????????????????????????????????????????????????????????????? GDPO ? GRPO ?????????????????????????????????????????????????????????????????????????????GDPO ????? GRPO??????????????????????????????????https://arxiv.org/abs/2601.05242