Step-2 SFT reference policies (π_ref) used to initialize DDRO (MS MARCO / NQ; PQ and Title+URL DocIDs); use these for fair comparisons/ablations.
Kidist Amde Mekonnen
kiyam
AI & ML interests
AI (Generative models, Computer vision, NLP) ,XAI
Recent Activity
updated
a collection
13 days ago
DDRO-Reference-Policies (SFT)
updated
a model
13 days ago
kiyam/ddro-nq-tu-sft
published
a model
13 days ago
kiyam/ddro-nq-tu-sft
Organizations
None yet