Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
scale-safety-research
's Collections
Open Source RM Sycophancy
Alignment Faking Datasets
Gemma 2 9b Emergent Misalignment
Apollo Deception Probes Datasets
Helpful-Only Synthetic Documents
Open Source RM Sycophancy
updated
Jul 10, 2025
Upvote
-
abhayesian/reward-models-biases-docs
Viewer
•
Updated
Jul 2, 2025
•
100k
•
1
abhayesian/old-biased-responses
Viewer
•
Updated
Jul 10, 2025
•
9.76k
•
6
abhayesian/llama-3.3-70b-reward-model-biases-merged
Text Generation
•
71B
•
Updated
Aug 13, 2025
•
1
Note
Trained on just the docs
Upvote
-
Share collection
View history
Collection guide
Browse collections