muses-llm/humaneval_qwen7b_gpt-4o-mini_att_iter0_att100_sol10_snap Viewer • Updated Apr 19, 2025 • 6.89k
muses-llm/humaneval_qwen7b_gpt-4o-mini_att_iter0_att100_sol10_snap Viewer • Updated Apr 19, 2025 • 6.89k
muses-llm/humaneval_qwen7b_gpt-4o-mini_att_iter0_att20_sol10_snap Viewer • Updated Apr 18, 2025 • 1.37k • 2
muses-llm/humaneval_qwen7b_gpt-4o-mini_att_iter0_att20_sol10_snap Viewer • Updated Apr 18, 2025 • 1.37k • 2
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol10_rerun_worker4_relabeled_dpo_6000 Viewer • Updated Apr 16, 2025 • 7.5k • 4
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol10_rerun_worker4_relabeled_dpo_6000 Viewer • Updated Apr 16, 2025 • 7.5k • 4
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol10_rerun_worker4 Viewer • Updated Mar 12, 2025 • 15.9k • 2
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol10_rerun_worker4 Viewer • Updated Mar 12, 2025 • 15.9k • 2
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol50_rerun_worker4 Viewer • Updated Mar 11, 2025 • 926 • 1
muses-llm/bigcodebench_qwen7b_att_iter0_ppo_att20_sol50_rerun_worker4 Viewer • Updated Mar 11, 2025 • 926 • 1
Discover and Cure: Concept-aware Mitigation of Spurious Correlation Paper • 2305.00650 • Published May 1, 2023
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases Paper • 2404.13207 • Published Apr 19, 2024
AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning Paper • 2406.11200 • Published Jun 17, 2024