Post
112
I'm thrilled to share our new paper, "FinForge: Semi-Synthetic Financial Benchmark Generation" in AI4Finance, AAAI'26!
Key Contributions:
- FinForge Framework: A hybrid pipeline integrating manual/programmatic corpus construction with rigorous LM-based synthesis.
- FinForge-5k Dataset: A new snapshot benchmark comprising over 5,000 human-validated Q&A pairs across 11 financial subdomains, derived from a curated corpus of 100,000 verified documents (143M tokens).
- Benchmarking Results: Evaluation of state-of-the-art open and closed-source models reveals significant variance in financial reasoning capabilities, with leading models achieving approximately 80% accuracy.
Huge thanks to my co-authors @glennmatlin , Anant Gupta, Anirudh JM, Rayan Castilla, and Yi Mei Ng for this collaboration.
You can read the paper: FinForge: Semi-Synthetic Financial Benchmark Generation (2601.06747)
Key Contributions:
- FinForge Framework: A hybrid pipeline integrating manual/programmatic corpus construction with rigorous LM-based synthesis.
- FinForge-5k Dataset: A new snapshot benchmark comprising over 5,000 human-validated Q&A pairs across 11 financial subdomains, derived from a curated corpus of 100,000 verified documents (143M tokens).
- Benchmarking Results: Evaluation of state-of-the-art open and closed-source models reveals significant variance in financial reasoning capabilities, with leading models achieving approximately 80% accuracy.
Huge thanks to my co-authors @glennmatlin , Anant Gupta, Anirudh JM, Rayan Castilla, and Yi Mei Ng for this collaboration.
You can read the paper: FinForge: Semi-Synthetic Financial Benchmark Generation (2601.06747)