RECAP: Reproducing Copyrighted Data from LLMs Training with an Agentic Pipeline Paper • 2510.25941 • Published Oct 29, 2025 • 1
LumberChunker: Long-Form Narrative Document Segmentation Paper • 2406.17526 • Published Jun 25, 2024 • 2
DIS-CO: Discovering Copyrighted Content in VLMs Training Data Paper • 2502.17358 • Published Feb 24, 2025 • 1
DE-COP: Detecting Copyrighted Content in Language Models Training Data Paper • 2402.09910 • Published Feb 15, 2024 • 1