video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions.
AI & ML interests
https://www.ee.tsinghua.edu.cn/en/
Recent Activity
Organization Card
Department of Electronic Engineering, Tsinghua University
models
13
tsinghua-ee/WAVE-7B
Updated
tsinghua-ee/video_SALMONN2plus_72B_audioAlign
Updated
β’
1
tsinghua-ee/video_SALMONN2plus_7B_audioAlign
9B
β’
Updated
β’
404
tsinghua-ee/SALMONN
Automatic Speech Recognition
β’
Updated
β’
49
tsinghua-ee/video-SALMONN-2_plus_72B
Updated
β’
7
β’
2
tsinghua-ee/video-SALMONN-2_plus_3B
Updated
β’
826
β’
3
tsinghua-ee/video-SALMONN-2_plus_7B
Updated
β’
720
β’
6
tsinghua-ee/video-SALMONN-2
Video-Text-to-Text
β’
9B
β’
Updated
β’
188
β’
1
tsinghua-ee/Speech_Quality_Assessment
Updated
β’
1
tsinghua-ee/F-16
Video-Text-to-Text
β’
Updated
β’
25
datasets
8
tsinghua-ee/ELViM
Viewer
β’
Updated
β’
211
tsinghua-ee/SACRED-Bench
Viewer
β’
Updated
β’
2.48k
β’
68
tsinghua-ee/F-16-NBA
Preview
β’
Updated
β’
36
tsinghua-ee/AVUTBenchmark
Viewer
β’
Updated
β’
3.28k
β’
5.15k
β’
1
tsinghua-ee/video-SALMONN_2_testset
Preview
β’
Updated
β’
33
tsinghua-ee/QualiSpeech
Viewer
β’
Updated
β’
14.6k
β’
628
β’
21
tsinghua-ee/RivaBench
Viewer
β’
Updated
β’
542
β’
526
β’
2
tsinghua-ee/SAVEBench
Preview
β’
Updated
β’
101
β’
3