📂 Project Tree
WSChuan-ASR
├── paraformer_large_chuan/
│ ├── config.yaml
│ ├── model.pt
│ └── infer.py
│
├── Qwen2.5-omni3B/
| ├──added_tokens.json
| ├──args.json
| ├──char_template.jinja
| ├──config.json
| ├──generation_config.json
| ├──merges.txt
| ├──model-00001-of-00003.safetensors
| ├──model-00002-of-00003.safetensors
| ├──model-00003-of-00003.safetensors
| ├──model.safetensors.index.json
| ├──preprocessor_config.json
| ├──special_tokens_map.json
| ├──spk_dict.pt
| ├──tokenizer_config.json
| ├──tokenizer.json
| ├──video_preprocessor_config.json
| └──vocab.json
│
├── .gitattributes
└── README.md
ASR Leaderboard
| Model |
Model Size |
WSC-Eval-ASR - Easy |
WSC-Eval-ASR - Hard |
WSC-Eval-ASR - Total |
Magicdata - Conversation |
Magicdata - Daily-Use |
Avg. |
| with LLM |
|
|
|
|
|
|
|
| Kimi-Audio |
7B |
16.65 |
28.66 |
17.66 |
24.67 |
5.77 |
18.68 |
| FireRedASR-LLM |
8.3B |
12.80 |
25.27 |
14.40 |
17.68 |
6.69 |
15.37 |
| Qwen2.5-omni |
3B |
16.94 |
26.01 |
18.20 |
20.40 |
6.32 |
17.69 |
| Qwen2.5-omni-WSC-Finetune⭐ |
3B |
14.36 |
24.14 |
15.61 |
18.45 |
6.15 |
15.74 |
| Qwen2.5-omni+internal data⭐ |
3B |
13.17 |
23.36 |
14.81 |
18.50 |
5.88 |
15.14 |
| Qwen2.5-omni-WSC-Finetune + internal data⭐ |
3B |
12.93 |
23.19 |
14.25 |
17.95 |
5.89 |
14.84 |
| without LLM |
|
|
|
|
|
|
|
| SenseVoice-small |
234M |
17.43 |
28.38 |
18.39 |
23.50 |
8.77 |
19.29 |
| Whisper |
244M |
52.06 |
63.99 |
53.59 |
55.88 |
52.03 |
55.51 |
| FireRedASR-AED |
1.1B |
13.29 |
23.64 |
14.62 |
17.84 |
6.69 |
15.14 |
| Paraformer |
220M |
14.34 |
24.61 |
15.66 |
19.81 |
8.16 |
16.52 |
| Paraformer-WSC-Finetune⭐ |
220M |
12.15 |
22.60 |
13.51 |
16.60 |
8.02 |
14.58 |
| Paraformer + internal data⭐ |
220M |
11.93 |
21.82 |
13.14 |
15.61 |
6.77 |
13.85 |
| Paraformer-WSC-Finetune + internal data⭐ |
220M |
11.59 |
21.59 |
12.87 |
14.59 |
6.28 |
13.38 |
ASR Inference
Paraformer_large_Chuan
export CUDA_VISIBLE_DEVICES=7
root_dir=./test_data
test_sets=("WSC-Eval-ASR" "WSC-Eval-ASR-Hard" "WSC-Eval-ASR-Easy")
model_dir=./model_dir
out_rootdir=./results
mkdir -p $out_rootdir
python infer.py \
--model $model_dir \
--wav_scp_file $root_dir/$test_data/wav.scp \
--output_dir $out_rootdir/debug \
--device "cuda" \
--output_file $out_dir/hyp.txt
Qwen2.5-Omni-3B_Chuan
python infer_qwen2.5omni.py \
--wavs_path /path/to/your/wav.scp \
--out_path /path/to/your/results.txt \
--gpu 0 \
--model /path/to/your/model