SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding
Paper
β’
2505.16630
β’
Published
A Multimodal Vision-Language Model for Soccer Game Understanding
SoccerChat-qwen2-vl-7b is a LoRA-finetuned version of Qwen2-VL-7B-Instruct designed for soccer video understanding and dialogue.
It is trained on the SoccerChat dataset, introduced in the paper SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding.
The model integrates video frames, event annotations, and commentary text to support question answering, commentary generation, and event-based reasoning in soccer.
Use the code below to get started with the model.
The model accepts video + text queries.
import os
import torch
from swift.llm import PtEngine, RequestConfig, InferRequest
from transformers import BitsAndBytesConfig
# quantized for free T4 in Colab; paper reports performance on unquantized model.
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4", # best accuracy for 4-bit
bnb_4bit_use_double_quant=True, # better compression
bnb_4bit_compute_dtype=torch.float16
)
os.environ["FPS_MIN_FRAMES"]="24"
os.environ["FPS_MAX_FRAMES"]="24"
os.environ["VIDEO_MAX_PIXELS"]="100352"
engine = PtEngine(adapters=[ "SimulaMet/SoccerChat-qwen2-vl-7b"], quantization_config = bnb_config, attn_impl="sdpa", max_batch_size=1, use_hf=True, model_id_or_path="Qwen/Qwen2-VL-7B-Instruct", )
req_cfg = RequestConfig(max_tokens=512, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.05)
infer_requests = [
InferRequest(messages=[{
"role": "user",
"content": [
{"type": "video", "video": "https://huggingface.co/datasets/SimulaMet/SoccerChat/resolve/main/videos/MultipleEvents/100037_Shotsontarget--Balloutofplay.mp4"},
# {"type": "video","video": "data:video/mp4;base64," + base64.b64encode(open("/localpath/video.mp4", "rb").read()).decode("utf-8")}, # for local path
{"type": "text", "text": "What is shown in the video?"}
],
}])
]
resp = engine.infer(infer_requests, req_cfg)
print(resp[0].choices[0].message.content)
(For full hyperparameters and details, see paper.)
If you use this model, please cite:
@article{Gautam2025May,
author = {Gautam, Sushant and Midoglu, Cise and Thambawita, Vajira and others},
title = {{SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding}},
journal = {ArXiv e-prints},
year = {2025},
month = may,
eprint = {2505.16630},
doi = {10.48550/arXiv.2505.16630}
}