Open Voice Detect
A simple binary classification model for voice activity detection (VAD).
Model Description
open-voice-detect is a lightweight neural network designed to detect the presence of voice in audio features. This is a minimal "hello world" implementation demonstrating how to structure a PyTorch model for Hugging Face.
Model Architecture
- Input: 128-dimensional feature vector (e.g., MFCCs, mel-spectrogram features)
- Hidden layers: 2 fully connected layers with ReLU activation and dropout
- Output: 2 classes (voice present / no voice)
Intended Use
This model is a demonstration/starting point for:
- Learning how to structure models for Hugging Face
- Building voice activity detection systems
- Understanding basic audio classification
How to Use
import torch
from model import OpenVoiceDetect
# Load model
model = OpenVoiceDetect(input_size=128, hidden_size=64)
checkpoint = torch.load('pytorch_model.bin')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# Prepare input (batch_size, 128)
# In practice, extract 128-dimensional features from audio
audio_features = torch.randn(1, 128)
# Get prediction
prediction = model.predict(audio_features)
print(f"Voice detected: {prediction.item() == 1}")
Training Data
This is an untrained demo model. For production use, you would need to train on a dataset with:
- Audio samples with voice activity
- Audio samples without voice (silence, noise, music)
Model Details
- Model type: Binary classifier
- Framework: PyTorch
- Parameters: ~10K
- Input size: 128 features
- Output: Binary classification (0: no voice, 1: voice)
Limitations
- This is a demonstration model with random weights
- Not trained on real data
- Requires feature extraction preprocessing (not included)
- For production use, train on appropriate voice detection datasets
License
MIT
- Downloads last month
- 1