Open Voice Detect

A simple binary classification model for voice activity detection (VAD).

Model Description

open-voice-detect is a lightweight neural network designed to detect the presence of voice in audio features. This is a minimal "hello world" implementation demonstrating how to structure a PyTorch model for Hugging Face.

Model Architecture

Input: 128-dimensional feature vector (e.g., MFCCs, mel-spectrogram features)
Hidden layers: 2 fully connected layers with ReLU activation and dropout
Output: 2 classes (voice present / no voice)

Intended Use

This model is a demonstration/starting point for:

Learning how to structure models for Hugging Face
Building voice activity detection systems
Understanding basic audio classification

How to Use

import torch
from model import OpenVoiceDetect

# Load model
model = OpenVoiceDetect(input_size=128, hidden_size=64)
checkpoint = torch.load('pytorch_model.bin')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Prepare input (batch_size, 128)
# In practice, extract 128-dimensional features from audio
audio_features = torch.randn(1, 128)

# Get prediction
prediction = model.predict(audio_features)
print(f"Voice detected: {prediction.item() == 1}")

Training Data

This is an untrained demo model. For production use, you would need to train on a dataset with:

Audio samples with voice activity
Audio samples without voice (silence, noise, music)

Model Details

Model type: Binary classifier
Framework: PyTorch
Parameters: ~10K
Input size: 128 features
Output: Binary classification (0: no voice, 1: voice)

Limitations

This is a demonstration model with random weights
Not trained on real data
Requires feature extraction preprocessing (not included)
For production use, train on appropriate voice detection datasets

License

MIT

Downloads last month: 1