Open Voice Detect

A simple binary classification model for voice activity detection (VAD).

Model Description

open-voice-detect is a lightweight neural network designed to detect the presence of voice in audio features. This is a minimal "hello world" implementation demonstrating how to structure a PyTorch model for Hugging Face.

Model Architecture

  • Input: 128-dimensional feature vector (e.g., MFCCs, mel-spectrogram features)
  • Hidden layers: 2 fully connected layers with ReLU activation and dropout
  • Output: 2 classes (voice present / no voice)

Intended Use

This model is a demonstration/starting point for:

  • Learning how to structure models for Hugging Face
  • Building voice activity detection systems
  • Understanding basic audio classification

How to Use

import torch
from model import OpenVoiceDetect

# Load model
model = OpenVoiceDetect(input_size=128, hidden_size=64)
checkpoint = torch.load('pytorch_model.bin')
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Prepare input (batch_size, 128)
# In practice, extract 128-dimensional features from audio
audio_features = torch.randn(1, 128)

# Get prediction
prediction = model.predict(audio_features)
print(f"Voice detected: {prediction.item() == 1}")

Training Data

This is an untrained demo model. For production use, you would need to train on a dataset with:

  • Audio samples with voice activity
  • Audio samples without voice (silence, noise, music)

Model Details

  • Model type: Binary classifier
  • Framework: PyTorch
  • Parameters: ~10K
  • Input size: 128 features
  • Output: Binary classification (0: no voice, 1: voice)

Limitations

  • This is a demonstration model with random weights
  • Not trained on real data
  • Requires feature extraction preprocessing (not included)
  • For production use, train on appropriate voice detection datasets

License

MIT

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support