MiloMusic_YuEGP

Running

App Files Files Community

MiloMusic_YuEGP / README.md

futurespyhi

Complete MiloMusic implementation with voice-to-song generation

658e790 3 months ago

preview code

raw

history blame contribute delete

4.36 kB

	---
	title: MiloMusic - AI Music Generation
	emoji: 🎵
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.25.0
	app_file: app.py
	pinned: false
	python_version: "3.10"
	license: mit
	short_description: AI-powered voice-to-song generation using YuE model
	---

	# MiloMusic 🎵 - Hugging Face Spaces

	[![License: BSD](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)

	## 🦙 AI-Powered Music Creation for Everyone

	MiloMusic is an innovative platform that leverages multiple AI models to democratize music creation. Whether you're a seasoned musician or have zero musical training, MiloMusic enables you to create high-quality, lyrics-focused music through natural language conversation.

	> A platform for everyone - regardless of musical training at the intersection of AI and creative expression.

	## 🚀 Features

	- Natural Language Interface - Just start talking to generate song lyrics
	- Genre & Mood Selection - Customize your music with different genres and moods
	- Iterative Creation Process - Refine your lyrics through conversation
	- High-Quality Music Generation - Transform lyrics into professional-sounding music
	- User-Friendly Interface - Intuitive UI built with Gradio

	## 🔧 Architecture

	MiloMusic employs a sophisticated multi-model pipeline to deliver a seamless music creation experience:

	### Phase 1: Lyrics Generation
	1. Speech-to-Text - User voice input is transcribed using `whisper-large-v3-turbo` (via Groq API)
	2. Conversation & Refinement - `llama-4-scout-17b-16e-instruct` handles the creative conversation, generates lyrics based on user requests, and allows for iterative refinement

	### Phase 2: Music Generation
	1. Lyrics Structuring - `Gemini flash 2.0` processes the conversation history and structures the final lyrics for music generation
	2. Music Synthesis - `YuE` (乐) transforms the structured lyrics into complete songs with vocals and instrumentation

	## 💻 Technical Stack

	- LLM Models:
	- `whisper-large-v3-turbo` (via Groq) - For speech-to-text conversion
	- `llama-4-scout-17b-16e-instruct` - For creative conversation and lyrics generation
	- `Gemini flash 2.0` - For lyrics structuring
	- `YuE` - For music generation
	- UI: Gradio 5.25.0
	- Backend: Python 3.10
	- Deployment: Hugging Face Spaces with GPU support

	## 📋 System Requirements

	- Python: 3.10 (strict requirement for YuE model compatibility)
	- CUDA: 12.4+ for GPU acceleration
	- Memory: 32GB+ RAM for model operations
	- GPU: A10G/T4 or better with 24GB+ VRAM

	## 🔍 Usage

	### Using the Interface:
	1. Select your genre, mood, and theme preferences
	2. Start talking about your song ideas
	3. The assistant will create lyrics based on your selections
	4. Give feedback to refine the lyrics
	5. When you're happy with the lyrics, click "Generate Music from Lyrics"
	6. Listen to your generated song!

	## 🔬 Performance

	Music generation typically takes:
	- GPU-accelerated: ~5-10 minutes per song
	- Quality: Professional-grade vocals and instrumentation
	- Format: High-quality audio output

	## 🛠️ Development Notes

	### Spaces-Specific Configuration:
	- Custom PyTorch build with CUDA 12.4 support
	- Flash Attention compiled from source for optimal performance
	- Specialized audio processing pipeline for cloud deployment

	### Key Components:
	- `requirements_space.txt` - Dependencies with CUDA-specific PyTorch
	- `packages.txt` - System packages for audio and compilation
	- Pre-build flash-attn installation for compatibility

	## 🚨 Important Notes

	- First run may take longer as models are downloaded and cached
	- Flash Attention compilation happens during startup (may take 10-15 minutes on first build)
	- Memory usage is high during music generation - please be patient

	## 🤝 Contributing

	Contributions are welcome! Please feel free to submit a Pull Request to the main repository.

	## 👥 Team

	- Norton Gu
	- Anakin Huang
	- Erik Wasmosy

	## 📝 License

	This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

	---

	<p align="center">
	Made with ❤️ and 🦙 (LLaMA) \| Deployed on 🤗 Hugging Face Spaces
	</p>