Spaces:
Running
Running
| title: MiloMusic - AI Music Generation | |
| emoji: π΅ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.25.0 | |
| app_file: app.py | |
| pinned: false | |
| python_version: "3.10" | |
| license: mit | |
| short_description: AI-powered voice-to-song generation using YuE model | |
| # MiloMusic π΅ - Hugging Face Spaces | |
| [](https://opensource.org/licenses/BSD-3-Clause) | |
| ## π¦ AI-Powered Music Creation for Everyone | |
| MiloMusic is an innovative platform that leverages multiple AI models to democratize music creation. Whether you're a seasoned musician or have zero musical training, MiloMusic enables you to create high-quality, lyrics-focused music through natural language conversation. | |
| > A platform for everyone - regardless of musical training at the intersection of AI and creative expression. | |
| ## π Features | |
| - **Natural Language Interface** - Just start talking to generate song lyrics | |
| - **Genre & Mood Selection** - Customize your music with different genres and moods | |
| - **Iterative Creation Process** - Refine your lyrics through conversation | |
| - **High-Quality Music Generation** - Transform lyrics into professional-sounding music | |
| - **User-Friendly Interface** - Intuitive UI built with Gradio | |
| ## π§ Architecture | |
| MiloMusic employs a sophisticated multi-model pipeline to deliver a seamless music creation experience: | |
| ### Phase 1: Lyrics Generation | |
| 1. **Speech-to-Text** - User voice input is transcribed using `whisper-large-v3-turbo` (via Groq API) | |
| 2. **Conversation & Refinement** - `llama-4-scout-17b-16e-instruct` handles the creative conversation, generates lyrics based on user requests, and allows for iterative refinement | |
| ### Phase 2: Music Generation | |
| 1. **Lyrics Structuring** - `Gemini flash 2.0` processes the conversation history and structures the final lyrics for music generation | |
| 2. **Music Synthesis** - `YuE` (δΉ) transforms the structured lyrics into complete songs with vocals and instrumentation | |
| ## π» Technical Stack | |
| - **LLM Models**: | |
| - `whisper-large-v3-turbo` (via Groq) - For speech-to-text conversion | |
| - `llama-4-scout-17b-16e-instruct` - For creative conversation and lyrics generation | |
| - `Gemini flash 2.0` - For lyrics structuring | |
| - `YuE` - For music generation | |
| - **UI**: Gradio 5.25.0 | |
| - **Backend**: Python 3.10 | |
| - **Deployment**: Hugging Face Spaces with GPU support | |
| ## π System Requirements | |
| - **Python**: 3.10 (strict requirement for YuE model compatibility) | |
| - **CUDA**: 12.4+ for GPU acceleration | |
| - **Memory**: 32GB+ RAM for model operations | |
| - **GPU**: A10G/T4 or better with 24GB+ VRAM | |
| ## π Usage | |
| ### Using the Interface: | |
| 1. Select your genre, mood, and theme preferences | |
| 2. Start talking about your song ideas | |
| 3. The assistant will create lyrics based on your selections | |
| 4. Give feedback to refine the lyrics | |
| 5. When you're happy with the lyrics, click "Generate Music from Lyrics" | |
| 6. Listen to your generated song! | |
| ## π¬ Performance | |
| Music generation typically takes: | |
| - **GPU-accelerated**: ~5-10 minutes per song | |
| - **Quality**: Professional-grade vocals and instrumentation | |
| - **Format**: High-quality audio output | |
| ## π οΈ Development Notes | |
| ### Spaces-Specific Configuration: | |
| - Custom PyTorch build with CUDA 12.4 support | |
| - Flash Attention compiled from source for optimal performance | |
| - Specialized audio processing pipeline for cloud deployment | |
| ### Key Components: | |
| - `requirements_space.txt` - Dependencies with CUDA-specific PyTorch | |
| - `packages.txt` - System packages for audio and compilation | |
| - Pre-build flash-attn installation for compatibility | |
| ## π¨ Important Notes | |
| - **First run may take longer** as models are downloaded and cached | |
| - **Flash Attention compilation** happens during startup (may take 10-15 minutes on first build) | |
| - **Memory usage is high** during music generation - please be patient | |
| ## π€ Contributing | |
| Contributions are welcome! Please feel free to submit a Pull Request to the main repository. | |
| ## π₯ Team | |
| - Norton Gu | |
| - Anakin Huang | |
| - Erik Wasmosy | |
| ## π License | |
| This project is licensed under the BSD 3-Clause License - see the LICENSE file for details. | |
| --- | |
| <p align="center"> | |
| Made with β€οΈ and π¦ (LLaMA) | Deployed on π€ Hugging Face Spaces | |
| </p> |