Apply for community grant: Company project (gpu and storage)

#2
by Keeby-smilyai - opened
Smilyai labs org

Hugging Face GPU Grant Request

Project Title: Autonomous Multi-Agent Code Generator

Summary

This Hugging Face Space implements an autonomous multi-agent system where specialized roles—Planner, Architect, Coder, Reviewer, Tester, and Publisher—collaborate to transform natural language prompts into complete, functional Python projects. Leveraging open-source models like Qwen3-0.6B and Phi-3-mini exclusively for inference, the system generates code, tests, and documentation without any training or fine-tuning.

Motivation: Why GPU Acceleration Is Essential

CPU-only Spaces suffer from severe performance bottlenecks for LLM inference: the 3.8B-parameter Phi-3-mini demands ~10GB RAM and incurs multi-minute delays per generation step, rendering the experience frustratingly slow. A GPU upgrade (T4, A10G, or equivalent) would accelerate inference to seconds per turn, enabling real-time interactivity and broader adoption. This aligns fully with Hugging Face's GPU Grant guidelines, focusing solely on inference with pre-trained models to deliver public value to the community.

Why Persistent Storage Is Required

Generated projects include zipped codebases, test suites, logs, and user-specific artifacts. To support concurrent multi-user experimentation without data volatility, 20 GB of persistent storage is necessary—accommodating dozens of active sessions while preserving outputs for review and iteration.

Models in Use

  • Qwen/Qwen3-0.6B (current; efficient 0.6B dense model for core agent tasks)
  • microsoft/Phi-3-mini-4k-instruct (3.8B; planned integration for enhanced reasoning)
  • Qwen/Qwen2.5-Coder-7B-Instruct (7B; planned for specialized code generation tasks)
  • Salesforce/CodeLlama-7b-Instruct-hf (7B; planned for advanced syntax handling and autocompletion)
  • mistralai/Mistral-7B-Instruct-v0.3 (7B; planned for versatile instruction-following in agent orchestration)

All models are loaded via Hugging Face Transformers for lightweight, quantized inference.

Community Impact and Value

This Space showcases the power of orchestrating compact open-source LLMs for practical software engineering, reducing reliance on costly proprietary APIs. It offers the HF community a free, interactive playground for AI-driven code generation—fostering experimentation, education, and innovation in agentic workflows. Early prototypes have demonstrated viability for tasks like building Flask apps or data pipelines, with potential to inspire similar tools.

Future Enhancements Post-Grant

With GPU resources secured, we plan to expand the system's capabilities for more ambitious use cases:

  • Integrate support for larger open-source models, such as a 20B-parameter GPT-style OSS model (e.g., an updated GPT-NeoX-20B successor or Qwen2.5-Coder-32B variant), enabling handling of complex, multi-file projects with deeper reasoning.
  • Implement a dynamic model selector in the user interface, allowing seamless switching between small (0.6B–7B) models for quick iterations and medium-sized (up to 34B) options for high-fidelity outputs, democratizing access to scaled intelligence.

Specific Request

  • GPU Tier: T4 or A10G (adequate for batched inference on these small models)
  • Persistent Storage: 20 GB
Smilyai labs org

@merve Hi could you please review my request. Thank you.

Smilyai labs org
•
edited Sep 22

@hysts Hi could you please consider this request. Thank you

Sign up or log in to comment