Open to Collab

s3nh PRO

s3nh

s3nhxx
s3nh

AI & ML interests

Quantization, LLMs, Deep Learning for good. Follow me if you like my work. Patreon.com/s3nh

Recent Activity

updated a model 7 days ago

s3nh/Nanbeige-4.1-3B-Uncensored

updated a model 7 days ago

s3nh/SmolLLM-3B-Uncensored

published a model 7 days ago

s3nh/SmolLLM-3B-Uncensored

View all activity

Organizations

updated 2 models 7 days ago

s3nh/Nanbeige-4.1-3B-Uncensored

Text Generation • 4B • Updated 7 days ago • 301

s3nh/SmolLLM-3B-Uncensored

Text Generation • 3B • Updated 7 days ago • 278

published 2 models 7 days ago

s3nh/SmolLLM-3B-Uncensored

Text Generation • 3B • Updated 7 days ago • 278

s3nh/Nanbeige-4.1-3B-Uncensored

Text Generation • 4B • Updated 7 days ago • 301

liked a model 7 days ago

HuggingFaceTB/SmolLM3-3B

Text Generation • 3B • Updated Sep 10, 2025 • 1.09M • 920

upvoted an article 7 days ago

Article

Projected Abliteration

Oct 25, 2025

•

updated a model 7 days ago

s3nh/Qwen3-0.6B-Uncensored

Text Generation • 0.6B • Updated 7 days ago • 294

published a model 7 days ago

s3nh/Qwen3-0.6B-Uncensored

Text Generation • 0.6B • Updated 7 days ago • 294

reactedto danielhanchen's post with 🚀🔥 11 days ago

Post

2955

You can now fine-tune Qwen3.5 for free with our notebook! 🔥

You just need 5GB VRAM to train Qwen3.5-2B LoRA locally!

Unsloth trains Qwen3.5 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide: https://unsloth.ai/docs/models/qwen3.5/fine-tune
Qwen3.5-4B Colab: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen3_5_(4B)_Vision.ipynb

liked 2 models 11 days ago

HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive

4B • Updated 26 days ago • 165k • 235

RoyalCities/Foundation-1

Updated 13 days ago • 288

reactedto mitkox's post with 🚀 about 1 month ago

Post

339

134,614 tok/sec input prefil max
1031 tokens/sec out gen max

At these local AI speeds, there is no User Interface for humans. My human UI is the Radicle distributed Git issues queue

On my GPU workstation:
- Z8 Fury G5 4x A6000
- MiniMax-M2.5
- Claude Code to localhost:8000

1 reply

liked a model about 1 month ago

ysong21/entropy-v1-fp8

Text Generation • 27B • Updated Feb 18 • 339 • 5

reactedto Tonic's post with 🔥 about 1 month ago

Post

3311

🙋🏻‍♂️hello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest

just a few quite minor details i'll take care of but i wanted to share here first

2 replies

reactedto MonsterMMORPG's post with 🔥 about 1 month ago

Post

2986

SECourses Musubi Trainer upgraded to V27 and FLUX 2, FLUX Klein, Z-Image training added with demo configs - amazing VRAM optimized - read the news

App is here : https://www.patreon.com/posts/137551634

Full tutorial how to use and train : https://youtu.be/DPX3eBTuO_Y

1 reply

reactedto codelion's post with 🔥 about 1 month ago

Post

6145

Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models!

Key findings from our research on optimal architectures for small language models:

→ Depth beats width: 32 layers outperforms 12 layers at the same parameter count
→ Best-in-class factuality: 47.5% on TruthfulQA
→ 10x training efficiency using WSD (Warmup-Stable-Decay) conversion
→ Canon layers add only 0.13% parameters but improve reasoning

We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens.

Blog: https://huggingface.co/blog/codelion/optimal-model-architecture
Model: codelion/dhara-70m

1 reply

reactedto giux78's post with 🔥 about 2 months ago

Post

227

Together with @mferraretto and @efederici we released #Nesso-4B, a new model specialized for agentic workflows.

mii-llm/nesso-4B

#Nesso-4B is a fine-tuned version of Qwen-4B, trained on a highly curated and balanced dataset designed specifically for multilingual agentic workflows and conversational use cases.

As shown in the video below we simulate, the new “cowork” from #Antrophic, without any data sharing all running on a consumer device. The model can be used to build agentic behavior in #privateAI environments.

Not every problem requires super intelligence: in many cases, intelligence at the edge is more than enough.

#Nesso4B #AgenticAI #PrivateAI #EdgeAI #OnDeviceAI