SIGGRAPH 2022

non-profit

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

akhaliq submitted a paper 1 day ago

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

akhaliq submitted a paper 8 days ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

akhaliq submitted a paper 17 days ago

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

View all activity

akhaliq

submitted a paper to Daily Papers 1 day ago

MultiGen: Level-Design for Editable Multiplayer Worlds in Diffusion Game Engines

Paper • 2603.06679 • Published 6 days ago • 3

akhaliq

submitted a paper to Daily Papers 8 days ago

AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Paper • 2603.24517 • Published 10 days ago • 10

Parveshiiii

posted an update 10 days ago

Post

2861

Just did something I’ve been meaning to try for ages.

In only 3 hours, on 10 billion+ tokens, I trained a custom BPE + tiktoken-style tokenizer using my new library microtok — and it hits the same token efficiency as Qwen3.

Tokenizers have always felt like black magic to me. We drop them into every LLM project, but actually training one from scratch? That always seemed way too complicated.

Turns out it doesn’t have to be.

microtok makes the whole process stupidly simple — literally just 3 lines of code. No heavy setup, no GPU required. I built it on top of the Hugging Face tokenizers library so it stays clean, fast, and actually understandable.

If you’ve ever wanted to look under the hood and build your own optimized vocabulary instead of just copying someone else’s, this is the entry point you’ve been waiting for.

I wrote up the full story, threw in a ready-to-run Colab template, and dropped the trained tokenizer on Hugging Face.

Blog → https://parveshiiii.github.io/blogs/microtok/
Trained tokenizer → Parveshiiii/microtok
GitHub repo → https://github.com/Parveshiiii/microtok

akhaliq

submitted a paper to Daily Papers 17 days ago

V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising

Paper • 2603.16792 • Published 18 days ago • 3

akhaliq

submitted a paper to Daily Papers 20 days ago

Multimodal OCR: Parse Anything from Documents

Paper • 2603.13032 • Published 23 days ago • 40

Nymbo

posted an update 20 days ago

Post

6408

We should really have a release date range slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.

3 replies

AlekseyKorshuk

authored a paper about 2 months ago

Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots

Paper • 2510.08270 • Published Oct 9, 2025 • 2

Parveshiiii

posted an update about 2 months ago

Post

334

Introducing Seekify — a truly non‑rate‑limiting search library for Python

Tired of hitting rate limits when building search features? I’ve built Seekify, a lightweight Python library that lets you perform searches without the usual throttling headaches.

🔹 Key highlights

- Simple API — plug it in and start searching instantly

- No rate‑limiting restrictions

- Designed for developers who need reliable search in projects, scripts, or apps

📦 Available now on PyPI:

pip install seekify

👉 Check out the repo: https:/github.com/Parveshiiii/Seekify
I’d love feedback, contributions, and ideas for real‑world use cases. Let’s make search smoother together!

akhaliq

submitted a paper to Daily Papers about 2 months ago

SE-Bench: Benchmarking Self-Evolution with Knowledge Internalization

Paper • 2602.04811 • Published Feb 4 • 2

akhaliq

submitted 2 papers to Daily Papers 2 months ago

Visual Personalization Turing Test

Paper • 2601.22680 • Published Jan 30 • 2

Causal World Modeling for Robot Control

Paper • 2601.21998 • Published Jan 29 • 31

Parveshiiii

posted an update 2 months ago

Post

1634

🚀 Wanna train your own AI Model or Tokenizer from scratch?

Building models isn’t just for big labs anymore — with the right data, compute, and workflow, you can create **custom AI models** and **tokenizers** tailored to any domain. Whether it’s NLP, domain‑specific datasets, or experimental architectures, training from scratch gives you full control over vocabulary, embeddings, and performance.

✨ Why train your own?
- Full control over vocabulary & tokenization
- Domain‑specific optimization (medical, legal, technical, etc.)
- Better performance on niche datasets
- Freedom to experiment with architectures

⚡ The best part?
- Tokenizer training (TikToken / BPE) can be done in **just 3 lines of code**.
- Model training runs smoothly on **Google Colab notebooks** — no expensive hardware required.

📂 Try out my work:
- 🔗 https://github.com/OE-Void/Tokenizer-from_scratch
- 🔗 https://github.com/OE-Void/GPT

akhaliq

submitted a paper to Daily Papers 2 months ago

Motion 3-to-4: 3D Motion Reconstruction for 4D Synthesis

Paper • 2601.14253 • Published Jan 20 • 10

Parveshiiii

posted an update 2 months ago

Post

256

📢 The Announcement
Subject: XenArcAI is now Modotte – A New Chapter Begins! 🚀

Hello everyone,

We are thrilled to announce that XenArcAI is officially rebranding to Modotte!

Since our journey began, we’ve been committed to pushing the boundaries of AI through open-source innovation, research, and high-quality datasets. As we continue to evolve, we wanted a name that better represents our vision for a modern, interconnected future in the tech space.

What is changing?

The Name: Moving forward, all our projects, models, and community interactions will happen under the Modotte banner.

The Look: You’ll see our new logo and a fresh color palette appearing across our platforms.

What is staying the same?

The Core Team: It’s still the same people behind the scenes, including our founder, Parvesh Rawal.

Our Mission: We remain dedicated to releasing state-of-the-art open-source models and datasets.

Our Continuity: All existing models, datasets, and projects will remain exactly as they are—just with a new home.

This isn’t just a change in appearance; it’s a commitment to our next chapter of growth and discovery. We are so grateful for your ongoing support as we step into this new era.

Welcome to the future. Welcome to Modotte.

Best regards, The Modotte Team

akhaliq

submitted 3 papers to Daily Papers 3 months ago

V-DPM: 4D Video Reconstruction with Dynamic Point Maps

Paper • 2601.09499 • Published Jan 14 • 11

UM-Text: A Unified Multimodal Model for Image Understanding

Paper • 2601.08321 • Published Jan 13 • 12

ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation

Paper • 2601.03955 • Published Jan 7 • 3

Nymbo

posted an update 3 months ago

Post

2711

Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return

In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.

AlekseyKorshuk

authored a paper 3 months ago

Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia

Paper • 2512.03318 • Published Dec 3, 2025 • 4

akhaliq

submitted a paper to Daily Papers 3 months ago

FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation

Paper • 2512.24724 • Published Dec 31, 2025 • 9

AI & ML interests

Recent Activity

Team members 100

SIGGRAPH2022's activity