NG's picture

142 284

NG

SirRa1zel

·

AI & ML interests

Text-to-Speech, Translation, Object Detection

Recent Activity

liked a model 10 days ago

YatharthS/MiraTTS

liked a Space 11 days ago

ACE-Step/ACE-Step

liked a model 25 days ago

onnx-community/Kokoro-82M-ONNX

View all activity

Organizations

None yet

upvoted a collection 26 days ago

faster-whisper

faster-whisper is a reimplementation of OpenAI's Whisper model using CTranslate2, which is a fast inference engine for Transformer models. • 15 items • Updated Jul 4, 2025 • 10

upvoted a collection about 1 month ago

PP-OCRv5

PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15, 2025 • 50

upvoted a paper about 2 months ago

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published Nov 12, 2025 • 69

upvoted 2 collections 7 months ago

Common Pile v0.1 Filtered Data

An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated Jun 6, 2025 • 20

Stable Diffusion 3.5

6 items • Updated Jan 9, 2025 • 177

upvoted a paper 7 months ago

Efficient Part-level 3D Object Generation via Dual Volume Packing

Paper • 2506.09980 • Published Jun 11, 2025 • 7

upvoted 2 collections 8 months ago

LLaMA-Omni

13 items • Updated May 17, 2025 • 19

Voila

Voila: Voice-Language Foundation Models. https://voila.maitrix.org • 7 items • Updated May 6, 2025 • 24

upvoted 2 papers 8 months ago

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published May 5, 2025 • 85

PixelHacker: Image Inpainting with Structural and Semantic Consistency

Paper • 2504.20438 • Published Apr 29, 2025 • 43

upvoted a collection 9 months ago

Orpheus Multilingual Research Release

Beta Release of multilingual models. • 12 items • Updated Apr 10, 2025 • 109

upvoted a paper 9 months ago

TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes

Paper • 2503.23461 • Published Mar 30, 2025 • 94

upvoted 4 papers 10 months ago

Long-Video Audio Synthesis with Multi-Agent Collaboration

Paper • 2503.10719 • Published Mar 13, 2025 • 9

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published Feb 26, 2025 • 47

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published Feb 25, 2025 • 36

MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use

Paper • 2502.15872 • Published Feb 21, 2025 • 5

upvoted an article 11 months ago

Article

Open-source DeepResearch – Freeing our search agents

+3

Feb 4, 2025

•

1.31k

upvoted a collection 11 months ago

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 11 items • Updated 6 days ago • 549

upvoted 2 papers 12 months ago

FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces

Paper • 2501.12909 • Published Jan 22, 2025 • 74

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21, 2025 • 84