nltpt (nltpt)

lysandre

posted an update 4 months ago

Post

7431

We're kick-starting the process of Transformers v5, with @ArthurZ and @cyrilvallez !

v5 should be significant: we're using it as a milestone for performance optimizations, saner defaults, and a much cleaner code base worthy of 2025.

Fun fact: v4.0.0-rc-1 came out on Nov 19, 2020, nearly five years ago!

6 replies

·

ankits0052

authored a paper 4 months ago

MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers

Paper • 2508.20453 • Published Aug 28, 2025 • 63

Xenova

posted an update 4 months ago

Post

13985

Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! 🤯
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! 😍

How does it work? 🤔
1️⃣ Generate and cache image features for each frame
2️⃣ Create a list of embeddings for selected patch(es)
3️⃣ Compute cosine similarity between each patch and the selected patch(es)
4️⃣ Highlight those whose score is above some threshold

... et voilà! 🥳

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!

2 replies

·

Xenova

posted an update 5 months ago

Post

4621

The next generation of AI-powered websites is going to be WILD! 🤯

In-browser tool calling & MCP is finally here, allowing LLMs to interact with websites programmatically.

To show what's possible, I built a demo using Liquid AI's new LFM2 model, powered by 🤗 Transformers.js: LiquidAI/LFM2-WebGPU

As always, the demo is open source (which you can find under the "Files" tab), so I'm excited to see how the community builds upon this! 🚀

2 replies

·

ankits0052

authored 14 papers 5 months ago

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

Paper • 2310.04445 • Published Oct 2, 2023

Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks

Paper • 2309.17002 • Published Sep 29, 2023 • 1

Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS

Paper • 2503.10674 • Published Mar 10, 2025 • 1

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

Paper • 2506.20609 • Published Jun 25, 2025

ProRefine: Inference-time Prompt Refinement with Textual Feedback

Paper • 2506.05305 • Published Jun 5, 2025 • 1

Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

Paper • 2410.03904 • Published Oct 4, 2024

Automatic Dataset Construction (ADC): Sample Collection, Data Curation, and Beyond

Paper • 2408.11338 • Published Aug 21, 2024

Harnessing Business and Media Insights with Large Language Models

Paper • 2406.06559 • Published Jun 2, 2024

Audio-visual fine-tuning of audio-only ASR models

Paper • 2312.09369 • Published Dec 14, 2023

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

Paper • 2310.07161 • Published Oct 11, 2023 • 1

Xenova

posted an update 5 months ago

Post

3484

Introducing Voxtral WebGPU: State-of-the-art audio transcription directly in your browser! 🤯
🗣️ Transcribe videos, meeting notes, songs and more
🔐 Runs on-device, meaning no data is sent to a server
🌎 Multilingual (8 languages)
🤗 Completely free (forever) & open source

That's right, we're running Mistral's new Voxtral-Mini-3B model 100% locally in-browser on WebGPU, powered by Transformers.js and ONNX Runtime Web! 🔥

Try it out yourself! 👇
webml-community/Voxtral-WebGPU

Xenova

posted an update 7 months ago

Post

7405

NEW: Real-time conversational AI models can now run 100% locally in your browser! 🤯

🔐 Privacy by design (no data leaves your device)
💰 Completely free... forever
📦 Zero installation required, just visit a website
⚡️ Blazingly-fast WebGPU-accelerated inference

Try it out: webml-community/conversational-webgpu

For those interested, here's how it works:
- Silero VAD for voice activity detection
- Whisper for speech recognition
- SmolLM2-1.7B for text generation
- Kokoro for text to speech

Powered by Transformers.js and ONNX Runtime Web! 🤗 I hope you like it!

5 replies

·

AI & ML interests

Team members 184

nltpt's activity