MIT-SLS
/

USAD-Small

Feature Extraction

automatic-speech-recognition

audio-classification

Model card Files Files and versions

Add Github repo

#1

by nielsr HF Staff - opened Jun 26, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +14 -11

README.md CHANGED Viewed

@@ -1,13 +1,4 @@
 ---
-license: bsd-3-clause
-pipeline_tag: feature-extraction
-tags:
-- automatic-speech-recognition
-- audio-classification
-- audio
-- speech
-- music
-library_name: transformers
 datasets:
 - openslr/librispeech_asr
 - facebook/multilingual_librispeech
@@ -17,7 +8,19 @@ datasets:
 - agkphysics/AudioSet
 language:
 - en
 ---
 # USAD: Universal Speech and Audio Representation via Distillation
 **Universal Speech and Audio Distillation (USAD)** is a unified **speech**, **sound**, and **music** encoder distilled from domain-specific teachers.
@@ -25,6 +28,7 @@ Trained on 126k hours of mixed data, USAD delivers competitive performance acros
 [👀 **Read Full Paper**](https://arxiv.org/abs/2506.18843)
 ---
 ## 🗂️ Models
@@ -39,7 +43,6 @@ USAD models are all transformer encoders operating at **50Hz frame rate**. The t
 ---
 ## 🚀 How To Use
 **Installation**
@@ -89,4 +92,4 @@ See [usad_model.py](https://huggingface.co/MIT-SLS/USAD-Small/blob/main/usad_mod
 ## 🙏 Acknowledgement
-Our implementation is based on the awesome [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq), [cwx-worst-one/EAT](https://github.com/cwx-worst-one/EAT), and [sooftware/conformer](https://github.com/sooftware/conformer) repositories.

 ---
 datasets:
 - openslr/librispeech_asr
 - facebook/multilingual_librispeech
 - agkphysics/AudioSet
 language:
 - en
+library_name: transformers
+license: bsd-3-clause
+pipeline_tag: feature-extraction
+tags:
+- automatic-speech-recognition
+- audio-classification
+- audio
+- speech
+- music
+- distillation
+- audio-representation
 ---
 # USAD: Universal Speech and Audio Representation via Distillation
 **Universal Speech and Audio Distillation (USAD)** is a unified **speech**, **sound**, and **music** encoder distilled from domain-specific teachers.
 [👀 **Read Full Paper**](https://arxiv.org/abs/2506.18843)
+Code: [https://github.com/MIT-SLS/universal_audio_representation](https://github.com/MIT-SLS/universal_audio_representation)
 ---
 ## 🗂️ Models
 ---
 ## 🚀 How To Use
 **Installation**
 ## 🙏 Acknowledgement
+Our implementation is based on the awesome [facebookresearch/fairseq](https://github.com/facebookresearch/fairseq), [cwx-worst-one/EAT](https://github.com/cwx-worst-one/EAT), and [sooftware/conformer](https://github.com/sooftware/conformer) repositories.