eustlb HF Staff commited on
Commit
739f0c3
·
1 Parent(s): 5f47f66

update readme

Browse files
Files changed (1) hide show
  1. README.md +73 -1
README.md CHANGED
@@ -50,4 +50,76 @@ Notes:
50
 
51
  `GLM-ASR-Nano-2512` can be easily integrated using the `transformers` library.
52
  We will support `transformers 5.x` as well as inference frameworks such as `vLLM` and `SGLang`.
53
- you can check more code in [Github](https://github.com/zai-org/GLM-ASR).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  `GLM-ASR-Nano-2512` can be easily integrated using the `transformers` library.
52
  We will support `transformers 5.x` as well as inference frameworks such as `vLLM` and `SGLang`.
53
+ you can check more code in [Github](https://github.com/zai-org/GLM-ASR).
54
+
55
+ ### Transformers 🤗
56
+
57
+ Install `transformers` from source:
58
+ ```bash
59
+ pip install git+https://github.com/huggingface/transformers
60
+ ```
61
+
62
+ #### Basic Usage
63
+
64
+ ```python
65
+ from transformers import AutoModelForSeq2SeqLM, AutoProcessor
66
+
67
+ processor = AutoProcessor.from_pretrained("zai-org/GLM-ASR-Nano-2512")
68
+ model = AutoModelForSeq2SeqLM.from_pretrained("zai-org/GLM-ASR-Nano-2512", dtype="auto", device_map="auto")
69
+
70
+ inputs = processor.apply_transcription_request("https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3")
71
+
72
+ inputs = inputs.to(model.device, dtype=model.dtype)
73
+ outputs = model.generate(**inputs, do_sample=False, max_new_tokens=500)
74
+
75
+ decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1] :], skip_special_tokens=True)
76
+ print(decoded_outputs)
77
+ ```
78
+
79
+ #### Using Audio Arrays Directly
80
+
81
+ You can also use audio arrays directly:
82
+
83
+ ```python
84
+ from transformers import GlmAsrForConditionalGeneration, AutoProcessor
85
+ from datasets import load_dataset
86
+ from datasets import Audio
87
+
88
+ processor = AutoProcessor.from_pretrained("zai-org/GLM-ASR-Nano-2512")
89
+ model = GlmAsrForConditionalGeneration.from_pretrained("zai-org/GLM-ASR-Nano-2512", dtype="auto", device_map="auto")
90
+
91
+ # loading audio directly from dataset
92
+ ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
93
+ ds = ds.cast_column("audio", Audio(sampling_rate=processor.feature_extractor.sampling_rate))
94
+ audio_array = ds[0]["audio"]["array"]
95
+
96
+ inputs = processor.apply_transcription_request(audio_array)
97
+
98
+ inputs = inputs.to(model.device, dtype=model.dtype)
99
+ outputs = model.generate(**inputs, do_sample=False, max_new_tokens=500)
100
+
101
+ decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1] :], skip_special_tokens=True)
102
+ print(decoded_outputs)
103
+ ```
104
+
105
+ #### Batched Inference
106
+
107
+ You can process multiple audio files at once:
108
+
109
+ ```python
110
+ from transformers import GlmAsrForConditionalGeneration, AutoProcessor
111
+
112
+ processor = AutoProcessor.from_pretrained("zai-org/GLM-ASR-Nano-2512")
113
+ model = GlmAsrForConditionalGeneration.from_pretrained("zai-org/GLM-ASR-Nano-2512", dtype="auto", device_map="auto")
114
+
115
+ inputs = processor.apply_transcription_request([
116
+ "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/bcn_weather.mp3",
117
+ "https://huggingface.co/datasets/hf-internal-testing/dummy-audio-samples/resolve/main/obama.mp3",
118
+ ])
119
+
120
+ inputs = inputs.to(model.device, dtype=model.dtype)
121
+ outputs = model.generate(**inputs, do_sample=False, max_new_tokens=500)
122
+
123
+ decoded_outputs = processor.batch_decode(outputs[:, inputs.input_ids.shape[1] :], skip_special_tokens=True)
124
+ print(decoded_outputs)
125
+ ```