List also RTF on CPU

#19
by csukuangfj - opened

The current table lists RTF for GPUs.

Is there a plan to list also RTF for models running on CPU?

Hugging Face for Audio org

Hi @csukuangfj , there is no plan for this at the moment.

Even the current RTFx computed on GPU should be taken with a grain of salt, as it could change depending on the compute envioronment. As stated in our preprint (Section 3):

"All evaluation scripts, as described in Section 2.5, were conducted on an NVIDIA A100-SXM4-80GB GPU (driver 560.28.03, CUDA 12.6), using a batch size of 64 whenever memory allowed, and reduced adaptively (48, 32, 16, . . . ) when necessary to fit in device memory."

Moreover, CPU results could similarly depend a lot on the compute environment.

Nevertheless, RTFx on the leaderboard are insightful for relative comparisons between the models under fixed conditions, rather than as absolute performance claims across arbitrary hardware.

Thank you for your reply.

CPU results could similarly depend a lot on the compute environment.

I agree—CPU performance can indeed vary significantly based on the hardware and environment. That said, it would still be very helpful to report how the models perform on any CPU you happen to use. Even relative RTF measurements on CPU can offer valuable insights for comparison.

As long as you clearly specify the CPU model or configuration used for testing, that’s sufficient and greatly appreciated.

All evaluation scripts, as described in Section 2.5, were conducted on an NVIDIA A100-SXM4-80GB GPU (driver 560.28.03, CUDA 12.6), using a batch size of 64 whenever memory allowed,

If you'd like to test the RTF on CPU, it should be quite straightforward to do so in your current environment. Simply set the device to CPU instead of CUDA—for example, if you're using PyTorch, replace cuda:0 with cpu, re-run the test, record the results, and mention the specific CPU you used. I’m not sure what’s preventing you from running this experiment—it seems like a simple and informative addition.

Hugging Face for Audio org

We'll look into it in the new year. There are other requests for columns and we don't want the leaderboard to be too bloated.

However, you're right, it should be relatively straightforward. Only that using the current scripts would be too time-consuming, as RTFx is computed at the same time as WER over the whole dataset (for example). But we could instead have a dedicated script just for computing RTFx over a few audio samples (and where hardware device could be toggled). If you're able to open a PR with such dedicated script that would greatly help!

Sign up or log in to comment