(331/365)

The idea that

people might enjoy listening to a like approach of evaluating various

has been brought upon me.

To keep everything in the with the power of the goal is to publish the content onto .

(332/365)

The main way that I evaluate specifically is by taking a after each chapter, section, or idea.

I have noticed that with books, I can easily listen to them at twice the speed. On the other hand, books need to be listened to at normal speed.

Show thread
Follow

(333/365)

These recordings then would be transformed to text using . That way the private information can be removed without spending too much time editing the audio files directly.

Further, additional information can be easily added. This would then be converted back to audio using . Before being finally published on .

(362/365)

Originally wanted to use to transcribe the . Initially tried it out over and its ‘Speech Recognition’ tool.

This took quite awhile to setup, since it is not concrete what kind file format, if any, the VOSK model should have. Additionally, the recommendation of setting up a virtual environment didn’t work as expect and went with the global approach.

And finally scratched the whole approach, once realizing that transcribing 26 min audio clip is taking longer than 10min.

Show thread

(363/365)

Instead opted in to using which also works with .

Although Whisper is originally written in there is a project that makes transcribing very fast. It took less than 2min to transcribe the 26 min audio clip.

github.com/ggerganov/whisper.c

Show thread

(364/365)

The ‘Quick Start’ section in the Readme sufficed for setting up.

The only thing that I had to change in the ./models/download-ggml-model.sh script (1) is remove the option --show-progress on line 105. Seems like GNU Wget2 2.1.0 doesn’t have that option.

Alternatively one can replace the option with

--progress=bar --force-progress

Show thread

(365/365)

The only caveat of the project is that it only works on 16-bit files.

There is a script on how to do it via the

ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav

Show thread
Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.