**barefootstache** @barefootstache@qoto.org · Jul 02, 2024, 21:09

**barefootstache** @barefootstache@qoto.org · Jul 02, 2024, 21:09

barefootstache @barefootstache@qoto.org

Jul 02, 2024, 21:09

barefootstache @barefootstache@qoto.org

#DailyBloggingChallenge (331/365)

The idea that

> people might enjoy listening to a #podcast like approach of evaluating various #books

has been brought upon me.

To keep everything in the #Fediverse with the power of #ActivityPub the goal is to publish the content onto #FunkWhale.

**barefootstache** @barefootstache@qoto.org · Jul 02, 2024, 21:13

**barefootstache** @barefootstache@qoto.org · Jul 02, 2024, 21:13

Jul 02, 2024, 21:13

barefootstache @barefootstache@qoto.org

#DailyBloggingChallenge (332/365)

The main way that I evaluate #books specifically #AudioBooks is by taking a #VoiceRecording after each chapter, section, or idea.

I have noticed that with #NonFiction books, I can easily listen to them at twice the speed. On the other hand, #fiction books need to be listened to at normal speed.

**barefootstache** @barefootstache@qoto.org · Jul 02, 2024, 21:18

**barefootstache** @barefootstache@qoto.org · Jul 02, 2024, 21:18

Jul 02, 2024, 21:18

barefootstache @barefootstache@qoto.org

#DailyBloggingChallenge (333/365)

These recordings then would be transformed to text using #SpeechToText. That way the private information can be removed without spending too much time editing the audio files directly.

Further, additional information can be easily added. This would then be converted back to audio using #TextToSpeech. Before being finally published on #FunkWhale.

**barefootstache** @barefootstache@qoto.org · Aug 07, 2024, 08:18

**barefootstache** @barefootstache@qoto.org · Aug 07, 2024, 08:18

Aug 07, 2024, 08:18

barefootstache @barefootstache@qoto.org

#DailyBloggingChallenge (362/365)

Originally wanted to use #VOSK to transcribe the #SpeechToText. Initially tried it out over #KdenLive and its 'Speech Recognition' tool.

This took quite awhile to setup, since it is not concrete what kind file format, if any, the VOSK model should have. Additionally, the recommendation of setting up a virtual #Python environment didn't work as expect and went with the global approach.

And finally scratched the whole approach, once realizing that transcribing 26 min audio clip is taking longer than 10min.

**barefootstache** @barefootstache@qoto.org · 2024-08-07T08:22:22Z

barefootstache @barefootstache@qoto.org

#DailyBloggingChallenge (363/365)

Instead opted in to using #Whisper which also works with #KdenLive.

Although Whisper is originally written in #Python there is a #CPP project that makes transcribing very fast. It took less than 2min to transcribe the 26 min audio clip.

https://github.com/ggerganov/whisper.cpp

Aug 07, 2024, 08:22 · · · ·

**barefootstache** @barefootstache@qoto.org · Aug 07, 2024, 08:28

**barefootstache** @barefootstache@qoto.org · Aug 07, 2024, 08:28

Aug 07, 2024, 08:28

barefootstache @barefootstache@qoto.org

#DailyBloggingChallenge (364/365)

The 'Quick Start' section in the Readme sufficed for setting up.

The only thing that I had to change in the `./models/download-ggml-model.sh` script (1) is remove the option `--show-progress` on line 105. Seems like GNU Wget2 2.1.0 doesn't have that option.

Alternatively one can replace the option with

`--progress=bar --force-progress`

- 1: https://github.com/ggerganov/whisper.cpp/blob/master/models/download-ggml-model.sh

#wget #bash

**barefootstache** @barefootstache@qoto.org · Aug 07, 2024, 08:31

**barefootstache** @barefootstache@qoto.org · Aug 07, 2024, 08:31

Aug 07, 2024, 08:31

barefootstache @barefootstache@qoto.org

#DailyBloggingChallenge (365/365)

The only caveat of the #Whisper project is that it only works on 16-bit #WAV files.

There is a #FFMPEG script on how to do it via the #terminal

`ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav`

Trending now

Resources

Developers

What is Mastodon?

qoto.org

More…