#DailyBloggingChallenge (331/365)
The idea that
> people might enjoy listening to a #podcast like approach of evaluating various #books
has been brought upon me.
To keep everything in the #Fediverse with the power of #ActivityPub the goal is to publish the content onto #FunkWhale.
#DailyBloggingChallenge (333/365)
These recordings then would be transformed to text using #SpeechToText. That way the private information can be removed without spending too much time editing the audio files directly.
Further, additional information can be easily added. This would then be converted back to audio using #TextToSpeech. Before being finally published on #FunkWhale.
#DailyBloggingChallenge (363/365)
Instead opted in to using #Whisper which also works with #KdenLive.
Although Whisper is originally written in #Python there is a #CPP project that makes transcribing very fast. It took less than 2min to transcribe the 26 min audio clip.
#DailyBloggingChallenge (364/365)
The 'Quick Start' section in the Readme sufficed for setting up.
The only thing that I had to change in the `./models/download-ggml-model.sh` script (1) is remove the option `--show-progress` on line 105. Seems like GNU Wget2 2.1.0 doesn't have that option.
Alternatively one can replace the option with
`--progress=bar --force-progress`
- 1: https://github.com/ggerganov/whisper.cpp/blob/master/models/download-ggml-model.sh
#DailyBloggingChallenge (365/365)
The only caveat of the #Whisper project is that it only works on 16-bit #WAV files.
There is a #FFMPEG script on how to do it via the #terminal
`ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav`
#DailyBloggingChallenge (362/365)
Originally wanted to use #VOSK to transcribe the #SpeechToText. Initially tried it out over #KdenLive and its 'Speech Recognition' tool.
This took quite awhile to setup, since it is not concrete what kind file format, if any, the VOSK model should have. Additionally, the recommendation of setting up a virtual #Python environment didn't work as expect and went with the global approach.
And finally scratched the whole approach, once realizing that transcribing 26 min audio clip is taking longer than 10min.