Also here's the source if anyone's interested. It was made in just a few minutes and is about as bare-bones as it gets.
For machine learning analysis of sound, I've found nothing that beats a spectrogram.
While making this post, I wondered, can you make audio from a spectrogram? The answer is, apparently you can. https://stackoverflow.com/questions/57967487/convert-spectrogram-to-audio-using-librosa-functions
I've only been playing with deep learning stuff for a few months now, and it's kind of mind-blowing how fundamentally simple it is to understand with fastai.
All deep learning is this: you give the software inputs and desired outputs, and the computer creates a function that converts the input as close to the desired output as possible.
Once this is understood, most of the difficulties and bottlenecks make intuitive sense.
Here is a recent project I worked on. I used fast.ai to train a machine learning model to detect audio clips that contain the sound of typing. #fastai https://www.lhackworth.com/2020/12/18/typing-detection-using-fast-ai/