@pganssle Just an FYI, now that I am past some of the multiprocessing headaches and found some easier workarounds that isnt as convoluted as my earlier attempts I can say that while still not my favorite WRT multiprocessing it is a lot more pleasant than my early experiences.
@freemo Nice! Glad to hear it!
@pganssle though I will say one thing has been infuriating me that i cant figure out whats going on and there is nothing on google... apparently if i create more than about 500 multiprocessing RLocks I get an out of memory error (and we are talking a 64 gig system). I am on docker and I am pretty sure its actually related to the shm size which when upped from 64 M to 2 gig was able to handle the 500 RLock (before i couldnt even get taht much).. but im not sure as it doesnt appear to actually **fill** the shm by nearly that much (though it does fill it a few 100M now)...
Very weird problem but eh, for now im just limiting my algorithms lookback to 500 minutes and it works.
Other than that i figured out most of the multiprocessing problems and have what used to take 2 hours to run down to a minute.
@freemo Though TBH I'd kinda love to have a problem that admits an embarrassingly parallel solution as an excuse to write something significant in Rust to test out that "fearless concurrency".
Closest I've come is this: https://gitlab.com/pganssle/metadata-backup
There's a big queue that theoretically could be read in parallel, but I think the file system access ends up blocking, because adding multithreading into the mix doesn't seem to have meaningfully sped anything up.
@pganssle massive parallelism is a lot of fun in any language, even in python with all its annoyances its still a fun task.. seen my CPUs all light up to 100% (I have 32 CPU cores) and my run time go from 2 hours to a minute or two is very pleasurable