@pganssle Just an FYI, now that I am past some of the multiprocessing headaches and found some easier workarounds that isnt as convoluted as my earlier attempts I can say that while still not my favorite WRT multiprocessing it is a lot more pleasant than my early experiences.

@pganssle though I will say one thing has been infuriating me that i cant figure out whats going on and there is nothing on google... apparently if i create more than about 500 multiprocessing RLocks I get an out of memory error (and we are talking a 64 gig system). I am on docker and I am pretty sure its actually related to the shm size which when upped from 64 M to 2 gig was able to handle the 500 RLock (before i couldnt even get taht much).. but im not sure as it doesnt appear to actually **fill** the shm by nearly that much (though it does fill it a few 100M now)...

Very weird problem but eh, for now im just limiting my algorithms lookback to 500 minutes and it works.

Other than that i figured out most of the multiprocessing problems and have what used to take 2 hours to run down to a minute.

@freemo Weird. If you are doing so much in parallel and it's a big part of your operation (and you think it's worth it to explore this further) it might make sense to try out using Cython or Numba with a function that releases the GIL, then use multithreading instead of multiprocessing.

Running hundreds of processes and serializing / serializing your data probably creates a ton of overhead.

@pganssle well there are only 32 processes, the locks are as many as they are however because of how i access a rather large array that all the processes share and write to.. In short the array needs a element-level lock (mostly because each element acts as a counter where += operations need to be atomic), but there is never a need for any one process to lock more than one cell at a time.

So i created an array of locks that is as big as the array itself in an attempt to minimize thread contention.

Works great at minimizing thread contention, it runs almost as fast as the locks not being there (with no lucks there was minor corruption for obvious reasons).. but that memory issue.

So yea not an issue with running tons of processes, i just need tons of Rlocks

@freemo Hmm, weird. If there is no solution that allows you to have more than 500 RLocks, maybe you can get a larger array by locking "blocks" of the array for your atomic operations. Then you can have 500 locks spread across 20,000 elements.

Python is always pass by reference, so you can do something like this:

locks = [threading.RLock() for _ in range(500)]
locks = locks * int(math.ceil(len(array) / 500))
random.shuffle(locks)
locks = locks[:len(array)]

Each element is randomly assigned a block of locks, so even if the access pattern is non-random the lock distribution will be. At any given time you have 32 processes contending for 500 locks, which seems like good odds.

Follow

@pganssle Excellent selection, yea that might be what id go with.. should be the best way to keep contention as minimal as possible for the number of locks used, Thanks.. Plus that has the advantage of scaling to arrays of enourmous size, which is a possiblitility as I could see 3900 x100 array as being a perfectly reasonable max (that would represent only 2 weeks of stock data during RTH, less otherwise).

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.