It might be faster to convert `pr` and `t` to a dictionary (in Python) but this is still expensive as we need to loop. If there are lots of elements in `ti`, this can become far too slow. So the idea is to vectorize this procedure. The following is the best solution I’ve come up with at this point
The clever part (in my opinion) is the remainder. Line 4 creates a matrix of indicators where the columns are the individuals and rows are whether their value in `ti` was >= the corresponding `t`. I do a similar thing in line 5 but now < the shifted times. When these are multiplied together in 6, we get a matrix where the only non-zero value in a column corresponds to the final time the person was observed
So first, I am going to convert `t`, `ti`, `pr` to NumPy arrays so we can use NumPy to help speed things along. After that
>>> t = np.insert(t, 0, 0)
>>> pr = np.insert(pr, 0, 1)
>>> shift_t = np.append(t, np.max(ti)+1)[1:]
>>> upper = (ti >= t[:, None]).astype(int)
>>> lower = (ti < shift_t[:, None]).astype(int)
>>> t_ind = upper * lower
>>> np.sum(t_ind * pr[:, None], axis=0)