**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:12

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:12

Pausal Živference @PausalZ@qoto.org

Nov 18, 2022, 18:12

I don't really have another place to write up, so here is a computational problem I keep coming back to: getting individual probs of survival from an array of times and an array of probs

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:13

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:13

Nov 18, 2022, 18:13

Pausal Živference @PausalZ@qoto.org

Let me sharpen the problem, I have 2 arrays for probability of survival and the corresponding time. Everything following will be written in Python
>>> pr = [0.9, 0.8, 0.7, 0.5, 0.2, 0.1]
>>> t = [1, 2, 3, 5, 6, 9]
So S(t=1)=0.9, S(t=2)=0.8, and so on.

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:13

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:13

Nov 18, 2022, 18:13

Pausal Živference @PausalZ@qoto.org

Given an array of individual times
>>> ti = [0.9, 5, 7.5, 6, 9, 10]
we want to get the probability of survival at that time. So our output should look like
>>> [1.0, 0.5, 0.2, 0.2, 0.1, 0.1]
The first element is 1 because S(0)=1 by definition. So, how can we get this array?

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:13

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:13

Nov 18, 2022, 18:13

Pausal Živference @PausalZ@qoto.org

My first thought is that we can write a loop. We look over each element in `ti`, then use that element to find the last valid index in `t` (for ti=7.5 that would be 6), then use that index to look up the element in `pr`.

**Pausal Živference** @PausalZ@qoto.org · 2022-11-18T18:14:08Z

Pausal Živference @PausalZ@qoto.org

It might be faster to convert `pr` and `t` to a dictionary (in Python) but this is still expensive as we need to loop. If there are lots of elements in `ti`, this can become far too slow. So the idea is to vectorize this procedure. The following is the best solution I’ve come up with at this point

Nov 18, 2022, 18:14 · · · ·

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:14

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:14

Nov 18, 2022, 18:14

Pausal Živference @PausalZ@qoto.org

So first, I am going to convert `t`, `ti`, `pr` to NumPy arrays so we can use NumPy to help speed things along. After that
>>> t = np.insert(t, 0, 0)
>>> pr = np.insert(pr, 0, 1)
>>> shift_t = np.append(t, np.max(ti)+1)[1:]
>>> upper = (ti >= t[:, None]).astype(int)
>>> lower = (ti < shift_t[:, None]).astype(int)
>>> t_ind = upper * lower
>>> np.sum(t_ind * pr[:, None], axis=0)

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:14

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:14

Nov 18, 2022, 18:14

Pausal Živference @PausalZ@qoto.org

So what does this do. Setup is done on lines 1-3. Lines 1-2 tack on S(0)=1 to the start of the `t` and `pr` arrays. Line 3 adds the maximum time we saw in `ti` then drops the first element (zero)

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:14

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:14

Nov 18, 2022, 18:14

Pausal Živference @PausalZ@qoto.org

The clever part (in my opinion) is the remainder. Line 4 creates a matrix of indicators where the columns are the individuals and rows are whether their value in `ti` was >= the corresponding `t`. I do a similar thing in line 5 but now < the shifted times. When these are multiplied together in 6, we get a matrix where the only non-zero value in a column corresponds to the final time the person was observed

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:14

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:14

Nov 18, 2022, 18:14

Pausal Živference @PausalZ@qoto.org

Then in the final step, we multiply that by `pr`. Therefore, we can sum over the rows, as the only non-zero row in this matrix will be the probability at their corresponding individual time. There is probably a better way to do this, but I think it’s a clever vectorization of the problem

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:15

**Pausal Živference** @PausalZ@qoto.org · Nov 18, 2022, 18:15

Nov 18, 2022, 18:15

Pausal Živference @PausalZ@qoto.org

The censoring events before the first event time in `t`, and censoring times greater than the last event time in `t` were the annoying parts to deal with. The first 3 lines were primarily to setup the input arrays to deal with those issues

Resources

Developers

What is Mastodon?

qoto.org

More…