Well I think I finally got to the bottom of why the matrix server has been slow in the past. Now that i have good stats on resources I can better address it... Should have matrix running much smoother soon.

@trinsec

@freemo @trinsec I do hear the matrix developers say there will be a go implementation to replace the slow and resource demanding python version. Is that still in alpha stage?

(As a jealous JVM lover, I don't trust Python 😂 )

@skyblond

That I have no idea.. But if they think go automatically means faster then it will probably be an abysmal failure.

@trinsec

@freemo @trinsec Based on my limited knowledge of python, the multithreading part is pretty heavy, if I recall correctly, you need a new python process to start a new thread (sounds familiar with JVM :ablobthinking: ). And go is pretty good at multithreading (I mean user-mode threads). If not limited by the IO, I would assume a go implementation will speed up some of the process. Maybe also ease the load on developers, considering go offers some great built-in multithreading structures.

C-Python can link with and interact with multi-threaded C code no problem. See https://github.com/sdgathman/pymilter/blob/master/miltermodule.c for an example of the C code end (key object is PyThreadState).

C-Python code itself does not multi-thread, except through multi-processing. (Jython is a Java python implementation that does multi-thread.)

HOWEVER, what is ultimately more efficient than multi-threading for pure python or most other interpreted code is co-routines. Remember those from Knuth's "Fundamental Algorithms"? Python has built-in support for very clean and efficient co-routines - key syntax is the "yield" statement. They work very well ad-hoc, and there are frameworks like the "twisted" library that provide consistent interaction between many parts.

The key limitation of co-routines is that other "threads" do not have a chance to run until the current code hits a "yield". This also means you don't need to bother with locks and stuff like with true multi-tasking, hence the efficiency for an interpreted language.

Another name for this is "cooperative multi-threading". It is harder to do in a language like C with linear stacks. Python stack frames are linked lists - and the resulting conceptual tangle of stack frames when hundreds of complex co-routine tasks cooperate gave rise to the name of the "twisted" library.

@sdgathman

As a JVM lover, I am jealous of the ability to call C code directly from CPython.

And yes, coroutine is powerful. I'm using kotlin coroutines and it's a huge (free) improvement of Java's native thread.

@freemo co-routines can run on multiple threads, where the "tasks" can yield and the thread from a pool can switch to another "task" without suspension or something. At least kotlin coroutine can, and according to sdgathman, Python can do it too. That's the ultimate free boost you can get by just switching to another tech.

But if Python can do that, then my earlier hypothesis about switching to go will give your free boost is wrong

You can call C code from Java with JNI (Java Native Interface) - register C functions that get run for class methods. It's no more difficult than the C-API for C-Python. https://www.baeldung.com/jni

@sdgathman

You can call C from any high level language. The difference is the varrier in python is very low, in java it is very high. Moreover in java it is a very expensive operation to cross the barrier from native to jvm unlike in python.

@skyblond

@skyblond - is the overhead for native methods in JVM so very high? It was never a bottle neck for me, even for relatively low level stuff like accessing Posix message queues.
Follow

@sdgathman

Compared to Java functions, yes, but overall no. According to StackOverflow, each JNI call is several nano seconds more compared to Java function calls. The main overhead is copying data from the Java heap to native memory. But if you use something like native buffer, there should be no such overhead.

If you load a big dataset in Java using a byte array, then you want to pass it to, let's say some BLAS implementation, then good luck copying all those data. But with native buffers, you can just pass the pointer of that buffer to the BLAS implementation and you're good to go.

stackoverflow.com/questions/13

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.