Also I noticed the JVM's impl of SHA3-256 doesn't perform well. It's kind of 150MB/s and can't saturate my tape drive which can read 160MB/s.
I have no issue with Bouncy Castle's impl which is purely in Java.
I thought native impl is better. Why? (I tried Java 21 LTS, no improvement)