Today I spend some time in the video recording using my Nikon Z50II, I asked Google's Gemini 2.5 Pro, and the conclusion feels pretty solid.
The conclusion is that:
1. Recording in 4k30p will use the full CMOS data and super sampled to 4K
2. Recording in 4k60p results into a 1.5x crop in addition to the APS-C sensor, which results in a 1:1 mapping from the CMOS pixel to the video frame pixel. Due to the bayer filter, each pixel on the CMOS will only record one channel of RGB colors, and the video will look soft compared to 4k30p.
3. Recording in FHD (1080) will also use super sampling, even with the Hi-Res Zoom enabled. But the overall benefits are not worth downsizing from 4k to 1080.
In my case, I think details are pretty important. Today I took a video from about 150 to 200 meters away for a little grebe (Tachybaptus ruficollis). I tried both 4k30p and 4k60p, none of them can capture the details. I can distinguish it because it's a blob of black with some red at its neck. But in 4k60p, without super sampling, everything, including the little egret (Egretta garzetta) at 115 meters away, is soft and lacking details. In contrast, 4k30p can give a better viewing experience since everything is sharper (but smaller).
I'd prefer 4k30p, even with birds. If 30 fps is not enough, then I probably need a Z8.