It's weird zoom's automatic transcription can't consistently distinguish speakers at transitions - it should be able to run separately on each stream and get the attribution right basically 100% of the time... Though maybe there's a problem with the mics capturing output from speakers.
and finally, the video recording
https://www.copyright.gov/policy/technical-measures/recordings/