@paullammers@eldritch.cafe
No. Correctness of high level program output is a given, with known hardware and language rules. Quality is another thing.
The benchmark itself is vacuous. But the human readability of output is a useful idea. But, it is unmeasured except for an example comparison at the very end. Giving the training technique is a useful contribution, if the data for training is credible. Which it does not have a replication badge that ACM tends to give out, so the contribution there may be complete fluff as well. A lot of neural network studies lack the statistical verification to be counted as real scientific works, sadly. So I am also suspicious of them fudging data to give good scores.
To summarize: this is a neat idea, but it is full of practical issues, and in perspective of what exists, is not that helpful of a contribution.