Microsoft just published a paper - modestly titled "AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models" - that, no joke, uses readily available *online* sample SAT questions to evaluate the GPT class LLMs. The problem, of course, is that same data was likely used to train the same models. https://arxiv.org/pdf/2304.06364.pdf
@twitskeptic hmmm
There's a github link at https://github.com/microsoft/AGIEval/blob/main/data/v1/sat-en.jsonl that has the questions. I tried one and got a search hit.