Benchmarking Intelligence Better

Profile image of Alex Grzankowski
Wednesday, 5th February 2025 @ 12:30pm

Abstract: This talk concerns the challenge of evaluating intelligence in artificial systems such as GPT. While contemporary methods provide fine-grained assessments of task performance, they often fail to distinguish genuine intelligence from sophisticated mimicry, leading to familiar, long-standing debates about what to say about “Block Heads” and the Chinese Room. I think that we can make headway on this stubborn issue by thinking more carefully about how various activities come to be performed rather than focusing only on outward performance. In the context of LLMs, this suggests a path towards “deep benchmarking” which requires attending to the mechanisms AI systems use to complete tasks along with careful thinking about the conditions under which those mechanisms underwrite intelligent activity. Once we do that, how do things look with respect to chatbot intelligence? In my view, there are plausible mechanisms present in contemporary LLMs that underwrite intelligent activities, but the activities in question are a good way off from anything like semantic understanding let alone AGI.

About the presenter: Alex Grzankowski is a reader in Philosophy at Birkbeck, University of London working primarily in Mind, Language, Metaphysics, and Epistemology. He is also the Associate Director of the Institute of Philosophy at the School of Advanced Study and the Director of the London AI and Humanity Project.