top of page

Research

Introspection in LLMs

Can Large Language Models introspect? In my new paper (preprint) I argue that recent philosophical proposals and empirical studies failed to show distinct introspective abilities in current models.

Recent philosophical work has proposed a “lightweight account” of introspection, on which a system introspects when it represents its own mental states in a way that makes these states accessible for guiding behavior. This approach has informed empirical proposals for detecting introspective abilities in current LLMs. I argue that this lightweight account is too permissive and fails to capture what is essential to genuine introspection. This paper proceeds through three increasingly concessive but individually sufficient challenges to the attribution of introspective abilities to LLMs. First, LLMs lack the persistent subject necessary for genuine introspection, as current models lack the psychological continuity relationship needed for self knowledge. Second, LLM self-reports violate the immunity to error through misidentification that characterizes genuine introspection, because they are based on public textual information that could equally support judgments about another system’s states. Third, by centering on functional self-monitoring and behavioral control, the lightweight account fails to distinguish introspection from ubiquitous self-regulatory processes in complex systems.

Can LLMs make trade-offs involving stipulated pain and pleasure states?

Geoff Keeling, Winnie Street, Martyna Stachaczyk, Daria Zakharova, Iulia M. Comsa, Anastasiya Sakovych, Isabella Logothetis, Zejia Zhang, Blaise Agüera y Arcas, Jonathan Birch

See our paper on arxiv

​​

Could Large Language Models feel pain or pleasure, or could they develop granular representations of such affect states? We developed a new behavioral approach, inspired by a paradigm from the animal sentience literature, to test the LLM behavior and decision-making beyond direct self-report. 

I was particularly interested in the extent to which a biologically-inspired behavioral approach could be adapted to testing LLMs, and whether one can plausibly claim a link to sentience candidacy via such tests. Our current conclusions are that 1) the LLMs are currently not sentience candidates, and 2) the experiments we conducted could lay a path toward a portfolio of research on artificial cognition and sentience, where several distinct lines of evidence would be required to support claims about the possibility of sentience in artificial systems.
 

This project was a collaboration between the Paradigms of Intelligence team at Google and Jonathan Birch's Foundations of Animal Sentience team at the LSE.

The epistemology of AI-driven science: The case of AlphaFold

preprint on PhilSciArchive


There has been much recent advancement in philosophical work on the notion of opacity and reliability of deep learning systems in scientific practice. My paper begins by taking this body of work as a starting point, and asks "what follows from there for what it means for something to be known to science?", in light of the strong opacity at the core of the AI-driven production of scientific knowledge.

​​

The success of AlphaFold, an AI that predicts protein structures, poses a challenge for traditional understanding of scientific knowledge. It generates predictions that are not empirically tested, without revealing the principles behind its predictive success. The paper presents an epistemological trilemma, forcing us to reject one of 3 claims: (1) AlphaFold produces scientific knowledge; (2) Predictions alone are not scientific knowledge unless derivable from established scientific principles; and (3) Scientific knowledge cannot be strongly opaque. The paper defends (1) and (2) and draws on Alexander Bird's functionalist, anti-individualist account of scientific knowledge, to accommodate AlphaFold's production of strongly opaque knowledge in science.

​

I argue that AlphaFold can generate scientific knowledge and that scientific knowledge can be strongly opaque to humans, as long as it is properly functionally integrated into the collective scientific enterprise.

​

​

An arthropod style of cognition?

Embodied vs. "Higher Cognitive" Explanations of Intelligent Behavior in Portia Jumping Spiders

The paper is forthcoming in:  ​​“Psychodiversity: Cognition and Sentience Beyond Humans”, Grant Ramsey (ed.).

​

Recent research in animal cognition reveals remarkable abilities in miniature-brained invertebrates, some of which exhibit complex behaviors that rival those of much larger-brained animals. Portia spiders display sophisticated hunting strategies, leading to debate about the cognitive mechanisms behind their behavior. 

 

“Higher-cognition” explanations that appeal to working memory and mental simulation compete with “embodied-heuristics” explanations that aim to show how Portia’s embodiment and sensory apparatus simplifies the computations required. The evidence to date is indecisive: studies that may seem to shift the dial towards higher cognition, such as studies of numerosity, leave room for embodied heuristics.

 

I worked with Jonathan Birch to propose three lines of inquiry that could shift the dial: electrophysiological studies of brain mechanisms, gaze-tracking studies, and comparative evidence from other spiders.

 

A broader view we aim to advance is that intelligent behavior can be differently mechanistically realized, and in studying various mechanisms of animal intelligence we have a good reason to allow for such variations.

 

We find that it is likely for a general arthropod-style mechanism to allow for variation in intelligent problem-solving. 

​

©Daria Zakharova

 All right reserved 2025

bottom of page