Lingyu Li 李凌宇

You can also find my articles on my Google Scholar profile.

Reflection-Bench: probing AI intelligence with reflection

2024

The ability to adapt beliefs or behaviors in response to unexpected outcomes, reflection, is fundamental to intelligent systems interaction with the world. From a cognitive science perspective, this serves as a core principle of intelligence applicable to both human and AI systems. To address the debate on the intelligence of large language models (LLMs), we propose Reflection-Bench, a comprehensive benchmark comprising 7 tasks spanning core cognitive functions crucial for reflection, including perception, memory, belief updating, decision-making, prediction, counterfactual thinking, and meta-reflection. We evaluate the performances of 13 prominent LLMs such as OpenAI o1, GPT-4, Claude 3.5 Sonnet, etc. The results indicate that current LLMs still lack satisfactory reflection ability. We discuss the underlying causes of these results and suggest potential avenues for future research. In conclusion, Reflection-Bench offers both evaluation tools and inspiration for developing AI capable of reliably interacting with the environment. Our data and code are available at https://github.com/YabYum/ReflectionBench.

Paper Link

Chain of Risks Evaluation (CORE): a structured framework for safer large language models in public mental health (email for draft)

2024

Large language models (LLMs) have witnessed widespread adoption due to their superb abilities to understand and generate natural language. However, they also raise important public mental health concerns, including inequity, stigma, dependence, medical risks, and security threats. This personal view provides a novel perspective within the actor-network framework, clarifying technical architectures, linguistic dynamics, and psychological effects underlying human-LLMs interactions. Upon this theoretical grounding, we identify four types of risks with increasing difficulties to identify and mitigate—universal, context-specific, user-specific, and user-context-specific risks. Correspondingly, we propose CORE: chain of risk evaluation, which offers a structured framework for assessing and mitigating the risks associated with LLMs in public mental health. By treating developing responsible LLMs as a continuum from technical to public efforts, we summarize technical approaches and potential contributions from psychiatrists to evaluate and regulate risks in human-LLMs interactions. We call for crucial efforts from psychiatrists including collaborations with LLMs developers, empirical studies, guidelines for LLMs, and public education etc.

ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models (second author)

2024

Emotion Support Conversation (ESC) is a crucial application, which aims to reduce human stress, offer emotional guidance, and ultimately enhance human mental and physical well-being. With the advancement of Large Language Models (LLMs), many researchers have employed LLMs as the ESC models. However, the evaluation of these LLM-based ESCs remains uncertain. Inspired by the awesome development of role-playing agents, we propose an ESC Evaluation framework (ESC-Eval), which uses a role-playing agent to interact with ESC models, followed by a manual evaluation of the interactive dialogues. In detail, we first re-organize 2,801 role-playing cards from seven existing datasets to define the roles of the role-playing agent. Second, we train a specific role-playing model called ESC-Role which behaves more like a confused person than GPT-4. Third, through ESC-Role and organized role cards, we systematically conduct experiments using 14 LLMs as the ESC models, including general AI-assistant LLMs (ChatGPT) and ESC-oriented LLMs (ExTES-Llama). We conduct comprehensive human annotations on interactive multi-turn dialogues of different ESC models. The results show that ESC-oriented LLMs exhibit superior ESC abilities compared to general AI-assistant LLMs, but there is still a gap behind human performance. Moreover, to automate the scoring process for future ESC models, we developed ESC-RANK, which trained on the annotated data, achieving a scoring performance surpassing 35 points of GPT-4.

Paper Link

A computational model of general suicide ideation

2024

Suicidal ideation represents one of the most complex mental phenomena and poorly understood. This paper explores the psychic mechanisms of suicide ideation through a novel approach that combines computational methods with Lacanian psychoanalysis, specifically utilizing free energy principle (FEP). We begin by outlining foundational concepts of FEP and computational Lacanian psychoanalysis. Subsequently, we review contemporary theories of suicide thought including Interpersonal theory, Escape theory, and Lacanian and other psychoanalytic perspectives. We identify four components of suicidal ideation: symbolic suicide, imaginary suicide, desire for death, and passage of the act. A recurrent generative model is proposed that simulates suicidal impulses based on internal states and external events. We discuss enhancing the practical applicability and validating this theoretical model using real-world data like Social Readjustment Rating Scale and Personality Inventory for DSM-5 (PID-5). The predictive capability of our model is evaluating using an open-source dataset of PID-5 and Columbia Suicide Severity Rating Scale, achieving an AUC of 0.76, despite the absence of life event data. The model demonstrates the potential of generative psychometrics as a paradigm for integrating psychometric measurement with generative modeling of psychological phenomena. Further work should validate the model by incorporating external factors to refine its explanatory and predictive power. With continued development, this generative psychometric approach could provide new tools for better assessing, understanding, and treating suicidal ideation.

Paper Link

Enabling self-identification in intelligent agent: insights from computational psychoanalysis

2024

Building upon prior framework of computational Lacanian psychoanalysis with the theory of active inference, this paper aims to further explore the concept of self-identification and its potential applications. Beginning with two classic paradigms in psychology, mirror self-recognition and rubber hand illusion, we suggest that imaginary identification is characterized by an integrated body schema with minimal free energy. Next, we briefly survey three dimensions of symbolic identification (sociological, psychoanalytic, and linguistical) and corresponding active inference accounts. To provide intuition, we respectively employ a convolutional neural network (CNN) and a multi-layer perceptron (MLP) supervised by ChatGPT to showcase optimization of free energy during motor skill and language mastery underlying identification formation. We then introduce Lacan’s Graph II of desire, unifying imaginary and symbolic identification, and propose an illustrative model called FreeAgent. In concluding remarks, we discuss some key issues in the potential of computational Lacanian psychoanalysis to advance mental health and artificial intelligence, including digital twin mind, large language models as avatars of the Lacanian Other, and the feasibility of human-level artificial general intelligence with self-awareness in the context of post-structuralism.

Paper Link

Return to Lacan: an approach to digital twin mind with free energy principle

2023

Free energy principle (FEP) is a burgeoning theory in theoretical neuroscience that provides a universal law for modelling living systems of any scale. Expecting a digital twin mind from this first principle, we propose a macro-level interpretation that bridge neuroscience and psychoanalysis through the lens of computational Lacanian psychoanalysis. In this article, we claim three fundamental parallels between FEP and Lacanian psychoanalysis, and suggest a FEP approach to formalizing Lacan’s theory. Sharing the non-linear temporal structure that combines prediction and retrospection (logical time), both of two theories focus on epistemological questions that how systems represented themselves and external world, and those elements failed to be represented (lacks and free energy) significantly influence the systems’ subsequent states. Additionally, the fundamental hypothesis of FEP that the precise state of environment is always concealed, accounts for object petit a, the core concept in Lacan’s theory. With neuropsychoanalytic mapping from three orders (the Real, the Symbolic, and the Imaginary, RSI) onto brain regions, we propose a brain-wide FEP model for a minimal definition of Lacanian mind - composite state of RSI that is perturbated by desire running over the logical time. The FEP-RSI model involves three FEP units connected by respective free energy with a natural compliance with logical time, mimicking core dynamics of Lacanian mind. The biological plausibility of current model is considered from perspectives of cognitive neuroscience. In conclusion, the FEP-RSI model encapsulates a unified framework for digital twin modeling at the macro level.

Paper Link

Schizophrenia research under the framework of predictive coding: body, language, and others

2023

We summarize predictive coding models of embodiment, co-occurrence of over- and under-weighting priors, subjective time processing, language production or comprehension, self-or-other inference, and social interaction. Corresponding impairments and clinical manifestations of schizophrenia are reviewed under these models at the same time. Finally, we discuss why and how to inaugurate a therapy turn of further research under the framework of predictive coding.

Paper Link

Real-world insights into the efficacy and safety of tyrosine kinase inhibitors against thyroid cancers

2022

Based on clinical trials demonstrating favorable short-term efficacy and tolerable toxicity, several tyrosine kinase inhibitors have been approved for treating locally recurrent or metastatic, progressive radioiodine-refractory differentiated thyroid cancer, BRAFV600E-mutant anaplastic thyroid cancer, and advanced or progressive medullary thyroid cancer. Longer term efficacy and safety of these treatments have been investigated in multiple real-world studies, demonstrating indispensable complementary value. Hereby, we summarize data from a total of 27 real-world studies with a focus on long-term survival data and rare but life-threatening adverse effects. An overall picture of current real-world study was drawn, and integrated experience of multiple centers would be helpful to clinical practice and further research.

Paper Link

Lingyu Li | 李凌宇