Using Large Language Models for Qualitative Analysis can Introduce Serious Bias

Dados Bibliográficos

AUTOR(ES)	Vyjayanthi Rao , Julian Ashwin , Aditya Chhabra
AFILIAÇÃO(ÕES)	World Bank, Washington, DC, USA, Maastricht University, Maastricht, Netherlands
ANO	Não informado
TIPO	Artigo
PERIÓDICO	Sociological Methods and Research
ISSN	0049-1241
E-ISSN	1552-8294
EDITORA	Annual Reviews (United States)
DOI	10.1177/00491241251338246
ADICIONADO EM	2025-08-18

Resumo

Large language models (LLMs) are quickly becoming ubiquitous, but their implications for social science research are not yet well understood. We ask whether LLMs can help code and analyse large-N qualitative data from open-ended interviews, with an application to transcripts of interviews with Rohingya refugees and their Bengali hosts in Bangladesh. We find that using LLMs to annotate and code text can introduce bias that can lead to misleading inferences. By bias we mean that the errors that LLMs make in coding interview transcripts are not random with respect to the characteristics of the interview subjects. Training simpler supervised models on high-quality human codes leads to less measurement error and bias than LLM annotations. Given that high quality codes are necessary in order to assess whether an LLM introduces bias, we argue that it may be preferable to train a bespoke model on a subset of transcripts coded by trained sociologists rather than use an LLM.

From Ends to Means: The Promise of Computational Text Analysis for Theoretically Driven Sociological Research

(2022)

Flexible Coding of In-depth Interviews: A Twenty-first-century Approach

(2021)

Computational Grounded Theory: A Methodological Framework

(2020)

Ferramentas

Acessar via DOI

Dados Bibliográficos

Resumo

Referências Citadas

Ferramentas