Findings

Thinking Clearly

Kevin Lewis

March 04, 2025

Experimental Evidence of the Effects of Large Language Models versus Web Search on Depth of Learning
Shiri Melumad & Jin Ho Yun
University of Pennsylvania Working Paper, January 2025

Abstract:
The effects of using large language models (LLMs) versus traditional web search on depth of learning are explored. Results from four online and laboratory experiments (N = 4,591) lend support for the predictions that when individuals learn about a topic from LLMs, they tend to develop shallower knowledge than when they learn through standard web search, even when the core information in the results is the same. This shallower knowledge accrues from an inherent feature of LLMs -- the presentation of results as syntheses of information rather than individual search links -- which makes learning more passive than in standard web search, where users actively discover and synthesize information sources themselves. In turn, when subsequently forming advice on the topic based on what they learned, those who learned from LLM syntheses (vs. standard search results) feel less invested in forming their advice and, more importantly, create advice that is sparser, less original -- and ultimately less likely to be adopted by recipients. Implications of the findings for recent research on the benefits and risks of LLMs are discussed.


The Widespread Adoption of Large Language Model-Assisted Writing Across Society
Weixin Liang et al.
Stanford Working Paper, February 2025

Abstract:
The recent advances in large language models (LLMs) attracted significant public and policymaker interest in its adoption patterns. In this paper, we systematically analyze LLM-assisted writing across four domains -- consumer complaints, corporate communications, job postings, and international organization press releases -- from January 2022 to September 2024. Our dataset includes 687,241 consumer complaints, 537,413 corporate press releases, 304.3 million job postings, and 15,919 United Nations (UN) press releases. Using a robust population-level statistical framework, we find that LLM usage surged following the release of ChatGPT in November 2022. By late 2024, roughly 18% of financial consumer complaint text appears to be LLM-assisted, with adoption patterns spread broadly across regions and slightly higher in urban areas. For corporate press releases, up to 24% of the text is attributable to LLMs. In job postings, LLM-assisted writing accounts for just below 10% in small firms, and is even more common among younger firms. UN press releases also reflect this trend, with nearly 14% of content being generated or modified by LLMs. Although adoption climbed rapidly post-ChatGPT, growth appears to have stabilized by 2024, reflecting either saturation in LLM adoption or increasing subtlety of more advanced models. Our study shows the emergence of a new reality in which firms, consumers and even international organizations substantially rely on generative AI for communications.


Learning not cheating: AI assistance can enhance rather than hinder skill development
Benjamin Lira et al.
University of Pennsylvania Working Paper, February 2025

Abstract:
It is widely believed that outsourcing cognitive work to AI boosts immediate productivity at the expense of long-term human capital development. An overlooked possibility is that AI tools can support skill development by providing just-in-time, high-quality, personalized examples. In this investigation, lay forecasters predicted that practicing writing cover letters with an AI tool would impair learning compared to practicing writing letters without the tool. However, in a highly-powered pre-registered experiment, participants randomly assigned to practice writing with AI improved more on a writing test one day later compared to writers assigned to practice without AI. Notably, writers given access to the AI tool improved more despite exerting less effort, whether measured by time on task, keystrokes, or subjective ratings. We replicated and extended these results in a second pre-registered experiment, showing that writers given access to the AI tool again outperformed those who practiced on their own -- but performed no better than writers merely shown an AI-generated cover letter that they could not edit. Collectively, these findings constitute an existence proof that by providing personalized examples of high-quality work, AI tools can improve, rather than undermine, learning.


Serial Position Bias Among Experts: Evidence From a Cooking Competition Show
Maira Emy Reimão, Rachel Sabbadini & Eric Rego
Kyklos, forthcoming

Abstract:
The Great British Bake Off is a popular amateur cooking competition show, and its design offers an opportunity for analyzing serial position bias among expert rankings. In this paper, we use the technical challenge portion of the show to assess whether experts -- in this case, the judges in the show -- are susceptible to primacy or recency effects. We find that expert judges favor the first dish tasted in a blind test and that this pattern holds not only among judges of the British version of the show but also in other English-speaking versions. We do not find evidence of a recency effect. Our results indicate that expert assessments, regularly used in markets, are vulnerable to bias even when there are no financial incentives.


Rational and Irrational Belief in the Hot Hand: Evidence from "Jeopardy!"
Anthony Kukavica & Sridhar Narayanan
Stanford Working Paper, December 2024

Abstract:
For several decades, researchers and practitioners have wondered whether a "hot hand" exists in domains with repeated, human-controlled trials. Using a comprehensive play-by-play dataset from the game show "Jeopardy!", we demonstrate that contestants strongly believe in a hot hand effect as reflected in their wagering decisions during gameplay. In parallel, we find that a small hot hand effect also exists in contestants' actual performances. We then quantify contestants' "hot hand bias" (the degree to which their belief is irrational), finding that they overestimate the true effect by up to an order of magnitude. We also find that more successful contestants, as well as those with more quantitative and analytical training, exhibit lower levels of bias. Our paper reconciles robust findings of belief in a hot hand with a growing consensus that a small effect often exists in reality and investigates foundational mechanisms underlying these effects.


Listen for a change? A longitudinal field experiment on listening’s potential to enhance persuasion
Erik Santoro et al.
Proceedings of the National Academy of Sciences, 25 February 2025

Abstract:
Scholars and practitioners widely posit that listening to other people enhances efforts to persuade them. Listening may enhance persuasion by promoting cognitive processing, reducing defensiveness, and improving perceptions of the persuader. However, empirical tests of this widely theorized hypothesis are surprisingly scarce. We review the case for and against this hypothesis, arguing previous research has not sufficiently attended to reasons why listening may not enhance persuasion. We test this hypothesis using a preregistered, well-powered field experiment in which trained professional canvassers, acting as confederates, had ∼10 min video conversations with U.S. participants (N = 1,485) about unauthorized immigration, a salient topic of disagreement. We independently randomized whether confederates shared a persuasive narrative about an undocumented immigrant and whether they practiced high-quality nonjudgmental listening to participants’ opinions. We measured outcomes immediately after the conversation and again five weeks later. Sharing a persuasive narrative meaningfully and durably reduced prejudice and changed policy attitudes. The listening manipulation also successfully improved perceptions of the persuader and increased processing. Surprisingly, however, the listening manipulation did not enhance persuasion: Sharing a persuasive narrative was just as effective in the absence of high-quality listening. We discuss theoretical and practical implications.


Prediction that conflicts with judgment: The low absolute likelihood effect
Chengyao Sun & Robyn LeBoeuf
Journal of Experimental Psychology: General, forthcoming

Abstract:
How do people predict the outcome of an event from a set of possible outcomes? One might expect people to predict whichever outcome they believe to be most likely to arise. However, we document a robust disconnect between what people predict and what they believe to be most likely. This disconnect arises because people consider not only relative likelihood but also absolute likelihood when predicting. If people think that an outcome is both the most likely to arise and has a high absolute likelihood of arising, they regularly predict it to arise. However, if people believe that an outcome is the most likely to arise but has a low absolute likelihood (e.g., it has a 20% chance, and other outcomes have smaller chances), they less often choose it as their prediction, even though they know it is most likely. We find that, when the most likely outcome has a low absolute likelihood, the final outcome feels hard to foresee, which leads people to use arbitrary prediction strategies, such as following a gut feeling or choosing randomly, instead of predicting more logically. We further find that predictions are less likely to depart from the most likely outcome when manipulations encourage people to focus more on relative likelihood and less on the low absolute likelihood. People also exhibit a smaller disconnect when advising others than when predicting for themselves. Thus, contrary to common assumptions, predictions may often systematically depart from likelihood judgments. We discuss implications for research on judgments, predictions, and uncertainty.


Underpowered studies and exaggerated effects: A replication and re-evaluation of the magnitude of anchoring effects
Tongzhe Li et al.
Economic Inquiry, forthcoming

Abstract:
We reconsider one of the most widely studied behavioral biases: anchoring effects. We estimate that study designs in this literature, including replication studies, routinely fail to achieve statistical power of more than 30%. This study replicates an anchoring study that reported an effect size of a 31% increase in participants' bids. In the replication, we increased the design's statistical power from 46% to 96%, reducing the average exaggeration of a statistically significant result by a factor of seven. Our replication results reject the size of the original estimated effects. We find an estimated effect of 3.4% (95% CI [−3.4%, 10%]).


Deliberation during online bargaining reveals strategic information
Miruna Cotet, Wenjia Joyce Zhao & Ian Krajbich
Proceedings of the National Academy of Sciences, 18 February 2025

Abstract:
A standard assumption in game theory is that decision-makers have preplanned strategies telling them what actions to take for every contingency. In contrast, nonstrategic decisions often involve an on-the-spot comparison process, with longer response times (RT) for choices between more similarly appealing options. If strategic decisions also exhibit these patterns, then RT might betray private information and alter game theory predictions. Here, we examined bargaining behavior to determine whether RT reveals private information in strategic settings. Using preexisting and experimental data from eBay, we show that both buyers and sellers take hours longer to accept bad offers and to reject good offers. We find nearly identical patterns in the two datasets, indicating a causal effect of offer size on RT. However, this relationship is half as strong for rejections as for acceptances, reducing the amount of useful private information revealed by the sellers. Counter to our predictions, buyers are discouraged by slow rejections -- they are less likely to counteroffer to slow sellers. We also show that a drift-diffusion model (DDM), traditionally limited to decisions on the order of seconds, can account for decisions on the order of hours, sometimes days. The DDM reveals that more experienced sellers are less cautious and more inclined to accept offers. In summary, strategic decisions are inconsistent with preplanned strategies. This underscores the need for game theory to incorporate RT as a strategic variable and broadens the applicability of the DDM to slow decisions.


Insight

from the

Archives

A weekly newsletter with free essays from past issues of National Affairs and The Public Interest that shed light on the week's pressing issues.

advertisement

Sign-in to your National Affairs subscriber account.


Already a subscriber? Activate your account.


subscribe

Unlimited access to intelligent essays on the nation’s affairs.

SUBSCRIBE
Subscribe to National Affairs.