Understanding AI and learning outcomes

Education is one of AI’s most promising frontiers. With tools like ChatGPT, personalized learning support can be available to any student, anywhere, at any time.

But the education sector is still early in its understanding of the impact of AI on learning outcomes. Last year, our team set out to study the use of tools like study mode⁠ and found promising gains in student performance. But our research also raised an important question: how can we assess how AI influences a learner's progress over time, not just on a final exam?

This is a broader ecosystem challenge. To-date, most research methods focus on narrow performance signals—such as test scores—and lack the ability to assess how students actually learn with AI in real-world settings, and how that use shapes outcomes over time.

To address this gap, we developed the Learning Outcomes Measurement Suite,a framework created with Estonia’s University of Tartu and the SCALE Initiative at the Stanford Accelerator for Learning to support longitudinal measurement of learning outcomes across different educational contexts.

Extensive validation is underway through a randomized controlled trial, and further research is planned with founding organizations in the Learning Lab, OpenAI’s learning research ecosystem, including researchers from Arizona State University, UCL Knowledge Lab, and MIT Media Lab (building on prior collaborative studies⁠).

Today, we’re sharing an overview of how the measurement suite works and why it matters. Over time, we intend to publish more research and release the measurement suite as a public resource for schools, universities, and education systems worldwide.

> “This research allows us to learn quickly while also laying the groundwork for a deeper understanding of how AI can be thoughtfully integrated into schools in ways that truly matter. We want to understand how these tools can support rigorous academic learning while also cultivating higher-order thinking, creativity, curiosity, and students’ confidence in themselves as learners.”

–Susanna Loeb, Professor of Education and Faculty Director, SCALE Initiative at Stanford University

#### Summary of takeaways

#### Origins and early research

When students use AI tools to study and learn, it can mean many different things—from going to AI for quick answers to using it to work through problems step by step with tutor-like guidance. To encourage users to engage with ChatGPT in ways that support deeper understanding and skill-building, OpenAI introduced study mode⁠ last year. Under the hood, study mode is powered by custom system instructions we’ve written in collaboration with teachers, scientists, and pedagogy experts to reflect a core set of behaviors that support true learning, not just answers—using scaffolding, checks for understanding, and guided practice.

To test whether this kind of pedagogically aligned AI interaction style translates into better learning outcomes, we ran a randomized study with over 300 college students preparing for neuroscience and microeconomics exams. While analysis is still underway, early results give us confidence that a pedagogically aligned AI interaction style, encouraged through features like study mode, can improve learning outcomes. But this research also surfaced an important reality: what really matters is whether the gains and associated productive behaviors remain durable over time.

Participants were assigned to one of three groups: a control group studied using traditional online resources such as Google Search and YouTube, with AI generated overview features disabled, while two additional groups were given access to one of two study mode variants designed to guide students through the learning process in slightly different ways. Baseline quizzes and onboarding surveys were collected ahead of time to adjust for differences in prior coursework exposure, study habits, academic confidence, and familiarity with AI tools. Students completed timed study mode sessions before each exam, with the two study mode variants counterbalanced across subjects.

This setup was designed to reflect real world study conditions rather than a tightly controlled lab environment. Participation was not tied to exam performance, and not all students used study mode to the same extent during the nominal 40 minute sessions. This allowed us to measure and report intention-to-treat (ITT) effects, the impact of being provided access to the tool under realistic rollout conditions—in other words, the causal impact of being offered study mode, acknowledging that engagement can vary in practice.

We measured performance on each exam separately. In our randomized study, improvements were not uniform across subjects, and levels of engagement with study mode varied across participants.

#### Study mode (variants A & B) vs Control (no AI group): Adjusted mean exam scores

T he effect remains consistent when we compare each study mode variant separately with the control.

While this reflects real world variation, it highlighted a deeper limitation in how learning outcomes are typically measured.

Most existing evaluation approaches rely on fixed interventions assessed over short time windows, using outcomes such as test scores or final essays as primary signals. These methods are not designed to capture the core mechanism through which AI affects learning in practice: ongoing, personalized interactions that evolve alongside a learner’s own strategies, preferences, and study habits. Nor do they surface whether improvements in one capability, such as short term recall, may come alongside trade offs in others, such as persistence, autonomous motivation, or creative problem solving. As a result, they miss the longitudinal cognitive effects that ultimately determine whether AI meaningfully improves learning.

Because learning environments differ widely across countries, curricula, and institutional goals, outcomes from one-off studies rarely generalize across systems. Measurement approaches must therefore be flexible enough for different education systems to define what success looks like in their context, evaluate AI against their own standards, and iterate accordingly.

Building a better measurement system

Based on the learnings from OpenAI’s study mode research, we have been building a structured measurement system to measure AI’s impact on learners at scale, and create a mechanism to improve models based on those outcomes. It is grounded in three signals—how the model behaves, how learners respond, and what measurable cognitive outcomes result over time. It includes:

When combined, we refer to this measurement system as the Learning Outcomes Measurement Suite.

It produces important signals the education ecosystem can use: structured views of learning moments, dashboards showing how outcomes shift over time across cohorts, indicators of model performance against teaching and tutoring rubrics, and outcome measures aligned to standardized assessments and short learner questionnaires. Where available, it can incorporate partner-provided ground truth such as exam scores, classroom observations, or attendance.

All data de-identified

It also enables our partners to understand the deeper cognitive impacts of using AI for learning over time, as we are able through this system also to track impact on capabilities such as:

This reflects our overall efforts to not simply focus on narrow definitions of learning outcomes (test scores rising), but the holistic capabilities that underpin learning. It also reflects our belief that there will be no silver bullet in terms of what to optimize for: systems and educators will need to be empowered to guide trade offs in alignment with pedagogical best practice and approaches.

Where we go from here

We are validating the Learning Outcomes Measurement Suite through large-scale studies before making it broadly available. This work is underway with the University of Tartu and Stanford’s SCALE Initiative across nation-scale partners like Estonia, where the measurement suite is being studied with nearly 20,000 students aged 16-18 over several months. Student use will happen in close collaboration with local leaders, to ensure safety and alignment with local curricula.

> “Estonia has always approached education not as static but as a system we continuously improve. With AI becoming part of that picture, the big question is how we measure AI's long-term impact on learning. That's what we're figuring out in collaboration with OpenAI. Students are keen to be involved in the development process, and many want to learn how to support learning with AI. It feels like a real turning point, and we’re excited to contribute methods that other education systems can reuse and build on.”

–Jaan Aru, Associate Professor at the Institute of Computer Science, University of Tartu

This work builds on a broader body of collaborative research underway. In addition to the outcomes research being conducted through founding partners in the Learning Lab, OpenAI is supporting studies at the intersection of learning and labor—examining how AI shapes students’ academic pathways, career decisions, and the ways institutions can support responsible adoption. This research is happening across Bocconi University, Innova Schools and Tuck School of Business at Dartmouth, San Diego State University, Stony Brook University, and others.

As we run longer-term studies on how students learn best with AI, we intend to share findings and work with the broader education ecosystem to ensure AI benefits learners everywhere.

Those interested in receiving updates on this work can sign up here⁠.

OpenAI helps Hyatt advance AI among colleagues Global Affairs Apr 20, 2026

Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI Global Affairs Apr 13, 2026

Introducing the Child Safety Blueprint Safety Apr 8, 2026

Our Research * Research Index * Research Overview * Research Residency * Economic Research

Latest Advancements * GPT-5.5 * GPT-5.4 * GPT-5.3 Instant * GPT-5.3-Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation(opens in a new window) * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Academy * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

English United States

Understanding AI and learning outcomes

Elon Musk's Fraud Claims Against OpenAI Dismissed; Trial on Other Claims Set to Begin

AI Companies Intensify Lobbying Efforts in the US and Europe

Elon Musk and Sam Altman's Diverging Paths in AI Development

Lachy Groom to back India startup Pronto at a $200M valuation, sources say

Latest Briefs