What Today’s “World Models” Can and Cannot Tell Us About Safety

News

Posted on Tuesday 9 June 2026

In recent years, the idea of a “world model” has re-emerged as a central theme in artificial intelligence. In the first of a series on world models Research Fellow, Dr Jie Zou, explores what that means for safe AI.

Dr Jie Zou, a woman with long dark hair wearing glasses and a neutral coloured top and jacket is smiling into the camera

Artificial intelligence (AI) systems are increasingly being deployed in settings where they must interpret complex environments, anticipate change, and make decisions under uncertainty. This makes world models important. They promise to give AI systems an internal representation of the world rather than relying solely on direct input–output reactions. For safety-critical systems, however, this raises a more challenging question. If an AI system relies on an internal model of the world to support its decisions, how do we know whether that model is adequate for safety assurance? Can it represent the assumptions, uncertainties, causal relationships, and operational limits upon which safety claims depend?

This is the issue explored in this post. Recent advances in world modelling have significantly improved how AI systems perceive, represent, and predict the world. However, safety assurance requires more than prediction. It requires confidence that a system will remain acceptably safe under uncertainty, changing conditions, and operational constraints.

Understanding the world

A world model can be broadly understood as an internal representation that allows a system to interpret its environment and anticipate how that environment may evolve over time. Rather than reacting only to immediate observations, a system equipped with a world model attempts to capture underlying structure, relationships, and dynamics that can support prediction, planning, and decision-making.

In recent years, the idea of world models has re-emerged as a central theme in artificial intelligence. Much of this renewed interest has been shaped by influential strands of research associated with Fei-Fei Li, whose work emphasises spatial and embodied understanding of the physical world, Yann LeCun, who advocates predictive representation learning through internal models of world dynamics, and Yoshua Bengio, whose recent work increasingly explores abstraction, causality, and reasoning over latent world structure. Although their approaches differ, they collectively reflect a broader shift in AI: from systems that merely react to observations toward systems that develop internal models capable of understanding, predicting, and reasoning about the world.

This shift has been transformative. Modern AI systems are increasingly able to extract structure from complex sensory data, learn representations of environments and objects, and anticipate how situations may evolve over time. In robotics, autonomous driving, and embodied AI, these capabilities are becoming foundational.

An AI generated diagram showing the three step flow process of world models

Figure 1. The evolution of intelligent systems from reactive pattern-recognition architectures toward systems that maintain internal world representations. While modern world models have significantly improved prediction and planning capabilities, safety-critical systems introduce additional requirements related to assumptions, uncertainty, causal reasoning, operational boundaries, and assurance evidence.

Figure 1 illustrates the broader transition currently taking place in AI. Recent world-model research has moved beyond reactive input–output processing towards richer internal representations capable of supporting prediction and planning. This has enabled remarkable advances in autonomous systems, robotics, and embodied intelligence. Yet as these systems increasingly interact with the physical world, a different challenge begins to emerge.

The challenge from a safety perspective

For safety-critical systems, building a world model is only part of the problem.

Safety assurance refers to the process of developing justified confidence that a system will operate acceptably safely within its intended context. This confidence is built through evidence, analysis, testing, monitoring, and structured argumentation throughout the system lifecycle. In domains such as autonomous driving, robotics, healthcare, and other systems operating in open and uncertain environments, engineers must reason not only about what is likely to happen, but also about what could happen when assumptions are violated, sensing degrades, environmental conditions change, or rare combinations of events occur.

This introduces requirements that differ fundamentally from prediction alone.

Modern world models are highly effective at learning statistical regularities and supporting prediction or planning. They can model correlations, anticipate likely future states, and compress complex observations into useful latent representations. However, they are not typically designed to represent safety-relevant semantics explicitly. Operational assumptions, admissible behaviour, uncertainty bounds, causal dependencies, and safety constraints often remain external to the model itself.

This limitation becomes especially important when reasoning about interventions and changing operational conditions. As emphasised in causal inference research, particularly in the work of Judea Pearl, predicting what is likely to happen is fundamentally different from reasoning about what would happen if conditions were deliberately changed.

An AI generated diagram comparing the Statistical (Correlational) Model and the Causal (Intervention) Model

Figure 2. Statistical world models can identify patterns and correlations within observed data. Safety assurance, however, often requires reasoning about interventions and changing conditions. Understanding how safety is affected when visibility, sensing confidence, or system capability changes requires causal rather than purely correlational reasoning.

Figure 2 highlights this distinction. A predictive model may learn that poor visibility is associated with increased collision risk. However, assurance often requires understanding how safety changes when visibility itself deteriorates, when sensing uncertainty increases, or when system capability degrades. These questions concern causal relationships and operational assumptions rather than prediction alone.

Consider an autonomous vehicle validated under particular environmental conditions and sensing assumptions. Once deployed, the system must continuously determine whether those assumptions remain valid in the real world. If visibility deteriorates, sensor confidence drops, or road conditions move beyond expected limits, the system must be able not only to detect these changes but also determine their implications for safe operation.

This requires more than accurate prediction alone. It requires structured representations linking environmental context, uncertainty, system capability, operational assumptions, and safety constraints in ways that support both reasoning and justification.

Towards assurance-oriented world models

Crucially, these challenges do not arise only at runtime.

The same questions appear throughout the assurance lifecycle: during requirements specification, hazard analysis, simulation, testing, deployment, monitoring, and operational decision-making. Yet these activities are often supported by different models, assumptions, abstractions, and evidence structures.

As AI systems become increasingly adaptive and data-driven, maintaining consistency across these representations becomes more difficult. Perception models, planning components, simulation environments, runtime monitors, and safety cases may each encode different interpretations of uncertainty, operational boundaries, or acceptable risk.

This creates a fundamental tension between the capabilities provided by modern world models and the information required for safety assurance.

An AI generated image comparing Prediction versus Assurance. Prediction describes the world as it is. Assurance justifies that the system remains safe within it's operating limits.

Figure 3. Prediction and assurance serve complementary purposes. Prediction focuses on forecasting how the world is likely to evolve, whereas assurance focuses on determining whether system behaviour remains acceptable under uncertainty, changing conditions, and operational constraints.

As shown in Figure 3, prediction and assurance address different engineering questions. Prediction helps estimate future states of the world, while assurance seeks to establish justified confidence that the system remains within acceptable safety bounds.

The observations above do not suggest that current world-model research is misguided or insufficient. On the contrary, advances in representation learning, predictive modelling, embodied intelligence, and causal reasoning are likely to be essential foundations for future intelligent systems.

The challenge is that the objectives of prediction and the objectives of assurance are not necessarily the same.

Most existing world models are designed to help systems understand and anticipate the world. Safety assurance, however, requires something additional: representations capable of supporting explicit reasoning about assumptions, uncertainty, causal dependencies, admissible behaviour, operational constraints, and the conditions under which safety claims remain valid. This suggests the need for a different perspective on world modelling, one oriented not only around prediction and planning, but also around explicit semantics, causal structure, uncertainty representation, operational constraints, and assurance-oriented reasoning.

In the next post, we introduce Structural Causal World Models (SCWMs), an approach intended to support assurance-oriented reasoning across the system lifecycle.

Interested in our work? Sign up to our newsletter

CfAA newsletter