Making Autonomous Drones Safer Through Situation Coverage based Safety Testing
Posted on Wednesday 27 August 2025
As a PhD student at the University of York, my research focuses on developing a testing framework named as SCALOFT; Situation Coverage Analysis using the ALOFT testbed. ALOFT is an artifact designed for the self-adaptive systems research community to investigate self-adaptive drone controllers in mine operations. ALOFT itself was originally developed during the 2023 Hackathon at York, and I was part of the team. SCALOFT is a systematic testing methodology designed to rigorously evaluate and ensure the safety of autonomous drone operating within the complex, unpredictable environment of underground mines.
Understanding emerging challenges
Autonomous drones are becoming essential for dangerous jobs such as inspecting collapsed buildings, monitoring construction sites, and helping during disasters. This includes underground mines where they can go to places too dangerous for humans collecting data and checking for potential problems.
Working with drones in underground mines isn’t just a technical challenge, it’s a safety challenge too. The safety of autonomous systems in dynamic and hazardous environments poses significant challenges. Autonomous systems operating in dynamic, hazardous environments such as mines face continual challenges from degraded sensors, unpredictable terrain changes, and sudden equipment faults. They must adapt in real time, gracefully handle failures without human intervention, and safely interact with nearby personnel.
That’s why it’s so important to think about safety right from the design stage. In environments where people might be nearby, we need to identify possible failure scenarios, understand their consequences, and plan ways to reduce the risks. Even if a drone can handle flying in low light, for example, we still have to be sure it behaves safely in more complex situations and that any remaining risks are kept as low as reasonably practicable.
In traditional software testing, engineers use “coverage” measures to check how thoroughly they’ve tested a program. But these measures can fall short when applied to autonomous software, because they often ignore the unpredictable external factors like dust, noise, or human movement that can change how a system behaves in the real world.
In the domain of autonomous drones in underground mines, there is a lack of rigorous testing frameworks that systematically explore the wide range of situations a drone may encounter. Our study addresses this gap by focusing on situation coverage-based safety testing, with the goal of developing and evaluating a systematic method for generating diverse situations and ensuring coverage.
How we’re developing industry-focused solutions
SCALOFT is built on top of ALOFT testbed, which is a realistic drone simulation platform created using 3D laser scans of a mine recreated in our research lab. It’s a 3D digital copy of the mine for the popular Gazebo simulation environment. Gazebo lets us change and test different situations in the simulation easily.
ALOFT features a modified PX4 Vision quadcopter drone and several obstacles. The drone is equipped with a depth camera that captures both regular images and 3D data, along with motion sensors and a bump sensor to detect crashes.The simulation uses a full physics engine (i.e. Gazebo) and the PX4 quadcopter uses the PX4-Autopilot flight controller software for accurate simulation.
The drone is controlled externally through a companion computer that talks to the flight controller and uses ROS (Robot Operating System) to navigate. It also has PX4-Avoidance software to help it avoid obstacles while flying to set waypoints. To keep things safe, the drone can detect humans using a pre-trained YOLO (You Only Look Once) object detection model that analyzes the camera feed. When it spots a person, the drone slows down to avoid any accidents.
Our SCALOFT framework shifts the focus to situation coverage testing - how a drone behaves in a variety of plausible and potentially dangerous conditions.The fundamental idea of situation coverage is to identify potential environmental factors the system may encounter, to determine how they can change, and to make sure that both individual factors and their combinations are tested.
To help ensure that our work on SCALOFT is industry-relevant we specifically designed the framework according to our SACE guidance. This includes basing our proposed testing approach for SCALOFT to address the challenge posed in SACE Activity 29—“Do the test cases sufficiently cover the range of potential operating scenarios for the Autonomous system?”.
Additionally, we built SCALOFT to satisfy the SACE Verification Strategy’s Activity which demands a clear justification that the collected situations “cover all relevant aspects of an Operational Domain Model (ODM)”. An ODM formalizes assumptions about when, where, and under what conditions the autonomous system is expected to function, enabling structured reasoning about potential situations. In principle, the number of possible situations that an autonomous system could face is infinite; lighting conditions can shift gradually or abruptly, obstacles can vary in type and placement, air quality and weather conditions may fluctuate, and the physical layout of the environment may be altered in countless ways. Attempting to test every possible combination of these factors is not feasible. To make the problem tractable, we introduce the concept of a situation hyperspace—a structured representation of possible situations derived methodically from the ODM for now. This hyperspace serves as a bounded, yet representative, search space. Within this hyperspace, our SCALOFT testing approach systematically explores and generates situations for UAV, ensuring that the coverage is both practical and justifiable. In essence, the hyperspace acts as a bridge between the theoretically infinite situation set and the manageable subset required for rigorous verification. The situation hyperspace is initially hypothetical. We constructed a methodological grid from the ODM and selected situations that are easier to generate in the simulation, the plan is to later employ alternative methods to generate the hyperspace from the ODM, with a focus on identifying high-risk or more plausible situations.
As an initial approach, the hyperspace consists of only two elements from the ODM: the environmental element (including obstacles and lighting conditions) and mine structures (ranging from narrow to open spaces).
We will focus on five key factors that influence the drone’s behaviour:
- whether it’s dark,
- if there’s a person nearby,
- if there’s an obstacle in the way,
- whether a waypoint is close to a wall, and
- if the drone needs to turn a corner.
Each factor has two possible states—yes or no—combining these factors creates 32 unique situations to test. Additional factors can be introduced later, making the hyperspace expandable for further testing. The system then iteratively generates situations from the hyperspace using a random number generator to select rows, marks the corresponding cells in the coverage grid, and simulates the UAV's behaviour while performing safety checks. It keeps an eye on two simple safety requirements: avoid crashing and slow down if a person is nearby. If the drone breaks any of these, SCALOFT logs the incident. It collects data during the flight, which we can use later for detailed analysis and improvements. Each situation is recorded with a unique identifier for post-flight analysis, and any safety violations such as collisions are logged for improvement.
When it comes to measuring how thoroughly the drone has been tested, we plan to use Kullback-Leibler (KL) divergence rather than just a basic coverage percentage. The hope is that KL divergence will give us a better picture of how evenly our tests are spread across all possible scenarios, ensuring we’re not just focusing on a few easy or common cases but are testing fairly across the entire situation hyperspace.
However, neither a quantitative implementation nor an empirical assessment of this metric have yet been conducted.
How can we assure safety in unpredictable situations?
To really test SCALOFT we did something counterintuitive; we broke the system on purpose.
We added small faults to simulate real problems such as:
- A delay in recognizing a nearby person
- A false alarm that makes the drone think it’s about to crash
- A misjudgement that causes it to overshoot its destination
In each case, SCALOFT caught the safety violation. It flagged the issue, recorded what went wrong, and added it to a growing list of situations in which the drone has been tested.
As autonomous systems take on bigger roles from delivery robots to self-driving cars, we need better ways to test and trust them. Not just in labs, but in the unpredictable, messy, human world.
SCALOFT gives us a framework to do that. It helps answer the question: “Has this system really been tested in the kinds of situations it will face?”. By doing this, it helps to support compliance with the requirements of SACE, and thus is a step towards autonomous system developers assuring these kinds of systems in a convincing way.
What’s Next?
SCALOFT is an early step toward improving how we test autonomous drones, especially in hazardous environments. It bridges the gap between traditional software testing and the complex demands of real-world autonomy.
As this is ongoing PhD research, the next stages will focus on expanding the situation space with additional environmental and structural factors, refining the simulator, and developing more systematic testing strategies. We aim to gradually increase the fidelity and coverage of scenarios to better approximate real-world complexity.
As we continue to expand the situation space and explore new testing strategies, our goal is to make sure that when drones fly into the dark messy world, literally or metaphorically, they do so safely. In the longer term, we see opportunities for collaboration with industry and academic partners to further develop the SCALOFT methodology and ensure its applicability to real-world testing challenges.