Conformal Prediction for Interactive Planning with Smart Safety

How a new “adversarially robust” conformal prediction framework keeps autonomous systems safe— even when the world reacts back.

Keywords

; ; ; ;

Published November 16, 2025 By EngiSphere Research Editors

In Brief

A recent research introduces an iterative, adversarially robust conformal prediction framework that keeps autonomous agents safe in interactive environments by accounting for how policy updates themselves change the behavior—and distribution—of surrounding agents.

In Depth

When the World Reacts to You

Autonomous technologies—self-driving cars, warehouse robots, drone couriers—are getting better every year. But there’s still one giant challenge: the world reacts to them.

Think of a self-driving car approaching a pedestrian crossing.
The car adjusts its speed → the pedestrian reacts by slowing or speeding up → the car reacts again… and so on. It’s a loop of interactions, not a one-way prediction.

Most existing safety tools don’t handle this well—especially conformal prediction (CP), a statistical method that wraps predictions in uncertainty bounds with guaranteed coverage. CP assumes that the data you test on behaves similarly to the data you calibrated on. But interactive environments break this assumption:

When the agent changes its policy, the world’s behavior also changes—so the calibration becomes invalid.

This new paper presents the first framework that retains statistical safety guarantees even under policy-induced distribution shifts. Let’s explore how it works.

Core Contribution at a Glance

The authors propose an iterative safe-planning framework that uses adversarially robust conformal prediction to maintain statistical safety guarantees across repeated policy updates.

Their system solves two major problems:

1. Interaction-driven distribution shifts

Traditional conformal prediction assumes the world behaves the same during calibration and deployment. But interactive agents (humans, robots, vehicles) react to the ego-agent’s actions, breaking that assumption.

2. Circular dependency (the “chicken-and-egg” problem)

Updating the policy changes the environment’s behavior.
But the safety certificates depend on environment behavior.
So updating one breaks the other.

They fix this with an iterative radius update rule that accounts for how much a policy update can influence the environment.

3. Guaranteed per-episode safety

Instead of only offering “after-convergence” guarantees like previous work, the authors provide safety at every single episode—critical for real-world robotic deployment.

Why Conformal Prediction Isn’t Enough (Yet)

Conformal Prediction is powerful:

  • It creates a prediction tube around future trajectories.
  • It guarantees: the true trajectory lies inside the tube with probability 1−α.

But conformal prediction requires exchangeability—roughly, calibration data must come from the same distribution as test data.

Interactive environments destroy that:

  • Change your policy → pedestrians behave differently
  • Pedestrian behavior now differs from your calibration data
  • Safety guarantees break

So the authors ask:

Can we adjust conformal prediction tubes to remain valid even after the policy changes?

Yes—with Adversarially Robust Conformal Prediction (ACP).

Enter Adversarially Robust Conformal Prediction

ACP extends CP by assuming the data you test on may be perturbed within a known budget.
In the context of interactive planning:

The “adversary” = the environment reacting to your policy update.

So instead of recalculating CP from scratch or assuming stability, ACP adds a safety buffer that accounts for how much the environment might change in response to the new policy.

This buffer is derived through policy-to-trajectory sensitivity:

  • If you change your policy slightly
  • How much can the environment’s trajectory realistically shift?

Formally, the authors prove:

max_t || y_t(pi_{j+1}) - y_t(pi_j) || <= βT * || pi{j+1} - pi_j ||

This “βₜ” is environment sensitivity to policy changes.

This leads to a new tube radius:

​r_{j+1} = q_j + M_{j+1}

where:

  • q_j = recalibrated CP radius under current policy
  • M_{j+1} = environment shift due to the policy update

This ensures safety before the new policy is even executed, giving strong episode-by-episode guarantees.

The Iterative Safe-Planning Framework

Each episode consists of four steps:

1. Initialize

Start with a conservative but guaranteed-safe radius ​r_0.
This creates the first safe policy ​pi_0.

2. Deploy & collect data

Execute the current policy pi_j.
Collect multiple real-world (or simulated) environmental trajectories.

3. Recalibrate using CP

Using the data collected under pi_j, compute a conformal radius q_j that captures environment behavior under the current policy.

This is standard CP.

4. Update the uncertainty radius

Here’s the magic:

r_{j+1} = q_j + βT * || pi{j+1} - pi_j ||

But pi_{j+1} depends on r_{j+1}—a circular dependency!
They propose two solutions:

Option A: Implicit Solver (exact but expensive)

Solve the implicit inequality:

r_{j+1} >= q_j + βT * || pi*(r_{j+1}) - pi_j ||

Iterate until convergence.

Option B: Explicit Solver (fast, analytic, used in experiments)

Assume the planner is Lipschitz-continuous:

|| pi*(r) - pi*(r') || <= L_U * | r - r' |

This yields:

If q_j ≤ r_j (shrink):

r_{j+1} = (q_j + κ * r_j) / (1 + κ)

If q_j > r_j (expand):

r_{j+1} = (q_j - κ * r_j) / (1 - κ)

where κ = β_T * L_U is a “closed-loop sensitivity gain.”

5. Update the policy

Solve the planning problem using the new radius r_{j+1}.
Produce the next safe policy pi_{j+1}.

6. Repeat until convergence

Radius shrinks, stabilizes → policy improves → safety is maintained.
A beautiful cycle.

Case Study: A Car and a Pedestrian

To test the method, the authors simulate a:

  • 2D self-driving car
  • Interacting pedestrian (who reacts to the car)
  • 5-step planning horizon

The pedestrian exhibits repulsive behavior: getting closer to the car makes them deviate more.

The predictor, however, ignores this! It assumes a simple straight-line path.
This creates a realistic mismatch.

The results show:

✔ The uncertainty radius converges
✔ The planned paths improve each episode
✔ CP tubes remain valid
✔ Safety constraints hold with high probability
✔ Performance improves steadily

One of the key figures shows tube coverage and safety coverage staying above target levels across episodes.
This is exactly the type of reliability you need in real autonomous systems.

What Makes This Paper Important?

This work addresses one of the hardest problems in real-world robot learning:

How do you keep guarantees valid when your actions change the world’s behavior?

Existing CP-based planners struggle here because CP breaks under distribution shift.

This framework:

✔ accounts for policy-induced distribution shifts
✔ keeps high-confidence statistical guarantees
✔ improves performance over time
✔ provides per-episode safety (not just at convergence)
✔ works in interactive environments where classical CP fails
✔ is modular—works with many planners and predictors

This is a big step toward deploying CP-based planners in real settings involving people, cars, drones, pets, cyclists, robots… the entire interactive zoo.

Future Directions & Prospects

The paper opens many promising research paths:

1. Richer interaction models

The current predictor is fixed for simplicity.
Future systems could include:

  • learned interaction-aware predictors
  • multi-agent models
  • human intent inference models

Integrating these could drastically shrink uncertainty tubes.

2. Multi-agent interactive environments

Extending the method to environments with:

  • many pedestrians
  • multiple autonomous vehicles
  • complex social interactions

would make it applicable to real city-scale autonomy.

3. Adaptive sensitivity estimation

β_T and L_U determine how much uncertainty must expand.
Real-world systems could learn or adaptively update these values online for tighter bounds.

4. Integrating with reinforcement learning

This CP-ACP framework could be wrapped around RL agents to ensure safe policy improvement.

5. Tightening the adversarial budget

Future work might produce less conservative (tighter) bounds on policy-induced distribution shifts—allowing faster learning and better performance.

Closing thoughts: A Safer Path Forward

This research provides a framework that:

  • acknowledges the interactive nature of the real world
  • uses adversarially robust conformal prediction
  • ensures per-episode safety in changing environments
  • guarantees stability and convergence under reasonable conditions
  • empirically validates the approach with a car–pedestrian case study

For a future where robots and humans share spaces, interact, and negotiate motion continuously, this is precisely the type of principled, mathematically grounded progress we need.


In Terms

Conformal Prediction (CP) - A method that wraps predictions in a statistical “bubble” ensuring the real outcome falls inside with a guaranteed probability.

Exchangeability - A condition where data points could be shuffled without changing their meaning — essential for CP to work.

Distribution Shift - When real-world data changes compared to training or calibration data, often breaking model assumptions.

Interactive Environment - A setting where the environment responds to the agent’s actions (e.g., a pedestrian reacting to a car).

Policy (Control Policy) - A rule or strategy that tells an autonomous system what action to take in each state.

Policy Update - Improving or modifying the control strategy — which may unintentionally influence how other agents behave.

Adversarially Robust Conformal Prediction (AR-CP) - An enhanced CP method designed to stay valid even when the data shifts or behaves adversarially.

Nonconformity Score - A measure of how much a real trajectory deviates from a predicted one; higher means more surprising.

Safety Set / Safety Tube - A protective region around predicted trajectories that the real trajectory must stay inside to remain safe.

Sensitivity Analysis - A technique to measure how much the environment’s behavior changes when the agent slightly adjusts its policy.

Robust Optimization - Planning that guarantees safety against all uncertainties within a predefined set — often conservative but reliable.

Contraction Analysis - A mathematical tool used to show that repeated updates in a system will eventually settle and converge.

Chance Constraint - A probabilistic safety rule, e.g., “stay collision-free with at least 90% confidence.”

Calibration Data - Previously collected trajectories used to tune the CP model’s uncertainty bounds before deployment.


Source

Omid Mirzaeedodangeh, Eliot Shekhtman, Nikolai Matni, Lars Lindemann. Safe Planning in Interactive Environments via Iterative Policy Updates and Adversarially Robust Conformal Prediction. https://doi.org/10.48550/arXiv.2511.10586

From: ETH Zürich; University of Pennsylvania.

© 2026 EngiSphere.com