UPPAAL Stratego Examples and Use Cases
Traffic Dilema
Suppose we need to travel to Sidney from Aalborg as soon as possible but we are not sure which mode of transport is the fastest: a bike, a car or a train. Each of them have different speeds and risks depending on traffic conditions. Below is a stochastic timed automaton capturing trip possibilities:
We start in the location Aalborg on the left, pack our things and leave within 2 minutes. Then we can choose/control our prefered mode of travel:
- Bike is relatively slow but it is the most predictable option which takes to Sydney in between 42 and 45 minutes and thus depends very little on uncontrollable traffic conditions.
- Car has a possibility of an Easy and fast trip in 20 minutes if traffic permits (probability weight 10 yields likelyhood of 10/11), but there is a risk of 1 in 11 to get stuck in Heavy traffic and thus the trip could be prolonged up to 140 minutes.
- Train have periodic schedules and thus may have a waiting time up to 6 minutes and it is quite likely (10 out of 11) that we will board the train successfully and spend up to 35 minutes in location Go before reaching Sydney. There is also a small risk (1 out of 11) that the train is canceled and we end up in Wait, where we would have to decide either to switch to another train or GoBack home and in Aalborg make a decision again.
Given our model, we can inspect sample trip trajectories via SMC simulate query where the locations are encoded as different levels:
simulate 1 [<=100] {
Kim.Aalborg + 2*Kim.Bike + 4*Kim.Easy + 6*Kim.Heavy +
8*Kim.Train + 10*Kim.Go + 12*Kim.Wait + 14*Kim.Sydney }
Below are two sample trajectories computed by Uppaal Stratego:
Normal train trip in 20 minutes: | Train and then bike in 50 minutes: |
---|---|
The following queries and resulting plot show where and when the time is being spent:
simulate 100 [<=100] {
14+Kim.Aalborg, 12+Kim.Bike, 10+Kim.Easy, 8+Kim.Heavy,
6+Kim.Train, 4+Kim.Go, 2+Kim.Wait, Kim.Sydney
}
Pr[<=60](<> Kim.Sydney) // probability of reaching Sydney within 60 minutes is about 0.97 (not certain, thus not safe)
E[<=200 ; 500] (max: trip) // mean trip is about 28 minutes
Generate and evaluate a safe strategy to get to Sydney within 60 minutes:
strategy GoSafe = control: A<> Kim.Sydney && time<=60
simulate 100 [<=100] {
14+Kim.Aalborg, 12+Kim.Bike, 10+Kim.Easy, 8+Kim.Heavy,
6+Kim.Train, 4+Kim.Go, 2+Kim.Wait, Kim.Sydney
} under GoSafe
A<> (Kim.Sydney && time<=60) under GoSafe // satisfied: using GoSafe we certainly reach Sydney within 60 minutes
E[<=200; 500] (max: trip) under GoSafe // mean trip time of GoSafe is about 33 minutes
The plot shows that most of the time is spent in Bike and Sydney and some in Train, therefore Bike is the sure way to reach Sydney but we can also risk catching a train within 10 minutes (if train does not come, we go back to Bike option).
Learn fast strategy by minimizing the time to get to Sydney:
strategy GoFast = minE (trip) [<=200] : <> Kim.Sydney
simulate 100 [<=100] {
14+Kim.Aalborg, 12+Kim.Bike, 10+Kim.Easy, 8+Kim.Heavy,
6+Kim.Train, 4+Kim.Go, 2+Kim.Wait, Kim.Sydney
} under GoFast
Pr[<=60](<> Kim.Sydney) under GoFast // probability is about 0.94 (rushing is risky!)
E[<=200; 500] (max: trip) under GoFast // mean trip time is 15.4 minutes (almost 2x faster than no strategy)
The plot shows that most of the time is spent in Easy and Heavy and then Sydney, therefore Car is the fastest choice on average but we can get stuck in Heavy traffic and be late.
In order to get safe and fast strategy we optimize the GoSafe by minimizing the trip time to get to Sydney:
strategy GoFastSafe = minE (trip) [<=200] : <> Kim.Sydney under GoSafe
simulate 100 [<=100] {
14+Kim.Aalborg, 12+Kim.Bike, 10+Kim.Easy, 8+Kim.Heavy,
6+Kim.Train, 4+Kim.Go, 2+Kim.Wait, Kim.Sydney
} under GoFastSafe
Pr[<=60](<> Kim.Sydney) under GoFastSafe // probability is >0.99 (we could not find a counter-example)
E[<=200 ; 500] (max: trip) under GoFastSafe // mean trip time is 22.4 minutes (faster than GoSafe but not as fast than GoFast)
The result shows that Train is the safe and fast option, but a few times one may need to go back and pick a bike.
The model and queries can be downloaded here: traffic.xml.
A walkthrough video presented at ATVA 2014: ATVA14-walkthrough.wmv.
Newspaper and Travel
Model: newspaper.xml
Safe and Optimal Cruise Control
This example is dedicated to Ernst-Rüdiger Olderog [LMT15]. It demonstrates the usage of symbolic techniques in synthesizing a safe discrete controller for continuous dynamics and then optimizing the control for maximum progress by using statistical techniques — all within a single integrated tool Uppaal Stratego.
The goal of cruise control is to maintain steady speed of a vehicle. As roads are becoming more congested, an adaptive cruise control would also maintain a safe distance to vehicles ahead. Picture below captures a setup with our Ego car and a car in Front.
Ego is equiped with a distance sensor. The distance changes dynamically based on a relative speed between cars and each car controls its speed by acceleration or breaking. The goal is to synthesize a control strategy whether to accelerate, maintain speed, break or even move backwards based on measured relative distance.
Our proposed solution is to model the system as a Stochastic Priced Timed Game, abstract away the stochastic and continuous dynamics, solve the Timed Game with a safety goal (when Ego should accelerate or break but maintain minimal safe distance despite what Front may choose to do), then take the safe Ego strategy back to the original Stochastic Priced setup and optimize the safe controller strategy into fast one by using learning techniques.
Modeling
First we model the controllable options of such a system, namely the actions of car Ego of picking its acceleration values via solid edges:
Front car can also exercise the same actions, but for us its transitions are not controllable (dashed edges). In addition Front may go beyond the reach of Ego’s distance sensor (Faraway location):
We introduce a scheduler to limit the decision taking for both cars to once per second. The scheduler dictates that Ego should make its choice first without the knowledge of what Front’s choice, then Front may choose its acceleration (may turn against Ego’s choice) and after one second the discrete approximation of velocities and distance is calculated:
The approximate integer distance is calculated based on averages using the function below:
int distance; // approximated distance between cars
int velocityEgo, velocityFront; // approximated velocities
int accelerationEgo, accelerationFront; // discrete acceleration
void _updateDiscrete()_
{
int oldVel, newVel;
oldVel = velocityFront - velocityEgo;
velocityEgo = velocityEgo + accelerationEgo;
velocityFront = velocityFront + accelerationFront;
newVel = velocityFront - velocityEgo;
if (distance > maxSensorDistance) {
distance = maxSensorDistance + 1;
} else {
distance += (oldVel + newVel)/2;
}
}
The actual dynamics of continuous distance and velocities is specified by ordinary differential equations over “hybrid clocks” as an invariant expression:
hybrid clock rVelocityEgo, rVelocityFront; // continuous "real" velocities hybrid clock rDistance; // continuous "real" distance between cars hybrid clock D; // "cost" to be minimized: integral over distance double _distanceRate_(double velFront, double velEgo, double dist) { **if** (dist > maxSensorDistance) **return** 0.0; **else** **return** velFront - velEgo; }
The hybrid clocks here are special: they do not influence the behavior of the model but rather monitor the “cost” of a simulation run, consequently they are abstracted away under symbolic model checking queries and enabled in statistical queries. Likewise, the stochastic features become plain non-deterministic under symbolic model checking queries. The RESET location provides special hint for statistical learning (the syntax is going to change in the future): the looping edge resets the variables which should not influence the strategy. In this case all variables are ignored as noise except the process locations.
We validate the discrete approximation by inspecting both discrete and continuous trajectories. The plot shows that discrete variables have exactly the same values as continuous ones at integer time points:
simulate 1 [<=20] { rDistance, dist, rVelocityFront - rVelocityEgo, velocityFront - velocityEgo }
However the verification shows that the distance is not always safe:
A[] distance > 5 // FALSE - unsafe
Statistical model checking also shows that the probability of a crash is unacceptably high if Ego is driven randomly:
Pr[<=100] (<> distance <= 5) // [0.85, 0.86] (95% CI)
The resulting plot shows that the probability density is highest in the beginning and the cumulative probability is over 85% after 100 seconds:
Synthesis of Safe Strategy
Controller synthesis uses symbolic game-solving techniques where Front is an oponent/adversary, Ego is the controlable player and the goal is to maintain the safe distance. Like with other symbolic techniques the continuous dynamics is abstracted away with Uppaal Tiga query and the resulting strategy is stored under the name safe:
strategy safe = control: A[] dist > 5 // success
The synthesized strategy is validated by simulating model behavior with controller choices from safe strategy. The plot shows that the distance trajectories are above 5, i.e. model does not exhibit crashes anymore:
simulate 10 [<=100] rDistance under safe
Once we have the safe strategy we can also estimate the smallest safe distance for any relative velocity. The plot below is a result of many infimum queries (symbolic model checking) for each possible relative velocity value:
inf { velocityFront-velocityEgo == _v_ } : dist under safe
The plot shows that the distance has a quadratic dependency on velocity when the velocity is negative (i.e. Front is approaching towards Ego) and it is a bit more complicated when Front is moving away.
Maximizing Progress by Minimizing Distance
Previous section showed some properties of the safe strategy. Apart from being safe, the strategy also exhibits rich (non-deterministic) behavior saying that there are many ways to stay safe. An emediate remark follows that even though the distance is safe it can still be arbitrary large, i.e. Ego may choose to stay home or even drive backwards without any progress. In order to guarantee maximal progress Ego should follow Front as close as possible, i.e. the progress is maximized when the distance between the cars is minimized. A statistical learning query is used to select a particular controllable transition from a non-deterministic set of safe transitions with an objective to minimized the expected cumulative distance D after 100 seconds under safe strategy drive:
strategy safeFast = minE (D) [<=100]: <> time >= 100 under safe
The learned result is a another strategy stored under the name safeFast which can also be checked using statistical queries. For example, here is a number of random runs using safeFast strategy:
simulate 10 [<=100] rDistance under safeFast
Notice that the distance trajectories are no longer diverging. The efficiency of the strategies is contrasted by comparing probability distributions of distances, which shows that under the safeFast strategy the distance is reduced from 138 to 24 on average:
Pr[rDistance <= maxSensorDistance + 1] <> time >= 100 under safe
Pr[rDistance <= maxSensorDistance + 1] <> time >= 100 under safeFast
In fact the new strategy is so effective at following the front car that the number of possible relative velocities is restricted to a narrow range of [-4,4] (as oposed to [-22,30] under safe) as shown in histogram of distances over velocities:
The box plot also shows that the mean distances stay low and only rarely diverge beyond 70.
Uppaal Stratego model: cruise.xml