ሃሳብ፡ እና፡ ሂሳብ : June 2025

The news these days is filled with the tit-for-tat conflict between Israel and Iran. Both countries are volleying missiles at each other while the defending nation is using some means to intercept and destroy said missiles. Israel in particular is known for its Iron Dome, a mobile air defense system. During a 2014 war the Iron Dome was said to be nearly 90% effective. The idea of such an air defense system is currently being contemplated by the US and Canada.

Obviously, any nation would love to simply zap all missiles coming their way. But how does one build an Iron Dome? Well, there is a lot that goes into building it (primary among which might be money) but let’s focus on one aspect: how does the interceptor catch up with the missile? To answer this question, we delve into the math of pursuit curves.

Pursuit curves describe the path taken by a pursuer/chaser that is always moving directly toward a moving target. Let the pursuer’s position be given by \( \vec{P}\) and the target’s by \( \vec{T}\). We make the idealistic assumption that the pursuer always moves towards the target. Hence,

\[ \frac{dP}{dt}=v\frac{\vec{T}-\vec{P}}{|\vec{T}-\vec{P}|} \] where \(v\) is is the speed of the pursuer which we assume to be constant. Visually, we see that the vector \(\vec{T}-\vec{P}\) points in the direction of \(\vec{T}\) from \(\vec{P}\).

Now, if the target moves in a very simple known path, we often can solve the trajectory of the pursuer using analytical methods for Ordinary Differential Equations. However, this ability quickly gets bogged down with nuance. Even for a simple circular path by the target with a speed equal to the pursuer’s, we get quite intricate pursuit curves. This is demonstrated in the GIF below.

Due to multiple enviromental interactions (such as wind) or due to internal imperfections, the target might move somewhat randomly. Its equations of motion would then be given by the Stochastic Differential Equations:

\[ dT=\vec{\mu}(t)dt+\vec{\sigma}(t)d\vec{W}(t) \] where \(\vec{\mu}\) is the deterministic(intended) velocity and the term "\(\vec{\sigma}(t)d\vec{W}(t)\)" captures the noisy aspect. \(\vec{\mu}\) can describe circular motion like before or some other chosen shape. Going forward, we will set \(\vec{\mu}\) to zero to emphasize the challenge of the random target. We will set "\(\vec{\sigma}\) as a single constant and see that, even then, we face interesting challenges when we add in some realism.

We will now see a noisy chase. For better visibility we will make the pursuer’s effective speed slower than the effective speed of the target. By effective speed we mean the root mean squared displacement over time step \(\Delta t\) (\(\frac{\sqrt{\langle \Delta x^2\rangle}}{\Delta t }\)). (On a side note, the GIF reminds me of a Key and Peele skit).

We will now introduce a challenge that brings us closer to the real world. So far, we have been giving the pursuer as many observations of the target as there are time steps in the simulation. But this is unrealistic. The interceptor can only afford to watch the target a few times during the chase. Hence, weith a fixed budget of observations, the target becomes a more distant goal.

There are many ways to schedule the observation times with a fixed budget of observations (M). One is to set an absolute final time T for the chase and evenly split it by M. We will call this a uniform budget. A second and third approach are to evenly spread the observations in the first and fourth quarters between \( t=0\) and \(t=T\). We will call these the front-loaded and back-loaded budgets respectively. A fourth approach is to randomly sample the observation times with a uniform distribution across the full time interval. We will call this a random budget.

With M=10, T=20 sec (I am skipping non-dimensionalization for clarity), and with pure Brownian noise like before we get the following demonstrations.

In these runs of the simulation only the front-loaded approach seems to be doing very bad. But we only had one run for each approach and with a pretty simple target behavior. Plus, the other three methods do not seem to be doing too good either. With a modestly small catch radius all of them would still be in pursuit (the catch radius is the radius of the circle centered at the target being within which signifies successful interception).

One can not help but wonder if the observation times can be optimized to maximize the likelihood of being within the catch radius before the final time T or at least get close to this radius when the clock stops. One clever approach is to have a geometric sequence of observation times which waits longer and longer later when sampling observations. This is clever for Brownian Motion (BM) because BM's uncertainity grows over time and therefore later observations would be increasingly misleading and it would be best to hunker down and wait longer intervals later on. But this is predicated on a Brownian target. What if we have other motion patterns like a Levy Walk or a sand walk from Dune? What is a good foundational method to optimize our observation times?

The answer is AI or, more modestly, Reinforcement Learning (RL). RL is a type of machine learning where an agent (AI) develops a policy for its actions based on positive or negative reinforcement. It looks at the state of things it is allowed to see, makes decisions, and faces consequences for its decisions. It does this many many times living and learning until the training stops. For our RL agent we have the following setup:

RL agent decision/action: waiting time until next observation
RL agent observable: current pursuer and last known target positions, current time, and remaining observation budget
RL agent reward: +1 if target is intercepted (within catch radius) otherwise negative of final distance between pursuer and target

Using a standard RL algorithm (a policy gradient method) and training over a modest number of episodes we get an average total reward of -0.69 when we test the RL agent 100 times (simulate 100 chases with the agent picking the observation times). Under the same set of circumstances, the other methods get the following average total reward:

Uniform budget: -1.65
Front-loaded budget: -1.81
Geometric budget: -1.07

Clearly, the RL agent wins.

This is just scratching the surface of the power of RL. Even with this scenario there is a lot more we can do to make the interception more likely and more efficient by giving the RL agent more authority. God willing, this will be a topic for another day.

To end on an eerie note, we will show a GIF of a pursuit with the RL agent picking the observation times and with circumstances like that of the previous GIF (so as to compare). Note that we are only plotting positions when the pursuer makes an observation/decision.

ሃሳብ፡ እና፡ ሂሳብ

Thursday, June 19, 2025

Pursuit Curves and Reinforcement Learning

Thursday, June 5, 2025

Some Challenges in Curve Fitting