Local Volatility Model Calibration under Optimal Control
Mathwrist White Paper Series

Fengdong Du, ©Mathwrist LLC 2024

(January 28, 2024)

Abstract

High quality local volatility model calibration has always been a very challenging task in quantitative finance field. In this document, we present a high level overview of the PDE constrained optimal control technique that Mathwrist recently developed as a web application for commodity underlyings. This methodology can be readily applied to equities with minor modifications.

1 Introduction

Assume an underlying asset $S(t)$ follows the dynamics

\frac{dS(t)}{S(t)}=\mu(t)dt+\sigma(S,t)dW(t)

(1)

, where the local volatility function $\sigma(S,t)$ is deterministic. If $S(t)$ is a martingale in some risk-neutral measure, Dupire [1994] showed that the local volatility function satisfies a parabolic linear partial differential equation (PDE)

\frac{1}{2}\sigma^{2}(K,T)K^{2}\frac{\partial^{2}C}{\partial K^{2}}=\frac{% \partial C}{\partial T}

(2)

, given European call option price $C(K,T)$ with maturity $T$ and strike $K$ . In a more general setup where interest rate $r(t)\neq 0$ and $S(t)$ pays continuous dividend yield $q(t)$ , e.g. $S(t)$ follows a SDE,

\frac{dS(t)}{S(t)}=(r(t)-q(t))dt+\sigma(S,t)dW(t)

, Gatheral [2006] gives a general form of the local volatility PDE using undiscounted European call option price,

\frac{1}{2}\sigma^{2}(K,T)K^{2}\frac{\partial^{2}C}{\partial K^{2}}+(r(t)-q(T)% )\left(C-K\frac{\partial C}{\partial K}\right)=\frac{\partial C}{\partial T}

(3)

The local volatility model is extremely useful in many ways. First, if the local volatility function $\sigma(K,T)$ is known, we can set the initial value $C(K,T=0)=(S_{0}-K)^{+}$ and solve European call prices for all $T>0$ and all $K$ simultaneously. This is effectively a market making capability. In other words, one can quote the entire listed option market in terms of $\sigma(K,T)$ .

Secondly, it can be shown that the local variance $\sigma^{2}(K,T)$ is a risk-neutral expectation of instantaneous variance conditional on $S(T)=K$ under T-forward measure. Local volatility surfaces can be used as the foundation to construct more complicated stochastic local volatility models for exotic derivative products.

Lastly, probably most importantly, for large option trading books and portfolio/risk management, the local volatility model is a consistent framework for option hedging and risk aggregation. In the traditional delta-hedge or delta-sigma-hedge, it is often tricky to decide what implied volatility should be used. Different practice, e.g. sticky delta, sticky moneyness, hedge ratio adjustments, vanna-volga etc could be very confusing even to professionals when context changes. They often are based on different assumptions and very likely introduce self-inconsistency if mixed together.

2 Existing Calibration Techniques

The most common practice to calibrate local volatility models currently perhaps is to first fit a smooth implied volatility surface, compute and differentiate European call prices using the implied volatility surface. This requires the implied volatility surface to be at least second order smooth. Small kinks on the implied volatility surface could be largely magnified in the local volatility surface hence produce unusable results. There are a variety of smoothing or smooth fitting techniques. Overall, it could be very challenging to fit a smooth implied volatility surface that perserves enough accuracy to match market quotes and yet is able to handle data noises. Most fundamentally though, implied volatility surface itself is not a mathematical model. It is not arbitrage free by construction. Additional care is necessary to prevent operable arbitrage opportunities.

PDEs (2) and (3) can be expressed in terms of implied volatility instead of call prices. e.g. Gatheral [2006],

\frac{\partial W}{\partial T}=V_{L}\left[1-\frac{y}{W}\frac{\partial W}{% \partial y}+\frac{1}{4}\left(-\frac{1}{4}-\frac{1}{W}+\frac{y^{2}}{W^{2}}% \right)\left(\frac{\partial W}{\partial y}\right)^{2}+\frac{1}{2}\frac{% \partial^{2}W}{\partial y^{2}}\right]

(4)

, where $V_{L}$ and $W$ are local volatility squared and implied volatility squared respectively, $y=\log(K/F)$ for forward price $F$ .

Therefore alternatively, one can attempt to fit a smooth implied volatility surface and numerically differentiate using equation (4). But this is even tricker to do because equation (4) is highly nonlinear. Apart from nonlinearity, boundary conditions add extra difficulties if we want to numerically solve PDE (4).

It would be ideal to bypass the step of building an implied volatility surface and directly fit the local volatility model to market quotes. However, this could be an ill-posed “inverse problem” that generally leads to unstable calibration and delivers unexpected results. There seem to be very few works in the direction of solving an optimal control problem under PDE constraints. Mathwrist completed its first version of proof-of-concept work in 2023 and then developed a web application for user experience. Interested users are welcome to compare the performance of our web application to other methodology, particularly from the following aspects,

•

Ability to handle large quote set. ¹¹ 1 For example WTI crude oil option has hundreds quotes almost at every expiry in the first year.
•

Calibration turnaround time.
•

Accuracy to match the market.
•

Shape of the local volatility surface and implied volatility smile produced from the model.
•

Vulnerability to market data noise.

3 Commodity Local Volatility

3.1 Dynamics of Future Contracts

Specific for commodities, we assume a future contract $F(t,T_{i})$ with maturity at fixed calendar time $T_{i}$ follows a SDE

\frac{dF(t,T_{i})}{F(t,T_{i})}=\alpha(t,F)\sigma_{T_{i}}\left[e^{-2\beta_{1}(T% _{i}-t)}+\lambda^{2}e^{-2\beta_{2}(T_{i}-t)}\right]^{\frac{1}{2}}dW(t)

(5)

Parameters $\beta_{1}\geq 0,\beta_{2}\geq 0$ are decay factors to produce Samuelson effect or humped volatility shape. Parameter $\lambda\geq 0$ can be interpreted as a blending factor between the two exponential functions. In the special case when $\beta_{1}=0$ , $\sigma_{T_{i}}$ can be interpreted as the long term volatility and $\lambda$ behaves like a leverage factor on short term volatility. All together, $\beta_{1}$ , $\beta_{2}$ and $\lambda$ determine the overall backbone shape of the vol term structure. These parameters are shared across all future contracts.

Parameter $\sigma_{T_{i}}\geq 0$ on the other hand is specific to the future contract $F(t,T_{i})$ . It is sometimes called the “big-T” parameter. And finally, $\alpha(t,F)$ , sometimes referred as the “small-t” function, in our model setup is a local smile scaling surface that is also common across all future contracts maturing at $T_{i},i=1,\cdots,n$ . This small-t and big-T combined approach gives rich enough volatility dynamics in commodity derivative modelling.

3.2 Calibration of $\beta_{1}$ , $\beta_{2}$ , $\lambda$ and $\sigma_{T_{i}}$

Parameters $\beta_{1}$ , $\beta_{2}$ and $\lambda$ can be calibrated from historical future prices. We take an alternative approach and fit them to the ATM implied volatility from all available future options. This step is carried out before we calibrate the smile scaling function $\alpha(t,F)$ .

Now, consider a “reduced” model with flat smile for all futures,

\frac{dF(t,T_{i})}{F(t,T_{i})}=\sigma_{T}\left[e^{-2\beta_{1}(T_{i}-t)}+% \lambda^{2}e^{-2\beta_{2}(T_{i}-t)}\right]^{\frac{1}{2}}dW(t)

(6)

Note that here, we replace the contract specific $\sigma_{T_{i}},i=1,\cdots,n$ by a single parameter $\sigma_{T}$ . Introducing a common parameter $\sigma_{T}$ for all futures facilates a stable calibration of $\beta_{1}$ , $\beta_{2}$ and $\lambda$ . These parameters determine the overall shape of the volatility backbone term structure.

Let $Y(t,T_{i})=\log\left(\frac{F(t,T_{i})}{F(0,T_{i})}\right)$ . For an option expiry $T<T_{i}$ , $Y(T,T_{i})$ has the normal distribution $Y(T,T_{i})\sim\mathcal{N}\left(-\frac{\text{Var}(T,T_{i})}{2},\text{Var}(T,T_{% i})\right)$ ,

\text{Var}(T,T_{i})=\sigma_{T}^{2}\left[\frac{e^{-2\beta_{1}(T_{i}-T)}-e^{-2% \beta_{1}T_{i}}}{2\beta_{1}}+\lambda^{2}\frac{e^{-2\beta_{2}(T_{i}-T)}-e^{-2% \beta_{2}T_{i}}}{2\beta_{2}}\right]

(7)

Using the ATM option implied volatility quotes for all available option expiry, we construct a residual vector function $\mathbf{r}$ that measures the difference between market ATM volatility and model produced ATM volatility $\sqrt{\frac{\text{Var}(T,T_{i})}{T}}$ .

Let parameter vector $\theta^{T}=(\beta_{1},\beta_{2},\lambda,\sigma_{T})$ , we solve a constrained and regularized nonlinear least squared problem,

	$\displaystyle\arg_{\theta}\min\\|\mathbf{r}(\theta)\\|_{2}^{2}+\xi\theta^{T}% \mathbf{R}\theta\;\;\text{s.t. }$
	$\displaystyle-a\leq\beta_{1}-\beta_{2}\leq a$
	$\displaystyle l\leq\theta\leq u$

, where $\xi$ is a penalty factor determined by a generalized cross validation method. Matrix $\mathbf{R}$ is chosen in a way such that $\theta^{T}\mathbf{R}\theta$ somewhat measures the roughness of the backbone volatility term structure. Lastly, all constraint bounds are experimentally choosen.

Once $\beta_{1}$ , $\beta_{2}$ and $\lambda$ are resolved, we discard the calibrated $\sigma_{T}$ value and proceed to determine the big-T parameters $\sigma_{T_{i}}$ . It is a nature choice to choose $\sigma_{T_{i}}$ such that the backbone term structure exactly matches the ATM implied volatility at the terminal option expiry of each future contract. We actually find and take a more preferable way of choosing $\sigma_{T_{i}}$ that helps to better fit $\alpha(t,F)$ in the later stage of solving the optimal control problem.

3.3 Implied Volatility Smile

The backbone parameters are calibrated to ATM implied volatility from market quotes. Many exchange traded future options are American options. A “de-americanization” prerequisite step is involved to compute equivalent Black Scholes implied volatility from American option prices. There are various “de-americanization” techniques in practice, which by itself could be a separate subject. Here, we only focus on local volatility model calibration, assuming implied volatility are given.

Define the log moneyness variable $x=\log(F/K)$ at future price $F$ and strike $K$ . We represent a smooth implied volatility curve by a $n$ -degree Chebyshev approximation polynomial,

f(x)=\sum_{j=0}^{n}T_{j}(t(x))\beta_{j}

, where $t(x)$ mapps the selected log moneyness range to canonical domain $[-1,1]$ and $\beta_{j}$ is the coefficient of the $j$ -th Chebyshev basis. Let $(\hat{\sigma}_{i}^{-},\hat{\sigma}_{i}^{+})$ be the market implied volatility range at quoted strike $K_{i}$ . We recover the set of basis coefficients $\beta=\{\beta_{j}\}$ by solving an elastically constrained quadratic programming problem,

	$\displaystyle\arg_{\beta}\min\beta^{T}\mathbf{R}\beta\;\;\text{s.t. }$		(8)
	$\displaystyle\hat{\sigma}_{j}^{-}\leq f(x_{j})\leq\hat{\sigma}_{j}^{+},\;\;\forall j$		(9)
	$\displaystyle 0\leq f(x_{\text{lo}})\leq u_{\text{lo}}$		(10)
	$\displaystyle 0\leq f(x_{\text{hi}})\leq u_{\text{hi}}$		(11)
	$\displaystyle c_{k}\leq f^{\prime\prime}(x_{k})\;\;\forall k$		(12)

First of all, our objective function (8) here is to minimize the curvature fluctuation on the implied volatility smile curve. The roughness matrix $\mathbf{R}$ therefore is constructed as

\beta^{T}\mathbf{R}\beta=\int_{x_{\text{lo}}}^{x_{\text{hi}}}f^{\prime\prime% \prime}(x)^{2}dx

Secondly, for intraday quotes, the market volatility range in linear constraint (9) is taken from bid/ask. For settlement quotes, they could be taken as settlement quote minus/plus some tolerance. Due to market data noises, we impose the set of constraints in (9) as elastic constraints. If there is no feasible solution satisfying all these range constraints, we then solve an elastic programming problem. For the details of elastic constraints, please refer to our NPL product optimization white paper series.

Thirdly, constraints (10) and (11) are chosen to satisfy Lee [2003] implied volatility theoretical bounds at extreme strikes. Lastly, we impose curvature constraints in (12) to prevent the second derivative of the smile curve from being too negative. The iff arbitrage free condition along the strike dimension is to have a positive density $\frac{\partial^{2}C}{\partial K^{2}}$ , implied from undiscounted European call option price $C$ . When we express $\frac{\partial^{2}C}{\partial K^{2}}$ in Black Scholes formula wrt strike $K$ and apply chain rule to implied volatility from smile function $f(x(K))$ , we can derive this butterfly arbitrage free condition as $f^{\prime\prime}(x)+c>0$ for some second order correction term $c$ . Because of this correction term, it is necessary to require $f^{\prime\prime}(x)$ to be not very negative at least at some representative point $x_{k}$ . The negative curvature lower bound $c_{k}$ at those points can be reasonaly estimated.

3.4 Normalizing the PDE

Both the original form of equation (2) and the general form (3) could be normalized to a “dimension-less” PDE. Define state variable $x=\log(F/K)$ and temporal variable $\tau=T_{i}-t$ . We have tried two transformations of the undiscounted European call prices $C(T,K)$ to function: a) $v(\tau,x)=\frac{C(T(\tau),K(x))}{K}$ and b) $v(\tau,x)=\frac{C(T(\tau),K(x))}{\sqrt{F_{i}K}}$ , used in Jackel [2013]. It appears that b) is slightly better. Then equation (2) then becomes to

\frac{\partial v}{\partial\tau}=\frac{1}{2}u^{2}(\tau,x)\left[\frac{\partial^{% 2}v}{\partial x^{2}}-\frac{1}{4}v\right]

(13)

This transformation not only delivers better numerical performance but also brings the same initial value condition $v(0,x)=\left(e^{\frac{x}{2}}-e^{-\frac{x}{2}}\right)^{+}$ to all future constracts. From the commodity dynamics (5), we have the local volatility as a control function

u(\tau,x)=\alpha(t(\tau),K(x))\sigma_{T_{i}}\left[e^{-2\beta_{1}\tau}+\lambda^% {2}e^{-2\beta_{2}\tau}\right]^{\frac{1}{2}}

After all backbone parameters $\beta_{1}$ , $\beta_{2}$ , $\lambda$ and $\sigma_{T_{i}}$ are determined, we now want to recover the smile scaling function $s(\tau,x)=\alpha(t(\tau),K(x))$ .

3.5 Optimal Control

Assume $s(\tau,x)$ is parameterized by a set of control parameters $\theta$ . From undiscounted European call option pricess $\{c_{i,j}\}$ at time to expiry $\tau_{i}$ and state $x_{j}$ , we want to find $\theta$ such that the transformed option values $v(\tau_{i},x_{j};\theta)$ are in bid-ask range. This imposes bounded constraints $c_{i,j}^{\mbox{bid}}\leq v(\tau_{i},x_{j};\theta)\leq c_{i,j}^{\mbox{ask}}$ . If we are working on settlement prices, the bounded constraints are $c_{i,j}-\epsilon\leq v(\tau_{i},x_{j};\theta)\leq c_{i,j}+\epsilon$ for a tolerance setting $\epsilon$ .

In the space of control variables $\{\theta\}$ that produce $v(\tau,x)$ according to the governing law in (13), which in turn satisfies the market price constraints, we like the local smile scaling function $s(\tau,x)$ to be smooth. Choosing a smoothness objective function $\varphi(\theta)$ suitable to the particular choice of $s(\tau,x;\theta)$ , our optimal control problem to recover $\theta$ can be loosely formulated as,

	$\displaystyle\arg_{\theta}\min\varphi(\theta)\mbox{ s.t. }$
	$\displaystyle\frac{\partial v}{\partial\tau}=\mathcal{A}(v;\theta),\;\forall\tau$
	$\displaystyle c^{\text{bid}}_{i,j}\leq v(\tau=T_{i},x_{j})\leq c^{\text{ask}}_% {i,j}$

, where $\mathcal{A}(\cdot)$ is the spatial partial differentiation operator in equation (13). For example, if we solve this linear parabolic PDE (13) by finite difference method, at each time marching step from $\tau_{l}$ to $\tau_{l+1}$ , the PDE constraint amounts to a linear equation $\mathbf{v}_{l+1}=\mathbf{T}_{l}(\theta)\mathbf{v}_{l}$ for some finite difference matrix $\mathbf{T}(\theta)$ applied to the vector $\mathbf{v}$ of state function $v(\tau,x)$ values at discretized states. Collecting all matrix vector multiplications, we obtain the final values of state function $v(T_{i},x)$ as $\mathbf{v}_{l}=\mathbf{T}_{l}(\theta)\mathbf{T}_{l-1}(\theta)\cdots\mathbf{T}_% {0}(\theta)\mathbf{v}_{0}$ . Let $\mathcal{I}(\mathbf{v})$ be an interpolation function on vector $\mathbf{v}$ to return the transformed call option values at quoted strikes. The concrete optimal control now is formulated as

	$\displaystyle\arg_{\theta}\min\varphi(\theta)\mbox{ s.t. }$		(14)
	$\displaystyle\mathbf{c}^{\text{bid}}\leq\mathcal{I}\left(\mathbf{T}_{l}(\theta% )\mathbf{T}_{l-1}(\theta)\cdots\mathbf{T}_{0}(\theta)\mathbf{v}_{0}\right)\leq% \mathbf{c}^{\text{ask}}$		(15)
	$\displaystyle 0\leq s(\tau,x)\leq s_{\text{hi}}$		(16)
	$\displaystyle c\leq\frac{\partial^{2}s(\tau,x)}{\partial x^{2}}$		(17)

Constraint (16) requires the smile scaling function is bounded in a positive interval. Constraint (17) avoids the curvature of $s(t,x)$ being too negative.

Note that the PDE constraint (15) is a nonlinear implicit function wrt model parameters $\theta$ . Mathwrist’s sequential quadratic programming solver is used to solve this optimal control problem. For the details of this solver, please refer to our NPL nonlinear programming white paper.

3.6 Sensitivity Calculation

The most crucial part of solving this optimal control problem is to efficiently and accurately compute the sensitivity $\frac{\partial}{\partial\theta}v(\theta;\tau,x_{j})$ . Bump and reprice technique is unacceptable in this context. Some observations will help the sensitivity calculation. First, taking differentiation operation wrt model parameters $\theta$ at both sides of equation (13) and rearrange terms, we have

	$\displaystyle\frac{\partial}{\partial\tau}\left(\frac{\partial v}{\partial% \theta}\right)-\frac{1}{2}u^{2}(\tau,x;\theta)\left[\frac{\partial^{2}}{% \partial x^{2}}\left(\frac{\partial v}{\partial\theta}\right)-\frac{1}{4}\frac% {\partial v}{\partial\theta}\right]=r(\theta)$		(18)
	$\displaystyle r(\theta)=u(\theta;\tau,x)\frac{\partial}{\partial\theta}u(% \theta;\tau,x)\left[\frac{\partial^{2}v}{\partial x^{2}}-\frac{1}{4}v\right]$		(19)

So the sensitivities $\frac{\partial v}{\partial\theta}$ have the exactly same PDE coefficients as (13) except that they are inhomogeneous PDEs with the source function $r(\theta)$ . A carefully implemented PDE solver should be able to solve both $v(\tau,x;\theta)$ and $\frac{\partial}{\partial\theta}v(\theta;\tau,x)$ on the fly in the same process.

Secondly, applying differentiation operation to the chain of finite difference transformations, we have

	$\displaystyle\frac{\partial}{\partial\theta}\left[\mathbf{T}_{l}(\theta)% \mathbf{T}_{l-1}(\theta)\cdots\mathbf{T}_{0}(\theta)\right]=\frac{\partial% \mathbf{T}_{l}}{\partial\theta}\left[\mathbf{T}_{l-1}(\theta)\cdots\mathbf{T}_% {0}(\theta)\right]+$
	$\displaystyle\mathbf{T}_{l}(\theta)\frac{\partial\mathbf{T}_{l-1}}{\partial% \theta}\left[\mathbf{T}_{l-2}(\theta)\cdots\mathbf{T}_{0}(\theta)\right]+\cdots+$
	$\displaystyle\mathbf{T}_{l}(\theta)\mathbf{T}_{l-1}(\theta)\cdots\mathbf{T}_{1% }(\theta)\frac{\partial\mathbf{T}_{0}(\theta)}{\partial\theta}$

If we choose $s(\tau,x;\theta)$ such that $\theta$ has sensitivity locality in time, for example a sub set of parameters $\theta_{1}\in\theta$ has locality in $[\tau_{0},\tau_{1}]$ , then all above collapses to

\displaystyle\mathbf{T}_{l}(\theta)\mathbf{T}_{l-1}(\theta)\cdots\mathbf{T}_{1% }(\theta)\frac{\partial\mathbf{T}_{0}(\theta_{1})}{\partial\theta_{1}}

(20)

This is just to solve the same PDE given a different initial value $\frac{\partial\mathbf{T}_{0}(\theta_{1})}{\partial\theta_{1}}$ subject to appropriate boundary conditions.

3.7 Accuracy and Speed

In the demo application, we calibrated the local volatility model to over 1400 WTI crude oil options across 10 different option expiry dates in 1.03 sec, details given in table 1. Row RMSE gives the root of mean square errors to market vol. Note that there are two numbers in this row. The RMSE for the first 7 option expiry is 0.0068. As the 8th to 10th option expiry add more data noise, RMSE goes up to 0.0117. The majority of the residual population is distributed within a very narrow range around 0 as shown in figure 1.

Machine	Lenovo Thinkpad X1 Yoga
Operating System	Windows 10 Professional, 64-bit
Processor	Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz 2.81 GHz
Installed RAM	8.00 GB (7.83 GB usable)
Turnaround Time	1.03 sec.
RMSE	(0.0068, 0.0117)

Table 1: Test Report

References

Derman and Kani [1994] E. Derman and I. Kani: Riding on a smile, Risk 7, 32–39, 1994
Dupire [1994] B. Dupire: Pricing with a smile, Risk Mag., 7(1): 18-20, 1994
Gatheral [2006] J. Gatheral: The Volatility Surface: A Practitioner’s Guide, Wiley Finance, John Wiley & Sons, 2006
Jackel [2013] P. Jackel: Let’s Be Rational November 2013; Wilmott, pp 40-53, January 2015
Lee [2003] R. Lee: The moment formula for implied volatility at extreme strikes, Mathematical Finance. Forthcoming.

Local Volatility Model Calibration under Optimal Control Mathwrist White Paper Series