Empirical Evaluation of Volatility Estimation

Viewer
Transcript

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Empirical Evaluation of Volatility Estimation Neural Networks, Time Series, & Implied Volatility Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Abstract: This paper shall attempt to forecast option prices using volatilities obtained from techniques of neural networks, time series analysis and calculations of implied volatility.

1

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Table of Contents

Abstract ......................................................................................................................... 4 Introduction .................................................................................................................. 6 Implied Volatility.......................................................................................................... 8 Straddle Formula ....................................................................................................... 8 Improved Quadratic Formula .................................................................................. 10 Realized Volatility ...................................................................................................... 13 Introduction to Realized Volatility ........................................................................... 13 Time Series ............................................................................................................... 15 Introduction to Time Series .................................................................................. 15 ARIMA Methodology .......................................................................................... 16 I: Analyzing the Dow Jones Industrial Average (DJIA) ...................................... 18 II. Analyzing the NASDAQ ................................................................................. 25 III: Analyzing the Standard & Poor’s 500 (S&P) ................................................ 32 IV: Conclusion ..................................................................................................... 39 Neural Network ........................................................................................................ 40 Introduction to Neural Networks.......................................................................... 40 Neural Network Methodology.............................................................................. 43 Empirical Analysis ............................................................................................... 47 I. NASDAQ .......................................................................................................... 47 II. S&P 500........................................................................................................... 50 III. Dow Jones Industrial Average ....................................................................... 54 IV. Conclusion...................................................................................................... 57 Forecasting of Option Prices ..................................................................................... 58 I. Forecasting of DJIA option prices........................................................................ 58 II. Forecasting of NASDAQ Option Price................................................................ 61

2

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

III. Forecasting of S&P 500 ..................................................................................... 64 IV. Conclusion .......................................................................................................... 67 Bibliography................................................................................................................ 70

3

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Abstract

The paper forecasts option prices using volatilities obtained from neural networks, time series, and implied volatility analysis of Dow Jones Industrial Average (DJIA), Nasdaq100, and the Standard and Poor’s 500 (S&P500). The results are compared and contrasted to determine optimal technique. The pitfalls and assumptions of each model are highlighted and examined. We start with an examination of the implied volatility. In addition to the actual implied volatility that we obtain through numerical iteration from the Black-Scholes formula, we introduce two closed forms, namely Straddle and Improved Quadratic. The closed forms offer a more convenient and elegant representation of the implied volatility but are only approximations to the actual implied volatility. The actual implied volatility is in turn limited by the accuracy of the Black-Scholes formula. Prediction of index options over the 3/5/01 to 3/30/01 time span indicates that the actual implied volatility is indeed better while the deviation of the predictions obtained from Improved Quadratic method is acceptable. However, the prediction obtained from the Straddle technique is not too accurate, especially in the case of NASDAQ 100. Next, daily-realized volatilities over the 4/1/1991 to 3/1/01 time span are obtained from the open/high/low/close (OHLC) estimator pioneered by Garman and Klass. Using this data set we performed time series and neural networks analysis. Marking that financial data are often based upon past performances, the group selected the time-side approach instead of frequency-side approach. Due to its wide-use, the group ultimately selected the autoregressive integrated moving average (ARIMA) methodology for the time series analysis. The group found that AR(4) models fit the S&P500 and Dow Jones Industrial Average the best while AR(11) fits the NASDAQ100. Forecasting is obtained and found to be less accurate than the implied volatility forecasts. The group notes that the time series method is limited by the non-stationarity of the input data set. It is difficult to satisfy the stationary assumption of the ARIMA model, and our sample is only approximately stationary. Noting a non-linear regression data set, the group determines to use the Multi-Layer Perceptrons (MLP) method. An auto-restructuring algorithm is used to obtain the best Multi-Layer Perceptrons architecture. The group finds that MLP using logsig transfer

4

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

functions model the indices the best while S&P500 works best with 2 hidden perceptrons, DJIA with 4, and Nasdaq100 with 3. Again, forecasts that are obtained indicate less satisfactory results than those from implied volatility forecasts. The group notes that neural networks require high computing power. The amount of data that we can analyze and the structure of the perceptrons layer are strongly limited by the computing power of our machine and the Matlab software. We believe that with a better CPU and a neural net built upon a more efficient language much better results can be obtained. The group finds the relative performance between different methods lack consistency between the indices, though implied volatility seems to perform better than realized in most cases. Options calculation for DJIA is ranked as following – Straddle, Actual, Improved Quadratic, ARIMA, and Neural Networks; for Nasdaq100 – Improved Quadratic, ARIMA, Actual, Neural Networks, and Straddle; and for S&P500 – Actual, Improved Quadradic, ARIMA, Straddle, and Neural Networks. The group notes that the particular period forecasted – 3/5/01 to 3/30/01 – are marked by strong market swings due to the foreboding economic recession and the surprising underperformance in the TMT (telecomms, media, and technology) sector. Due to the fragility of the market at the period, it is strongly influenced by policy makers. An example is drawn from the disappointing 0.5-point Federal rate cut on March 20th 2001 that sets our prediction wild. The result seems to indicate that future information is more relevant to information closer in time. Implied volatilities use the on-time information and produces the best results while time series and neural networks examine over a longer period of time in the past, yielding a lesser performance. The group ultimately concludes that forecasting methods are at best approximates and none can be used alone in a practical setting. They do not take into account the behavior of policy makers, which are often the dominant factor in influencing market movements. Composites of forecasting methods should be used prudently by investors in addition to information about policies and other exogenous factors.

5

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Introduction

Volatility is the most basic statistical risk measure. It can be used to measure the market risk of a single instrument or an entire portfolio of instruments. While volatility can be expressed in different ways, the typical definition used in finance of the volatility of a random variable is its standard deviation. In day-to-day practice, volatilities are calculated for all sorts of random financial variables: stock prices, interest rates, the market value of a portfolio, etc. The measurement of swing and movement of a price series is said to be the volatility of the series. Intuitively, a series that fluctuates a lot or has high volatility imparts more risk. Hence, volatility becomes a major market parameter to which key concepts in risk management and derivatives pricing are based upon. There are two general methodologies for estimating the volatility of a random financial variable such as an exchange rate or stock price: 1)

Implied volatility: If options are traded which have the financial variable as an underlier, it is possible to infer volatility from the prices at which those options are trading. Implied volatility reflects the option market's perception of the variable's volatility. [Latane & Rendleman, 76]

2)

Realized or Historical volatility: It is possible to directly estimate a financial variable's volatility from recent historical data for the variable's value.

Estimation of volatility is difficult and sometimes obscure. Techniques of volatility estimation are usually categorized as implied or realized (historical). Implied volatility is based upon the assumption that the market is efficient, and thus the market’s guess about future volatility must be the best. However, numerous papers upon the subject have pointed to the fact that market frictions and flawed assumptions in pricing models diminish the capacity of implied volatility in forecasting. (Lamoureux & Lastrapes, 93)

We plan to achieve 3 goals: 1) To forecast volatility based on historical data sets of security and option prices

6

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

2) To predict the future trend of security and option prices 3) To compare and contrast different techniques of price forecasting

7

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Implied Volatility

Other than the derivation of volatility from the original Black-Scholes formula, numerous techniques have been proposed for calculating implied volatility. The Black-Scholes method allows us to calculate the value of the option if we input the volatility estimate. But since we can observe market option prices, we can use the option price plus our observations of other variables (strike price, time to expiration, interest rate) to compute the implied volatility. It is the market’s estimation of the volatility of the underlying asset price over the life of the option. If the assumptions underlying the Black-Scholes model are correct, every option on a particular stock with the same strike price and time to maturity should have the same implied volatility. However, options that differ by strike price but are the same in other respects do not have the same implied volatility: the volatility smile noted by practitioners refers to the observation that volatility varies with the strike price in such way that a graph of strike price versus implied volatility sometimes resembles a smile. Thus we must choose one option to use in the calculation. In general we could select a set of options and infer implied volatility from a weighted sum of the implied volatility of the options. Another method is to select the nearest atthe-money option to infer implied volatility. In this project, our group will use the latter method for its simplicity and accuracy. In the recent years, efforts have been made to develop a closed-form for computing implied volatility. Brenner [1981] and Subrahmanyam proposed the Straddle formula in 1988 while Corrado and Miller[1996] developed the Improved Quadratic formula in 1996.

Straddle Formula Definitions: S = underlying asset price

K = strike price

T = time to expiration

C = price of (K,T) European S = price of (K,T) European call option put option

σ = implied volatility

r = prevailing interest rate

Brenner[1981] and Surahmanyam formula to compute implied volatility is precisely accurate when a stock price is exactly equal to a discounted strike price, that is S = Ke − rT . The formula looks as follows,

8

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

σ T = 2π

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

C S

Let b denote the per unit-time cost of carry and y be the dividend yield, we have b-r=-y. Setting S = S (0)e (b−r )T , the Brenner-Subrahmanyam formula can be immediately applied to European-style commodity options.

The accuracy of the Brenner-Subrahmanyam formula depends on the assumption that the stock price is equal to the discounted strike price. To improve accuracy when a stock price is not exactly equal to a discounted strike price, Brenner and Subrahmanyam suggest the following straddle formula,

σ T =

2π (C + P) / 2 S

It is not necessary for a put option to actually exist in order to use the straddle formula. By substituting the put-call parity condition for European options (i.e., P=C+ Ke − rT -S), The BrennerSubrahmanyam straddle formula becomes,

σ T =

2π (C −

S − Ke − rT ) 2 S

[From the below diagram, it can be seen that the plot of implied volatilities over various stock prices gives the impression of a straddle, hence the name straddle formula.]

9

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Source: [Nelken, 97]

Improved Quadratic Formula Corrado and Miller [1996] derived an accurate formula to compute implied volatility over an even broader band of stock prices. Their formula is based on a refined quadratic approximation,

Φ( z ) =

1 1 z3 z5 + (z − + + ...) 2 6 40 2π

Substituting expansions of the normal probability functions in the Black-Scholes call price formula, but ignoring cubic and higher-order terms, yields this call price approximation.

d 1 d −σ T 1 ) C = S( + )− X( + 2 2 2π 2π

10

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

This approximation can e rearranged as a quadratic equation in the quantity σ T .

σ 2T ( S + Ke −rT ) − σ T 8π (C −

S − Ke − rT ) + 2( S − Ke −rT ) ln(Se rT / K ) = 0 2

Substituting the approximation ln( Se rT / K ) = 2( S − Ke − rT ) /( S + Ke − rT ) and solving for the largest root of the above quadratic yields

æ S − Ke −rT çC − 2 σ T = 2π ç ç S + Ke −rT ç è

ö æ S − Ke − rT ÷ çC − 2 ÷ + 2π ç ÷ ç S + Ke − rT ÷ ç ø è

ö ÷ − rT ÷ − 4æç S − Ke ÷ çè S + Ke − rT ÷ ø

ö ÷÷ ø

2

Only the largest root is the correct solution, since only it reduces to the BrennerSubrahmanyam formula when S= Ke − rT . This simple quadratic solution performs well, but its range of accuracy can be improved substantially. Substituting the parameter α for the number 4 in the formula, we can get series of efficiency. Corrado and Miller show that the value α =2 has the best result concerning simplicity and accuracy. Therefore, they refer to the following formula as the improved quadratic formula.

2π σ T = S + Ke −rT

æ æ S − Ke −rT S − Ke −rT ç ç C C − + − ç çç 2 2 è è

2

(

ö S − Ke −rT ÷÷ − π ø

)

2

ö ÷ ÷÷ ø

Despite its simplicity, this improved quadratic formula is remarkably accurate over a wide range of stock prices. Using data of European-styled call options on the Dow Jones Industrial Average, NASDAQ 100 and S&P 500 on 1st March 2001, we obtain the following results:

11

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Index

C

S

r

K

DJIA

210

10466

0.05

NASDAQ 23.2 100 S&P 500

19

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Straddle

Quadratic Actual implied volatility computed using NewtonRalphson

10800 0.16

0.20

0.19

0.19

2412.90 0.05

2500

0.16

0.58

0.35

0.36

1234.2

1300

0.16

0.23

0.20

0.20

0.05

T (in years)

12

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Realized Volatility

Introduction to Realized Volatility The easiest method to estimate volatility is simply using the classical definition of standard deviation:

σ =

n 1 ⋅ å (rt − r ) 2 n − 1 t =1

σ = realized volatility n = sample period rt = closing price of asset at time t r = mean of all closing prices in the period from t = 1 to t = n

The volatility formula discussed above is called the close-close (CC) estimator because it only uses the market closing price to estimate the volatility. A better estimator (HL) uses daily highs and lows to characterize the distribution. These extreme-values give more detailed view of the movements throughout the period, so such estimators are more efficient. Since models like BlackScholes are based in continuous time, it is natural to want a volatility measure more broadly based on the continuity of price. The pioneering people who have worked in this area were Parkinson [1980], Garman and Klass [1980]. The form of Parkinson’s estimator is

H 1 n æ σ =k ⋅ å çç log t n t =1 è Lt

2

ö ÷÷ , where k=0.601 ø

H t = highest traded price of the day at time t Lt = lowest traded price of the day at time t

13

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

The Garman and Klass estimator, which also uses open and close data, has the form

σ =

H 1 n é1 æ ê ⋅ çç log t å n t =1 ê 2 è Lt ë

2

ö æ C ÷÷ − (2 log(2) − 1) ⋅ çç log t Ot ø è

ö ÷÷ ø

2

ù ú úû

Ot = opening price of the day at time t C t = closing price of the day at time t Ideally, with lognormal prices and continuous trading where the observed highs and lows are truly the extremes of the distribution, the efficiency gains are dramatic. The high/low (HL) estimator compared to the CC estimator is five times more efficient, and the open/high/low/close (OHLC) estimator is seven times more efficient. The practical importance of this improved efficiency is that five to seven times fewer observations are necessary in order to obtain the same statistical precision as the CC estimator. The random variable, which is estimated (i.e., volatility), has a much tighter sampling distribution.

Therefore, the project shall use the OHLC estimator to compute historical volatilities of the 3 stock indices.

14

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Time Series Introduction to Time Series

In the field of time series analysis, two separate approaches exist, which are commonly identified as time domain approach and frequency domain approach. The time domain approach is generally motivated by the presumption that correlation between adjacent points in time is best explained in terms of dependencies of the current value on past values. Stock markets and indices are moved by investors’ expectation of the future, and it is our group’s belief that such expectations are in no doubt influenced by the behavior of past stock prices. Hence, the group will utilize a time domain approach in time series analysis of the 3 main stock indices.

One prominent time domain approach, advocated by Box and Jenkins in the 1970s, focuses on autoregressive integrated moving average (ARIMA) models, given by the following equation [Box-Jenkins, 1970]:

φ ( B)(1 − B) d xt = θ ( B) wt xt = time series data B

= backshift operator

φ ( B) = autoregressive operator d

= degree of difference

θ ( B ) = moving average operator wt = independent and identically distributed normal random variables, with zero mean and constant variance. It is also known as Gaussian white noise.

15

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

ARIMA Methodology

In fitting ARIMA models to time series data of the indices, the following few basic steps will be observed:

First, the data will be plotted and observed for any signs of it being stationary. Stationarity can be determined by inspecting if the mean is constant over the time series, and autocovariance 1of the data is independent of time. The sample autocorrelations (ACF) between different time lags will be computed and graphed to aid us in such judgment. Should the data be non-stationary, transformations will be performed on it to make it so.

After suitably transforming the data, the next step will be to identify preliminary values of the autoregressive order, p, the order of differencing, d, and the moving average order, q. However, in the area of performing differencing on consecutive time data, the group is aware of the hazard of destroying the fragile structure and information that are prevalent within stock markets, which move and react more on recent news rather than long-past information. Therefore, utmost attempts will be made to ensure that method of differencing will only be that of the last resort.

In deciding the values of p and q, the sample ACF and partial autocorrelation (PACF) of the series will be analyzed. Properties of the ACF and PACF indicate that the ACF cuts off after lag q, if p = 0 and q > 0 … and that the PACF cuts off after lag p, if q = 0 and p > 0. If p > 0 and q > 0, both the ACF and PACF will tail off.

After deciding the most likely values for p and q, the next step is to measure the goodness of fit for each possible ARIMA models. In this endeavor, the group will seek inspiration from the Akaike approach (AIC) [Akaike, 69, 73, 74] and Schwarz approach (SIC/ BIC) [Schwarz, 78] to goodness of fit. The AIC and SIC are obtained as follows:

1

Autocovariance between data at time t and time s is defined as

γ x ( s, t ) = E[( x s − µ s )( xt − µ t )] 16

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

AIC = ln σˆ 2p+ q +

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

2( p + q ) , n

n = number of observations

σˆ 2p+ q = sample variance of the residuals, i.e. wt

SIC = ln σˆ 2p+ q +

( p + q ) log n n

Models giving lower AICs and SICs will be preferred.

In the efforts to fit a suitable model, the group is aware of the problem of overfitting by allowing too many variables into the model. Overfitting leads to less-precise estimators and adding more parameters may fit the data better but may also lead to bad forecasts. Hence, data from the trading month of February has intentionally been left out during the model fitting process. Models will be selected using data up to end of January and subsequently be used to forecast volatilities for the month of February. A mean squared error value will be computed for each model,

Σ( X tpredict − X tactual ) 2 MSE = , where n = number of observations n

Finally, taking MSE, AIC and SIC into consideration, the most suitable model will be selected. Since the goal of the group is to forecast future volatilities, the MSE will be the most important factor in the selection of the model.

After the ARIMA model is chosen, maximum likelihood estimates (MLE) of the model’s different parameters, θ and φ , will be computed. Subsequently, March’s daily volatilities will be forecasted using those estimates.

17

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

I: Analyzing the Dow Jones Industrial Average (DJIA) Step 1: Plotting and observing data

•

The volatility of the DJIA from 3/4/97 to 3/30/01 is plotted. That is a total of 1028 observations.

•

While we see the volatility hovering around the range of 0.01, we see the variance varying from time to time, especially at t ≈ 180, t ≈ 400 and t ≈ 790.

•

A further computation of the ACF and PACF, which shows signs that they depend on t, brings us to the conclusion that the series is not stationary.

18

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 2: Transforming the data and identifying suitable values for p and q

•

We perform a logarithmic function on the series. By taking logs, we can augment values that are smaller, and reducing values that are exponentially larger. This tends to smooth out and make the series more stationary.

•

After taking logs, we see that the series does not suffer from such wide sudden fluctuations.

•

Computations of the ACF show us that it tails off gently. Therefore, q is very likely to be zero, i.e. the model is purely autoregressive.

•

Looking at computations of the sample PACF, we observe that it first drops below the 95% confidence at lag = 5, but rising up again at lag = 8 and down at lag = 10.

19

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

•

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Therefore, taking into account sampling errors, probable values of q range from 5 ± 1 , 8 ± 1 , and 10 ± 1 . Hence, we will analyze goodness of fit from values of 4 to 11.

Step 3: Performing AIC and SIC tests

•

First, we obtain σˆ 2p+ q by doing analysis on the residuals. After substituting into the equations for the AIC and SIC, we get the following: AR[j]

σˆ 2p+ q

SIC

AIC

4

0.1425000

-1.920471

-1.940308

5

0.1416987

-1.919125

-1.943921

6

0.1414095

-1.914182

-1.943937

7

0.1412400

-1.908396

-1.943110

9

0.1400115

-1.903161

-1.947793

10

0.1393781

-1.900710

-1.950301

11

0.1394746

-1.893033

-1.947583

20

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 4: Cross-validation

•

We do a forecast of February volatilities using maximum likelihood estimators (MLE), and compare it against actual market data to compute every model’s respective mean squared errors.

AR[j]

MSE

4

1.091929

5

1.096610

6

1.099331

7

1.101856

9

1.106161

10

1.107262

11

1.107380

Step 5: Selection of model

•

Looking at the above analysis, AR(4) will be the model of choice. It has the best goodness of fit (SIC) with the empirical data, yet does not fall into the trap of overfitting as it has the best forecasting ability.

21

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 6: Parameter estimation of AR(4) model

•

An AR(4) model is of the form Xt =

4

å i =1

•

Φ i x t −i

+ ϖt

Using MLE diagnosis, we obtain the following estimated parameters ˆ 1 = 0.15785946 Φ ˆ 2 = 0.18318730 Φ

ˆ 3 = 0.16602891 Φ ˆ 4 = 0.09909937 Φ

22

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

σˆ ω2 = 0.1408858 t

Step 7: Forecasting for the month of March

•

Subsequently, we obtain the following forecasts of March volatilities using the AR(4) model

Note: The solid line is the actual volatilities of March The middle dotted line is the forecasted volatilities, with the lower and upper lines indicating the 95% confidence interval levels

23

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 8: Evaluation

•

In the above graph, we could see the actual volatilities fluctuating mainly between the 95% forecasted confidence intervals. The actual volatility for the 2nd day slightly exceeded the interval and this is understandable as we expect a 5% error.

• However, the volatility of 15th day (3/21/01) greatly exceeded the boundary. Nevertheless, upon further research, the group has found an explanation for this weird behavior, which will be elaborated in the later chapters.

• The MSE between the actual and predicted volatility is 4.785996e-05.

24

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

II. Analyzing the NASDAQ

Step 1: Plotting and observing data

•

The volatility of the NASDAQ from 12/17/97 to 3/30/01 is plotted. That is a total of 828 observations.

•

It is easily observed that variance of the index increases as time passes.

•

Computation of the ACF and PACF, which shows them exhibiting a waveform, bringing us to the conclusion that the series is not stationary.

25

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 2: Transforming the data and identifying suitable values for p and q

•

Seeking inspiration from the previous analysis on DJIA, we once again take logs on the NASDAQ series.

•

After taking logs, we observe a huge reduction in the wanton behavior of the variance. It is roughly constant, regardless of time period.

•

Computations of the ACF show us that it tails off gently. Therefore, q is very likely to be zero, i.e. the model is purely autoregressive.

26

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

•

Looking at computations of the sample PACF, we observe that it first drops below the 95% confidence at lag = 5, rising up at lag = 7 and down after lag = 10.

•

Hence, very similar to the previous analysis on DJIA, we will analyze goodness of fit from values of 4 to 11.

Step 3: Performing AIC and SIC tests

2 The results of σˆ p+ q , AIC and SIC are as follows:

AR[j]

2 σˆ p+ q

SIC

AIC

4

0.1809747

-1.675469

-1.699220

5

0.1800834

-1.671925

-1.701613

6

0.1797493

-1.665299

-1.700925

7

0.1773992

-1.669978

-1.711541

8

0.1764660

-1.666770

-1.714271

9

0.1766264

-1.657379

-1.710818

10

0.1755060

-1.655261

-1.714637

11

0.1757096

-1.645619

-1.710933

27

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 4: Cross -validation

•

We do a forecast of February volatilities using maximum likelihood estimators (MLE), and compare it against actual market data to compute every model’s respective mean squared errors.

AR[j]

MSE

4

0.9066020

5

0.9014114

6

0.8969225

7

0.8878579

8

0.8809299

9

0.8786902

10

0.8677804

11

0.8673764

Step 5: Selection of model

•

Looking at the above analysis, there are no overwhelming reasons to choose any of the models, unlike the case of DJIA. Hence, in making a value judgment of the merits of each model, the group decides that MSE holds the most weight among all three indicators. This is because the spirit of this project is to achieve as accurate a forecast of volatility as possible so as to compute future option prices. Hence, AR(11) is chosen as it gives the lowest MSE and also a reasonable goodness of fit given by the AIC.

28

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 6: Parameter estimation of AR(11) model

•

An AR(11) model is of the form Xt =

11

å i =1

•

Φ i x t −i

+ ϖt

Using MLE diagnosis, we obtain the following estimated parameters

ˆ 1 = 0.269388482 Φ

ˆ 7 = 0.058105307 Φ

ˆ 2 = 0.149437746 Φ

ˆ 8 = 0.051477695 Φ

ˆ 3 = 0.127055792 Φ

ˆ 9 = -0.004746733 Φ

29

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

ˆ 4 = 0.113045183 Φ

ˆ 10 = 0.086161461 Φ

ˆ 5 = 0.027045406 Φ

ˆ 11 = 0.006103403 Φ

ˆ 6 = -0.006583607 Φ

σˆ ω2 = 0.1732342 t

Step 7: Forecasting for the month of March

•

Subsequently, we obtain the following forecasts of March volatilities using the AR(11) model

30

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Note: The solid line is the actual volatilities of March The middle dotted line is the forecasted volatilities, with the lower and upper lines indicating the 95% confidence interval levels

Step 8: Evaluation

•

In the above graph, we see the actual volatilities fluctuating very close to the predicted values. In fact, the MSE is just a very small 6.194178e-05.

31

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

III: Analyzing the Standard & Poor’s 500 (S&P)

Step 1: Plotting and observing data

•

The volatility of the S&P from 3/4/97 to 3/30/01 is plotted. That is a total of 1028 observations.

•

Once again, we observe that the index is not stationary.

•

Computation of the ACF shows us a declining series, while PACF exhibits a waveform, bringing us to the conclusion that the series is not stationary.

32

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 2: Transforming the data and identifying suitable values for p and q

•

We take logs again.

•

After taking logs, we observe a huge reduction in the wanton behavior of the variance. It is roughly constant, regardless of time period.

•

Computations of the ACF show us that it tails off gently in a waveform. Therefore, q is very likely to be zero, i.e. the model is purely autoregressive.

•

Looking at computations of the sample PACF, we observe that it first drops below the 95% confidence twice, at lag = 5 and lag = 8

33

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

•

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Hence, we will analyze goodness of fit from values of 4 to 9.

34

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 3: Performing AIC and SIC tests

•

2 The results of σˆ p+ q , AIC and SIC are as follows:

AR[j]

Sigma2

SIC

AIC

4

0.1710947

-1.737596

-1.757433

5

0.1705077

-1.734047

-1.758843

6

0.1698215

-1.731095

-1.760850

7

0.1685592

-1.731570

-1.766284

8

0.1659360

-1.740270

-1.779943

9

0.1660463

-1.732619

-1.777251

Step 4: Cross-validation

•

We do a forecast of February volatilities using maximum likelihood estimators (MLE), and compare it against actual market data to compute every model’s respective mean squared errors.

35

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

AR[j]

msetest

4

1.105136

5

1.108028

6

1.109197

7

1.110219

8

1.110741

9

1.110898

Step 5: Selection of model

•

Looking at the above analysis, we see AR(4) as being the model of choice. This is because it has excellent forecasting ability.

36

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Step 6: Parameter estimation of AR(4) model

•

An AR(4) model is of the form Xt =

4

å i =1

•

Φ i x t −i

+ ϖt

Using MLE diagnosis, we obtain the following estimated parameters ˆ 1 = 0.18829921 Φ ˆ 2 = 0.21487707 Φ

ˆ 3 = 0.12781588 Φ ˆ 4 = 0.09417108 Φ

37

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

σˆ ω2 = 0.1697238 t

Step 7: Forecasting for the month of March

•

Subsequently, we obtain the following forecasts of March volatilities using the AR(11) model

Step 8: Evaluation

38

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

•

This model reminds us of that for DWIA. Majority of the fluctuations stay within the predicted confidence intervals, except for the 2nd and 15th day.

•

The strange behavior of the 15th day shall be explained in the next chapter.

•

Nevertheless, the MSE is still a respectable 4.801759e-05.

IV: Conclusion

On 20th Mar 01, the Fed announced a half-percentage point cut on short-term interest rate. With worries about the impending economic slowdown and profit growth, this announcement is a disappointment, as many have expected the Fed to make a more aggressive move. Selling started on late that day, and continued on into 21st Mar. The Dow Industrial Average tumbled 233.76 to 9,487.00, its lowest level in two years. The S&P 500 did not fare much better and slid 20.49 to 1,122.13. Hence the large increase in volatility on the 15th trading day. While the ARIMA technique of forecasting allows us to make reasonable predictions, it could never take into account unpredictable human actions. The above-mentioned Fed announcement that ran contrary to many market players’ expectations is an example of this failure. However, abnormalities tend to be short-term affairs. In the case of the DJIA and S&P 500, we notice that volatility drops back to the ‘normal’ level on the 16th day. Therefore, the ARIMA model is still useful to us if we intend to look at long-term behavior of markets and its volatility.

39

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Neural Network

Introduction to Neural Networks

Artificial intelligence is inspired by how our brain works and attempts to emulate the human mind. It takes the interesting characteristics such as the abilities of incomplete information processing, learning and adapting to changes, and pattern finding between data vectors of widely different attributes from the brain and interprets them in an algorithmic fashion for a computer to understand. The most popular method used in financial forecasting is of course the multi-layer perceptron (MLP).

The MLP is optimal for nonlinear regression information processing. [Duda & Hart, 73] MLP, as the name suggests, is constituted of various layers. Each layer is composed of an array of perceptrons, functional objects that emulate the human neuron cells.

Each perceptron is defined as following,

where R is the number of elements in input vector,

{

perceptron, w1,1 , w1, 2 , K , w1, R

}

{p1 , p2 ,K, p R }

the inputs to the

the weights of the connectivity, b the bias, f the transfer

function, and a the output. One can see the information processing performed by the perceptron following through the diagram from input to output.

Each input is multiplied with its

corresponding connectivity weight and summed together with the bias to yield n , the input to the transfer function, where n = w1,1 p1 + w1, 2 p 2 + K + w1, R p R + b . a is resulted from f (a ) . In matrix notation, the output is equal to f (W p + b) . [Rosenblatt, 62]

40

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

While the bias and weights are obtained from training algorithms, transfer function is often selected by the MLP architecture. Which transfer function is to be selected varies from case to case and should be chosen prudently. For a MLP network that can be trained using backpropagation, the transfer function must be nonlinear and differentiable. [Rumelhart, 86]. This is due to the fact that differentiation must be performed in backpropagation. The following lists a few of the common transfer functions used for backpropagation,

MLP neural networks used for complex non-linear data analysis must contain three or more layers. An examination can be drawn by using hardlim transfer function, an indicator function that gives 1 or 0 depending upon the input. The function is often used in classification and decision analysis. The following describes the capability of different layers. Structure

Types of Decision Regions

41

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Single-layer

Half plane bounded by hyperplane

Two-layer

Convex open or closed regions

Three-layer (and up)

Arbitrary (complexity limited by number of perceptrons)

A single layer can only perform linearly separable decision problem while a double layer can at the best do concave plane classification. Only three layers and up MLP’s can solve decision problems of arbitrary complexity for a hardlim nonlinear function. The same behavior is also exhibited in continuous nonlinear functions, except that the decision regions are often bounded by continuous curves instead of linear line segments.

Since three-layered MLP is the simplest neural network that can be used to do arbitrary classification, it is the most often used in practice. This practice is derived from Occam’s Razor principle2, that purports that the simpler network will be the superior generalizer for a given performance level. A simpler network will guarantee fewer weights in network and thus greater confidence level that over-training has not occurred.

The following figure illustrates how layers are connected in a 3 layer MLP

In the figure, oi’s are output layer, hj’s are hidden layer and xk’s are the input layer. As a hidden layer perceptron receives signal from its previous neurons, the perceptron will sum up

42

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

signals and add bias. The result will be used to make a transformation using a differentiable nonlinear function. The transformed signal will be passed on to neurons in subsequent layers and the process is repeated. After a final result is obtained, it is compared with a target vector. The error is then propagated backwards. Weights are adjusted towards a direction to see whether the error function has been reduced. If reduction occurred, weights are changed in this direction else it is changed in the opposite. This process is repeated till the network output vector emulates the target vector to within an acceptable accuracy. This training algorithm is the backpropagation method and is initially introduced by Remelhart. [Rumelhart, 86]. A formalized algorithm is presented as follows. 1. Initialize weights randomly. 2. Propagate input vectors through network, evaluating all neurons at a layer and proceeding to the next till the output layer neurons have been reached and an output is produced. 3. Let o be the final output vector and t the target vector. Calculate the error term for each output neurons i, δ i = g (1 − o )o (t − o ) 4. Calculate error term for each hidden layer neurons j. δ j = f ' (a j )

åw δ ij

i

i

new connecting any two cells q and p. ∆wqp = ηδ qV p , where 5. Calculate the new weight wqp new old η is the learning coefficient, and wqp = ηδ qV p + α∆wqp , where α is the momentum

coefficient. 6. If epochs3 exhausted or convergence limit reached, then stop. Else go to Step 2.

Neural Network Methodology

As explained previously, multi-layer neural network is trained by using the backpropagation technique. Since backpropagation is performed by examining the mean-square-errors produced by the output against the training set, the deadliest pitfall of neural network information processing is the over-fitting of the training set. This section of the paper examines methods employed in guarding against over-fitting via dynamic restructuring of the neural network.

2

Willam of Occam (or Ockham) (1284-1347) was an English philosopher and theologian. He is famous for stressing the Aristotelian principle “Pluralitas non est ponenda sine neccesitate” (“plurality should not be posited without necessity”) that eventually became the Occam’s Rasor principle. 3

An epoch is an iteration of steps 2 to 5.

43

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Forecasting Scheme

A myriad of forecasting schemes exist that takes account of using different input sets and testing sets. The following enumerates the major groups. Forecasting Scheme

Description

Single-stepping

Trains network over a given training set. The network is tested against actual time series data and one-time step-ahead predictions are made. The technique is best used to make one time ahead prediction with limited data set.

Multi-stepping

Same as single-stepping, except multiple time step look ahead is predicted. The technique is best used to make long term prediction with limited data set.

Iterated single/multistepping

The training set is fixed, but the test input vectors can now comprise of forecasted data in order to use them to look ahead further than one time step or one sequence of multi-stepping. The technique is best used to make predictions when data set is large, and correlation between data largely separated in time is low.

Auto single/multistepping

The training set is updated with most recent data and network re-trained continuously. The scheme tests test vectors one at a time and then adds them to the training set before proceeding with the next test vector. The technique is used the most often. It guarantees that the best solution can be obtained with maximum information possible.

Single-stepping can only look ahead one at a time and cannot give a collection of samples for accurate statistical analysis of prediction. Hence, our neural network follows a multi-stepping scheme. Ultimately, the group determines to use the auto multi-stepping scheme in the rest of its neural network presentation, where its multiple time step look ahead is 21 days or 1 month of trading days. The group believes that the multi-stepping method would not emulate the accuracy of an auto single-stepping approach but believes that the error will be small.

44

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Data Preprocessing

A preliminary visual examination of the historical volatility is made to truncate older data that lacks consistency with the present data. The idea is that some historical data are not relevant to forecasting. A new policy or economic structure change would certainly render any historical financial data insufficient for forecasting after the event. Care is taken that uncommon events, like a sudden influx of volatility, etc., are omitted. Addition of uncommon patterns that are large in magnitude can often obstruct a neural network from convergence.

Next, the data set is divided into subsets of 21 days, where each subset approximates a month of trading. We begin subsetting the data from the most present and throw away the oldest data that cannot be partitioned into 21. The data sets can be represented as ( p1 , p 2 , K , p n } , where

p j = a vector of 21 elements at subset j, j = 1, …, n, and n = the index of the most recent data, namely from Mar. 1 2001 to Mar. 31 2001, which is the set of data the group will be predicting. At first, p1 is the input set, p 2 the training set, and p3 the prediction set. Next, p 2 becomes the input set, p 3 the training set, and p 4 the prediction set. The subdivisions are iterated through according to the fashion that if p i is the input set, pi +1 is the training set, and pi + 2 the prediction set. A neural network with an input set is trained using backpropagation with a training set, and the resulting output from a simulation of the network using the training set as input is compared with the predicting set to obtain an empirical mean-square-error.

The data vectors must be normalized to [-1, 1] ranges for transfer functions symmetric about the origin or [0, 1] ranges for an anti-symmetric one because symmetric transfer functions tend to have a lower limit at –1 and an upper limit at 1 while anti-symmetric transfer functions have limits at 0 and 1. Let p be the data set vector and pj an element of the vector then,

p j − min( p )

normalization to [0, 1]

pj =

normalization to [-1, 1]

pj = 2

max( p ) − min( p ) p j − min( p ) max( p ) − min( p )

45

−1

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Dynamic Structuring4

We first train the data vectors through a series of candidate transfer functions using a hidden layer of 2 perceptrons, the smallest selectable at the point.5 A hidden layer of 1 perceptron can take away the non-linear nature of a 3-layered network. An example might be when a hardlim function is used. The second layer already yields an output chosen from amongst two values. The third layer would only be redundant since its task will be classifying a two-valued input. In this case, a 2-layered network would suffice.6 The smallest number of hidden perceptrons is chosen in order to speed up the running time. Since volatility can never be negative, symmetric transfer functions are excluded from among our candidates. This leaves logsig and radbas transfer functions at our disposal.7 A few iterations are run at each transfer function, and averages are taken. This step diminishes the effect of when the neural network reaches a local minimum on the solution plane. The transfer function with the least prediction set mean-squareerror is chosen, and we continue with this selection.8

The next step to dynamically structuring is to determine complexity. Complexity increases with greater number of hidden layer perceptrons. A natural step for our group to perform is to train the data vectors through a series of increasing number of hidden layer perceptrons. The one with the least prediction set mean-square-error is chosen.9

Finally, a convergence limit must be chosen. It is known that a high convergence limit can over-fit a data set while a low one can under-fit a data set. An iteration is performed accordingly, and best convergence limit is chosen with the least prediction set mean-square-error.

4

The maximum number of epochs is set at 100 throughout the demonstration due to limited CPU. A higher number of epochs should yield a better solution. 5

A convergence limit of 0.001 is preliminary used at this point. The optimal convergence limit is to be calculated later. 6

Again, a continuous non-linear function would yield a similar conclusion, except that different functions are involved. 7

A myriad other transfer functions exist, but these set offers a good variety of the commonly used ones.

8

A better method might be to select a few of the small mean-square-error transfer functions and continue separately. We chose to use only the smallest due to CPU limit. 9

Again, a better method would be to select a few of the small mean-square-error complexities and continue separately. We chose to use only the smallest due to CPU limit.

46

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Empirical Analysis

I. NASDAQ

•

Data Preprocessing:

A plot of 10 years of NASDAQ volatility is as following,

Volatility was fairly stable from trading days of 0 to 1500 from 4/01/1991. It has risen accordingly afterwards in clear clusters of 1500 to 1900, 1900 to 2400, and 2400 thereafter. The cluster of approximately 2400 to 2500 seems to be the best for neural network training. Further scrutiny determines the range from 10/20/2000 to 3/02/2001. Hence, volatility from 10/20/2000 to 3/02/2001 will be used in our neural network training.10

10

A larger data set might be more appropriate here. Again, limited CPU power restricts me from using too much input data.

47

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

In the following tables, we will use the notations as denoted here: APSNM: Average prediction set normalized MSE; ATSNM: Average target set normalized MSE.

- Transfer Function Selection Radbas

Logsig

APSNM

0.0660

0.0545

ATSNM

0.0251

0.0412

- Complexity Selection 1

2

3

4

5

6

7

8

9

10

1.0000

0.0596

0.0562

0.0674

0.0813

0.0752

0.0824

0.0844

0.0824

0.0853

ATSNM 0.0316

0.0243

0.0276

0.0230

0.0336

0.0208

0.0331

0.0270

0.0186

0.0258

APSNM

- Convergence Limit Selection 0.5000

.2500

.1250

.0625

.0313

.0156

.0078

.0039

.0020

.0010

0.1980

0.1302

0.0571

0.0474

0.0430

0.0393

0.0435

0.0390

0.0439

0.0688

ATSNM 0.1774

0.1207

0.0662

0.0458

0.0435

0.0366

0.0435

0.0375

0.0423

0.0667

APSNM

- Result Prediction set MSE

3.3296e-5

Target set MSE

3.3296e-5

48

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

As seen the neural network seems to emulate the movements of volatility though it sometimes deviates. It seems to take the mean during the first half of the month. This indicate varying pattern during first half of the month while consistency later on. This can be due to monthly economic patterns governed by payday, important announcements, etc. An investor might wait for the important announcement to occur before making decisive and regular market decisions. An incoming monthly evaluation might force traders to become more efficient, thus yielding more consistent patterns.

49

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

II. S&P 500

The volatility remains fairly stable in the period of 0 to 1400 trading days from 4/01/1991. Gradual increase occurs approximately from 1400 to 1800, till it stabilizes around 1800. A reasonable choice will be within the 1800 to 2500 range. We might also look at the variance of volatility to help us pinpoint the best data set.

50

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Apparently, the dramatic jumps around 500 trading days from 5/12/1998 must be avoided, and the range between 500 to 600 that shows no significant volatility must be avoided also. Avoiding significant jumps prevents wild deviation in prediction, and avoiding the range that shows no consistent volatility fluctuation prevents undertraining. Further examination pinpoints to 10/10/2000 as the starting day for our prediction.

•

Transfer Function Selection Radbas

Logsig

APSNM

0.0405

0.0346

ATSNM

0.0401

0.0246

51

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

•

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Complexity Selection 1

2

3

4

5

6

7

8

9

10

0.0434

0.0359

0.0387

0.0462

0.0502

0.0467

0.0454

0.0598

0.0715

0.0705

ATSNM 0.0213

0.0233

0.0203

0.0237

0.0170

0.0165

0.0136

0.0114

0.0250

0.0156

APSNM

•

Convergence Limit Selection 0.5000

.2500

.1250

.0625

.0313

.0156

.0078

.0039

.0020

.0010

0.1916

0.1013

0.0738

0.0400

0.370

0.0344

0.0356

0.0372

0.0387

0.0353

ATSNM 0.1929

0.1133

0.0794

0.0418

0.0353

0.0255

0.0231

0.0215

0.0225

0.0221

APSNM

•

Result

Prediction set MSE 4.1230e-5 Target set MSE

1.2006e-5

52

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

This time the forecast is surprising lower than the actual. An interpretation can be made that the volatility increase in March is exceptionally high. The neural network emulates the past data sets but fails to predict the exceptional increase in volatility in March.

53

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

III. Dow Jones Industrial Average

A cluster exists from 0 to 1500, and 1500 onwards. Our data set must lie somewhere within about 1500 and 2500. Let us examine the variance to pinpoint further.

54

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Let us choose our data set to avoid the large influx of variance at around 900 trading days from 3/4/1997. Further examination persuades us to use the data set from 10/20/2000 to 3/2/2001.

•

Transfer Function Selection Radbas

Logsig

APSNM

0.0260

0.0223

ATSNM

0.0135

0.0135

55

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

•

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Complexity Selection 1

2

3

4

5

6

7

8

9

10

0.0195

0.0201

0.0276

0.0192

0.0312

0.0318

0.0309

0.0346

0.0304

0.0310

ATSNM 0.0112

0.0117

0.0117

0.0092

0.0168

0.0064

0.0139

0.0167

0.0094

0.0114

APSNM

•

APSNM

Convergence Limit Selection 0.5000

.2500

.1250

.0625

.0313

.0156

.0078

.0039

.0020

.0010

0.1528

0.0610

0.0497

0.0314

0.0273

0.0212

0.033

0.0236

0.0257

0.0211

0.0641

0.0577

0.0325

0.0266

0.0173

0.0092

0.0095

0.0187

0.0105

ATSNM .1443

•

Result

Prediction set MSE 4.1230e-5 Target set MSE

1.2006e-5

The forecast seems to emulate the volatility fairly well except for the large influx during the 15th trading day. This is to be expected since a large deviation is probably

56

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

caused by unexpecting events that lacks correlation with past data patterns. As emphasized in the Time Series section, the large influx at day 15 is indeed caused by a devious 0.5-point rate cut by the Fed’s. IV. Conclusion Neural network is especially keen in looking for patterns in data sets. However, it is unreliable during chaotic periods. To guard against this, an investor might have a market specialist examine a neural network prediction, eliminate unreseasonable ones, and employ the resulting forecast in investing.

Our experiment is especially limited by CPU power and Matlab computing speed. A better machine and language can allow the group to evaluate larger input samples and more candidate structures.

57

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Forecasting of Option Prices

From the data above, we try to do some forecasting of future option prices. This is done substituting forecasted volatilities and implied volatilities into the Black-Scholes formula. The forecasted prices will be compared against the actual option prices obtained from the Chicago Board of Options Exchange. A percentage deviation error, instead of a squared error, will be computed for each model, and its mean will be used to decide which model forecast best. æ | Actual − Forecasted | ö Percentage deviation error = ç x100% ÷ Actual è ø

I. Forecasting of DJIA option prices

58

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

59

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Mean Percentage Deviation for DJIA 70 60 50 40 mean percentage deviation

30 20 10 0 Actual

Straddle

Improved Quadratic

ARIMA

Neural Networks

Analysis: In the first graph, the dotted line is actual option prices, and the other three lines are the prediction based on calculation of neural network, implied volatility and time series. The result of implied volatilities has the closest features with the actual curve, especially when the option price becomes lower. And the neural network method get the curve furthest away from real data.

60

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

II. Forecasting of NASDAQ Option Price

61

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

62

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Mean Percentage Deviation for Nasdaq100 1400 1200 1000 800 mean percentage deviation

600 400 200 0 Actual

Straddle

Improved Quadratic

ARIMA

Neural Networks

Analysis: Now in this figure, the straddle method gives a very skewed prediction. This is because the straddle volatility is 0.58, much higher than the 0.35 computed by the improved quadratic method (page 11). Discounting the straddle method, the ARIMA, improved quadratic and actual implied volatility gives a good prediction, while neural network is not good. One general observation is that as option prices go low, the estimation of neural network tends to be skewed high.

63

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

III. Forecasting of S&P 500

64

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

65

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Mean Percentage Deviation for S&P 500 80 70 60 50 40

mean percentage deviation

30 20 10 0 Actual

Straddle

Improved Quadratic

ARIMA

Neural Networks

Analysis: In this figure, the data of time series and neural network are both a little biased. The straddle once again gives us a bad forecast of option prices, while the quadratic formula and actual implied volatility method works very well in forecasting.

66

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

IV. Conclusion Implied volatility is a timely measure—it reflects today’s market perceptions. Historical volatility, on the other hand, is a retrospective measure of volatility. It reflects how volatile the variable has been in the recent past. It is, however, a highly objective measure. Implied volatilities can be biased, especially if they are based upon options that are thinly traded. Furthermore, historical volatility can be calculated for any variable for which historical data is tracked. Implied volatilities can only be estimated for a variable if options are traded on that variable. Descriptive statistics of what volatility has been are useful, but our primary interest is to know what volatility will be. If the Black-Scholes assumption that volatility is a constant were true, everything will be set. The problem is that nobody believes this assumption, as empirical evidence disputes it unequivocally. But there seems no reason to scrap the Black-Scholes model in favor of one that allows a non-constant volatility. Merton [1973] demonstrated that the average of a time varying volatility, as long as it changes deterministically, is consistent with the Black-Scholes concept of constant volatility. So the proper use of historical volatility does have informational value. As we found in the above figures, implied volatility and time series method have better estimation than neural network. The implied volatility employs current information to calculate volatility and reflect the on-time volatility. The ARIMA also weighs more recent observations more heavily, and their forecasts depend highly on recent information. Thus we can say that the reason these two methods yield better solution is their ability to grasp the recent market information. While implied volatilities are useful in certain applications, they can only be calculated if there is a liquid market for a corresponding option. For example, implied volatilities can be calculated for many currencies or for the S&P 500; whereas they cannot be calculated for most municipal bonds or the portfolio of a pension plan. For this reason, implied volatilities can be of limited usefulness to risk managers. Historical volatility estimates, on the other hand, are highly flexible. They can be applied to any instrument or portfolio for which historical data is available. They are widely used for risk management purposes, but do have limitations:

67

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

1)

There is a trade-off between basing historical volatility estimates on only the most recent data, or using data from a longer sample period. Estimates based only on recent data may be timely, but not statistically significant. Alternatively, estimates based on a lot of data, may be statistically significant, but out of date.

2)

Historical volatility estimates can provide a false measure of risk. For example, in a thinly traded market, prices may remain unchanged for an extended period of time. This would reflect a lack of market liquidity—not a lack of market risk.

3)

For traders or portfolio managers whose positions are constantly changing, historical volatility estimates are useless. The user needs to know the riskiness of the portfolio that exists today. Historical measures speak only to the riskiness of the portfolio as it existed a month ago—or a year ago.

4)

For many instruments, historical volatility says nothing about how risky they are today. For example, the price volatility of a call option (which is different from the price volatility of the option's underlier) will depend upon whether the option is in-the-money or out-of-the-money. If historical volatility were estimated from a period when a call option were out-of-the-money, but the option were now inthe-money, the historical volatility would be misleading.

Also, the group notes that the particular period forecasted – 3/5/01 to 3/30/01 – are marked by strong market swings due to the foreboding economic recession and the surprising underperformance in the TMT (telecomms, media, and technology) sector. Due to the fragility of the market at the period, it is strongly influenced by policy makers. An example is drawn from the disappointing 0.5-point Federal rate cut on March 20th 2001 that sets our prediction wild. The result seems to indicate that future information is more relevant to information closer in time. Implied volatilities use the on-time information and produces the best results while time series and neural networks examine over a longer period of time in the past, yielding a lesser performance. The group ultimately concludes that forecasting methods are at best approximates and none can be used alone in a practical setting. They do not take into account the behavior of policy makers, which are often the dominant factor in influencing market movements. Composites of forecasting methods should be used prudently by investors in addition to information about policies and other exogenous factors.

68

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

69

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

Bibliography

[Akaike, 69]

Akaike, H. (1969). Fitting autoregressive models for prediction. Ann. Inst. Stat. Math., 21, 243-247

[Akaike, 73]

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principal. In 2nd Int. Symp. Inform. Theory, 267-281. B.N. Petrov and F. Csake eds. Budapest: Akademia Kiado

[Akaike, 74]

Akaike, H. (1974). A new look at statistical model identification. IEEE Trans. Automat. Contr., AC-19, 716-723.

[Azoff, 94]

E. Michael Azoff. “Neural Network Time Series Forecasting of Financial Markets”, 1994 by John Wiley & Sons Ltd.

[Box-Jenkins, 70]

Box, G.E.P., G.M. Jenkins (1970). Time Series Analysis, Forecasting, and Control. Oakland, CA: Holden-Day.

[Brenner, 81]

Brenner, M., and D. Galai. “The Properties of the Estimated Risk of Common Stocks Implied by Option Prices,” WP112, University of California, Berkeley (1981).

[Corrado and Miller, 96]

Corrado, C., and T. Miller. “Volatility without Tears,” RISK(July 1996), pp. 49-52.

[Duda & Hart, 73]

R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons: New York, 1973.

[Garman & Klass, 80]

Mark B. Garman and Michael J. Klass, “On the Estimation of Security Price Volatility from Historical Data”, Journal of business 53, 1980, pp. 67-78.

[Lamourex & Lastrapes, 93]

Lamoureux, C., and W. Lastrapes. “Forecasting Stock Return Variance: Toward an Understanding of Stochastic Implied Volatilities,” Review of Financial Studies 3, 1993, pp. 293-326.

[Latane & Rendleman, 76]

Latane, H., and R. Rendleman. “Standard Deviation of Stock Price Ratios Implied by Option Premia,” Journal of Finance 31, May 1976, pp. 36982.

[Merton, 73]

Merton, R. “The Theory of Rational Option Pricing,” Bell Journal of Economics and Management Science 4, 1973, pp.141-83

70

Prof. Sheldon M. Ross IEOR221, Intro to Financial Engineering Spring, 2001

Daye, Zhongyin John Leow, Kahshin Ding, Sheng-Wei

[Nelken, 97]

Israel Nelken. “Volatility in the Capital Markets,” 1997 by Glenlake Publishing company, Ltd.

[Parkinson, 80]

Parkinson, M. “The Extreme Value Method for Estimating the Variance of the Rate of Return”, Journal of business 53, 1980, pp. 67-78.

[Roert & David, 00]

Robert Shumway, and David Stoffer. “Time Series Analysis and Its Application,” 2000 by Springer-Verlag.

[Rosenblatt, 62]

Rosenblatt, F. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan Books: Washington, D.C. 1962.

[Rumelhart, 86]

Rumelhart, DE, Hinton, GE and Williams, RJ. Learning Representations by Backpropagating Errors. Nature, vol 323, pp 533-536, 1986.

[Schwarz, 78]

Schwarz, F. (1978). Estimating the dimension of a model. Ann. Stat., 6, 461-464.

71

Tilted Nonparametric Estimation of Volatility Functions ...

Tilted Nonparametric Estimation of Volatility Functions

Bayesian Empirical Likelihood Estimation and Comparison of Moment ...

Implementation and Empirical Evaluation of Server ...

An Empirical Evaluation of Client-side Server Selection ... - CiteSeerX

Empirical Evaluation of Excluded Middle Vantage Point ...

Pricing Options under Stochastic Volatility: An Empirical ...

An Empirical Evaluation of Test Adequacy Criteria for ...

Empirical Evaluation of Brief Group Therapy Through an Internet ...

optimal taxation with volatility a theoretical and empirical ...

Empirical Evaluation of 20 Web Form Optimization ... - Semantic Scholar

An Empirical Evaluation of Client-side Server Selection ...

Empirical Evaluation of Brief Group Therapy Through an Internet ...

An Empirical Performance Evaluation of Relational Keyword Search ...

optimal taxation with volatility a theoretical and empirical ...

Empirical Evaluation of 20 Web Form Optimization ... - Semantic Scholar

Empirical Evaluation of 20 Web Form Optimization Guidelines

Empirical Evaluation of Signal-Strength Fingerprint ...

Age estimation of faces: a review - Dental Age Estimation

forecasting volatility - CiteSeerX

Understanding Volatility

Empirical calibration of confidence intervals - GitHub