TIME SERIES DATA MINING: IDENTIFYING TEMPORAL PATTERNS FOR CHARACTERIZATION AND PREDICTION OF TIME SERIES EVENTS
|
|
|
- Sophia Lindsey
- 10 years ago
- Views:
Transcription
1 TIE SERIES DATA INING: IDENTIFYING TEPORAL PATTERNS FOR CHARACTERIZATION AND PREDICTION OF TIE SERIES EVENTS by Richard J. Povinelli, B.A., B.S.,.S. A Disseraion submied o he Faculy of he Graduae School, arquee Universiy, in Parial Fulfillmen of he Requiremens for he Degree of Docor of Philosophy ilwaukee, Wisconsin December, 1999
2 This work is dedicaed o my wife, Chrisine, our son, Chrisopher, and his broher, who will arrive shorly.
3 iii Acknowledgmen I would like o hank Dr. Xin Feng for he encouragemen, suppor, and direcion he has provided during he pas hree years. His insighful suggesions, enhusiasic endorsemen, and shrewd proverbs have made he compleion of his research possible. They provide an example o emulae. I owe a deb of graiude o my commiee members, Drs. Naveen Bansal, Ronald Brown, George Corliss, and James Heinen, who each have helped me o expand he breadh of my research by providing me insighs ino heir areas of experise. I am graeful o arquee Universiy for is financial suppor of his research, and he faculy of he Elecrical and Compuer Engineering Deparmen for providing a rigorous and simulaing environmen ha exemplifies cura personalis. I hank ark Palmer for many ineresing, insighful, and hough provoking conversaions on my favorie opic, Time Series Daa ining, and on his, Fuzzy Opimal Conrol. I am indebed o him for he many hours he spen reviewing his manuscrip. I am deeply graeful o my wife, Chrisine, for her generous ediing experise, ongoing moral suppor, and accepance of my long hours away from our family.
4 iv Absrac A new framework for analyzing ime series daa called Time Series Daa ining (TSD) is inroduced. This framework adaps and innovaes daa mining conceps o analyzing ime series daa. In paricular, i creaes a se of mehods ha reveal hidden emporal paerns ha are characerisic and predicive of ime series evens. Tradiional ime series analysis mehods are limied by he requiremen of saionariy of he ime series and normaliy and independence of he residuals. Because hey aemp o characerize and predic all ime series observaions, radiional ime series analysis mehods are unable o idenify complex (nonperiodic, nonlinear, irregular, and chaoic) characerisics. TSD mehods overcome limiaions of radiional ime series analysis echniques. A brief hisorical review of relaed fields, including a discussion of he heoreical underpinnings for he TSD framework, is made. The TSD framework, conceps, and mehods are explained in deail and applied o real-world ime series from he engineering and financial domains.
5 v Table of Conens Acknowledgmen...iii Absrac...iv Table of Conens...v Lis of Tables... vii Lis of Figures...ix Glossary... xii Chaper 1 Inroducion Daa ining Analogy Problem Saemen Disseraion Ouline...8 Chaper 2 Hisorical Review ARIA Time Series Analysis Geneic Algorihms Theoreical Underpinnings of Time Series Daa ining Chaoic Time Series Daa ining...24 Chaper 3 Some Conceps in Time Series Daa ining Evens Even Example Synheic Earhquakes Even Example eal Drople Release Even Example Spikes in Sock Open Price Temporal Paern and Temporal Paern Cluser Phase Space and Time-Delay Embedding Even Characerizaion Funcion Augmened Phase Space Objecive Funcion Opimizaion Summary of Conceps in Time Series Daa ining...43 Chaper 4 Fundamenal Time Series Daa ining ehod Time Series Daa ining ehod TSD Example TSD Training Sep 1 Frame he TSD Goal in Terms of TSD Conceps TSD Training Sep 2 Deermine Temporal Paern Lengh TSD Training Sep 3 Creae Phase Space TSD Training Sep 4 Form Augmened Phase Space TSD Training Sep 5 Search for Opimal Temporal Paern Cluser TSD Tesing Sep 1 Creae Phase Space TSD Tesing Sep 2 Predic Evens Repulsion Funcion for oderaing δ Saisical Tess for Temporal Paern Cluser Significance Opimizaion ehod Geneic Algorihm...59 Chaper 5 Basic and Explanaory Examples Sinusoidal Time Series...62
6 5.2 Noise Time Series Sinusoidal wih Noise Time Series Synheic Seismic Time Series...84 Chaper 6 Exended Time Series Daa ining ehods uliple Time Series (TSD-/x) uliple Temporal Paerns (TSD-x/) Oher Useful TSD Techniques Clusering Technique Filering Technique Non-filering Techniques Evaluaing Resuls and Adjusing Parameers Chaper 7 Engineering Applicaions Release Predicion Using Single Sickou Time Series Adjused Release Characerizaion and Predicion Using Sickou Sickou, Release, Curren and Volage Synchronizaion Adjused Release Characerizaion and Predicion Using Sickou, Volage, and Curren Conclusion Chaper 8 Financial Applicaions of Time Series Daa ining ICN Time Series Using Open Price ICN 1990 Time Series Using Open Price ICN 1991 Time Series Using Open Price ICN Time Series Using Open Price and Volume ICN 1990 Time Series Using Open Price and Volume ICN 1991 Time Series Using Open Price and Volume DJIA Componen Time Series Training Sage Tesing Sage Resuls Chaper 9 Conclusions and Fuure Effors References vi
7 vii Lis of Tables Table 2.1 Chromosome Finess Values...17 Table 2.2 Tournamen Selecion Example...18 Table 2.3 Crossover Process Example...19 Table 2.4 Crossover Process Example...20 Table 2.5 Resuling Geneic Algorihm Populaion...20 Table 5.1 Geneic Algorihm Parameers for Sinusoidal Time Series...65 Table 5.2 Sinusoidal Resuls (Observed)...66 Table 5.3 Sinusoidal Resuls (Tesing)...69 Table 5.4 Noise Resuls (Observed)...72 Table 5.5 Noise Resuls (Tesing)...75 Table Sinusoidal wih Noise Resuls (Observed)...80 Table Sinusoidal wih Noise Resuls (Tesing)...83 Table 5.8 Synheic Seismic Resuls (Observed)...87 Table 5.9 Synheic Seismic Resuls (Tesing)...90 Table 6.1 Geneic Algorihm Parameers for Linearly Increasing Time Series Table 7.1 Even Caegorizaion Table 7.2 Geneic Algorihm Parameers for Recalibraed Sickou and Release Time Series Table 7.3 Recalibraed Sickou and Release Resuls (Observed) Table 7.4 Recalibraed Sickou and Release Resuls (Tesing) Table 7.5 Geneic Algorihm Parameers for Recalibraed Sickou and Adjused Release Time Series Table 7.6 Recalibraed Sickou and Adjused Release Resuls (Observed) Table 7.7 Recalibraed Sickou and Adjused Sickou Resuls (Tesing) Table 7.8 Geneic Algorihm Parameers for Recalibraed Sickou, Curren, Volage, and Adjused Release Time Series Table 7.9 Recalibraed Sickou, Curren, Volage, and Adjused Release Resuls (Observed) Table 7.10 Recalibraed Sickou, Curren, Volage, and Adjused Release Resuls (Tesing) Table 8.1 Geneic Algorihm Parameers for Filered ICN 1990H1 Daily Open Price Time Series Table 8.2 Filered ICN 1990H1 Daily Open Price Resuls (Observed) Table 8.3 Filered ICN 1990H2 Daily Open Price Resuls (Tesing) Table 8.4 Filered ICN 1991H1 Daily Open Price Resuls (Observed) Table 8.5 Filered ICN 1991H2 Daily Open Price Resuls (Tesing) Table 8.6 ICN 1990H1 Daily Open Price and Volume Resuls (Observed) Table 8.7 ICN 1990H2 Daily Open Price and Volume Resuls (Tesing) Table 8.8 ICN 1991H1 Daily Open Price and Volume Resuls (Observed) Table 8.9 ICN 1991H2 Daily Open Price and Volume Resuls (Tesing) Table 8.10 Dow Jones Indusrial Average Componens (1/2/1990 3/8/1991) Table 8.11 Geneic Algorihm Parameers for DJIA Componen Time Series Table 8.12 DJIA Componen Resuls (Observed) Table 8.13 DJIA Componen Resuls (Tesing)...169
8 Table 8.14 Trading Resuls viii
9 ix Lis of Figures Figure 1.1 Synheic Seismic Time Series...6 Figure 1.2 Welding Time Series...7 Figure 1.3 Sock Daily Open Price and Volume Time Series...8 Figure 2.1 Exponenial Growh Time Series...14 Figure 2.2 Filered Exponenial Growh Time Series...15 Figure 2.3 Chromosome Crossover...19 Figure Aracor...23 Figure 3.1 Synheic Seismic Time Series wih Evens...26 Figure 3.2 Welding Time Series...27 Figure 3.3 Sock Daily Open Price Time Series...28 Figure 3.4 Synheic Seismic Time Series wihou Conaminaing Noise wih Temporal Paern and Evens...29 Figure 3.5 Synheic Seismic Time Series wih Temporal Paern and Evens...30 Figure 3.6 Consan Value Phase Space...31 Figure 3.7 Synheic Seismic Phase Space...32 Figure 3.8 Welding Phase Space...33 Figure 3.9 Sock Daily Open Price Phase Space...33 Figure 3.10 Synheic Seismic Augmened Phase Space...36 Figure 3.11 Welding Augmened Phase Space...36 Figure 3.12 Sock Daily Open Price Augmened Phase Space...37 Figure 3.13 Synheic Seismic Augmened Phase Space wih Highlighed Temporal Paern Clusers...38 Figure 3.14 Synheic Seismic Phase Space wih Alernaive Temporal Paern Clusers...42 Figure 4.1 Block Diagram of TSD ehod...46 Figure 4.2 Synheic Seismic Time Series (Observed)...48 Figure 4.3 Synheic Seismic Phase Space (Observed)...50 Figure 4.4 Synheic Seismic Augmened Phase Space (Observed)...51 Figure 4.5 Synheic Seismic Phase Space wih Temporal Paern Cluser (Observed) 52 Figure 4.6 Synheic Seismic Time Series wih Temporal Paerns and Evens Highlighed (Observed)...52 Figure 4.7 Synheic Seismic Time Series (Tesing)...53 Figure 4.8 Synheic Seismic Phase Space (Tesing)...53 Figure 4.9 Synheic Seismic Time Series wih Temporal Paerns and Evens Highlighed (Tesing)...54 Figure 4.10 Repulsion Force Illusraion...55 Figure 5.1 Sinusoidal Time Series (Observed)...63 Figure 5.2 Sinusoidal Phase Space (Observed)...63 Figure 5.3 Sinusoidal Augmened Phase Space (Observed)...64 Figure 5.4 Sinusoidal Phase Space wih Temporal Paern Cluser (Observed)...67 Figure 5.5 Sinusoidal Time Series (Tesing)...68 Figure 5.6 Sinusoidal Time Series wih Predicions (Tesing)...69 Figure 5.7 Noise Time Series (Observed)...70 Figure 5.8 Noise Phase Space (Observed)...71
10 Figure 5.9 Noise Augmened Phase Space (Observed)...71 Figure 5.10 Noise Phase Space wih Temporal Paern Cluser (Observed)...73 Figure 5.11 Noise Time Series (Tesing)...74 Figure 5.12 Noise Phase Space (Tesing)...74 Figure 5.13 Noise Augmened Phase Space (Tesing)...75 Figure 5.14 Noise Time Series wih Predicions (Tesing)...76 Figure Sinusoidal wih Noise Time Series (Observed)...77 Figure Sinusoidal wih Noise Phase Space (Observed)...78 Figure Sinusoidal wih Noise Augmened Phase Space (Observed)...79 Figure Sinusoidal wih Noise Phase Space wih Temporal Paern Cluser (Observed)...80 Figure Sinusoidal wih Noise Time Series (Tesing)...81 Figure Sinusoidal wih Noise Phase Space (Tesing)...82 Figure Sinusoidal wih Noise Augmened Phase Space (Tesing)...82 Figure Sinusoidal wih Noise Time Series wih Predicions (Tesing)...84 Figure 5.23 Synheic Seismic Time Series (Observed)...85 Figure 5.24 Synheic Seismic Phase Space (Observed)...86 Figure 5.25 Synheic Seismic Augmened Phase Space (Observed)...86 Figure 5.26 Synheic Seismic Phase Space wih Temporal Paern Cluser (Observed)...88 Figure 5.27 Synheic Seismic Time Series (Tesing)...89 Figure 5.28 Synheic Seismic Phase Space (Tesing)...89 Figure 5.29 Synheic Seismic Augmened Phase Space (Tesing)...90 Figure 5.30 Synheic Seismic Phase Space wih Temporal Paern Cluser (Tesing)..91 Figure 5.31 Synheic Seismic Time Series wih Predicions (Tesing)...91 Figure 6.1 Block Diagram of TSD-/x ehod...95 Figure 6.2 uliple Temporal Paern Cluser Phase Space...96 Figure 6.3 uliple Cluser Soluion Wih Too any Temporal Paern Clusers...98 Figure 6.4 uliple Cluser Soluion...98 Figure 6.5 Cluser Shapes of Uni Radius for Various l p Norms Figure 6.6 Linearly Increasing Time Series (Observed) Figure 6.7 Linearly Increasing Phase Space (Observed) Figure 6.8 Linearly Increasing Augmened Phase Space (Observed) Figure 6.9 Linearly Increasing Phase Space wih Temporal Paern Cluser (Observed) Figure 6.10 Linearly Increasing Time Series (Tesing) Figure 6.11 Linearly Increasing Phase Space wih Temporal Paern Cluser (Tesing) Figure 6.12 Linearly Increasing Time Series wih Predicions (Tesing) Figure Welder Figure 7.2 Sickou and Release Time Series Figure 7.3 Volage and Curren Time Series Figure 7.4 Sickou Time Series (Observed) Figure 7.5 Recalibraed Sickou Time Series (Observed) Figure 7.6 Recalibraed Sickou and Release Time Series (Observed) Figure 7.7 Recalibraed Sickou Phase Space (Observed) x
11 Figure 7.8 Sickou and Release Augmened Phase Space (Observed) Figure 7.9 Sickou Time Series (Tesing) Figure 7.10 Sickou Sample Time Series (Tesing) Figure 7.11 Recalibraed Sickou Time Series (Tesing) Figure 7.12 Recalibraed Sickou and Release Time Series (Tesing) Figure 7.13 Recalibraed Sickou Phase Space (Tesing) Figure 7.14 Recalibraed Sickou and Release Augmened Phase Space (Tesing) Figure 7.15 Recalibraed Sickou and Adjused Release Time Series (Observed) Figure 7.16 Recalibraed Sickou and Adjused Release Augmened Phase Space (Observed) Figure 7.17 Recalibraed Sickou and Adjused Release Time Series (Tesing) Figure 7.18 Recalibraed Sickou and Adjused Release Augmened Phase Space (Tesing) Figure 7.19 Recalibraed Sickou, Curren, Volage, and Adjused Release Time Series (Observed) Figure 7.20 Recalibraed Sickou, Curren, Volage, and Adjused Release Time Series (Tesing) Figure 8.1 ICN 1990H1 Daily Open Price Time Series (Observed) Figure 8.2 Filered ICN 1990H1 Daily Open Price Time Series (Observed) Figure 8.3 Filered ICN 1990H1 Daily Open Price Phase Space (Observed) Figure 8.4 Augmened Phase Space of Filered ICN 1990H1 Daily Open Price (Observed) Figure 8.5 ICN 1990H2 Daily Open Price Time Series (Tesing) Figure 8.6 Filered ICN 1990H2 Daily Open Price Time Series (Tesing) Figure 8.7 Filered ICN 1990H2 Daily Open Price Phase Space (Tesing) Figure 8.8 Augmened Phase Space of Filered ICN 1990H2 Daily Open Price (Tesing) Figure 8.9 ICN 1991H1 Daily Open Price Time Series (Observed) Figure 8.10 Filered ICN 1991H1 Daily Open Price Time Series (Observed) Figure 8.11 Filered ICN 1991H1 Daily Open Price Phase Space (Observed) Figure 8.12 Augmened Phase Space of Filered ICN 1991H1 Daily Open Price (Observed) Figure 8.13 ICN 1991H2 Daily Open Price Time Series (Tesing) Figure 8.14 Filered ICN 1991H2 Daily Open Price Time Series (Tesing) Figure 8.15 Filered ICN 1991H2 Daily Open Price Phase Space (Tesing) Figure 8.16 Augmened Phase Space of Filered ICN 1991H2 Daily Open Price (Tesing) Figure ICN 1990H1 Daily Open Price and Volume Time Series (Observed) Figure 8.18 ICN 1990H2 Daily Open Price and Volume Time Series (Tesing) Figure ICN 1991H1 Daily Open Price and Volume Time Series (Observed) Figure 8.20 ICN 1991H2 Daily Open Price and Volume Time Series (Tesing) Figure 8.21 DJIA Daily Open Price Time Series Figure 8.22 α µ vs. Excess Reurn xi
12 xii Glossary XY, Time series x, y Time series observaions a ime index B Q Backshif operaor Phase space dimension, emporal paern lengh Q, The se of real numbers, real Q-space τ p δ d P x g() Λ c( ) c ( ) Embedding delay Temporal paern Temporal paern hreshold, radius of emporal paern cluser Disance or meric defined on he phase space Temporal paern cluser Phase space poin wih ime index Even characerizaion funcion Index se of all of phase space poins Index se of phase space poins wihin a emporal paern cluser Index se of phase space poins ouside a emporal paern cluser Cluser cardinaliy Non-cluser cardinaliy µ Cluser mean evenness σ Cluser sandard deviaion evenness µ Non-cluser mean evenness σ Non-cluser sandard deviaion evenness µ X Average evenness of all phase space poins
13 xiii f() β b () X z r α r z m α m Objecive funcion Percenage of he oal phase space poins Repulsion force funcion, moderaes δ uli-dimensional ime series The es saisic for he runs es Probabiliy of a Type I error in rejecing he null runs es hypohesis The es saisic for difference of wo independen means es Probabiliy of a Type I error in rejecing he null difference of wo independen means es hypohesis
14 1 Chaper 1 Inroducion The Time Series Daa ining (TSD) framework, inroduced by his disseraion, is a fundamenal conribuion o he fields of ime series analysis and daa mining. ehods based on he TSD framework are able o successfully characerize and predic complex, nonperiodic, irregular, and chaoic ime series. The TSD mehods overcome limiaions (including saionariy and lineariy requiremens) of radiional ime series analysis echniques by adaping daa mining conceps for analyzing ime series. This chaper reviews he definiion of a ime series, inroduces he key TSD conceps of evens and hidden emporal paerns, and provides examples of problems he TSD framework addresses. A ime series X is a sequence of observed daa, usually ordered in ime [1, p. 1]. X = { x, = 1,, N}, (1.1) where is a ime index, and N is he number of observaions. Time series analysis is fundamenal o engineering, scienific, and business endeavors. Researchers sudy sysems as hey evolve hrough ime, hoping o discern heir underlying principles and develop models useful for predicing or conrolling hem. Time series analysis may be applied o he predicion of welding drople releases and sock marke price flucuaions [2, 3]. Tradiional ime series analysis mehods such as he Box-Jenkins or Auoregressive Inegraed oving Average (ARIA) mehod can be used o model such ime series. However, he ARIA mehod is limied by he requiremen of saionariy of he ime series and normaliy and independence of he residuals [1, 4, 5]. The saisical
15 Chaper 1 Inroducion 2 characerisics of a saionary ime series remain consan hrough ime. Residuals are he errors beween he observed ime series and he model generaed by he ARIA mehod. The residuals mus be uncorrelaed and normally disribued. For real-world ime series such as welding drople releases and sock marke prices, he condiions of ime series saionariy and residual normaliy and independence are no me. A severe drawback of he ARIA approach is is inabiliy o idenify complex characerisics. This limiaion occurs because of he goal of characerizing all ime series observaions, he necessiy of ime series saionariy, and he requiremen of residual normaliy and independence. Daa ining [6, 7] is he analysis of daa wih he goal of uncovering hidden paerns. Daa ining encompasses a se of mehods ha auomae he scienific discovery process. Is uniqueness is found in he ypes of problems addressed hose wih large daa ses and complex, hidden relaionships. The new TSD framework innovaes daa mining conceps for analyzing ime series daa. In paricular, his disseraion describes a se of mehods ha reveal hidden paerns in ime series daa and overcome limiaions of radiional ime series analysis echniques. The TSD framework focuses on predicing evens, which are imporan occurrences. This allows he TSD mehods o predic nonsaionary, nonperiodic, irregular ime series, including chaoic deerminisic ime series. The TSD mehods are applicable o ime series ha appear sochasic, bu occasionally (hough no necessarily periodically) conain disinc, bu possibly hidden, paerns ha are characerisic of he desired evens.
16 Chaper 1 Inroducion 3 I is commonly assumed ha he ARIA ime series models developed wih pas daa will apply o fuure predicion. This is he saionariy assumpion ha models will no need o vary hrough ime. ARIA models also assume ha he sysem generaing he ime series is linear, i.e., can be defined by linear differenial or difference equaions [8]. Unforunaely, he sysems generaing he ime series are no necessarily linear or saionary. In conras, he TSD framework and he mehods buil upon i can handle nonlinear and nonsaionary ime series. This framework is mos useful for predicing evens in a ime series, which migh include predicing when a drople from a welder will release, when a sock price will drop, or when an inducion moor adjusable speed drive sysem will fail. All hese applicaions are well suied o his new framework and he mehods buil upon i. The novel TSD framework has is underpinnings in several fields. I builds upon conceps from daa mining [6, 7], ime series analysis [1, 4, 5], adapive signal processing [9], waveles [10-18], geneic algorihms [19-27], and chaos, nonlinear dynamics, and dynamical sysems [28-35]. From daa mining comes he focus on discovering hidden paerns. From ime series analysis comes he heory for analyzing linear, saionary ime series. In he end, he limiaions of radiional ime series analysis sugges he possibiliy of new mehods. From adapive signal processing comes he idea of adapively modifying a filer o beer ransform a signal. This is closely relaed o waveles. Building on conceps from boh adapive signal processing and waveles, his disseraion develops he idea of a emporal paern. From geneic algorihms comes a robus and easily applied opimizaion mehod [19]. From he sudy of chaos, nonlinear
17 Chaper 1 Inroducion 4 dynamics, and dynamical sysems comes he heoreical jusificaion of he mehod, specifically Takens Theorem [36] and Sauer's exension [37]. 1.1 Daa ining Analogy An analogy o gold mining helps clarify he problem and inroduces wo key daa mining conceps. An analogy is he assumpion ha if wo hings are similar in one area, hey will be similar in ohers. The use of he erm daa mining implies an analogy wih gold mining. There are several parallels beween he ime series analysis problems discussed in his disseraion and his analogy. As gold mining is he search for nugges of gold, so daa mining is he search for nugges of informaion. In mining ime series daa, hese nugges are known as evens. As gold is hidden in he ground or under waer, nugges of informaion are hidden in daa. The firs analogy is gained by comparing he definiion of he gold nugges wih he definiion of informaion nugges. To he inexperienced miner, gold is gold, bu o a veeran prospecor, he size of he gold nugges o be uncovered make a significan difference in how he gold mining is approached. Individual prospecors use primarily manual mehods when looking for nugges of gold ha are ounces in weigh [38]. Indusrial mining companies may find i accepable o look for gold a he molecular level [39]. Likewise, if a prospecor is seeking silver or oil, he mining processes are differen. This leads o he imporance of clearly defining he nugges of informaion ha are desired, i.e., ime series daa mining requires a clear definiion of he evens o be mined. Wihou his clear definiion of wha is o be found, here is no way o know when eiher he gold nugges or he informaion nugges have been discovered.
18 Chaper 1 Inroducion 5 The second analogy looks a how prospecors learn where o search for he gold nugges. Prospecors look for specific geological formaions such as quarz and ironsone, and srucures such as banded iron formaions [38]. They sudy where oher prospecors have had success. They learn no o dig aimlessly, bu o look for clues ha a paricular locaion migh yield a gold srike. Similarly, i is necessary o define he formaions ha poin o nugges of informaion (evens). In he conex of ime series analysis hese, probably hidden, formaions ha idenify an informaion srike are called emporal paerns emporal because of he ime naure of he problem and paerns because of heir idenifiable srucure. Like gold prospecors, informaion prospecors undersand ha he clues need no be perfec, raher he clues need only o conribue o he overall effeciveness of he predicion. The wo analogies lead us o idenify wo key conceps and heir associaed requiremens for daa mining ime series. The firs concep is ha of an even, which is an imporan occurrence. A clear definiion of an even is required. The second concep is ha of a emporal paern, which is a poenially hidden srucure in a ime series. The emporal paerns are required o help predic evens. Wih he key TSD conceps of evens and emporal paerns defined, he nex secion presens he ypes of problems addressable by he TSD framework. 1.2 Problem Saemen Figure 1.1 illusraes a TSD problem, where he horizonal axis represens ime, and he verical axis observaions. The diamonds show he ime series observaions. The squares indicae observaions ha are deemed imporan evens. Alhough he following
19 Chaper 1 Inroducion 6 examples illusrae evens as single observaions, evens are no resriced o be jus single observaions. The goal is o characerize and predic when imporan evens will occur. The ime series evens in Figure 1.1 are nonperiodic, irregular, and conaminaed wih noise. x Figure 1.1 Synheic Seismic Time Series To make he ime series more concree, consider i a measure of seismic aciviy, which is generaed from a randomly occurring emporal paern, synheic earhquake, and a conaminaing noise signal The goal is o characerize when peak seismic aciviy (earhquakes) occurs and hen use he characerizaions of he aciviy for predicion. The nex example of he ype of problem he TSD framework can solve is from he engineering domain. Figure 1.2 illusraes a welding ime series generaed by a sensor on a welding saion. Welding joins wo pieces of meal by forming a join beween hem.
20 Chaper 1 Inroducion 7 Predicing when a drople of meal will release from a welder allows he qualiy of he meal join o be moniored and conrolled. In Figure 1.2, he squares indicae he release of meal droples. The diamonds are he sickou lengh of he drople measured in pixels. The problem is o predic he releases using he sickou ime series. Because of he irregular, chaoic, and noisy naure of he drople release, predicion is impossible using radiional ime series mehods. x Figure 1.2 Welding Time Series Anoher example problem ha is addressed by he TSD framework is he predicion of sock prices. For his problem, he goal is o find a rading-edge, which is a small advanage ha allows greaer han expeced gains o be realized. The goal is o find hidden emporal paerns ha are on average predicive of a larger han normal increase in he price of a sock. Figure 1.3 shows a ime series generaed by he daily open price and volume of a sock. The bars show he volume of shares raded on a paricular day. The
21 Chaper 1 Inroducion 8 diamonds show he daily open price. The goal is o find hidden paerns in he daily open price and volume ime series ha provide he desired rading-edge. x Figure 1.3 Sock Daily Open Price and Volume Time Series Now ha examples of he ypes of problems addressable by he TSD framework have been presened, he nex secion oulines he res of he disseraion. 1.3 Disseraion Ouline The disseraion is divided ino nine chapers. Chaper 2 reviews several of he consiuen echnologies underlying his research including ime series analysis, daa mining, and geneic algorihms. Addiionally, Chaper 2 presens he heoreical background for he TSD framework, reviewing Takens Theorem. Chaper 3 elaboraes on he key TSD conceps of evens, emporal paerns, emporal paern clusers, phase spaces and ime-delay embeddings, augmened phase spaces, objecive funcions, and opimizaion.
22 Chaper 1 Inroducion 9 Chaper 4 esablishes he fundamenal TSD mehod for characerizing and predicing ime series evens. Chaper 5 clarifies he TSD framework by analyzing a sequence of example ime series. In Chaper 6, exensions of he TSD mehod including daa mining muliple ime series and nonsaionary emporal paern ime series are presened. Chapers 7 and 8 discuss experimenal resuls. Chaper 7 presens resuls from predicing drople releases from a welder. In Chaper 8, he experimenal resuls from analyzing sock marke open price changes are presened. The las chaper summarizes he disseraion and discusses fuure work.
23 10 Chaper 2 Hisorical Review This chaper reviews he consiuen fields underlying he Time Series Daa ining (TSD) research. TSD innovaes conceps from ime series analysis, chaos and nonlinear dynamics, daa mining, and geneic algorihms. From ime series analysis comes he heory for analyzing linear, saionary ime series [1, 4, 5]. From dynamical sysems comes he heoreical jusificaion for he Time Series Daa ining (TSD) mehods, specifically Takens Theorem [36] and Sauer's exension [37]. From daa mining comes he focus on discovering hidden relaionships and paerns [6, 7, 40-44]. From geneic algorihms comes a robus and easily applied opimizaion mehod [19, 27]. 2.1 ARIA Time Series Analysis The Box-Jenkins [4] or Auoregressive Inegraed oving Average (ARIA) [1, 5] mehodology involves finding soluions o he difference equaion L L ( ) ( ) ( ) ( ) φ B φ B x = δ + θ B θ B a [5, p. 570]. (2.1) p P q Q The nonseasonal auoregressive operaor φ p (B) of order p models low-order feedback responses. The seasonal auoregressive operaor φ P (B L ) of order P models feedback responses ha occur periodically a seasonal inervals. For example, given a ime series of monhly daa, his operaor would be used o model a regressive effec ha occurs every January. The nonseasonal moving average operaor θ q (B) of order q models low-order weighed average responses.
24 Chaper 2 Hisorical Review 11 The seasonal moving average operaor θ Q (B L ) of order Q models seasonal weighed average responses. The erms x, a, and δ are he ime series, a sequence of random shocks, and a consan, respecively. The orders of he operaor are seleced ad hoc, and he parameers are calculaed from he ime series daa using opimizaion mehods such as maximum likelihood [4, pp , ] and leas squares [4, pp ]. The ARIA mehod is limied by he requiremen of saionariy and inveribiliy of he ime series [5, p. 488], i.e., he sysem generaing he ime series mus be ime invarian and sable. Addiionally, he residuals, he differences beween he ime series and he ARIA model, mus be independen and disribued normally [5, p ]. Alhough inegraive (filering) echniques can be useful for convering nonsaionary ime series ino saionary ones, i is no always possible o mee all of he requiremens. This review of ARIA ime series modeling examines each of he erms given in (2.1), discusses he mehods for idenifying he orders of he various operaors, and deails he various saisical mehods available o es he model s adequacy. Finally, his secion discusses he inegraive echniques ha allow some nonsaionary ime series o be ransformed ino saionary ones. The ARIA model is bes presened in erms of he following operaors [4, p. 8, 5, p. 568]. The backshif operaor B shifs he index of a ime series observaion backwards, e.g., Bz = z 1, and k Bz =. The nonseasonal or firs difference operaor, z k = 1 B, provides a compac way of describing he firs difference. The seasonal
25 Chaper 2 Hisorical Review 12 operaor L is useful for aking he difference beween wo periodic or seasonal ime L series observaions. I is defined as = 1 B. L Having inroduced he basic operaor noaion, he more complex operaors presened in (2.1) can be discussed. The firs operaor from (2.1) is he nonseasonal auoregressive operaor φ p (B) [4, p. 9, 5, p. 570], also called he Green s funcion [1, p. 78]. This operaor capures he sysems dynamical response o a he sequence of random shocks and previous values of he ime series [1, pp ]. The second operaor is he nonseasonal moving average operaor θ q (B) [5, p. 570]. I is a weighed moving average of he random shocks a. The hird operaor is he seasonal auoregressive operaor φ P (B L ). I is used o model seasonal regressive effecs. For example, if he ime series represens he monhly sales in a oy sore, i is no hard o imagine a large increase in sales jus before Chrismas. This seasonal auoregressive operaor is used o model hese seasonal effecs. The fourh operaor is he seasonal moving average operaor θ Q (B L ). I also is useful in modeling seasonal effecs, bu insead of regressive effecs, i provides a weighed average of he seasonal random shocks. The consan δ = µφ ( B) φ ( B), where µ is he mean of he modeled saionary ime series [5, p. 571]. Bowerman [5, pp. 571] suggess hree seps o deermine he ARIA model for a paricular ime series. 1. Should he consan δ should be included? 2. Which of he operaors φ p (B), φ P (B L ), θ q (B), and θ Q (B L ) are needed? 3. Wha order should each seleced operaor have? p P
26 Chaper 2 Hisorical Review 13 The δ should be included if µ ( Z) c( Z) σ z > 2, (2.2) where µ ( Z) is he mean of he ime series, c( Z ) is he number of ime series observaions, and σ z is he sandard deviaion of he ime series. Two saisical funcions, he sample auocorrelaion funcion (SAC) and sample parial auocorrelaion funcion (SPAC), are used o deermine he inclusion and order of he operaors. The process for deermining he inclusion and orders of he operaors is somewha involved and well explained in [5, pp ]. Is essence is o examine he shape of he SAC and SPAC. The procedure looks for hese funcions o die down or cu off afer a cerain number of lags. Deermining wheher he SAC or SPAC is dying down or cuing off requires exper judgmen. Afer he operaors have been seleced and heir orders deermined, he coefficiens of he operaors are esimaed using a raining ime series. The coefficiens are esimaed using a leas squares [4, pp ] or maximum likelihood mehod [4, pp , ]. Diagnosic checking of he overall ARIA model is done by examining he residuals [5, p. 496]. The firs diagnosic check is o calculae he Ljung-Box saisic. Typically, he model is rejeced when he α corresponding o he Ljung-Box saisic is less han For non-rejeced models, he residual sample auocorrelaion funcion (RSAC) and residual sample parial auocorrelaion funcion (RSPAC) should have absolue saisic values greaer han wo [5, p. 496]. For rejeced models, he RSAC and
27 Chaper 2 Hisorical Review 14 RSPAC can be used o sugges appropriae changes o enhance he adequacy of he models. Classic Box-Jenkins models describe saionary ime series [5, p. 437]. However, several inegraive or filering mehods ransform nonsaionary ime series ino saionary ones. The simples nonsaionary ime series o make saionary is a linear rend, which is nonsaionary because is mean varies hrough ime. The nonseasonal operaor or seasonal operaor x 70K 60K 50K 40K 30K 20K 10K L is applied o remove he linear rend. 0K Figure 2.1 Exponenial Growh Time Series A slighly more complex ransformaion is required for an exponenial rend. One mehod akes he logarihm of he ime series and applies he appropriae nonseasonal or seasonal operaor o he resuling linear rend ime series. Alernaively, he % change ransform may be used, where % 1 B =. (2.3) B
28 Chaper 2 Hisorical Review 15 The ransform is applied as follows: B x x z = x = x =. (2.4) % 1 1 B x 1 Figure 2.1 shows a ime series wih exponenial growh. Figure 2.2 illusraes he ransformed ime series z Figure 2.2 Filered Exponenial Growh Time Series For ime series wih nonsaionary variances, here are wo possible soluions. The firs is o replace he ime series wih he square or some oher appropriae roo of he ime series. Second, he ime series may be replaced by is logarihm [5, pp ]. Given an adequae model, fuure ime series values may be prediced using (2.1). An error confidence range may also be provided. This secion has reviewed he ARIA or Box-Jenkins ime series analysis mehod. The hree references cied here [1, 4, 5] are excellen sources for furher sudy of his opic. As discussed in his secion, opimizaion mehods are needed o find he
29 Chaper 2 Hisorical Review 16 parameers for he ARIA model. Similarly, opimizaion is a necessary componen of he Time Series Daa ining (TSD) framework. The nex secion presens he geneic algorihm opimizaion mehod used in TSD. 2.2 Geneic Algorihms A geneic algorihm is a sochasic opimizaion mehod based on he evoluionary process of naural selecion. Alhough a geneic algorihm does no guaranee a global opimum, i is known o be effecive in opimizing non-linear funcions [19, pp ]. TSD requires an opimizaion mehod o find opimizers for he objecive funcions. Geneic algorihm opimizaion is seleced for his purpose because of is effeciveness and ease of adapaion o he objecive funcions posed by he TSD framework. This secion briefly discusses he key conceps and operaors used by a binary geneic algorihm [19, pp , 22, pp , 23, pp , 24, pp ]. The geneic algorihm process also is discussed. The four major operaors are selecion, crossover, muaion, and reinserion. The fifh operaor, inversion, is used infrequenly. The conceps of geneic algorihms are finess or objecive funcion, chromosome, finess of a chromosome, populaion, and generaion. The finess funcion is he funcion o be opimized, such as 2 f ( x) = x + 10x (2.5) A chromosome is a finie sequence of 0 s and 1 s ha encode he independen variables appearing in he finess funcion. For equaion (2.5), he chromosomes represen values of x. Given an eigh-bi chromosome and a wo s complemen encoding, he values of x for several chromosomes are given in Table 2.1.
30 Chaper 2 Hisorical Review 17 Chromosome x f(x), finess Table 2.1 Chromosome Finess Values The finess is he value assigned o a chromosome by he finess funcion. The populaion is he se of all chromosomes in a paricular generaion, e.g., he four chromosomes in Table 2.1 form a populaion. A generaion is an ieraion of applying he geneic algorihm operaors. The mos common geneic algorihm process is defined as follows. Alernaive geneic algorihm processes may reorder he operaors. Iniializaion while sopping crieria are no me Selecion Crossover uaion Reinserion The iniializaion sep creaes, usually randomly, a se of chromosomes, as in Table 2.1. There are many possible sopping crieria, e.g., haling afer a fixed number of generaions (ieraions) or when finess values of all chromosomes are equivalen. The selecion process chooses chromosomes from he populaion based on finess. One selecion process is based on a roulee wheel. The roulee wheel selecion process
31 Chaper 2 Hisorical Review 18 gives each chromosome a porion of he roulee wheel based on he chromosome s finess. The roulee wheel is spun, and he winning chromosome is placed in he maing or crossover populaion. Usually he individuals are seleced wih replacemen, meaning any chromosome can win on any spin of he roulee wheel. The second ype of selecion is based on a ournamen. In he ournamen, n chromosomes usually wo are seleced a random, normally wihou replacemen. They compee based on finess, and he winner is placed in he maing or crossover populaion. This process is repeaed unil here are no individuals lef. The whole ournamen process is run n imes, where n is he number of chromosomes in each round of he ournamen. The oupu of he selecion process is a maing populaion, which is usually he same size as he original populaion. Given he iniial populaion from Table 2.1, a ournamen wihou replacemen is demonsraed in Table 2.2. The crossover populaion is formed from he winners. Tournamen Round Compeior 1 Compeior 2 Winner (-7664) (-4859) (10000) (9944) (-4859) (9944) (10000) (-7664) Table 2.2 Tournamen Selecion Example Crossover is he process ha mixes he chromosomes in a manner similar o sexual reproducion. Two chromosomes are seleced from he maing populaion wihou replacemen. The crossover operaor combines he encoded binary forma of he paren chromosomes o creae offspring chromosomes. A random crossover locus is chosen, and
32 Chaper 2 Hisorical Review 19 he paren chromosomes are spli a he locus. The ails of he chromosomes are swapped, yielding new chromosomes ha share he geneic maerial from heir parens. Figure 2.3 shows he crossover process. crossover locus crossover locus head 1 ail 1 head 1 ail 1 head 2 ail 2 head 2 ail 2 crossover locus head 1 ail 2 head 2 ail 1 Figure 2.3 Chromosome Crossover A variaion on he crossover process includes using a fixed raher han random locus and/or using a crossover probabiliy ha he seleced pair will no be maed. Coninuing he example, he crossover process is illusraed in Table 2.3, where is he crossover locus. aing Pair Paren 1 Paren 2 Offspring 1 Offspring Table 2.3 Crossover Process Example The muaion operaor randomly changes he bis of he chromosomes. The muaion probabiliy is usually se in he range of 0.1 o 0.01%. For he running example, he muaion process is shown in Table 2.4, where only one bi is muaed.
33 Chaper 2 Hisorical Review 20 Pre-muaion Pos-muaion Table 2.4 Crossover Process Example The reinserion or eliism operaor selecs he op n chromosomes o bypass he selecion, crossover, and muaion operaions. By applying eliism, he op individuals pass direcly from one generaion o he nex unmodified. This operaor is used o ensure ha he mos fi individuals are no los due o he sochasic naure of he selecion and crossover processes. For he example, no reinserion is used. The nex generaion wih finess values is presened in Table 2.5. A comparison of Table 2.1 and Table 2.5 show ha beer soluions have evolved hrough he geneic algorihm process. Chromosome x f(x), finess Table 2.5 Resuling Geneic Algorihm Populaion In summary, a geneic algorihm is a sochasic, global opimizaion mehod based on he evoluionary heory of survival of he fies. The geneic algorihm applies four operaors (selecion, crossover, muaion, and reinserion) o search for objecive funcion
34 Chaper 2 Hisorical Review 21 opimizers. The use of an opimizaion mehod will form a key componen of he TSD framework, specifically in finding he hidden emporal paerns inroduced in Chaper 1. The nex secion presens he heoreical jusificaion for searching for hese hidden emporal paerns. 2.3 Theoreical Underpinnings of Time Series Daa ining This secion shows how Takens Theorem provides he heoreical jusificaion for he TSD framework. Takens proved, wih cerain limiaions, ha he sae space of an unknown sysem can be reconsruced [36, 37]. Theorem (Takens) [36]: Le he sae space of a sysem be Q dimensional, ϕ : be a map ha describes he dynamics of he sysem, and y: be a wice coninuously differeniable funcion, which represens he observaion of a single 2Q+ 1 sae variable. The map Φ( ϕ, y) :, defined by ( ( )) ( )( ) ( ) ( ( )) 2 Q Φ ϕ, y x = y x, y ϕ x,, y ϕ ( x), (2.6) is an embedding. An embedding is a homeomorphic mapping from one opological space o anoher [45, pp ], where a homeomorphic map is coninuous, bijecive (oneo-one and ono), and is inverse is coninuous [45, pp. 1280]. If he embedding is performed correcly, Takens Theorem guaranees ha he reconsruced dynamics are opologically idenical o he rue dynamics of he sysem. Therefore, he dynamical invarians also are idenical [46]. Hence, given a ime series X, a sae space opologically equivalen o he original sae space can be reconsruced by a process called ime-delay embedding [28, 37].
35 Chaper 2 Hisorical Review 22 The difficuly in he ime-delay embedding process is in esimaing Q, he original sae space dimension. Forunaely, as shown in [2, 3, 28, 46], useful informaion can be exraced from he reconsruced sae space even if is dimension is less han 2Q+ 1. This disseraion uses Takens Theorem o provide he srong heoreical jusificaion for reconsrucing sae spaces using ime-delay embedding. The dynamics of he reconsruced sae spaces can conain he same opological informaion as he original sae space. Therefore, characerizaions and predicions based on he reconsruced sae space can be as valid as hose ha could be performed on he original sae space. This is rue even for chaoic dynamics, which are discussed in he nex secion. 2.4 Chaoic Time Series The mos ineresing ime series presened in his disseraion may be classified as chaoic. (See Chapers 7 and 8.) This secion provides a definiion and discussion of chaoic ime series. Chaos comprises a class of signals inermediae beween regular sinusoidal or quasiperiodic moions and unpredicable, ruly sochasic behavior [28, p. 11]. A working definiion of a chaoic ime series is one generaed by a nonlinear, deerminisic process highly sensiive o iniial condiions ha has a broadband frequency specrum [28]. The language for describing chaoic ime series comes from dynamical sysems heory, which sudies he rajecories described by flows (differenial equaions) and maps (difference equaions), and nonlinear dynamics, an inerdisciplinary field ha applies
36 Chaper 2 Hisorical Review 23 dynamical sysems heory in numerous scienific fields [30]. The key concep for describing chaoic ime series is a chaoic aracor. Le be a manifold (a smooh geomeric space such as a line, smooh surface or solid [30, p. 10]), f : be a map, and S = { x0 : x0 S, f n ( x0) S, n} (2.7) be an invarian se. A posiively invarian se is one where n 0. A closed invarian se A is an aracing se, if here exiss a neighborhood U of A such ha U is a n posiively invarian se, and f ( x) A x U. A dense orbi is a rajecory ha passes arbirarily close o every poin in he se [30]. An aracor is defined as an aracing se ha conains a dense orbi. Figure 2.4 illusraes he concep of an aracor wih he arrows represening sae rajecories. A Figure Aracor Thus, a chaoic ime series is defined as one generaed by observing a sae variable s rajecory on a map wih a chaoic aracor. Since a chaoic ime series is deerminisic, i is predicable. However, since i is highly dependen on iniial condiions, he predicion horizon is very shor. The TSD framework provides mehods ha use Takens Theorem o exploi he shor-erm predicabiliy of chaoic ime series. The nex
37 Chaper 2 Hisorical Review 24 secion presens daa mining, which leads o he idea of searching in he shor ime horizon where chaoic ime series are predicable. 2.5 Daa ining Weiss and Indurkhya define daa mining as he search for valuable informaion in large volumes of daa. Predicive daa mining is a search for very srong paerns in big daa ha can generalize o accurae fuure decisions [7]. Similarly, Cabena, e al., define i as he process of exracing previously unknown, valid, and acionable informaion from large daabases and hen using he informaion o make crucial business decisions [43]. Daa mining evolved from several fields, including machine learning, saisics, and daabase design [7]. I uses echniques such as clusering, associaion rules, visualizaion, decision rees, nonlinear regression, and probabilisic graphical dependency models o idenify novel, hidden, and useful srucures in large daabases [6, 7]. Ohers who have applied daa mining conceps o finding paerns in ime series include Bernd and Clifford [47], Keogh [48-50], and Rosensein and Cohen [51]. Bernd and Clifford use a dynamic ime warping echnique aken from speech recogniion. Their approach uses a dynamic programming mehod for aligning he ime series and a predefined se of emplaes. Rosensein and Cohen [51] also use a predefined se of emplaes o mach a ime series generaed from robo sensors. Insead of using he dynamic programming mehods as in [47], hey employ he ime-delay embedding process o mach heir predefined emplaes.
38 Chaper 2 Hisorical Review 25 Similarly, Keogh represens he emplaes using piecewise linear segmenaions. Local feaures such as peaks, roughs, and plaeaus are defined using a prior disribuion on expeced deformaions from a basic emplae [48]. Keogh s approach uses a probabilisic mehod for maching he known emplaes o he ime series daa. The TSD framework, iniially inroduced by Povinelli and Feng in [3], differs fundamenally from hese approaches. The approach advanced in [47-51] requires a priori knowledge of he ypes of srucures or emporal paerns o be discovered and represens hese emporal paerns as a se of emplaes. Their [47-51] use of predefined emplaes compleely prevens he achievemen of he basic daa mining goal of discovering useful, novel, and hidden emporal paerns. The nex chaper inroduces he key TSD conceps, which allow he TSD mehods o overcome he limiaions of radiional ime series mehods and he more recen approaches of Bernd and Clifford [47], Keogh [48-50], and Rosensein and Cohen [51].
39 26 Chaper 3 Some Conceps in Time Series Daa ining Chaper 1 presened wo of he imporan conceps in Time Series Daa ining (TSD), i.e., evens and emporal paerns. In his chaper, hese conceps are explained in furher deail. Oher fundamenal TSD conceps such as even characerizaion funcion, emporal paern cluser, ime-delay embedding, phase space, augmened phase space, objecive funcion, and opimizaion are defined and explained. The chaper also provides examples of each concep. 3.1 Evens In a ime series, an even is an imporan occurrence. The definiion of an even is dependen on he TSD goal. In a seismic ime series, an earhquake is defined as an even. Oher examples of evens include sharp rises or falls of a sock price or he release of a drople of meal from a welder. 5 4 x Figure 3.1 Synheic Seismic Time Series wih Evens
40 (3.2) Chaper 3 Some Conceps in Time Series Daa ining Even Example Synheic Earhquakes evens. Le Figure 3.1 shows a synheic example ime series, which is useful for explaining X = { x, = 1,, N} (3.1) be a synheic ime series represening seismic daa, where N = 100. The diamonds show he values of observaions a paricular ime indices. The squares indicae observaions ha are deemed imporan evens. x (pixels) sickou release Figure 3.2 Welding Time Series Even Example eal Drople Release Figure 3.2 shows a welding ime series. Le X = { x, = 400,,600} be a ime series of meal drople sickou lenghs. The diamonds in Figure 3.2 are he sickou lenghs measured in pixels. Le Y = { y, = 400,,600} (3.3)
41 Chaper 3 Some Conceps in Time Series Daa ining 28 be a binary (1 for an even, 0 for a noneven) ime series of drople releases. In Figure 3.2, he squares indicae when y = 1, i.e., when a drople of meal has released Even Example Spikes in Sock Open Price Le X = { x, = 1,,126} be he daily open price of a sock for a six-monh period as illusraed by Figure 3.3. For his ime series, he goal is o find a rading-edge, which is a small advanage ha allows greaer han expeced gains o be realized. The sock will be bough a he open of he firs day and sold a he open of he second day. The goal is o pick buy-and-sell-days ha will, on average, have greaer han expeced price increases. Thus, he evens, highlighed as squares in Figure 3.3, are hose days when he price increases more han 5% x Figure 3.3 Sock Daily Open Price Time Series
42 Chaper 3 Some Conceps in Time Series Daa ining Temporal Paern and Temporal Paern Cluser The nex imporan concep wihin he TSD framework is he emporal paern. A emporal paern is a hidden srucure in a ime series ha is characerisic and predicive of evens. The emporal paern p is a real vecor of lengh Q. The emporal paern will be represened as a poin in a Q dimensional real meric space, i.e., Q p. The vecor sense of p is illusraed in Figure 3.4, which shows he synheic seismic ime series wihou any conaminaing noise. The hidden emporal paern p ha is characerisic of he evens is highlighed wih gray squares. Since he conaminaing noise has been removed, he emporal paern perfecly maches he sequence of ime series observaions before an even. x p x even Figure 3.4 Synheic Seismic Time Series wihou Conaminaing Noise wih Temporal Paern and Evens Figure 3.5 shows he synheic seismic ime series wih conaminaing noise. Because of he noise, he emporal paern does no perfecly mach he ime series
43 Chaper 3 Some Conceps in Time Series Daa ining 30 observaions ha precede evens. To overcome his limiaion, a emporal paern cluser is defined as he se of all poins wihin δ of he emporal paern. Q P = { a : d( p, a) δ}, (3.4) where d is he disance or meric defined on he space. This defines a hypersphere of dimension Q, radius δ, and cener p. 5 4 p x even x Figure 3.5 Synheic Seismic Time Series wih Temporal Paern and Evens { } The observaions x ( 1 ),,,, Q τ x 2τ x τ x form a sequence ha can be compared o a emporal paern, where x represens he curren observaion, and x 1,, x, x pas observaions. Le τ > 0 be a posiive ineger. If represens he ( Q ) τ 2τ τ presen ime index, hen τ is a ime index in he pas, and + τ is a ime index in he fuure. Using his noaion, ime is pariioned ino hree caegories: pas, presen, and fuure. Temporal paerns and evens are placed ino differen ime caegories. Temporal paerns occur in he pas and complee in he presen. Evens occur in he fuure.
44 , Chaper 3 Some Conceps in Time Series Daa ining 31 The nex secion presens he concep of a phase space, which allows sequences of ime series o be easily compared o emporal paerns. 3.3 Phase Space and Time-Delay Embedding A reconsruced phase space [28, 35, 52], called simply phase space here, is a Q- dimensional meric space ino which a ime series is embedded. As discussed in Chaper 2, Takens showed ha if Q is large enough, he phase space is homeomorphic o he sae space ha generaed he ime series [36]. The ime-delayed embedding of a ime series maps a se of Q ime series observaions aken from X ono x, where x is a vecor or ( ) poin in he phase space. Specifically, ( ),,,, T x = x Q 1 τ x 2τ x τ x. c x x -1 c Figure 3.6 Consan Value Phase Space For example, given a consan value ime series X = { x = c, = 1,, N} where c is a consan, he phase space has a single poin as illusraed by Figure 3.6. Figure 3.7
45 Chaper 3 Some Conceps in Time Series Daa ining 32 shows a wo-dimensional phase space ha resuls from he ime-delayed embedding of he synheic seismic ime series presened in Figure 3.1. The emporal paern and emporal paern cluser also are illusraed. For his ime-delayed embedding, τ = 1. Every pair of adjacen observaions in he original ime series forms a single poin in his phase space. 5 x Embedded Time Series Temporal Paern Temporal Paern Cluser x -1 Figure 3.7 Synheic Seismic Phase Space Figure 3.8 shows he wo-dimensional phase space of he welding ime series presened by Figure 3.2, and Figure 3.9 shows he wo-dimensional phase space of he sock ime series presened by Figure 3.3. Noe ha τ = 1for boh embeddings.
46 Chaper 3 Some Conceps in Time Series Daa ining 33 x x -1 Figure 3.8 Welding Phase Space x x -1 Figure 3.9 Sock Daily Open Price Phase Space
47 Chaper 3 Some Conceps in Time Series Daa ining 34 To deermine how well a emporal paern or a phase space poin characerizes an even requires he concep of an even characerizaion funcion as inroduced in he nex secion. 3.4 Even Characerizaion Funcion To link a emporal paern (pas and presen) wih an even (fuure) he gold or even characerizaion funcion g() is inroduced. The even characerizaion funcion represens he value of fuure evenness for he curren ime index. I is, o use an analogy, a measure of how much gold is a he end of he rainbow (emporal paern). The even characerizaion funcion is defined a priori and is creaed o address he specific TSD goal. The even characerizaion funcion is defined such ha is value a correlaes highly wih he occurrence of an even a some specified ime in he fuure, i.e., he even characerizaion funcion is causal when applying he TSD mehod o predicion problems. Non-causal even characerizaion funcions are useful when applying he TSD mehod o sysem idenificaion problems. For he ime series illusraed in Figure 3.1, he goal is o predic occurrences of synheic earhquakes. One possible even characerizaion funcion o address his goal is g() = x + 1, which capures he goal of characerizing synheic earhquakes one-sep in he fuure. Alernaively, predicing an even hree ime-seps ahead requires he even characerizaion funcion g() = x + 3. A more complex even characerizaion funcion ha would predic an even occurring one, wo, or hree ime-seps ahead is g() = max { x, x, x }. (3.5)
48 Chaper 3 Some Conceps in Time Series Daa ining 35 In Figure 3.2, he TSD goal is o predic he drople releases using he sickou ime series. Specifically, he objecive is o generae one ime-sep predicions of when meal droples will release from a welder. In he previous even characerizaion funcions g() was defined in erms of x he same ime series ha conains he emporal paerns. However, in his example, he emporal paerns are discovered in a differen ime series from he one conaining he evens. Thus, he even characerizaion funcion is g() = y + 1, where Y is defined by (3.3). In Figure 3.3, he goal is o decide if he sock should be purchased oday and sold omorrow. The even characerizaion funcion ha achieves his goal is x+ 1 x g() =, (3.6) x which assigns he percenage change in he sock price for he nex day o he curren ime index. 3.5 Augmened Phase Space The concep of an augmened phase space follows from he definiions of he even characerizaion funcion and he phase space. The augmened phase space is a Q+1 dimensional space formed by exending he phase space wih g() as he exra dimension. Every augmened phase space poin is a vecor Q+ 1 x, g ( ). <>
49 Chaper 3 Some Conceps in Time Series Daa ining g x x - 1 Figure 3.10 Synheic Seismic Augmened Phase Space Figure 3.10, a sem-and-leaf plo, shows he augmened phase space for he synheic seismic ime series. The heigh of he leaf represens he significance of g() for ha ime index. From his plo, he required emporal paern and emporal paern cluser are easily idenified. 1 g x x - 1 Figure 3.11 Welding Augmened Phase Space
50 Chaper 3 Some Conceps in Time Series Daa ining g x Figure 3.12 Sock Daily Open Price Augmened Phase Space Figure 3.11 and 3.12 show he augmened phase spaces for he welding ime series and he Sock Daily Open Price, respecively. In boh of hese plos he desired emporal paerns and emporal paern clusers are hidden. Appropriae filering and higher order augmened phase spaces are required o allow he hidden emporal paerns in hese ime series o be idenified. These echniques are discussed in Chaper 6. Idenifying he opimal emporal paern cluser in he augmened phase space requires he formulaion of an objecive funcion, which is discussed in he nex secion. 3.6 Objecive Funcion The nex concep is he TSD objecive funcion, which represens he efficacy of a emporal paern cluser o characerize evens. The objecive funcion f maps a emporal paern cluser P ono he real line, which provides an ordering o emporal paern clusers according o heir abiliy o characerize evens. The objecive funcion is consruced in such a manner ha is opimizer x - 1 * P mees he TSD goal.
51 Chaper 3 Some Conceps in Time Series Daa ining g 2 Temporal Paern Cluser, P 1 Temporal Paern Cluser, P x x - 1 Figure 3.13 Synheic Seismic Augmened Phase Space wih Highlighed Temporal Paern Clusers Figure 3.13 illusraes he requiremen of he TSD objecive funcion. The emporal paern cluser P 1 is obviously he bes emporal paern cluser for idenifying evens, while he emporal paern cluser P 2 is no. The objecive funcion mus map he emporal paern clusers such ha f ( P) f ( P ) >. 1 2 The form of he objecive funcions is applicaion dependen, and several differen objecive funcions may achieve he same TSD goal. Before presening example objecive funcions, several definiions are required. The index se Λ is he se of all ime indices of phase space poins. Λ= { : = ( Q 1) τ + 1,, N}, (3.7) where ( Q 1) τ is he larges embedding ime-delay, and N is he number of observaions in he ime series. The index se is he se of all ime indices when x is wihin he emporal paern cluser, i.e. = { : x P, Λ}. (3.8)
52 ,. Chaper 3 Some Conceps in Time Series Daa ining 39 Similarly,, he complemen of, is he se of all ime indices when x is ouside he emporal paern cluser. The average value of g, also called he average evenness, of he phase space poins wihin he emporal paern cluser P is µ 1 = g(), (3.9) c( ) where c( ) is he cardinaliy of. The average evenness of he phase space poins no in P is µ = 1 g() (3.10) c ( ) Consequenly, he average evenness of all phase space poins is given by µ X 1 = g() c( Λ). (3.11) Λ The corresponding variances are σ ( () ) 2 = g µ, (3.12) c( ) ( () ) 2 ( ) g σ = µ c and (3.13) σ ( () ) 2 X = g µ X c( Λ). (3.14) 2 1 Using hese definiions, several examples of objecive funcions are defined below. The firs objecive funcion is he es for he difference beween wo independen means [53, 54]. Λ
53 Chaper 3 Some Conceps in Time Series Daa ining 40 f ( P) = µ µ 2 σ c σ + 2 ( ) c( ), (3.15) where P is a emporal paern cluser. This objecive funcion is useful for idenifying emporal paern clusers ha are saisically significan and have a high average evenness. The nex example objecive funcion orders emporal paern clusers according o heir abiliy o characerize ime series observaions wih high evenness and characerize a leas a minimum number of evens. The objecive funcion µ if c( ) c( Λ) β f ( P) = c( ), (3.16) ( µ - g0) + g0 oherwise β c( Λ) where β is he desired minimum percenage cardinaliy of he emporal paern cluser, and g 0 is he minimum evenness of he phase space poins, i.e. g0 = min { g() : Λ }. (3.17) The parameer β in he linear barrier funcion in (3.16) is chosen so ha c() is non-rivial, i.e., he neighborhood around p includes some percenage of he oal phase space poins. If β = 0, hen ( ) 0 c = or g() i = g( j) i, j, i.e., he evenness value of all poins in he emporal paern cluser are idenical. If β = 0, he emporal paern cluser will be maximal when i conains only one poin in he phase space he poin wih he highes evenness. If here are many poins wih he highes evenness, he opimal emporal paern cluser may conain several of hese poins. When β = 0, (3.16) is sill defined, because c( ) c( Λ) 0 is always rue.
54 Chaper 3 Some Conceps in Time Series Daa ining 41 The nex objecive funcion is useful when he TSD goal requires ha every even is prediced, e.g., he bes soluion o he welding problem will predic every drople release. Wih his goal in mind, he objecive funcion mus capure he accuracy wih which a emporal paern cluser predics all evens. Since i may be impossible for a single emporal paern cluser o perfecly predic all evens, a collecion of emporal paern clusers is used for his objecive funcion. The objecive funcion f ( ) is he raio of correc predicions o all predicions, i.e. p + n f ( ) = + + f + f p n p n, (3.18) where p (rue posiive), n (rue negaive), f p (false posiive), and f n (false negaive) are respecively defined as = c( { x : P x P g() = 1} ), (3.19) p i i f = c( { x : P x P g() = 0} ), (3.20) p i i = c( { x : x P P g() = 0} ), and (3.21) n i i f = c( { x : x P P g() = 1} ) (3.22) n i i This objecive funcion would be used o achieve maximum even characerizaion and predicion accuracy for binary g() (1 for an even, 0 for a noneven) as wih he welding ime series shown in Figure Opimizaion The key concep of he TSD framework is o find opimal emporal paern clusers ha characerize and predic evens. Thus, an opimizaion algorihm represened by
55 Chaper 3 Some Conceps in Time Series Daa ining 42 max f ( P) p, δ (3.23) is necessary Embedded Time Series Temporal Paern Cluser 1 Temporal Paern Cluser 2 Temporal Paern Cluser 3 x x -1 Figure 3.14 Synheic Seismic Phase Space wih Alernaive Temporal Paern Clusers Since differen emporal paern clusers may conain he same phase space poins, as illusraed in Figure 3.14, a bias may be placed on δ, he radius of he emporal paern cluser hypersphere. Three possible biases are minimize, maximize, or moderae. The choice of he bias is based on he ypes of predicion errors o be minimized. To minimize he false posiive predicion errors, he error of classifying a non-even as an even, is minimized subjec o f(p) remaining consan. This will cause he emporal paern cluser o have as small a coverage as possible while no changing he value of he objecive funcion. To minimize he false negaive predicion errors, he error of classifying an even as a non-even, is maximized subjec o f(p) remaining consan. This will cause
56 Chaper 3 Some Conceps in Time Series Daa ining 43 he emporal paern cluser o have as large a coverage as possible while no changing he value of he objecive funcion. A moderaing bias would balance beween he false posiives and false negaives. Thus, an opimizaion formulaion for (3.15) and (3.16), is max f ( P ) subjec o minδ such ha minimizing δ does no change he value of f(p). This formulaion places a minimizaion bias on δ. An opimizaion formulaion for (3.18) is max f ( ) subjec o min c ( ) and minδ i Pi such ha minimizing c( ) and δ i does no change he value of f(p). This formulaion searches for a minimal se of emporal paern clusers ha is a maximizer of he objecive funcion, and each emporal paern cluser has a minimal radius. 3.8 Summary of Conceps in Time Series Daa ining To review, some he key conceps of TSD follow. An even is defined as an imporan occurrence in ime. The associaed even characerizaion funcion g(), defined a priori, represens he value of fuure evenness for he curren ime index. Defined as a vecor of lengh Q or equivalenly as a poin in a Q-dimensional space, a emporal paern is a hidden srucure in a ime series ha is characerisic and predicive of evens. A phase space is a Q-dimensional real meric space ino which he ime series is embedded. The augmened phase space is defined as a Q+1 dimensional space formed by exending he phase space wih he addiional dimension of g(). The objecive funcion represens a value or finess of a emporal paern cluser or a collecion of emporal paern clusers. Finding opimal emporal paern clusers ha characerize and predic evens is he key of he TSD framework.
57 Chaper 3 Some Conceps in Time Series Daa ining 44 Wih he conceps of he TSD framework defined, he nex chaper formulaes he TSD mehod ha searches for a single opimal emporal paern cluser in a single dimensional ime series.
58 45 Chaper 4 Fundamenal Time Series Daa ining ehod This chaper deails he fundamenal Time Series Daa ining (TSD) mehod. Afer reviewing he problem saemen, he TSD mehod will be discussed. The chaper presens a mehod based on an elecrical field for moderaing he emporal paern cluser hreshold δ. Saisical ess for emporal paern cluser significance are discussed as a means for validaing he resuls. The chaper also presens an adapaion of a geneic algorihm o he TSD framework. Exensions and variaions of he TSD mehod are presened in Chaper 6. The key o he TSD mehod is ha i forgoes he need o characerize ime series observaions a all ime indices for he advanages of being able o idenify he opimal local emporal paern clusers for predicing imporan evens. This allows predicion of complex real-world ime series using small-dimensional phase spaces. 4.1 Time Series Daa ining ehod The firs sep in applying he TSD mehod is o define he TSD goal, which is specific o each applicaion, bu may be saed generally as follows. Given an observed ime series X = { x, = 1,, N}, (4.1) he goal is o find hidden emporal paerns ha are characerisic of evens in X, where evens are specified in he conex of he TSD goal. Likewise, given a esing ime series Y = { x, = R,, S} N < R< S, (4.2) he goal is o use he hidden emporal paerns discovered in X o predic evens in Y.
59 Chaper 4 Fundamenal Time Series Daa ining ehod 46 Training Sage Evaluae raining sage resuls Define TSD goal Observed ime series Selec Q Define g, f, and opimizaion formulaion Embed ime series ino phase space Search phase space for opimal emporal paern cluser Tesing ime series Tesing Sage Embed ime series ino phase space Predic evens Figure 4.1 Block Diagram of TSD ehod Figure 4.1 presens a block diagram of he TSD mehod. Given a TSD goal, an observed ime series o be characerized, and a esing ime series o be prediced, he seps in he TSD mehod are: I. Training Sage (Bach Process) 1. Frame he TSD goal in erms of he even characerizaion funcion, objecive funcion, and opimizaion formulaion. a. Define he even characerizaion funcion g. b. Define he objecive funcion f.
60 Chaper 4 Fundamenal Time Series Daa ining ehod 47 c. Define he opimizaion formulaion, including he independen variables over which he value of he objecive funcion will be opimized and he consrains on he objecive funcion. 2. Deermine Q, i.e., he dimension of he phase space and he lengh of he emporal paern. 3. Transform he observed ime series ino he phase space using he imedelayed embedding process. 4. Associae wih each ime index in he phase space an evenness represened by he even characerizaion funcion. Form he augmened phase space. 5. In he augmened phase space, search for he opimal emporal paern cluser, which bes characerizes he evens. 6. Evaluae raining sage resuls. Repea raining sage as necessary. II. Tesing Sage (Real Time or Bach Process) 1. Embed he esing ime series ino he phase space. 2. Use he opimal emporal paern cluser for predicing evens. 3. Evaluae esing sage resuls. Wih he TSD mehod defined, he nex secion presens an example o furher clarify he mehod s mechanisms.
61 Chaper 4 Fundamenal Time Series Daa ining ehod TSD Example This secion applies he TSD mehod o he synheic seismic ime series as illusraed in Figure 4.2. The TSD goal is o characerize and predic he earhquakes, i.e., he large spikes TSD Training Sep 1 Frame he TSD Goal in Terms of TSD Conceps The firs sep in he TSD mehod is o frame he daa mining goal in erms of he even characerizaion, objecive funcion, and opimizaion formulaion. Since he goal is o characerize he synheic earhquakes, he even characerizaion funcion is g() = x + 1, which allows predicion one ime-sep in he fuure. 5 4 x Figure 4.2 Synheic Seismic Time Series (Observed) Since he emporal paerns ha characerize he evens are o be saisically differen from oher emporal paerns, he objecive funcion is
62 Chaper 4 Fundamenal Time Series Daa ining ehod 49 f ( P) = µ µ 2 σ c σ + 2 ( ) c( ), (4.3) which orders emporal paern clusers according o heir abiliy o saisically differeniae beween evens and non-evens. The opimizaion formulaion is o max f ( P ) subjec o min b( P ) such ha minimizing b(p) does no change he value of f(p). This opimizaion formulaion will idenify he mos saisically significan emporal paern cluser wih a moderae radius. The funcion b deermines a moderae δ based on an elecrical field wih each phase space poin having a uni charge. The funcion b measures he cumulaive force applied on he surface of he emporal paern cluser. The deails of b are provided laer in his chaper TSD Training Sep 2 Deermine Temporal Paern Lengh The lengh of he emporal paern Q, which is also he dimension of he phase space, is chosen ad hoc. Recall ha Takens Theorem proves ha if Q= 2m+ 1, where m is he original sae space dimension, he reconsruced phase space is guaraneed o be opologically equivalen o he original sae space, bu Takens Theorem provides no mechanism for deermining m. Using he principle of parsimony, emporal paerns wih small Q are examined firs. For his example, Q = 2, which allows a graphical presenaion of he phase space TSD Training Sep 3 Creae Phase Space For his example, Figure 4.3 illusraes he phase space.
63 Chaper 4 Fundamenal Time Series Daa ining ehod x x -1 Figure 4.3 Synheic Seismic Phase Space (Observed) The ime series X is embedded ino he phase space using he ime-delay embedding process where each pair of sequenial poins (x -1, x ) in X generaes a wodimensional phase space poin. If he phase space were hree-dimensional, every riple of sequenial poins (x -2, x -1, x ) could be seleced o form he phase space. The anhaan or l 1 disance is chosen as he meric for his phase space TSD Training Sep 4 Form Augmened Phase Space The nex sep is o form he augmened phase space by exending he phase space wih he g() dimension as illusraed by Figure 4.4, a sem-and-leaf plo. The verical lines represen he dimension g associaed wih he pairs of (x -1, x ). The nex sep will find an opimal cluser of leaves wih high evenness.
64 Chaper 4 Fundamenal Time Series Daa ining ehod g x x - 1 Figure 4.4 Synheic Seismic Augmened Phase Space (Observed) TSD Training Sep 5 Search for Opimal Temporal Paern Cluser A geneic algorihm searches for he opimal emporal paern cluser, where a emporal paern cluser P is a hypersphere wih a cener defined by a emporal paern p and a radius δ. In Figure 4.5, he emporal paern cluser found by he geneic algorihm is highlighed in he phase space. By comparing Figure 4.4 and Figure 4.5, i is obvious ha he opimal emporal paern cluser is idenified. The circle P (recall he phase space disance is anhaan) in Figure 4.5 has is cener a p wih radius δ. In Figure 4.6, he emporal paern and evens are highlighed on he ime series. The is no presen in his view, bu he relaionship beween he ime series observaions mached by he emporal paern cluser and he even observaion is obvious. I is clear from Figures 4.4, 4.5, and 4.6 ha he TSD raining sage has been successful. The process of evaluaing he raining sage resuls is explained laer in his chaper. Nex, he esing sage applies he emporal paern cluser P o he esing ime series.
65 Chaper 4 Fundamenal Time Series Daa ining ehod 52 4 p cluser x 3 x x - 1 Figure 4.5 Synheic Seismic Phase Space wih Temporal Paern Cluser (Observed) 5 4 p x even 3 x Figure 4.6 Synheic Seismic Time Series wih Temporal Paerns and Evens Highlighed (Observed)
66 Chaper 4 Fundamenal Time Series Daa ining ehod x Figure 4.7 Synheic Seismic Time Series (Tesing) x x -1 Figure 4.8 Synheic Seismic Phase Space (Tesing)
67 Chaper 4 Fundamenal Time Series Daa ining ehod TSD Tesing Sep 1 Creae Phase Space The esing ime series Y, which is shown in Figure 4.7, is he nonsaionary, nonperiodic coninuaion of he observed ime series. The ime series Y is embedded ino he phase space as shown in Figure 4.8 using he ime-delay embedding process performed in he raining sage TSD Tesing Sep 2 Predic Evens The las sep in he TSD mehod is o predic evens by applying he discovered emporal paern cluser o he esing phase space. For his example, Figure 4.9 clearly illusraes he accuracy of he emporal paern in predicing evens. The pair of conneced gray squares ha mach sequences of ime series observaions before evens is he emporal paern. The black squares indicae prediced evens p x even 4.0 x Figure 4.9 Synheic Seismic Time Series wih Temporal Paerns and Evens Highlighed (Tesing)
68 Chaper 4 Fundamenal Time Series Daa ining ehod 55 This secion has presened an example applicaion of he TSD mehod o he synheic seismic ime series. The nex secion describes in deail he funcion b used in his example o find a moderae δ. 4.3 Repulsion Funcion for oderaing δ The opimizaion formulaion in he previous secion was o max f ( P ) subjec o min b( P ) such ha minimizing b(p) does no change he value of f(p). This secion explains he repulsion funcion b, which is based on he concep of an elecrical field. P Figure 4.10 Repulsion Force Illusraion The minimizer of b is a emporal paern cluser wih a moderae δ. ore * * * precisely, δ δ δ, where min b max * δ min is he radius of * P min (he opimal emporal paern * * cluser wih he smalles radius); δ b is he radius of P b (he opimal emporal paern cluser wih he smalles b( P )); and δ is he radius of * max * P max (he emporal paern cluser wih he larges radius), where P, P, P, he collecion of opimal * * * min b max emporal paern clusers ha all conain he same phase space poins. The funcion b represens a repulsion force on he surface of he hypersphere defined by a emporal
69 Chaper 4 Fundamenal Time Series Daa ining ehod 56 paern cluser P. The poins in he phase space are reaed like fixed elecrons ha exer a force on he neares poin on he surface of he hypersphere as illusraed in Figure 4.10 Several inermediae resuls are needed o define b. Recall he se of all ime indices of phase space poins Λ= { : = ( Q 1) τ + 1,, N}. The vecor v = x p, Λ (4.4) is he vecor from he cener of he hypersphere o each phase space poin. The disances o he surface of he hypersphere are using he p norm of he phase space. The h = δ v, Λ, (4.5) p m 1 = p, Λ (4.6) h is he force magniude of he h phase space poin. The force f v m if h δ v p = = τq-1,, N v m if h > δ v p (4.7) is he h phase space poin s force on he hypersphere surface. Finally, N b( P) = f + m m = τq 1 p (4.8) is he magniude of he sum of all forces added o he absolue value of he difference beween he sum of he force magniudes inside he emporal paern cluser and he sum of he force magniudes ouside he emporal paern cluser. The minimizer of b is boh he minimizer of he overall force and he minimizer of he difference beween he forces
70 Chaper 4 Fundamenal Time Series Daa ining ehod 57 * * inside and ouside he emporal paern cluser. The δ b has a value beween he δ min and * δ max. The nex secion discusses he ess used for evaluaing he saisical significance of he emporal paern clusers. 4.4 Saisical Tess for Temporal Paern Cluser Significance Two saisical ess are used o verify ha he TSD goal is me. Recall ha he goal was o find hidden emporal paerns ha are characerisic of evens in he observed ime series and predicive of evens in he esing ime series. The firs saisical es is he runs es. The runs es measures wheher a binary sequence is random [54, pp ]. A binary sequence is formed by assigning a 0 o ime series observaions classified as non-evens and a 1 o hose classified as evens. Soring he binary sequence according o associaed evennesses of he binary sequence forms he es sequence. For large sample sizes 2n0n 1 r + 1 n0 + n1 z =, (4.9) 2n0n1 ( 2n0n1 n0 n1 ) 2 ( n + n ) ( n + n 1) 0 where r he number of runs of he same elemen in a sequence, n 0 is he number of occurrences of a 0, and n 1 is he number of occurrences of a
71 Chaper 4 Fundamenal Time Series Daa ining ehod 58 The es hypohesis is: H 0 : The se of evennesses associaed wih he emporal paern cluser P { g() : } is no differen from he se of evennesses no associaed wih he emporal paern cluser P { g() : }. { } H a : The ses { g() : } and g() : are differen. The complemenary error funcion and a wo-ailed normal disribuion are used o find he probabiliy value α associaed wih z. The probabiliy values are ypically much beer han α = 0.01, where α is he probabiliy of making a Type I error. A Type I error is when he null hypohesis is incorrecly rejeced [53, pp , 54, p. 16]. The second saisical es is he z es for wo independen samples [53, pp , 54, pp ]. z = ( X Y) σ n 2 2 X σy X + n Y, (4.10) where X is he mean of X, Y is he mean of Y, σ X is he sandard deviaion of X, σ Y is he sandard deviaion of Y, n X is he number of elemens in X, and n Y is he number of elemens in Y. As wih he runs es, he probabiliy values are ypically much beer han α = 0.01.
72 Chaper 4 Fundamenal Time Series Daa ining ehod 59 The es hypohesis is: H 0 : The mean of he evennesses { g() : } associaed wih he emporal paern cluser P is no greaer han he mean of he { } evennesses g() : no associaed wih he emporal paern cluser P. H a : The mean of { g() : } is greaer han he mean of { g() : }. A single-ailed disribuion is used. The nex secion discusses he adapaion of he geneic algorihm for he TSD mehod. 4.5 Opimizaion ehod Geneic Algorihm In Chaper 2, a review of he basic geneic algorihm was provided. Here he basic geneic algorihm is adaped o he TSD framework. These adapaions include an iniial one Carlo search and hashing of finess values. Addiionally, he muli-objecive opimizaion capabiliies of he ournamen geneic algorihm are discussed. The basic geneic algorihm presened in Chaper 2 is modified as follows. Creae an elie populaion Randomly generae large populaion (n imes normal populaion size) Calculae finess Selec he op 1/n of he populaion o coninue While all finesses have no converged Selecion Crossover
73 Chaper 4 Fundamenal Time Series Daa ining ehod 60 uaion Reinserion Iniializing he geneic algorihm wih he resuls of a one Carlo search has been found o help he opimizaion s rae of convergence and in finding a good opimum. The hashing modificaion reduces he compuaion ime of he geneic algorihm by 50%. This modificaion is discussed in deail in [20]. Profiling he compuaion ime of he geneic algorihm reveals ha mos of he compuaion ime is used evaluaing he finess funcion. Because he diversiy of he chromosomes diminishes as he populaion evolves, he finess values of he bes individuals are frequenly recalculaed. Efficienly soring finess values in a hash able dramaically improves geneic algorihm performance [20]. The objecive funcion max f ( P ) subjec o min b( P ) such ha minimizing b(δ) does no change he value of f(p), presens wo separae opimizaion objecives. The wo opimizaion objecives could be reduced o a single objecive problem using a barrier funcion, or he ournamen geneic algorihm could hen be applied direcly. The second mehod is applied because he differen objecives have differen prioriies. The primary objecive is o maximize f ( P ). The secondary objecive is o minimize b( P ) such ha minimizing b(δ) does no change he value of f(p). The primary TSD goal of finding an opimal emporal paern cluser should never be compromised o achieve a beer emporal paern cluser shape. This is accomplished wih a ournamen iebreaker sysem. The chromosomes compee on he primary objecive of finding opimal emporal paern clusers. If, in he
74 Chaper 4 Fundamenal Time Series Daa ining ehod 61 ournamen, wo chromosomes have he same primary objecive funcion value, he winner is deermined by a iebreaker, where he iebreaker is he secondary opimizaion objecive. This chaper presened he TSD mehod and hrough an example showed how hidden emporal paerns can be idenified. Addiionally, he repulsion force funcion, saisical characerizaion of he emporal paern cluser, and adapaion of he geneic algorihm were discussed. The nex chaper furher illusraes he mehod hrough a series of examples.
75 62 Chaper 5 Basic and Explanaory Examples This chaper presens four examples ha help elici he capabiliies and limiaions of he TSD mehod while clarifying is mechanisms. The firs example characerizes he maximal values of a consan frequency sinusoid. The second example applies he TSD mehod o a uniform densiy sochasic ime series. The hird uses a combinaion of a sinusoid and uniform densiy noise o illusrae he TSD mehod s capabiliies wih noisy ime series. The fourh example is he synheic seismic ime series. 5.1 Sinusoidal Time Series The firs observed ime series, X = { x = sin ( ω), = 1,, N}, where ω = π 8 and N = 100, is illusraed in Figure 5.1. For his ime series, he TSD goal is o predic he maximal poins of he ime series. To achieve his objecive, he even characerizaion funcion is g() = x + 1, which will be used for all remaining examples. The objecive funcion (described in Chaper 3) is µ if c( ) c( Λ) β f ( P) = c( ), (5.1) ( µ - g0) + g0 oherwise β c( Λ) where β = This objecive funcion is useful for finding emporal paern clusers wih a high average evenness, where β is he desired minimum percenage cardinaliy of he emporal paern cluser. The opimizaion formulaion is max f ( P ) subjec o min b( δ ) such ha minimizing b(δ) does no change he value of f(p). The funcion b is described in Chaper 4.
76 Chaper 5 Basic and Explanaory Examples x Figure 5.1 Sinusoidal Time Series (Observed) x x -1 Figure 5.2 Sinusoidal Phase Space (Observed)
77 Chaper 5 Basic and Explanaory Examples 64 Figure 5.2 presens he raining sage phase space wih an l 2 disance meric. Since he ime series varies sinusoidally, i embeds o an ellipse. Figure 5.3 illusraes he augmened phase space, which furher shows he ellipical naure of he phase space poins. 1 g x x - 1 Figure 5.3 Sinusoidal Augmened Phase Space (Observed) The ournamen geneic algorihm search parameers are presened in Table 5.1. The random search muliplier specifies he size of he one Carlo search used o creae he iniial geneic algorihm populaion. The populaion size is he number of chromosomes in he geneic algorihm populaion. The elie coun specifies he number of chromosomes ha bypass he selecion, maing, and muaion seps. The gene lengh is he number of bis used o represen each dimension of he search space. For a Q = 2, he chromosome is formed from hree genes. The firs gene is he x -1 dimension, he second gene is he x dimension, and he hird is he hreshold δ. Hence, he chromosome will have a lengh of 3 (genes) x 8 (gene lengh) = 24 (bis). The ournamen size specifies he number of chromosomes ha will paricipae in one round of he ournamen selecion
78 Chaper 5 Basic and Explanaory Examples 65 process. The muaion rae specifies he likelihood a paricular bi in a chromosome will be muaed. The convergence crierion wih a range of [ 0,1 ] is used o decide when o hal he geneic algorihm. The convergence crierion is he minimum raio of he wors chromosome s finess o he bes chromosome s finess. When he raio is equal o or greaer han he convergence crierion, he geneic algorihm is haled. Parameer Value Random search muliplier 1 Populaion size 100 Elie coun 1 Gene lengh 8 Tournamen size 2 uaion rae 0.2% Convergence crieria 1 Table 5.1 Geneic Algorihm Parameers for Sinusoidal Time Series Resul Value Temporal paern, p [ ] Threshold, δ 0.25 Cluser cardinaliy, c( ) 7 Cluser mean evenness, µ 1.0 Cluser sandard deviaion evenness, σ 0.0 Non-cluser cardinaliy, c( ) 91 Non-cluser mean evenness, µ Non-cluser sandard deviaion evenness, σ 0.69
79 Chaper 5 Basic and Explanaory Examples 66 Resul Value z r -9.5 α r 3.0x10-21 z m 15 α m 5.2x10-49 Table 5.2 Sinusoidal Resuls (Observed) The search resuls are shown in Table 5.2. The firs wo resuls, emporal paern and hreshold, define he emporal paern cluser. The cluser cardinaliy is he coun of phase space poins in he emporal paern cluser. The cluser mean evenness is he average value of g for each phase space poin in he cluser. The cluser sandard deviaion evenness is he sandard deviaion of g for he phase space poins in he cluser. The non-cluser cardinaliy is he number of phase space poins no in he emporal paern cluser. The non-cluser mean evenness is he average value of g for each phase space poin no in he emporal paern cluser. The non-cluser sandard deviaion evenness is he sandard deviaion of g for he phase space poins no in he emporal paern cluser. The las four resuls describe he saisical significance of he emporal paern cluser using he runs es and he z es for wo independen samples, which were discussed in Chaper 4. The runs es uses a 0.01 probabiliy of Type I error (α = 0.01). The α r = < means he null hypohesis can be rejeced for he observed x ime series resuls. The second saisical es is he z es for wo independen samples. The wo populaions are he evenness of he poins in he emporal paern cluser and he
80 Chaper 5 Basic and Explanaory Examples 67 evenness of he poins no in he emporal paern cluser. The z es uses a 0.01 probabiliy of Type I error (α = 0.01). Again, α m 49 = 5.2x10 < 0.01 shows ha he null hypohesis can be rejeced for he observed ime series emporal paern cluser. 1 p cluser x 0.5 x Figure 5.4 Sinusoidal Phase Space wih Temporal Paern Cluser (Observed) Figure 5.4 highlighs he emporal paern p = [ ] wih hreshold δ = 0.25 in he phase space. By comparing he emporal paern cluser seen in Figure 5.4 o he augmened phase space in Figure 5.3, i is obvious ha he bes emporal paern cluser is idenified. When he emporal paern cluser maches a subsequence of he ime series, he nex ime series observaion is a maximal value of he sinusoid. In he esing sage, he emporal paern cluser is used o predic evens. The esing sage ime series Y = { x = sin ( ω), = S,, R}, where ω = π 8, S = 101, and R = 200, is shown in Figure 5.5. x -1
81 Chaper 5 Basic and Explanaory Examples x Figure 5.5 Sinusoidal Time Series (Tesing) Since he esing ime series is idenical o he observed ime series excep for a ime shif, he phase space and augmened phase spaces are idenical o Figure 5.2 and Figure 5.3, respecively. Resul Value Cluser cardinaliy, c( ) 6 Cluser mean evenness, µ 1.0 Cluser sandard deviaion evenness, σ 0.0 Non-cluser cardinaliy, c( ) 92 Non-cluser mean evenness, µ Non-cluser sandard deviaion evenness, σ 0.68 z r -9.4 α r 5.4x10-21
82 Chaper 5 Basic and Explanaory Examples 69 Resul Value z m 15 α m 2.0x10-51 Table 5.3 Sinusoidal Resuls (Tesing) x x sandard error predicion Figure 5.6 Sinusoidal Time Series wih Predicions (Tesing) The esing sage demonsraes ha he TSD goal of predicing all maximal values in he sinusoid is me, as illusraed in Table 5.3 and Figure 5.6. The paerns discovered in he raining phase and applied in he esing phase are saisically significan according o he α r and α m saisics. The null hypohesis can be rejeced in boh cases. The daa mining naure of he TSD mehod is clearly demonsraed by his example. The emporal paern cluser characerizes he sequences ha lead o he
83 Chaper 5 Basic and Explanaory Examples 70 observaions wih he highes evenness. The nex example applies he TSD mehod o a noise ime series. 5.2 Noise Time Series A random variable x wih a uniform densiy funcion generaes he second example ime series, where 1 0 x 1 f ( x ) = (5.2) 0 oherwise is he densiy funcion [55, p. 75]. The ime series X = { x = x (), = 1,,100} is illusraed in Figure x Figure 5.7 Noise Time Series (Observed) For his ime series, he TSD goal is o find a emporal paern ha is characerisic and predicive of ime series observaions ha have high values. Because
84 Chaper 5 Basic and Explanaory Examples 71 he ime series is a random sequence, he expecaion is ha any emporal paern cluser discovered in he raining phase will no be predicive in he esing phase. x x -1 Figure 5.8 Noise Phase Space (Observed) 1 g x x - 1 Figure 5.9 Noise Augmened Phase Space (Observed)
85 Chaper 5 Basic and Explanaory Examples 72 The even characerizaion, objecive funcion, and opimizaion formulaion are he same as in he previous secion. Figure 5.8 presens he Euclidean phase space. Since he ime series varies randomly in a uniform manner over he range [0,1], i embeds o an evenly scaered paern. Figure 5.9 shows he augmened phase space, which furher illusraes he scaered naure of he embedded ime series. The search parameers are described previously in Table 5.1. The raining sage resuls are shown in Table 5.4. Resul Value Temporal paern, p [ ] Threshold, δ 0.21 Cluser cardinaliy, c( ) 5 Cluser mean evenness, µ 0.78 Cluser sandard deviaion evenness, σ 0.20 Non-cluser cardinaliy, c( ) 93 Non-cluser mean evenness, µ 0.48 Non-cluser sandard deviaion evenness, σ 0.28 z r α r 5.9x10-1 z m 3.1 α m 8.2x10-4 Table 5.4 Noise Resuls (Observed) Finding a saisically significan emporal paern in random noise is counerinuiive. However, he TSD mehod found a emporal paern cluser conaining
86 Chaper 5 Basic and Explanaory Examples 73 five phase space poins wih a mean evenness greaer han he mean evenness of phase space poins no conained in he emporal paern cluser. According o α m = 8.2x10-4, he null hypohesis may be rejeced, i.e., he wo ses are saisically differen. However, according o he runs saisical es α r = 5.9x10-1, he wo ses canno be said o be saisically differen. This means ha here is some evidence ha he emporal paern is saisically significan. Figure 5.10 highlighs he emporal paern p = [ ] wih hreshold δ = 0.21 illusraed in he phase space. 1 p cluser x 0.8 x Figure 5.10 Noise Phase Space wih Temporal Paern Cluser (Observed) The esing sage ime series X = { x = x (), = 101,, 200}, which is a coninuaion of he raining sage ime series, is illusraed in Figure The esing ime series is ransformed ino he phase space as shown in Figure 5.12, and he augmened phase space is seen in Figure x -1
87 Chaper 5 Basic and Explanaory Examples x Figure 5.11 Noise Time Series (Tesing) x x -1 Figure 5.12 Noise Phase Space (Tesing)
88 Chaper 5 Basic and Explanaory Examples 75 1 g x x - 1 Figure 5.13 Noise Augmened Phase Space (Tesing) Table 5.5 shows he saisical characerizaion of he esing sage resuls. Resul Value Cluser cardinaliy, c( ) 8 Cluser mean evenness, µ 0.36 Cluser sandard deviaion evenness, σ 0.28 Non-cluser cardinaliy, c( ) 90 Non-cluser mean evenness, µ 0.49 Non-cluser sandard deviaion evenness, σ 0.30 z r α r 6.3x10-1 z m -1.3 α m 9.1x10-1 Table 5.5 Noise Resuls (Tesing)
89 Chaper 5 Basic and Explanaory Examples 76 The emporal paern cluser discovered in he raining sage and applied in he esing sage is no saisically significan as seen by he α r and α m saisics. The null hypohesis canno be rejeced. This is illusraed in Figure 5.14, which shows he predicions made by he esing sage. 0.8 x x sandard error predicion Figure 5.14 Noise Time Series wih Predicions (Tesing) In his example, he TSD mehod canno find emporal paern clusers ha are boh characerisic and predicive of evens in a noise ime series. Figure 5.14 along wih he resuls from Table 5.5, show ha he TSD goal of finding a emporal paern cluser ha is predicive of ime series observaions whose mean value is greaer han he mean value of he no prediced observaions has no been me. Alhough according o one saisical measure, he raining sage resuls were significan in heir abiliy o characerize evens, hese resuls did no carry over o
90 Chaper 5 Basic and Explanaory Examples 77 predicing evens in he esing sage. However, he nex secion shows ha a sinusoidal conaminaed wih noise is sill predicable. 5.3 Sinusoidal wih Noise Time Series A sinusoid combined wih a random variable x (5.2) is illusraed by Figure 5.15, where, X = { x = sin( π 8) x ( ), = 1,,100} x Figure Sinusoidal wih Noise Time Series (Observed) To furher characerize his ime series, he signal-o-noise-raio (SNR) is measured and deermined analyically. The heoreical SNR is he raio of he signal variance o he noise variance. This would be he measured SNR for an ergodic ime series as he lengh of he ime series approached infiniy. The variance of he random variable [55, p. 107] x is xdx= (5.3)
91 Chaper 5 Basic and Explanaory Examples 78 The variance of he sinusoid is π 1 π 1 sin ( π 8) d 0 = sin π = , (5.4) making he heoreical SNR 7.5 ( 8.8dB). The measured variance of he noise is and of he sinusoid is 0.51, making he measured SNR 7.4 (8.7dB) for he finie lengh observed ime series. For his ime series, he TSD goal is o predic he maximal values of he ime series. The objecive funcion, even characerizaion funcion, and opimizaion formulaion remain he same as in he wo previous secions x x -1 Figure Sinusoidal wih Noise Phase Space (Observed) Figure 5.16 presens he Euclidean phase space. Since he ime series is composed of a sinusoid and a uniform densiy random variable, he embedding is expeced o be a
92 Chaper 5 Basic and Explanaory Examples 79 scaered ellipse. Figure 5.16 shows exacly his ype of paern. Figure 5.17 shows he augmened phase space, which furher illusraes he scaered ellipical naure of he embedded ime series. 2 g x x - 1 Figure Sinusoidal wih Noise Augmened Phase Space (Observed) The geneic algorihm search parameers are described previously in Table 5.1. The raining sage resuls are shown in Table 5.6. Resul Value Temporal paern [ ] Threshold 0.46 Cluser cardinaliy 9 Cluser mean evenness 1.5 Cluser sandard deviaion evenness 0.36 No cluser cardinaliy 89 No cluser mean evenness 0.41 No cluser sandard deviaion evenness 0.72
93 Chaper 5 Basic and Explanaory Examples 80 Resul Value z r -3.3 α r 8.8x10-4 z m 7.7 α m 5.1x10-15 Table Sinusoidal wih Noise Resuls (Observed) According o boh saisical ess, he raining resuls are saisically significan. Figure 5.18 highlighs he emporal paern p = [ ] wih hreshold δ = 0.46 in he phase space. Comparing he emporal paern cluser seen in Figure 5.18 o he augmened phase space in Figure 5.17 demonsraes ha he TSD mehod found a good emporal paern cluser x p cluser x Figure Sinusoidal wih Noise Phase Space wih Temporal Paern Cluser (Observed) x -1
94 Chaper 5 Basic and Explanaory Examples 81 Figure 5.19 illusraes he esing sage ime series, which is a coninuaion of he observed ime series. The measured variance of he noise is and of he sinusoid is 0.50, yielding a measured SNR is 6.0 (7.8dB). Figure 5.20 and Figure 5.21 illusrae he phase space and he augmened phase space, respecively x Figure Sinusoidal wih Noise Time Series (Tesing)
95 Chaper 5 Basic and Explanaory Examples x x -1 Figure Sinusoidal wih Noise Phase Space (Tesing) 2 g x x - 1 Figure Sinusoidal wih Noise Augmened Phase Space (Tesing)
96 Chaper 5 Basic and Explanaory Examples 83 Resul Value Cluser cardinaliy 8 Cluser mean evenness 1.4 Cluser sandard deviaion evenness 0.47 No cluser cardinaliy 90 No cluser mean evenness 0.41 No cluser sandard deviaion evenness 0.76 z r α s 6.3x10-1 z m 5.3 α m 6.1x10-8 Table Sinusoidal wih Noise Resuls (Tesing) The paerns discovered in he raining phase and applied in he esing phase are saisically significan as seen by he α m saisic, bu no he α r saisic. The cluser mean evenness also is greaer han he non-cluser mean evenness. Therefore, even hough one of he saisical ess is no significan, he TSD mehod was able o find a significan emporal paern cluser (alhough because of he noise no every maximal poin is accuraely prediced). This is illusraed in Figure 5.22, which shows he predicions and error range when he emporal paern cluser is applied o he esing ime series.
97 Chaper 5 Basic and Explanaory Examples x x sandard error predicion Figure Sinusoidal wih Noise Time Series wih Predicions (Tesing) This example furher reveals he daa mining naure of he TSD mehod. The emporal paern cluser does no characerize he whole ime series or every highes value; raher i characerizes a sequence ha leads o an observaion wih high evenness. The nex secion provides a furher example of he TSD mehods capabiliies. 5.4 Synheic Seismic Time Series This example analyzes in deail he previously presened synheic seismic ime series, which is generaed from a randomly occurring emporal paern, synheic earhquake, and a conaminaing noise signal. The noise is defined by (5.2). The observed ime series is illusraed in Figure The measured variance of he conaminaing noise is 3.3x10-3 and of he emporal paern wih synheic earhquake is 1.3. Wihou he synheic earhquake, he variance of he emporal paern is The measured SNR is 396 (26.0dB) for he emporal paern and synheic earhquake and 30.2 (14.8dB) for he emporal paern wihou he synheic earhquake.
98 Chaper 5 Basic and Explanaory Examples x Figure 5.23 Synheic Seismic Time Series (Observed) The TSD goal for his ime series is o characerize he synheic earhquakes one ime-sep ahead. To capure his goal, he even characerizaion funcion is g() = x + 1, and he objecive funcion is f ( P) = µ µ 2 σ c σ + 2 ( ) c( ). (5.5) This objecive funcion is useful for idenifying emporal paern clusers ha are saisically significan and have a high average evenness. The opimizaion formulaion is max f ( P ) subjec o min b( P ) such ha minimizing b(δ) does no change he value of f(p). Composed of a emporal paern, synheic earhquake, and noise, he ime series embeds o a se of small clusers in he phase space as illusraed in Figure Figure
99 Chaper 5 Basic and Explanaory Examples shows he augmened phase space, which clearly indicaes he differen evenness values associaed wih he small clusers of phase space poins x x -1 Figure 5.24 Synheic Seismic Phase Space (Observed) 6 4 g x x - 1 Figure 5.25 Synheic Seismic Augmened Phase Space (Observed)
100 Chaper 5 Basic and Explanaory Examples 87 The search parameers are presened in Table 5.1. The raining sage resuls are shown in Table 5.8. Resul Value Temporal paern, p [ ] Threshold, δ 0.37 Cluser cardinaliy, c( ) 7 Cluser mean evenness, µ 4.8 Cluser sandard deviaion evenness, σ Non-cluser cardinaliy, c( ) 91 Non-cluser mean evenness, µ 0.50 Non-cluser sandard deviaion evenness, σ 0.33 z r -9.5 α r 3.0x10-21 z m 104 α m 0 Table 5.8 Synheic Seismic Resuls (Observed) The discovered emporal paern cluser is saisically significan by boh saisical ess. Figure 5.26 illusraes he emporal paern p = [ ] wih hreshold δ = 0.37 in he phase space. A comparison of Figure 5.25 and Figure 5.26 demonsraes ha he raining sage found he bes emporal paern cluser, i.e., when a sequence of ime series observaions mach he emporal paern cluser, he nex observaion is a synheic earhquake.
101 Chaper 5 Basic and Explanaory Examples p cluser x x Figure 5.26 Synheic Seismic Phase Space wih Temporal Paern Cluser (Observed) The synheic seismic esing ime series, a coninuaion of he observed ime series, is illusraed in Figure The measured variance of he noise is 3.5x10-3 and of he emporal paern wih synheic earhquake is 1.9. The measured variance of he emporal paern wihou synheic earhquake is The measured SNR is 536 (27dB) for he emporal paern wih synheic earhquake, and 29.0 (14.6dB) for he emporal paern wihou synheic earhquake x -1
102 Chaper 5 Basic and Explanaory Examples x Figure 5.27 Synheic Seismic Time Series (Tesing) The esing ime series is ransformed ino he phase space as shown in Figure The augmened phase space for he esing ime series is seen in Figure x x -1 Figure 5.28 Synheic Seismic Phase Space (Tesing)
103 Chaper 5 Basic and Explanaory Examples g x x - 1 Figure 5.29 Synheic Seismic Augmened Phase Space (Tesing) The esing sage resuls presened in Table 5.9 are saisically significan as seen by he α r and α m saisics. Resul Value Cluser cardinaliy, c( ) 11 Cluser mean evenness, µ 4.8 Cluser sandard deviaion evenness, σ Non-cluser cardinaliy, c( ) 87 Non-cluser mean evenness, µ 0.53 Non-cluser sandard deviaion evenness, σ 0.33 z r -9.6 α r 8.5x10-22 z m 107 α m 0 Table 5.9 Synheic Seismic Resuls (Tesing)
104 Chaper 5 Basic and Explanaory Examples p cluser x x x -1 Figure 5.30 Synheic Seismic Phase Space wih Temporal Paern Cluser (Tesing) 4 x 3 2 x sandard error predicion Figure 5.31 Synheic Seismic Time Series wih Predicions (Tesing)
105 Chaper 5 Basic and Explanaory Examples 92 Figure 5.30 highlighs he emporal paern cluser in he esing phase space. Figure 5.31 clearly illusraes he predicion accuracy of he esing sage by highlighing he predicions and error range on he esing ime series. This example furher reveals he srengh of he TSD mehod is abiliy o predic evens. In his chaper, he TSD mehod has been applied successfully o he sinusoidal, random noise, sinusoidal wih noise, and synheic seismic example ime series. Each example ime series highlighed he capabiliies of he TSD mehod. The sinusoidal ime series highlighed he even-capuring capabiliy of he TSD mehod. Wih he sinusoidal ime series, each peak poin in he ime series was characerized and prediced as an even. The noise ime series showed ha he mehod correcly deermined ha here are no emporal paerns in random noise. The sinusoidal wih noise ime series showed ha he mehod, alhough affeced by noise, can sill predic maximal values. The synheic seismic ime series demonsraes he full power of he TSD mehod. The ime series is he composie of a emporal paern, a synheic earhquake ha occur non-periodically, and conaminaing noise. Wih his ime series, he mehod accuraely characerized and prediced all of he evens. Chaper 6 presens several exensions o he TSD mehod, including variaions ha search for emporal paerns in muli-dimensional ime series and find muliple emporal paern clusers. In Chapers 7 and 8, he TSD mehod is applied o real world problems.
106 93 Chaper 6 Exended Time Series Daa ining ehods This chaper presens hree exensions o he Time Series Daa ining (TSD) mehod. The firs variaion exends he TSD mehod o muli-dimensional ime series by adaping he ime-delay embedding process. For simpliciy, i is called he TSD-/x (Time Series Daa ining muli-dimensional ime series) mehod. The second TSD exension searches for muliple emporal paern clusers. I is called he Time Series Daa ining muliple emporal paern (TSD-x/) mehod, where he x may be eiher S or depending on he dimensionaliy of he ime series. Addiionally, his chaper discusses alernaive clusering mehods and emporal paern saionariy. In Chaper 4, he TSD mehod employed a emporal paern cluser ha was formed wih a hypersphere in a anhaan phase space. By changing he disance meric associaed wih he phase space, alernaive cluser shapes are achieved. Nonsaionary emporal paerns are addressed wih wo echniques. The firs is by applying he inegraive echniques from he ARIA mehod o ransform nonsaionary emporal paern clusers ino saionary ones. The second is hrough an exension o he TSD mehod, called he Time Series Daa ining evolving (TSDe) mehod. The chaper concludes wih a discussion of diagnosics for improving TSD resuls. 6.1 uliple Time Series (TSD-/x) This secion discusses he TSD-/x mehod [2], which allows daa from muliple sensors o be fused. The TSD mehod is adaped by modifying he ime-delay embedding process o incorporae observaions from each dimension of a muli-
107 Chaper 6 Exended Time Series Daa ining ehods 94 dimensional ime series. Inuiively, addiional sensors on a sysem will provide addiional informaion assuming hey are no sensing he same sae variable. Therefore, he ime series generaed by hese sensors will provide a richer se of observaions from which o form he reconsruced phase space. This has been shown experimenally by Povinelli and Feng [2]. The muli-dimensional ime series { x, 1,, N} X = = (6.1) is a sequence of N vecor observaions, where x is an n-dimensional vecor. This collecion of observed ime series may be represened as a marix x1 x1 x 1 x2 x2 x2 X =. (6.2) xn 1 xn 2 xn N The corresponding muli-dimensional esing ime series Y akes he form Y = { x, = R,, S} N < R< S, or (6.3) x1 x 1 x1 x2 x2 x2 Y =. (6.4) xn xn R R+1 xn S Since he vecor ime series is n-dimensional, he dimension of he phase space is n Q. As wih he TSD mehod, a meric d is defined on he phase space. The observed ime series are embedded ino he phase space yielding phase space poins or ( nq ) 1 phase space vecors ( ) x = x,,,, 1 x x Λ, (6.5) T T T ( Q ) τ τ T
108 Chaper 6 Exended Time Series Daa ining ehods 95 where Λ= { : = ( Q 1) τ + 1,, N}. Likewise, he collecion of esing ime series is embedded yielding y. The dimensionaliy of he phase space and modified embedding process are adapaions of he TSD mehod required o yield he TSD-/x mehod. Training Sage Evaluae raining sage resuls Define TSD goal Observed mulidimensional ime series Selec Q Normalize Define g, f, and opimizaion formulaion Embed ime series ino phase space Search phase space for opimal emporal paern cluser Tesing Sage Tesing mulidimensional ime series Normalize Embed ime series ino phase space Predic evens Figure 6.1 Block Diagram of TSD-/x ehod As illusraed in Figure 6.1, a normalizaion sep may be added o force each dimension of he muli-dimensional ime series o have he same range. Normalizaion does no change he opology of he phase space, bu mapping each ime series ono he same range allows he use of similar search sep sizes for each phase space dimension. This normalizaion assiss he opimizaion rouines. The normalizaion consan used in he raining sage is reained for use in predicing evens in he esing sage. The nex secion presen a variaion of he TSD mehod ha searches for muliple emporal paern clusers.
109 Chaper 6 Exended Time Series Daa ining ehods uliple Temporal Paerns (TSD-x/) The TSD mehod finds a single hyperspherical emporal paern cluser. The emporal paerns o be characerized may no conform o a hyperspherical shape or may consis of muliple disjoin regions, as shown in Figure High Even Value Low Even Value 3 x x -1 Figure 6.2 uliple Temporal Paern Cluser Phase Space The riangles have high evenness values and he dos have low evenness values. However, here is no a single hypersphere ha can conain all he high evenness phase space poins and exclude all of he low evenness ones. Two emporal paern clusers are needed. A new mehod for finding a collecion of emporal paern clusers also is needed. In order o find a collecion of emporal paerns, he objecive funcion is modified o include he phase space poins wihin each of he emporal paern clusers Pi, i = 1,2,. The example objecive funcion given by (3.15) is exended o yield
110 Chaper 6 Exended Time Series Daa ining ehods 97 f ( ) = µ µ 2 σ c σ + 2 ( ) c( ), (6.6) where he index se is defined more generally, i.e. where P, i = 1, 2, i = { : x P, Λ}, (6.7) i. Similarly,, he complemen of, is he se of all ime indices when x is no in any P i. This objecive funcion is useful for idenifying emporal paern clusers ha are saisically significan and have a high average evenness. predicions, Anoher example objecive funcion, he raio of correc predicions o all p + n f ( ) = + + f + f p n p n (6.8) was firs defined in (3.18) and requires no modificaion o work in he TSD-x/ mehod. The opimizaion formulaion max P i f ( ) (6.9) may be used, bu i may lead o he following se of emporal paern clusers illusraed in Figure 6.3. A simpler and herefore more preferable soluion is illusraed in Figure 6.4.
111 Chaper 6 Exended Time Series Daa ining ehods High Even Value Low Even Value x x -1 Figure 6.3 uliple Cluser Soluion Wih Too any Temporal Paern Clusers High Even Value Low Even Value x x -1 Figure 6.4 uliple Cluser Soluion
112 Chaper 6 Exended Time Series Daa ining ehods 99 To achieve he preferred soluion he opimizaion formulaion is max f ( ) subjec o min c ( ) such ha minimizing c ( ) does no change he value of f ( ) bias also may be placed on he δ s yielding he opimizaion formulaion max f ( ) subjec o min c ( ) such ha minimizing c ( ) does no change he value of f ( ) and minδ i P i such ha minimizing δ P does no change he value of c ( ). These i i saged opimizaions are resolved hrough he geneic algorihm ournamen iebreaker sysem inroduced in Chaper 4. Given a TSD goal, a arge observed ime series o be characerized, and a esing ime series o be prediced, he seps in he TSD-x/ mehod are essenially he same as he seps in he TSD mehod. The modificaions are ha a range of phase space dimensions is chosen, and he search processes is ieraive. The seps of he TSD-x/ mehod are given below. I. Training Sage (Bach Process) 1. Frame he TSD goal in erms of he even characerizaion funcion, objecive funcion, and opimizaion formulaion. a. Define he even characerizaion funcion, g. b. Define he objecive funcion, f. c. Define he opimizaion formulaion, including he independen variables over which he value of he objecive funcion will be opimized and he consrains on he objecive funcion. d. Define he crieria o accep a emporal paern cluser. 2. Deermine he range of Q s, i.e., he dimension of he phase space and he lengh of he emporal paern.. A
113 Chaper 6 Exended Time Series Daa ining ehods Embed he observed ime series ino he phase space using he imedelayed embedding process. 4. Associae wih each ime index in he phase space an evenness represened by he even characerizaion funcion. Form he augmened phase space. 5. Search for he opimal emporal paern cluser in he augmened phase space using he following algorihm. if he emporal paern cluser mees he crieria se in 1.d hen, repea sep 5 afer removing he clusered phase space poins from he phase space. elseif he range of Q is no exceeded, incremen Q and goo sep 2 else goo sep 6 6. Evaluae raining sage resuls. Repea raining sage as necessary. II. Tesing Sage (Real Time or Bach Process) 1. Embed he esing ime series ino he phase spaces. 2. Apply he emporal paern clusers o predic evens. 3. Evaluae esing sage resuls. This secion presened an exension of he TSD mehod ha allows muliple emporal paern clusers o be discovered. The nex secion presens a se of echniques ha allow more complicaed emporal paern clusers o be idenified.
114 Chaper 6 Exended Time Series Daa ining ehods Oher Useful TSD Techniques This secion presens hree echniques ha are useful in he process of idenifying opimal emporal paern clusers. The firs is a mehod for changing he emporal paern cluser shape by employing differen phase space merics. The nex wo echniques are useful for ime series wih nonsaionary emporal paern clusers Clusering Technique The phase space meric used in he synheic seismic ime series example from Chaper 4 was he anhaan or l 1 disance. Obviously, his is no he only applicable meric. Wih alernaive merics, he shape of he emporal paern cluser can be changed. The l p norms provide a simple mechanism for changing he emporal paern cluser shape wihou increasing he search space dimensionaliy. The l p norm is defined as 1/ p n p x = x p i [56, p. 29]. (6.10) i= 1 Figure 6.5 illusraes five differen norms: l 0.5, l 1, l 2, l 3, and l. The emporal paern cluser is locaed in a wo-dimensional space a (0,0) wih δ = 1.
115 Chaper 6 Exended Time Series Daa ining ehods p = 3 p = p = 1 p = 2 p = Figure 6.5 Cluser Shapes of Uni Radius for Various l p Norms When he l 2, Euclidean, norm is used he cluser is a circle. Using he l 1 and l norms, he emporal paern cluser is a square. These alernaive cluser shapes are incorporaed ino he mehod by simply defining he phase space using he desired l p norm. The nex secion presens a echnique for idenifying nonsaionary emporal paern clusers Filering Technique In Chaper 2, ARIA ime series analysis was discussed. ARIA modeling requires ha he ime series be saionary. TSD s requiremen is less sringen. Only he emporal paern cluser mus be saionary, i.e., he phase space poins characerisic of evens mus remain wihin he emporal paern cluser. In Chaper 2, a se of filers were presened ha could ransform linear and exponenial rend ime series, which are nonsaionary, ino saionary ones. These same filers also are useful for ransforming
116 Chaper 6 Exended Time Series Daa ining ehods 103 ime series wih nonsaionary emporal paern clusers ino ime series wih saionary emporal paern clusers. The following example shows how a nonsaionary ime series can be made saionary and he appearance of a nonsaionary ime series in he phase space and augmened phase space. The observed ime series X = { x =.02, = 1,,100} is illusraed in Figure 6.6. x Figure 6.6 Linearly Increasing Time Series (Observed) The TSD goal is o characerize and predic all observaions. Thusly, he even characerizaion funcion is g() = x + 1. The corresponding objecive funcion (described in Chaper 3) is µ if c( ) c( Λ) β f ( P) = c( ), (6.11) ( µ - g0) + g0 oherwise β c( Λ) where β = The opimizaion formulaion is max f ( P ) subjec o minδ.
117 Chaper 6 Exended Time Series Daa ining ehods 104 Figure 6.7 presens he Euclidean phase space, and Figure 6.8 illusraes he augmened phase space. Since he ime series has a linearly increasing value, i embeds as a line in boh spaces. The linear feaure of he phase space poins indicaes nonsaionariy x x -1 Figure 6.7 Linearly Increasing Phase Space (Observed)
118 Chaper 6 Exended Time Series Daa ining ehods g x x - 1 Figure 6.8 Linearly Increasing Augmened Phase Space (Observed) The geneic algorihm search parameers are presened in Table 6.1. Parameer Value Random search muliplier 1 Populaion size 20 Elie coun 1 Gene lengh 8 Tournamen size 2 uaion rae 0.2% Convergence crieria 1 Table 6.1 Geneic Algorihm Parameers for Linearly Increasing Time Series The raining sage resuls are shown in Figure 6.9, which demonsraes ha he emporal paern cluser does no capure he linearly increasing naure of he ime series. This will become more eviden in he esing sage of he TSD mehod.
119 Chaper 6 Exended Time Series Daa ining ehods 106 x p cluser x x -1 Figure 6.9 Linearly Increasing Phase Space wih Temporal Paern Cluser (Observed) x Figure 6.10 Linearly Increasing Time Series (Tesing) The esing ime series is illusraed in Figure 6.10.
120 Chaper 6 Exended Time Series Daa ining ehods x p cluser x x -1 Figure 6.11 Linearly Increasing Phase Space wih Temporal Paern Cluser (Tesing) x sandard error predicion x Figure 6.12 Linearly Increasing Time Series wih Predicions (Tesing)
121 Chaper 6 Exended Time Series Daa ining ehods 108 Figure 6.11 highlighs he emporal paern cluser in he phase space. Obviously, as illusraed by Figure 6.11, he desired TSD goal is no me, which is reinforced by Figure The cause of he predicion failure is he lack of emporal paern saionariy, no necessarily because of ime series nonsaionariy. The resoluion o he problem of emporal paern nonsaionariy is achieved by applying he filering echniques discussed in Chaper 2. Applying he firs difference filer o he observed ime series X yields Z = { z =.02, = 2,,100}, which is a consan-value ime series. The problem is now rivial. Alhough some ime series may be made saionary hrough filering echniques, hese mehods will no conver all nonsaionary ime series ino saionary ones. The nex secion presens a mehod for analyzing ime series wih quasi-saionary emporal paern clusers Non-filering Techniques Alhough saionariy usually describes he saisical characerisics of a sochasic ime series [55, pp ], his disseraion inroduces a more general definiion. When applied o a deerminisic ime series, saionariy indicaes ha he periodiciy, if he ime series is periodic, and range of he ime series are consan. When applied o chaoic ime series, saionariy indicaes ha he aracors remain consan hrough ime. Chaoic ime series whose underlying aracors evolve hrough ime are classified as nonsaionary chaoic ime series. Beyond filering o exrac nonsaionary emporal paerns, here are wo TSD mehods presened in his secion ha address quasi-saionary emporal paerns, i.e., emporal paerns ha are characerisic and predicive of evens for a limied ime
122 . Chaper 6 Exended Time Series Daa ining ehods 109 window. They are called he Time Series Daa ining evolving emporal paern (TSDe) mehods. These mehods are useful for analyzing ime series generaed by adapive sysems such as financial markes wih feedback characerisics ha counerac sysemic predicions. The firs mehod (TSDe 1 ) uses a fixed raining window and a fixed predicion window. The second mehod (TSDe 2 ) uses a fixed raining window and a single period predicion window. The TSDe mehods differ from he oher TSD mehods in how he observed and esing ime series are formed. The TSDe 1 mehod divides he ime series ino equally sized ses X = { x, = ( j 1) N + 1,, jn}, where N is he number of observaions in a subse of X, j and j is he index of he subse. The ime series X j is used in he raining sage. The ime series X j + 1 is used in he esing sage. The lengh of he ime window N is deermined experimenally such ha he emporal paerns clusers remain quasi-saionary beween any wo adjacen ime windows. The TSDe 2 mehod creaes he overlapping observed ime series as follows: j {,,, } X = x = j j+ N The esing ime series is formed from a single observaion as follows: j {, 1} (6.12) Y = x = j+ N +. (6.13) Wih hese changes in he formaion of he observed and esing ime series, any of he TSD mehods may be applied. The las secion in his chaper presens a se of cases wih which o diagnose and adjus he TSD mehod.
123 Chaper 6 Exended Time Series Daa ining ehods Evaluaing Resuls and Adjusing Parameers In he raining sage of he TSD mehods, here is an evaluae raining sage resuls sep, which is an ad hoc evaluaion of he inermediae and final resuls of he TSD mehod. The evaluaion may include visualizaion of he phase space and augmened phase space and review of he saisical resuls. Based on he ad hoc evaluaion, he parameers of he mehod may be adjused, alernaive TSD mehods seleced, and/or appropriae TSD echniques applied. This secion discusses ad hoc evaluaion echniques, wha issues hey migh discover, and possible soluions. By parsimony, he simples characerizaion of evens possible is desired, i.e., as small a dimensional phase space as possible and as few emporal paern clusers as required. The firs evaluaion echnique is o visualize, if possible, he phase space and augmened phase space, which allows human insigh o idenify clusering problems. The cases ha may be idenified and heir poenial soluions are lised below. Case 1: One cluser is idenifiable, bu no discovered by he TSD mehod. Poenial Soluion A: Selec alernaive phase space meric. Poenial Soluion B: Increase geneic algorihm populaion size. Poenial Soluion C: Increase geneic algorihm chromosome lengh. Poenial Soluion D: Increase geneic algorihm muaion rae. Poenial Soluion E: Use alernaive objecive funcion. Case 2: uliple clusers are visualized, bu no discovered by he TSD mehod. Poenial Soluion A: Use TSD-x/ mehod. Case 3: No clusers are visualized. Poenial Soluion A: Try higher dimensional phase space.
124 , Chaper 6 Exended Time Series Daa ining ehods 111 Poenial Soluion B: Use TSD-x/ mehod. Case 4: Phase space poins cluser ino a line. Poenial Soluion A: Apply filering echniques. The second evaluaion echnique is o review he saisical characerisics of he resuling emporal paern cluser(s). These saisics include he c( ), c( ), σ, µ σ, soluions are lised below. µ, µ X, α r, and α m. The cases ha may be idenified and heir poenial Case 5: The cluser cardinaliy c( ) is oo large or small while using he objecive funcion described in (4.3). Poenial Soluion A: Use he objecive funcion described in (3.16). Case 6: The cluser cardinaliy c( ) is oo large or small while using he objecive funcion described in (3.16). Poenial Soluion A: Adjus he β as appropriae. Case 7: Eiher or boh he α r and α m do no allow he null hypohesis o be rejeced. Poenial Soluion A: The null hypohesis holds. No emporal paerns exis in he ime series. Poenial Soluion B: Use he TSD-x/ mehod o find muliple emporal paerns. Poenial Soluion C: Use a larger raining ime series. Poenial Soluion D: Use he TSDe 1 or TSDe 2 mehods o see if he emporal paerns may be quasi-saionary. Poenial Soluion E: Adjus he cluser shape by using an alernaive p-norm.
125 Chaper 6 Exended Time Series Daa ining ehods 112 This secion presened seven cases where he resuling emporal paern clusers did no achieve he desired TSD goal and poenial soluions for each of hese cases. This is no an exhausive lis of reamens o improve he TSD resuls, bu a represenaive sample of he mos common adjusmens needed. This chaper has presened exensions o he TSD mehod for finding muliple emporal paerns and analyzing muli-dimensional ime series. I has also presened a se of echniques for dealing wih nonsaionary emporal paern clusers. I concluded wih a se of diagnosic cases and heir poenial resoluions. The nex wo chapers will apply hese exended TSD mehods o real-world applicaions.
126 113 Chaper 7 Engineering Applicaions This chaper inroduces a se of real-world ime series gahered from sensors on a welding saion. The problem is o predic when a drople of meal will release from a welder. The welding process joins wo pieces of meal ino one by making a join beween hem. A curren arc is creaed beween he welder and he meal o be joined. Wire is pushed ou of he welder. The ip of he wire mels, forming a meal drople ha elongaes (sicks ou) unil i releases. The goal is o predic he momen when a drople will release, which will allow he qualiy of he join o be improved. Because of he irregular, chaoic, and even naure of he drople release, predicion is impossible using radiional ime series mehods. wire curren arc welder drople meal o be joined Figure Welder
127 Chaper 7 Engineering Applicaions 114 Samples of he four welding ime series are presened in Figure 7.2 and Figure 7.3. Obviously, hey are noisy and nonsaionary. Sensors on he welding saion generae hree of he ime series. The firs is he sickou of he drople measured in pixels by an elecronic camera. I is sampled a 1kHz and comprised of approximaely 5,000 observaions. The second ime series is he volage measured in decivols from he welder o he meal o be joined. The hird is he curren measured in amperes. The volage and curren ime series are sampled a 5kHz, synchronized o each oher, and each comprised of approximaely 35,000 observaions. The fourh ime series indicaes he release of he meal droples. This ime series was creaed afer he sensor daa was colleced using a process a INEEL (Idaho Naional Engineering & Environmenal Laboraory), which also provided he daa. I is synchronized wih he sickou ime series and comprised of approximaely 5,000 observaions. The release ime series indicaes he evens wih a one indicaing an even and a zero indicaing a non-even.
128 Chaper 7 Engineering Applicaions 115 x (pixels) sickou release Figure 7.2 Sickou and Release Time Series Volage Curren Volage (decivol) Curren (amps) Figure 7.3 Volage and Curren Time Series 60 This chaper is organized ino six secions. This firs secion discusses he four ime series ha comprise he daa se and provides an overview of he chaper. The
129 Chaper 7 Engineering Applicaions 116 second secion characerizes and predics he release evens using he sickou ime series. The hird secion characerizes and predics evens in an adjused release ime series. The fourh secion presens and resolves a ime series synchronizaion problem. As noed above, wo of he sensors sampled a approximaely 5kHz, while he oher sensor sampled a approximaely 1kHz. The problem is complicaed furher because he raio of he sampling raes is no exacly 5:1. In he fifh secion, he TSD-/ mehod is applied o daa from all hree sensors. 7.1 Release Predicion Using Single Sickou Time Series This secion presens he resuls of applying he TSD-S/ mehod o characerizing and predicing drople releases using he sickou ime series. This applicaion of he TSD-S/ mehod does no require he synchronizaion of he sickou and release ime series wih he curren and volage ime series o be resolved. x (pixels) Figure 7.4 Sickou Time Series (Observed)
130 Chaper 7 Engineering Applicaions 117 The observed sickou ime series X consiss of he 2,492 equally sampled observaions, a = 175 hrough 2,666. Figure 7.4 illusraes all observaions, while Figure 7.2 provides a deailed view of a sample of he ime series. Besides he obvious nonperiodic oscillaions, he sickou ime series exhibis a large-scale rend. As discussed in Chaper 6, removing rends helps he mehod find he necessary emporal paerns. A firs difference filer could be applied, bu ha would inroduce a new synchronizaion problem beween he release and sickou ime series. Insead, a simple recalibraion rule is used o removing he rend. When here is a 10- pixel drop beween wo consecuive observaions, he second observaion is recalibraed o zero. Figure 7.5 and Figure 7.6 illusrae ha he rend in sickou ime series has been removed x (pixels) Figure 7.5 Recalibraed Sickou Time Series (Observed)
131 Chaper 7 Engineering Applicaions sickou release 15 x (pixels) Figure 7.6 Recalibraed Sickou and Release Time Series (Observed) Insead of being conained wihin he sickou ime series, he evens are capured in he release ime series Y, as illusraed in Figure 7.6. The release ime series is defined as a binary sequence, where he ones indicae a release (even) and he zeros a nonrelease (non-even). The release usually occurs afer a sickou value reaches a local peak and drops 10 pixels or more. However, a sudy of Figure 7.6 shows here are several imes when his does no occur. In his secion, he release ime series will be used unalered. In he nex secion, he release series will be recalculaed o more correcly mach he sickou lengh minimums. Now ha he observed ime series have been presened, he TSD goal is resaed in erms of he objecive and even characerizaion funcions. The TSD-S/ mehod requires wo objecive funcions. The firs objecive funcion describes he objecive for he final resul. Inroduced in Chaper 3,
132 Chaper 7 Engineering Applicaions 119 p + n f1 ( ) = + + f + f p n p n (7.1) has an opimal value when every even is correcly prediced. The values p, n, fp,and f n are described in Table 7.1. Acually an even Acually a non-even Caegorized as an even True posiive, p False posiive, f p Caegorized as a non-even False negaive, f n True negaive, n The second objecive funcion, Table 7.1 Even Caegorizaion f 2 ( P) = p p + f p, (7.2) called he posiive accuracy, defines how well each P, i = 1, 2, i is a avoiding false posiives. I is used as he objecive for he inermediae seps in he TSD-S/ raining sage. The opimizaion formulaion for he whole raining sage is max f ( ) subjec o min c ( ) and min b( δi) Pi. The opimizaion formulaion for he inermediae seps is max f ( P ) subjec o min b( δ ). Figure 7.7 presens an illusraive phase space, where he anhaan or l 1 disance meric is employed. The phase space poins are similar o he linearly increasing phase space poins, bu he increase repeas insead of coninuing o grow.
133 Chaper 7 Engineering Applicaions x x -1 Figure 7.7 Recalibraed Sickou Phase Space (Observed) Figure 7.8 clearly shows he complexiy of he augmened phase space. The evens are no separable from he non-evens using a wo-dimensional phase space. Hence, he TSD-S/ mehod, which finds muliple emporal clusers of varying dimensionaliy, is applied.
134 Chaper 7 Engineering Applicaions 121 Figure 7.8 Sickou and Release Augmened Phase Space (Observed) The augmened phase space is searched using a ournamen geneic algorihm. The wo ses of search parameers are presened in Table 7.2. Parameer Se 1 Se 2 Random search muliplier Populaion size Elie coun 1 1 Gene lengh 8 8 Tournamen size 2 2 uaion rae 0.05% 0% Convergence crieria Table 7.2 Geneic Algorihm Parameers for Recalibraed Sickou and Release Time Series The resuls of he search are shown in Table 7.3.
135 Chaper 7 Engineering Applicaions 122 Resul Value Temporal paern cluser coun, c ( ) 14 Temporal paern cluser dimensions 1~14 Clusers cardinaliy, c( ) 142 Clusers mean evenness, µ 0.71 Clusers sandard deviaion evenness, σ 0.45 Non-clusers cardinaliy, c( ) 2,349 Non-clusers mean evenness, µ Non-clusers sandard deviaion evenness, σ 0.15 z r -49 α r 0 z m 18 α m 2.4x10-72 True posiives, p 101 False posiives, f p 41 True negaives, n 2296 False negaives, f n 53 Accuracy, f 1( ) 96.23% Posiive accuracy, f 2( ) 71.13% Table 7.3 Recalibraed Sickou and Release Resuls (Observed) Foureen emporal paern clusers form he emporal paern cluser collecion employed o idenify evens. This collecion conains emporal paern clusers ha vary in dimension from 1 o 14. The runs and z ess wih α r = 0 and α m = 2.4x10-72 show ha
136 Chaper 7 Engineering Applicaions 123 he wo ses, clusered and non-clusered, are saisically differen. However, for his problem he goal is o accuraely predic drople releases. The more meaningful saisics are he rue/false posiives/negaives. The saisics for accuracy indicae ha 96.23% of he release observaions are correcly characerized. The posiive accuracy indicaes ha 71.13% of he release observaions caegorized as evens are evens. The esing ime series is shown in Figure 7.9 and Figure The recalibraed sickou and release ime series are shown in Figure 7.11 and Figure The esing ime series is ransformed ino he phase space as illusraed in Figure The augmened phase space for he esing ime series is seen in Figure x (pixels) Figure 7.9 Sickou Time Series (Tesing)
137 Chaper 7 Engineering Applicaions 124 x (pixels) Figure 7.10 Sickou Sample Time Series (Tesing) x (pixels) Figure 7.11 Recalibraed Sickou Time Series (Tesing)
138 Chaper 7 Engineering Applicaions sickou release x (evens) Figure 7.12 Recalibraed Sickou and Release Time Series (Tesing) x x -1 Figure 7.13 Recalibraed Sickou Phase Space (Tesing)
139 Chaper 7 Engineering Applicaions 126 Figure 7.14 Recalibraed Sickou and Release Augmened Phase Space (Tesing) The resuls of applying he emporal paern cluser collecion o he esing ime series is seen in Table 7.4. Resul Value Clusers cardinaliy, c( ) 136 Clusers mean evenness, µ 0.74 Clusers sandard deviaion evenness, σ 0.44 Non-clusers cardinaliy, c( ) 2,356 Non-clusers mean evenness, µ Non-clusers sandard deviaion evenness, σ 0.15 z r -49 α r 0 z m 19 α m 4.0x10-78 True posiives, p 100
140 Chaper 7 Engineering Applicaions 127 Resul Value False posiives, f p 36 True negaives, n 2,303 False negaives, f n 53 Accuracy, f 1( ) 96.43% Posiive accuracy, f 2( ) 73.53% Table 7.4 Recalibraed Sickou and Release Resuls (Tesing) As wih he raining sage resuls, he esing sage resuls are saisically significan as seen by boh he runs and z ess. The α r is zero, and he α m is 4.0x ore imporanly, he predicion accuracy is 96.43%, and he posiive accuracy is 73.53%. These resuls are beer han hose found in he characerizaion phase. This is significan, especially considering ha he daa se provider deems he sickou measuremens as no oo reliable. 7.2 Adjused Release Characerizaion and Predicion Using Sickou This secion presens resuls using an adjused release ime series raher han he one compued using he INEEL process. As seen in Figure 7.6, he release ime series does no always correspond wih he sickou daa. I also does no correspond wih he volage ime series presened laer in he chaper. The adjused release ime series is creaed using a simple rule a release has occurred afer a en-pixel drop in he sickou ime series. This rule is idenifying evens a poseriori, while he TSD mehod is predicing evens a priori. A sample of he adjused release ime series is shown in Figure 7.15.
141 Chaper 7 Engineering Applicaions sickou release 15 x (pixels) Figure 7.15 Recalibraed Sickou and Adjused Release Time Series (Observed) Figure 7.16 Recalibraed Sickou and Adjused Release Augmened Phase Space (Observed) The TSD goal, primary objecive funcion, even characerizaion, and opimizaion formulaion remain he same. An alernaive secondary objecive funcion,
142 Chaper 7 Engineering Applicaions ( f ) 2 n + n if p = 0 fp = 0 f3( P) =, (7.3) p - ( p + fp + n + fn) fp oherwise is inroduced, which maximizes he number of rue posiives while penalizing any false posiives. The augmened phase space, illusraed by Figure 7.16, while sill complex, is more orderly han he unadjused release augmened phase space shown in Figure 7.8. Five differen ses of geneic algorihms parameers are used o find he emporal paern clusers. For all ses, he elie coun was one, he gene lengh was eigh, and he ournamen size was wo. The oher parameers are lised in Table 7.5. Random Search muliplier Populaion size uaion rae Convergence crieria Secondary objecive funcion Se % 1 f ( P ) Se % 1 f ( P ) Se % 1 f ( P ) Se % 0.5 f ( P ) Se % 0.65 f ( P ) Se % 0.5 f ( P ) Table 7.5 Geneic Algorihm Parameers for Recalibraed Sickou and Adjused Release Time Series The raining sage resuls are shown in Table Resul Value Temporal paern cluser coun, c ( ) 67 Temporal paern cluser dimensions 1~14 Clusers cardinaliy, c( ) 138
143 Chaper 7 Engineering Applicaions 130 Resul Value Clusers mean evenness, µ 0.81 Clusers sandard deviaion evenness, σ 0.39 Non-clusers cardinaliy, c( ) 2,353 Non-clusers mean evenness, µ Non-clusers sandard deviaion evenness, σ 0.13 z r -49 α r 0 z m 24 α m 2.9x True posiives, p 112 False posiives, f p 26 True negaives, n 2,313 False negaives, f n 40 Accuracy, f 1 ( ) 97.35% Posiive accuracy, f 2 ( ) 81.16% Table 7.6 Recalibraed Sickou and Adjused Release Resuls (Observed) Sixy-seven emporal paern clusers form he emporal paern cluser collecion used o idenify he evens. The saisical ess wih α r = 0 and -124 α m = 2.9x10 show ha he wo ses, clusered and non-clusered, are saisically differen. The accuracy saisic indicaes ha 97.35% (vs % using he unadjused release ime series) of he release observaions are correcly characerized. The posiive accuracy indicaes ha 81.16% (vs.
144 Chaper 7 Engineering Applicaions % using he unadjused release ime series) of he release observaions caegorized as evens are evens. The esing sage ime series is shown in Figure The augmened phase space for he esing ime series is illusraed in Figure sickou release x (evens) Figure 7.17 Recalibraed Sickou and Adjused Release Time Series (Tesing) Figure 7.18 Recalibraed Sickou and Adjused Release Augmened Phase Space (Tesing)
145 Chaper 7 Engineering Applicaions 132 The esing sage resuls are presened in Table 7.7. Resul Value Clusers cardinaliy, c( ) 161 Clusers mean evenness, µ 0.70 Clusers sandard deviaion evenness, σ 0.46 Non-clusers cardinaliy, c( ) 2,331 Non-clusers mean evenness, µ Non-clusers sandard deviaion evenness, σ 0.13 z r -49 α r 0 z m 19 α m 1.63x10-79 True posiives, p 113 False posiives, f p 48 True negaives, n 2,291 False negaives, f n 40 Accuracy, f 1 ( ) 96.47% Posiive accuracy, f 2 ( ) 70.19% Table 7.7 Recalibraed Sickou and Adjused Sickou Resuls (Tesing) As wih he raining sage resuls, he esing sage resuls are saisically significan as seen by boh he runs and z ess. The α r = 0, and he -79 α m = 1.63x10. The predicion accuracy is 96.47% (vs % wih he unadjused release ime series) and he posiive accuracy is 70.19% (vs % wih he unadjused release ime series).
146 Chaper 7 Engineering Applicaions 133 According o he oal predicion accuracy, he recalibraed sickou and adjused sickou resuls are beer. Whereas according o he posiive predicion accuracy, he unadjused release ime series resuls are beer. 7.3 Sickou, Release, Curren and Volage Synchronizaion The las wo secions focused on using he sickou ime series emporal paerns for characerizaion and predicion of drople releases. The TSD-S/ mehod has yielded excellen resuls. The nex sep is o use he curren and volage ime series o help characerize and predic drople releases. Unforunaely, he sickou and release ime series are no synchronized wih he curren and volage ime series. This leaves wo problems o be solved. The firs is o synchronize he four ime series. The second is o compensae for he differen sampling raes. The synchronizaion is done by maching he firs and las volage peaks wih he firs and las drople releases. For he volage ime series, hese observaions are 973 and For he drople release ime series, hese observaions are 187 and Recall ha he sickou and release ime series sampling rae was repored o be 1kHz and he curren and volage sampling-rae was repored o be 5kHz. If hese sampling raes are perfecly calibraed, he 1kHz ime series could be up-sample o he 5kHz rae by inerpolaing four addiional poins for each observaion or down-sampling he 5kHz ime series by averaging five observaions ino one observaion. However, when his is done, he ime series lose synchronizaion. The iniial synchronizaion was done using he firs volage spike and he firs drople release. Using he repored five-o-one sampling raio and he las drople release
147 Chaper 7 Engineering Applicaions 134 observaion of 5151, he las volage spike should be observaion 25,793. I is acually observaion 25,764, which is deermined by visualizing he daa. The rue sampling raes are no exacly in a 5:1 raio. The problem is solved using alab s inerp1 [57, pp ] funcion wih he cubic spline opion. This funcion allows conversion beween arbirary sampling raes by providing he iniial ime series wih is sampling imes and by specifying a vecor wih he desired sampling imes. The funcion performs inerpolaion using a cubic spline. I may be used for eiher up-sampling or down-sampling. Boh he up-sampling o 5kHz and down sampling o 1kHz ime series were generaed by appropriaely mapping he firs and las synchronizaion observaions ono each oher. 7.4 Adjused Release Characerizaion and Predicion Using Sickou, Volage, and Curren Wih he synchronizaion problem solved, he TSD-/ mehod is applied o he volage, curren, and sickou ime series o characerize and predic drople releases. The adjused release ime series is used as he indicaor of evens. The ime series are normalized o he range [0,1], using he ransformaion Z X min ( X) =. (7.4) max ( X min ( X) ) A sample of he observed ime series is shown in Figure The TSD goal, primary objecive funcion, even characerizaion, and opimizaion formulaion remain he same. An alernaive secondary objecive funcion, f ( P) = 4 µ µ 2 σ c σ + 2 ( ) c( ) (7.5)
148 Chaper 7 Engineering Applicaions 135 also is used. Scaled pixels, amps, decivols sickou release curren volage Figure 7.19 Recalibraed Sickou, Curren, Volage, and Adjused Release Time Series (Observed) Because he smalles phase space ha can be formed using all he ime series is hree-dimensional, and he corresponding augmened phase space is four-dimensional, graphical illusraions are no possible. Noneheless, hese spaces are formed and searched using a ournamen geneic algorihm. The se of geneic algorihm search parameers is presened in Table 7.5. Three differen ses of geneic algorihm parameers were used o find all he emporal paern clusers. For all parameer ses, he elie coun was one, he gene lengh was eigh, he ournamen size was wo, and muaion rae was 0.2%. The oher parameers by se are lised in Table 7.8.
149 Chaper 7 Engineering Applicaions 136 Random Search muliplier Populaion size Convergence crieria Secondary objecive funcion Se f ( P ) Se f ( P ) Se f ( P ) Table 7.8 Geneic Algorihm Parameers for Recalibraed Sickou, Curren, Volage, and Adjused Release Time Series The raining sage resuls are shown in Table Resul Value Temporal paern cluser coun, c ( ) 62 Temporal paern cluser dimensions 3~15 Clusers cardinaliy, c( ) 117 Clusers mean evenness, µ 0.89 Clusers sandard deviaion evenness, σ 0.32 Non-clusers cardinaliy, c( ) 2,374 Non-clusers mean evenness, µ Non-clusers sandard deviaion evenness, σ 0.14 z r -49 α r 0 z m 30 α m 7.1x True posiives, p 104 False posiives, f p 13 True negaives, n 2,326
150 Chaper 7 Engineering Applicaions 137 Resul Value False negaives, f n 48 Accuracy, f 1( ) 97.55% Posiive accuracy, f 2( ) 88.89% Table 7.9 Recalibraed Sickou, Curren, Volage, and Adjused Release Resuls (Observed) Scaled pixels, amps, decivols sickou release curren volage Figure 7.20 Recalibraed Sickou, Curren, Volage, and Adjused Release Time Series (Tesing) Sixy-wo emporal paern clusers form he collecion of emporal paern clusers used o idenify he evens. This collecion conains emporal paern clusers ha vary in dimension from 3 o 15. The runs and z ess wih α r = 0 and α m = 7.1x show ha he wo ses, clusered and non-clusered, are saisically differen. The accuracy saisic indicaes ha 97.55% (vs % using jus he sickou and he
151 Chaper 7 Engineering Applicaions 138 adjused release ime series and vs % using he sickou and unadjused release ime series) of he release observaions are correcly characerized. The posiive accuracy indicaes ha 88.89% (vs % using jus he sickou and he adjused release ime series and vs % using he sickou and unadjused release ime series) of he release observaions caegorized as evens are evens. The esing sage ime series is illusraed in Figure 7.20 and resuls in Table 7.7. Resul Value Clusers cardinaliy, c( ) 117 Clusers mean evenness, µ 0.67 Clusers sandard deviaion evenness, σ 0.47 Non-clusers cardinaliy, c( ) 2,375 Non-clusers mean evenness, µ Non-clusers sandard deviaion evenness, σ 0.17 z r -49 α r 0 z m 14 α m 2.1x10-47 True posiives, p 78 False posiives, f p 39 True negaives, n 2,300
152 Chaper 7 Engineering Applicaions 139 Resul Value False negaives, f n 75 Accuracy, f 1( ) 95.42% Posiive accuracy, f 2( ) 66.67% Table 7.10 Recalibraed Sickou, Curren, Volage, and Adjused Release Resuls (Tesing) As wih he raining sage, he esing sage resuls are saisically significan as seen by boh he runs and z ess. The α r = 0 and α m 41 = 2.1x10. ore imporanly, he predicion accuracy is 95.42% (vs % using jus he sickou and he adjused release ime series and vs % using he sickou and unadjused release ime series) and he posiive accuracy is 66.67% (vs % using jus he sickou and he adjused release ime series and vs % using he sickou and unadjused release ime series). The predicion resuls using he sickou, curren, and volage ime series are no as good as using jus he sickou ime series. There are wo possible explanaions for his. Recall ha he raining sage resuls using all hree ime series were beer han he raining resuls using jus he sickou ime series. In addiion, he search space is be higher dimensional and herefore sparser, because he muli-dimensional ime series embeds o a higher dimensional phase space. This suggess ha he raining sage over-fi he emporal paern clusers o he raining sage observaions, i.e., he emporal paern clusers discovered in he raining sage are oo specific o he raining sage ime series. The second explanaion is ha he recalibraion process has inroduced noise causing he esing resuls o be worse.
153 Chaper 7 Engineering Applicaions Conclusion Using from one o hree ime series generaed from sensors on a welding saion, he problem of predicing when a drople of meal will release from he welder was solved wih a high degree of accuracy from 95.42% o 96.47% oal predicion accuracy and from 66.67% o 73.53% posiive predicion accuracy. These resuls show ha he TSD mehod could be used in a sysem o conrol and monior he welding seam hereby improving he qualiy of he weld. The nex chaper applies he TSD mehods o he financial domain.
154 141 Chaper 8 Financial Applicaions of Time Series Daa ining This chaper, organized ino four secions, presens significan resuls found by applying he Time Series Daa ining (TSD) mehod o financial ime series. The firs secion discusses he definiion of evens for his applicaion and he generaion of he ime series. The second and hird secions presen he resuls of applying he TSDe 1 -S/S and TSDe 1 -/S mehods o a financial ime series. The final secion applies he TSDe 2 -S/S mehod o a collecion of ime series. In his chaper, he analyzed ime series are neiher synheically generaed as in Chaper 5, nor measured from a physical sysem as in Chaper 7. Insead, hey are creaed by he dynamic ineracion of millions of invesors buying and selling securiies hrough a secondary equiy marke such as he New York Sock Exchange (NYSE) or Naional Associaion of Securiies Dealers Auomaed Quoaion (NASDAQ) marke [58]. The imes series are measuremens of he aciviy of a securiy, specifically a sock. The ime series are he daily open price, which is he price of he firs rade, and he daily volume, which is he oal number of shares of he sock raded. Before applying he TSD framework o securiy price predicion, an explanaion of he underlying srucure of securiy price behavior is required, i.e., he efficien marke hypohesis. The efficien marke hypohesis is described using he expeced reurn or fair game model, which pus he efficien marke hypohesis on firmer heoreical grounds han using he random walk hypohesis [58, p. 210]. The expeced value of a securiy is E( P Φ ) = [ 1+ E( r Φ )] P [58, p. 210], (8.1)
155 Chaper 8 Financial Applicaions of Time Series Daa ining 142 where P is he price of a securiy a ime, r + 1 is he one-period percen rae of reurn for he securiy during period +1, and he securiy price a ime. Φ is he informaion assumed o be fully refleced in There are hree forms of he efficien marke hypohesis. The weak form assumes Φ is all securiy-marke informaion, such as hisorical sequence of price, raes of reurn, and rading volume daa [58, p. 211]. The semisrong form assumes Φ is all public informaion, which is a super se of all securiy-marke informaion, including earnings and dividend announcemens, price-o-earning raios, and economic and poliical news [58, p. 211]. The srong form assumes Φ is all public and privae informaion, also including resriced daa such as company insider informaion [58, p. 212]. The weak form of he efficien marke hypohesis, which has been suppored in he lieraure, applies o he curren chaper. The efficien marke hypohesis is verified by showing ha securiy price ime series show no auocorrelaion and are random according o he runs es. In addiion, ess of rading rules have generally shown ha he weak form of he efficien marke hypohesis holds [58, p ]. The TSD goal is o find a rading-edge, a small advanage ha allows greaer han expeced reurns o be realized. If he weak form of he efficien marke hypohesis holds, he TSD mehods should no be able o find any emporal paerns ha can be exploied o achieve such a rading-edge. The TSD goal is o find emporal paern clusers ha are, on average, characerisic and predicive of a larger han normal increase in he price of a sock.
156 Chaper 8 Financial Applicaions of Time Series Daa ining ICN Time Series Using Open Price This secion presens he resuls of applying he TSDe 1 -S/ mehod o characerizing and predicing he change in he open price of ICN, a NASDAQ raded sock. ICN is an inernaional pharmaceuical company. Two periods, 1990 and 1991, are analyzed. The firs half of 1990 will be used as he observed ime series and he second half as he esing ime series. The 1991 ime series will be similarly divided. x (open price in dollars) /2/1990 1/16/1990 1/30/1990 2/13/1990 2/28/1990 3/14/1990 3/28/1990 4/11/1990 4/26/1990 5/10/1990 5/24/1990 6/8/1990 6/22/1990 Figure 8.1 ICN 1990H1 Daily Open Price Time Series (Observed) ICN 1990 Time Series Using Open Price The Figure 8.1 illusraes he observed ime series X, which is he ICN open price for he firs half of 1990 (1990H1). To idenify emporal paerns ha are boh characerisic and predicive of evens, a filer is needed. The % filer convers he ime series ino a percenage change open price ime series. The filered ime series has a more consisen range, as seen in Figure 8.2, faciliaing he discovery of emporal paern clusers.
157 Chaper 8 Financial Applicaions of Time Series Daa ining 144 x (filered open price) 25% 20% 15% 10% 5% 0% -5% -10% -15% 1/3/1990 1/17/1990 1/31/1990 2/14/1990 3/1/1990 3/15/1990 3/29/1990 4/12/1990 4/27/1990 5/11/1990 5/25/1990 6/11/1990 6/25/1990 Figure 8.2 Filered ICN 1990H1 Daily Open Price Time Series (Observed) The TSD goal of finding a rading-edge is resaed in erms of TSD conceps. The objecive funcion is µ if c( ) c( Λ) β f ( P) = c( ), (8.2) ( µ - g0) + g0 oherwise β c( Λ) where β = The even characerizaion funcion is g() = x + 1, which allows for onesep-ahead characerizaion and predicion. The opimizaion formulaion is max f ( P ) subjec o min b( δ ). Figure 8.3 presens an illusraive phase space for he filered ICN 1990H1 daily open price ime series wih a Euclidean disance meric. Figure 8.4 shows he augmened phase space. The complexiy of he embedding as illusraed in Figure 8.4. Clearly, he idenificaion of a emporal paern cluser ha separaes evens from non-evens is no possible. This will no preven he TSD goal of finding a rading-edge, hough. The
158 Chaper 8 Financial Applicaions of Time Series Daa ining 145 goal is o find emporal paern clusers ha have higher objecive funcion values and are saisically differen from he phase space poins ouside he emporal paern clusers. 25% 20% 15% 10% x 5% 0% -15% -5% 5% 15% 25% -5% -10% -15% x -1 Figure 8.3 Filered ICN 1990H1 Daily Open Price Phase Space (Observed) 0.2 g x x - 1 Figure 8.4 Augmened Phase Space of Filered ICN 1990H1 Daily Open Price (Observed)
159 Chaper 8 Financial Applicaions of Time Series Daa ining 146 The geneic algorihm search parameers are presened in Table 8.1. Parameer Values Random search muliplier 10 Populaion size 30 Elie coun 1 Gene lengh 6 Tournamen size 2 uaion rae 0 % Convergence crieria 1 Table 8.1 Geneic Algorihm Parameers for Filered ICN 1990H1 Daily Open Price Time Series The raining sage resuls are shown in Table 8.2. Resul Se 1 Se 2 Se 3 Combined Se Temporal paern cluser coun, c ( ) Temporal paern cluser dimensions ,3,5 Clusers cardinaliy, c( ) Clusers mean evenness, µ 5.43% 3.50% 6.49% 3.37% Clusers sandard deviaion evenness, σ 8.70% 6.95% 7.47% 6.60% Non-clusers cardinaliy, c( ) Non-clusers mean evenness, µ -0.56% -0.50% -0.61% Non-clusers sandard deviaion evenness, σ 3.60% 3.92% 3.80% 3.43% z r
160 Chaper 8 Financial Applicaions of Time Series Daa ining 147 Resul Se 1 Se 2 Se 3 Combined Se α r 4.71x x x x10-6 z m α m 5.30x x x x10-3 Table 8.2 Filered ICN 1990H1 Daily Open Price Resuls (Observed) In each case, he cluser mean evenness is greaer han he non-cluser mean evenness. However, because of he limied raining se size, he probabiliy of a Type I error incorrecly rejecing he null hypohesis ha he wo ses are he same is higher han in he previous chapers. By combining he ses, he saisical significance is increased. This ype of financial ime series is nonsaionary on all of he levels defined in his disseraion: sochasic, deerminisic, and chaoic. The paerns persis for a shor ime period. This causes problems in achieving he desired 0.05 significance level. The esing ime series and he filered esing ime series are shown in Figure 8.5 and Figure 8.6, respecively. Figure 8.7 illusraes he esing phase space. The augmened phase space is seen in Figure 8.8.
161 Chaper 8 Financial Applicaions of Time Series Daa ining 148 x (open price in dollars) /2/1990 7/17/1990 7/31/1990 8/14/1990 8/28/1990 9/12/1990 9/26/ /10/ /24/ /7/ /21/ /6/ /20/1990 Figure 8.5 ICN 1990H2 Daily Open Price Time Series (Tesing) x (filered open price) 25% 20% 15% 10% 5% 0% -5% -10% -15% 7/2/1990 7/17/1990 7/31/1990 8/14/1990 8/28/1990 9/12/1990 9/26/ /10/ /24/ /7/ /21/ /6/ /20/1990 Figure 8.6 Filered ICN 1990H2 Daily Open Price Time Series (Tesing)
162 Chaper 8 Financial Applicaions of Time Series Daa ining % 20% 15% 10% x 5% 0% -15% -5% 5% 15% 25% -5% -10% -15% x -1 Figure 8.7 Filered ICN 1990H2 Daily Open Price Phase Space (Tesing) g x x - 1 Figure 8.8 Augmened Phase Space of Filered ICN 1990H2 Daily Open Price (Tesing)
163 Chaper 8 Financial Applicaions of Time Series Daa ining 150 The esing sage resuls are seen in Table 8.3. Resul Se 1 Se 2 Se 3 Combined Se Temporal paern cluser coun, c ( ) Temporal paern cluser dimensions ,3,5 Clusers cardinaliy, c( ) Clusers mean evenness, µ 4.16% 0.96% 1.95% 1.48% Clusers sandard deviaion evenness, σ 9.58% 8.41% 9.64% 7.97% Non-clusers cardinaliy, c( ) Non-clusers mean evenness, µ -0.56% -0.23% -0.30% -0.60% Non-clusers sandard deviaion evenness, σ 4.80% % 4.48% z r α r 2.62x x x x10-4 z m α m 8.02x x x x10-1 Table 8.3 Filered ICN 1990H2 Daily Open Price Resuls (Tesing) As wih he raining sage resuls, he average evenness values of ime series observaions inside he emporal paern clusers are greaer han he average evenness of he observaions ouside he emporal paern clusers. However, for he same reasons discussed previously sample size and emporal paern saionariy he saisical significance as shown by α is never less han The TSD goal is me in ha a rading-edge is idenified, bu i is no saisically significan.
164 Chaper 8 Financial Applicaions of Time Series Daa ining ICN 1991 Time Series Using Open Price The same TSD goal, objecive funcion, even characerizaion funcion and opimizaion formulaion are applied o he 1991 open price ime series. The observed ime series X, he open price for firs half of 1991 (1991H1), is illusraed in Figure 8.9. Figure 8.10 shows he filered observed ime series observaions. Figure 8.11 presens an illusraive phase space, and Figure 8.12 an illusraive augmened phase space. The ournamen geneic algorihm search parameers are presened in Table 8.1. The raining sage resuls are shown in Table 8.4. x (open price in dollars) /2/1991 1/16/1991 1/30/1991 2/13/1991 2/28/1991 3/14/1991 3/28/1991 4/12/1991 4/26/1991 5/10/1991 5/24/1991 6/10/1991 6/24/1991 Figure 8.9 ICN 1991H1 Daily Open Price Time Series (Observed)
165 Chaper 8 Financial Applicaions of Time Series Daa ining 152 x (filered open price) 25% 20% 15% 10% 5% 0% -5% -10% -15% 1/2/1991 1/16/1991 1/30/1991 2/13/1991 2/28/1991 3/14/1991 3/28/1991 4/12/1991 4/26/1991 5/10/1991 5/24/1991 6/10/1991 6/24/1991 Figure 8.10 Filered ICN 1991H1 Daily Open Price Time Series (Observed) 25% 20% 15% 10% x 5% 0% -15% -5% 5% 15% 25% -5% -10% -15% x -1 Figure 8.11 Filered ICN 1991H1 Daily Open Price Phase Space (Observed)
166 Chaper 8 Financial Applicaions of Time Series Daa ining g x x - 1 Figure 8.12 Augmened Phase Space of Filered ICN 1991H1 Daily Open Price (Observed) Resul Se 1 Se 2 Se 3 Temporal paern cluser coun, c ( ) Combined Se Temporal paern cluser dimensions ,3,5 Clusers cardinaliy, c( ) Clusers mean evenness, µ 4.62% 4.41% 5.49% 3.71% Clusers sandard deviaion evenness, σ 3.59% % 6.65% Non-clusers cardinaliy, c( ) Non-clusers mean evenness, µ 0.34% 0.36% 0.42% 0.01% Non-clusers sandard deviaion evenness, σ 4.79% 4.29% 4.36% 4.21% z r α r 2.95x x x x10-3
167 Chaper 8 Financial Applicaions of Time Series Daa ining 154 Resul Se 1 Se 2 Se 3 Combined Se z m α m 2.75x x x x10-2 Table 8.4 Filered ICN 1991H1 Daily Open Price Resuls (Observed) The raining resuls show ha a rading-edge can be found from he observed ime series. However, because of he small sample size, saisical significance is more difficul o achieve. The esing sage ime series is illusraed by Figure x (open price in dollars) /1/1991 7/16/1991 7/30/1991 8/13/1991 8/27/1991 9/11/1991 9/25/ /9/ /23/ /6/ /20/ /5/ /19/1991 Figure 8.13 ICN 1991H2 Daily Open Price Time Series (Tesing) The filered version of he esing ime series is shown in Figure Illusraive phase and augmened phase spaces are shown in Figure 8.15 and Figure 8.16, respecively. The raining sage resuls are seen in Table 8.5.
168 Chaper 8 Financial Applicaions of Time Series Daa ining 155 x (filered open price) 25% 20% 15% 10% 5% 0% -5% -10% -15% -20% -25% 7/1/1991 7/16/1991 7/30/1991 8/13/1991 8/27/1991 9/11/1991 9/25/ /9/ /23/ /6/ /20/ /5/ /19/1991 Figure 8.14 Filered ICN 1991H2 Daily Open Price Time Series (Tesing) x 25% 20% 15% 10% 5% 0% -25% -15% -5% -5% 5% 15% 25% -10% -15% -20% -25% x -1 Figure 8.15 Filered ICN 1991H2 Daily Open Price Phase Space (Tesing)
169 Chaper 8 Financial Applicaions of Time Series Daa ining g x x - 1 Figure 8.16 Augmened Phase Space of Filered ICN 1991H2 Daily Open Price (Tesing) Resul Se 1 Se 2 Se 3 Temporal paern cluser coun, c ( ) Combined Se Temporal paern cluser dimensions ,3,5 Clusers cardinaliy, c( ) Clusers mean evenness, µ 2.06% 0.46% 0.88% 0.41% Clusers sandard deviaion evenness, σ 7.21% 5.06% 11.93% 8.04% Non-clusers cardinaliy, c( ) Non-clusers mean evenness, µ 0.98% 1.2% 1.12% 1.23% Non-clusers sandard deviaion evenness, σ 5.57% 5.81% 5.28% 5.16% z r α r 5.18x x x x10-2
170 Chaper 8 Financial Applicaions of Time Series Daa ining 157 Resul Se 1 Se 2 Se 3 Combined Se z m α m 6.01x x x x10-1 Table 8.5 Filered ICN 1991H2 Daily Open Price Resuls (Tesing) For his collecion of esing sage resuls, Se 1 has a higher cluser mean evenness han non-cluser mean evenness. Ses 2, 3, and combined do no. These resuls are presened so hey may be conrased wih hose in he nex secion, which incorporaes he volume ime series in predicing evens. The nex secion demonsraes ha, for he same se of possible evens, including he volume ime series yields beer and more saisically significan emporal paern clusers. 8.2 ICN Time Series Using Open Price and Volume This secion exends he resuls of applying he TSD mehod o predicing he change in he open price of ICN by including he volume ime series in he analysis. As wih he previous secion, his one is broken ino wo subsecions each addressing 1990 and 1991 periods, respecively. Adding informaion in he form of a second ime series enables beer characerizaion and predicion resuls ICN 1990 Time Series Using Open Price and Volume Figure 8.17 illusraes he observed ime series X, he firs half of 1990 (1990H1) open price and volume ime series. The TSD goal remains he same, as does he represenaion in TSD conceps. The search parameers are described in Table 8.1, and he raining sage resuls are shown in Table 8.6.
171 Chaper 8 Financial Applicaions of Time Series Daa ining 158 volume of shares raded 100K 90K 80K 70K 60K 50K 40K 30K 20K 10K 0K Figure ICN 1990H1 Daily Open Price and Volume Time Series (Observed) Resul Se 1 Se 2 Se 3 Temporal paern cluser coun, c ( ) Combined Se Temporal paern cluser dimensions ,6,10 Clusers cardinaliy, c( ) Clusers mean evenness, µ 7.24% 4.85% 7.95% 5.09% Clusers sandard deviaion evenness, σ 1/2/90 1/16/90 1/30/90 2/13/90 2/28/90 3/14/90 3/28/90 4/11/90 4/26/90 5/10/90 5/24/90 6/8/90 6/22/ % 7.68% 7.15% 7.27% Non-clusers cardinaliy, c( ) Non-clusers mean evenness, µ -0.55% -0.48% -0.63% -0.79% volume open open price in dollars Non-clusers sandard deviaion evenness, σ 3.57% 3.92% 3.78% 3.38% z r α r 1.39x x x x10-2
172 Chaper 8 Financial Applicaions of Time Series Daa ining 159 Resul Se 1 Se 2 Se 3 Combined Se z m α m 4.54x x x x10-3 Table 8.6 ICN 1990H1 Daily Open Price and Volume Resuls (Observed) In each case, he cluser mean evenness is greaer han he non-cluser mean evenness. A comparison o he same ime period resuls from Table 8.2 shows ha hese resuls are beer for boh he cluser mean evenness and he saisical measures. Four of he saisical ess are significan o he 0.05 α level. The esing sage ime series is shown in Figure The esing sage resuls are seen in Table 8.7. volume of shares raded 70K 60K 50K 40K 30K 20K 10K 0K 7/2/90 7/17/90 7/31/90 8/14/90 8/28/90 9/12/90 9/26/90 10/10/90 10/24/90 11/7/90 11/21/90 12/6/90 12/20/90 Figure 8.18 ICN 1990H2 Daily Open Price and Volume Time Series (Tesing) Resul Se 1 Se 2 Se 3 Temporal paern cluser coun, c ( ) Combined Se Temporal paern cluser dimensions ,6,10 volume open open price in dollars
173 Chaper 8 Financial Applicaions of Time Series Daa ining 160 Resul Se 1 Se 2 Se 3 Combined Se Clusers cardinaliy, c( ) Clusers mean evenness, µ 5.24% 3.14% 4.41% 3.27% Clusers sandard deviaion evenness, σ 9.14% 10.67% 12.57% 9.44% Non-clusers cardinaliy, c( ) Non-clusers mean evenness, µ -0.63% Non-clusers sandard deviaion evenness, σ -0.27% -0.31% -0.63% 4.84% 5.22% 5.09% 4.52% z r α r 1.57x x x x10-3 z m α m 2.84x x x x10-2 Table 8.7 ICN 1990H2 Daily Open Price and Volume Resuls (Tesing) As wih he raining sage, he esing sage resuls achieve he goal of finding a rading-edge. The cluser mean evenness is greaer han he non-cluser mean evenness. A comparison o he same ime period resuls from Table 8.3 reveals ha hese resuls are beer in boh he cluser mean evenness and he saisical measures. Three of he saisical ess are significan o he 0.05 α level ICN 1991 Time Series Using Open Price and Volume Figure 8.19 illusraes he observed ime series X, he firs half of 1990 (1990H1) open price and volume ime series. The raining sage resuls are shown in Table 8.8
174 Chaper 8 Financial Applicaions of Time Series Daa ining 161 volume of shares raded 140K 120K 100K 80K 60K 40K 20K 0K Figure ICN 1991H1 Daily Open Price and Volume Time Series (Observed) Resul Se 1 Se 2 Se 3 Temporal paern cluser coun, c ( ) Combined Se Temporal paern cluser dimensions ,6,10 Clusers cardinaliy, c( ) Clusers mean evenness, µ 5.76% 10.54% 9.88% 7.87% Clusers sandard deviaion evenness, σ volume open 1/2/91 1/16/91 1/30/91 2/13/91 2/28/91 3/14/91 3/28/91 4/12/91 4/26/91 5/10/91 5/24/91 6/10/91 6/24/ % 6.87% 7.92% 6.78% Non-clusers cardinaliy, c( ) Non-clusers mean evenness, µ 0.27% 0.02% 0.19% -0.20% open price in dollars Non-clusers sandard deviaion evenness, σ 4.65% 3.99% 4.16% 3.85% z r
175 Chaper 8 Financial Applicaions of Time Series Daa ining 162 Resul Se 1 Se 2 Se 3 Combined Se α r 2.95x x x x10-6 z m α m 4.53x x x x10-5 Table 8.8 ICN 1991H1 Daily Open Price and Volume Resuls (Observed) Again, he cluser mean evenness is greaer han he non-cluser mean evenness for each se, and he resuls are beer han he same ime period resuls from Table 8.4, which used only he open price ime series. All bu one of he saisical ess are significan o he 0.05 α level, and all bu wo are significan o he α level. The esing sage ime series is shown in Figure 8.20, and he resuls are seen in Table 8.9. volume of shares raded 1,600K 1,400K 1,200K 1,000K 800K 600K 400K 200K 0K volume open 7/1/91 7/16/91 7/30/91 8/13/91 8/27/91 9/11/91 9/25/91 10/9/91 10/23/91 11/6/91 11/20/91 12/5/91 12/19/91 Figure 8.20 ICN 1991H2 Daily Open Price and Volume Time Series (Tesing) Resul Se 1 Se 2 Se 3 Temporal paern cluser coun, c ( ) Combined Se Temporal paern cluser dimensions ,6, open price in dollars
176 Chaper 8 Financial Applicaions of Time Series Daa ining 163 Resul Se 1 Se 2 Se 3 Combined Se Clusers cardinaliy, c( ) Clusers mean evenness, µ 5.14% 1.26% 6.40% 3.48% Clusers sandard deviaion evenness, σ 7.98% 15.07% 11.91% 11.07% Non-clusers cardinaliy, c( ) Non-clusers mean evenness, µ 0.78% Non-clusers sandard deviaion evenness, σ 1.16% 0.92% 0.77% 5.45% 5.01% 5.46% 4.58% z r α r 3.75x x x x10-1 z m α m 1.07x x x x10-1 Table 8.9 ICN 1991H2 Daily Open Price and Volume Resuls (Tesing) As wih he characerizaion, he cluser mean evenness for each se is greaer han he non-cluser mean evenness. A comparison o he same ime period resuls (from Table 8.5) shows ha hese resuls are beer in boh he cluser mean evenness and he saisical measures. Recall ha, in Table 8.5, only one of he ses had a cluser mean evenness ha was greaer han he non-cluser mean evenness. Here, all of he cluser mean evennesses are greaer. However, as seen before, he saisical significances are hampered by he limied sample size and emporal paern saionariy. In he nex secion, he ideas gained from analyzing he ICN ime series are applied. For he ICN ime series, he secion applied a emporal paern discovered in a
177 Chaper 8 Financial Applicaions of Time Series Daa ining 164 half-year s worh of daa o he nex half-year s worh of daa. The nex secion will apply a half-year s worh of raining o he nex day s predicion. The raining sage is repeaed a each ime-sep. 8.3 DJIA Componen Time Series This secion presens he resuls of applying he TSDe 2 -S/ mehod o he 30 open daily price ime series of he Dow Jones Indusrial Average (DJIA) componens from January 2, 1990, hrough arch 8, 1991, which allows approximaely 200 esing sages. The following socks in Table 8.10 make up he DJIA during his period. Ticker Company Name Ticker Company Name AA Aluminum of America JNJ Johnson & Johnson ALD AlliedSignal Inc. JP J.P. organ AXP American Express KO Coca-Cola BA Boeing CD cdonald s CAT Caerpillar innesoa ining & anufacuring CHV Chevron O Philip orris DD DuPon RK erck DIS Wal Disney PG Procer & Gamble EK Easman Kodak S Sears, Roebuck GE General Elecric T AT & T Corp. G General oors TRV Travelers (Now par of Ciigroup Inc.) GT Goodyear Tire & Rubber UK Union Carbide HWP Hewle-Packard UTX Unied Technologies IB Inernaional Business achines WT Wal-ar Sores IP Inernaional Paper XON Exxon Table 8.10 Dow Jones Indusrial Average Componens (1/2/1990 3/8/1991)
178 Chaper 8 Financial Applicaions of Time Series Daa ining 165 Raher han graphically presen each of he 30 DJIA componen socks, Figure 8.21 illusraes he DJIA. As wih he ICN ime series, a percenage filer is applied o each DJIA componen ime series o faciliae finding emporal paern clusers. DJIA /2/1990 1/30/1990 Figure 8.21 DJIA Daily Open Price Time Series The TSD goal is o find a rading-edge. The nex secion shows how his goal is capured hrough TSD conceps Training Sage The objecive funcion is µ if c( ) c( Λ) β f ( P) = c( ), (8.3) ( µ - g0) + g0 oherwise β c( Λ) where β = The even characerizaion funcion o is g() = x + 1, which allows for one-sep-ahead characerizaion and predicion. The opimizaion formulaion is max f ( P ). 2/28/1990 3/28/1990 4/26/1990 5/24/1990 6/22/1990 7/23/1990 8/20/1990 9/18/ /16/ /13/ /12/1990 1/11/1991 2/8/1991
179 Chaper 8 Financial Applicaions of Time Series Daa ining 166 Because of he large number of raining processes 5,970 a graphical presenaion of each sep is no made. Recall ha he TSDe 2 mehod uses a moving raining window and a single observaion esing window. The raining window is 100 observaions. The search parameers are presened in Table The roulee selecion geneic algorihm was used. Parameer Values Random search muliplier 10 Populaion size 30 Elie coun 1 Gene lengh 6 uaion rae 0% Convergence crieria 1 Table 8.11 Geneic Algorihm Parameers for DJIA Componen Time Series Because of he large number of raining and esing ses and because of he rading-edge goal, he resuls presened are of a summary naure. The saisical raining resuls for each DJIA componen are presened in Table Of he 5,970 raining processes, he cluser mean evenness ( µ ) was greaer han oal mean evenness ( µ X ) every ime. For 69% of he emporal paern clusers, he probabiliy of a Type I error was less han 5% based on he independen means saisical es. For 49% of he emporal paern clusers, he probabiliy of a Type I error was less han 5% based on he runs saisical es.
180 Chaper 8 Financial Applicaions of Time Series Daa ining 167 Ticker µ > µ X α m α r AA 100% 82% 55% ALD 100% 72% 52% AXP 100% 71% 48% BA 100% 70% 42% CAT 100% 79% 48% CHV 100% 54% 34% DD 100% 42% 35% DIS 100% 83% 25% EK 100% 55% 18% GE 100% 66% 81% G 100% 73% 49% GT 100% 62% 44% HWP 100% 55% 34% IB 100% 67% 24% IP 100% 80% 78% JNJ 100% 89% 37% JP 100% 90% 14% KO 100% 67% 87% CD 100% 62% 62% 100% 57% 75% O 100% 65% 29% RK 100% 59% 70% PG 100% 76% 38% S 100% 59% 86% T 100% 66% 40% TRV 100% 78% 63% UK 100% 36% 66% UTX 100% 94% 46% WT 100% 73% 37% XON 100% 75% 61% Combined 100% 69% 49% Table 8.12 DJIA Componen Resuls (Observed)
181 Chaper 8 Financial Applicaions of Time Series Daa ining Tesing Sage Resuls Using he 5,970 raining processes, 471 evens are prediced. The saisical predicion resuls for each DJIA componen are presened in Table The cluser mean evenness ( µ ) was greaer han he non-cluser mean evenness ( µ ) 20 ou of 30 imes or 67% of he ime. For 16.7% of he emporal paern clusers, he probabiliy of a Type I error was less han 5% based on he independen means saisical es. For 3.3% of he emporal paern clusers, he probabiliy of a Type I error was less han 5% based on he runs saisical es. These low raes of saisical significance a he 5% α level are ypical for predicions of financial ime series as seen from he previously presened ICN resuls. Ticker c() µ σ c( ) µ σ α m α r AA % 1.652% % 1.620% 1.78x x10-1 ALD % 1.428% % 1.851% 1.83x x10-1 AXP % 2.058% % 2.610% 9.32x x10-1 BA % 2.044% % 2.181% 8.52x x10-1 CAT % 1.817% % 2.127% 8.08x x10-1 CHV % 1.572% % 1.200% 9.92x x10-1 DD % 1.946% % 1.635% 2.55x x10-1 DIS % 1.488% % 2.069% 8.00x x10-1 EK % 1.879% % 1.998% 8.20x x10-1 GE % 1.410% % 1.881% 8.04x x10-1 G % 2.090% % 1.863% 1.29x x10-1 GT % 2.034% % 2.549% 6.93x x10-1
182 Chaper 8 Financial Applicaions of Time Series Daa ining 169 Ticker c() µ σ c( ) µ σ α m α r HWP % 1.881% % 2.664% 1.08x x10-1 IB % 1.785% % 1.460% 6.32x x10-1 IP % 2.525% % 1.587% 6.80x x10-1 JNJ % 1.444% % 1.551% 2.25x x10-1 JP % 1.878% % 1.985% 1.82x x10-1 KO % 3.396% % 1.807% 8.36x x10-1 CD % 1.753% % 1.977% 4.54x x % 1.044% % 1.258% 4.82x x10-1 O % 1.820% % 1.641% 6.42x x10-1 RK % 1.163% % 1.580% 4.11x x10-2 PG % 1.615% % 1.707% 7.85x x10-1 S % 2.677% % 1.938% 2.77x x10-4 T % 1.797% % 1.645% 6.88x x10-2 TRV % 2.449% % 2.617% 3.21x x10-1 UK % 2.263% % 1.900% 4.30x x10-1 UTX % 1.979% % 1.828% 6.33x x10-1 WT % 1.950% % 2.458% 2.77x x10-1 XON % 1.398% % 1.263% 9.68x x10-1 All % 1.970% 5, % 1.919% 1.38x x10-1 Top % 1.966% 2, % 1.809% 2.27x x10-3 Table 8.13 DJIA Componen Resuls (Tesing)
183 Chaper 8 Financial Applicaions of Time Series Daa ining 170 For he combined resuls using all predicions he mean cluser evenness is greaer han he non-cluser mean evenness. I also is saisically significan o he 0.005α level according o he independen means es. However, beer resuls can be achieved by predicing which emporal paern clusers are more likely o yield accurae predicions. This is done by defining ( αm 0.05) + ( αr 0.05) αµ =. (8.4) 2 The α µ is he average of he α m α r Table The excess reurn, µ e = µ µ, (8.5) is he difference in he reurns achieved by using he emporal paern clusers and he complemen of he emporal paern clusers. The α µ has a 0.50 correlaion wih he excess reurn. Figure 8.22 illusraes his. 2% Excess Reurn 1% 0% -1% -2% 79% 74% 71% 69% 66% 64% 62% 61% α µ 57% 55% 53% 52% 47% 45% 39% Figure 8.22 α µ vs. Excess Reurn
184 Chaper 8 Financial Applicaions of Time Series Daa ining 171 The op 15 socks are seleced based on heir α µ. The predicion resuls using he porfolio formed from hese op 15 DJIA componens yields excepional resuls. Using he emporal paern clusers for he op 15 socks, 245 predicions are made. The cluser mean evenness ( µ ) was greaer han he non-cluser mean evenness ( µ ) 13 ou of 15 imes or 87% of he ime. The average prediced even had a 0.596% increase in open price. The average of he no prediced evens was %. According o boh saisical ess, he resuls are saisically significan. Using he means es, here is only a % chance of making a Type I error in rejecing he null hypohesis ha he prediced evens are he same as he no prediced observaions. Using he runs es, here is a 0.884% chance of making a Type I error. The bes way o undersand he effeciveness of he TSD mehod when applied o financial ime series is o show he rading resuls ha can be achieved by applying he emporal paern clusers discovered above. An iniial invesmen is made as follows: If a emporal paern cluser from any of he socks in he porfolio predics a high evenness, he iniial invesmen is made in ha sock for one day. If here are emporal paern clusers for several socks ha indicae high evenness, he iniial invesmen is spli equally among he socks. If here are no emporal paern clusers indicaing high evenness, hen he iniial invesmen is invesed in a money marke accoun wih an assumed 5% annual rae of reurn. The raining process is rerun using he new 100 mos recen observaion window. The following day, he iniial invesmen principal plus reurn is invesed according o he same rules. The process is repeaed for he remaining invesmen period.
185 Chaper 8 Financial Applicaions of Time Series Daa ining 172 The resuls for he invesmen period of ay 29, 1990 hrough arch 8, 1991 are shown in Table This period is less han he oal ime frame (January 1, 1990, hrough arch 8, 1991) because he firs par of he ime series is used only for raining. The reurn of he DJIA also is given, which is slighly differen from he buy and hold sraegy for all DJIA componens because he DJIA has a non-equal weighing among is componens. Porfolio Invesmen ehod Reurn Annualized Reurn All DJIA componens Temporal Paern Cluser 30.98% 41.18% Top 15 DJIA componens Temporal Paern Cluser 67.77% 93.70% DJIA Buy and hold 2.95% 3.79% All DJIA componens No in Temporal Paern Cluser 0.35% 0.45% Top 15 DJIA componens No in Temporal Paern Cluser -2.94% -3.74% All DJIA componens Buy and hold 3.34% 4.29% Top 15 DJIA componens Buy and hold 2.81% 3.60% Table 8.14 Trading Resuls An iniial invesmen of $10,000 made on ay 29, 1990, in he op 15 DJIA componen socks using he TSD mehod would have grown o $16,777 a he end of arch 8, One cavea o his resul is ha i ignores rading coss [59]. The rading cos is a percenage of he amoun invesed and includes boh he buying and selling ransacion coss along wih he spread beween he bid and ask. The reurn of he op 15 DJIA componen porfolio using he emporal paern cluser invesmen mehod is reduced o 63.73% or 87.76% annualized when a rading cos rae of 0.01% applied. This
186 Chaper 8 Financial Applicaions of Time Series Daa ining 173 level of rading cos would require invesmens in he $500,000 o $1,000,000 range and access o rading sysems ha execue in beween he bid and ask prices or have spreads of 1/16h or less. A 0.2% rading cos applied o he same porfolio resuls would reduce he reurn o 3.54% or 4.55% annualized. In his chaper, he TSD mehod was applied o financial ime series. Using emporal paern clusers from single and muliple ime series as a rading ool has yielded significan resuls. Even wih a complex, nonsaionary ime series like sock price and volume, he TSD mehod uncovers emporal paerns ha are boh characerisic and predicive.
187 174 Chaper 9 Conclusions and Fuure Effors Through he novel Time Series Daa ining (TSD) framework and is associaed mehods, his disseraion has made an original and fundamenal conribuion o he fields of ime series analysis and daa mining. The key TSD conceps of even, even characerizaion funcion, emporal paern, emporal paern cluser, ime-delay embedding, phase space, augmened phase space, objecive funcion, and opimizaion were reviewed, seing up he framework from which o develop TSD mehods. Chapers 4 and 6 developed TSD mehods o find opimal emporal paern clusers ha boh characerize and predic ime series evens. TSD mehods were creaed for discovering boh single and muliple emporal paern clusers in single and muli-dimensional ime series. Addiionally, a se of filering and ime series windowing echniques was adaped o allow predicion of nonsaionary evens. This disseraion has demonsraed ha mehods based on he TSD framework successfully characerize and predic complex, nonperiodic, irregular, and chaoic ime series. This was done, firs, hrough a se of explanaory and basic examples ha demonsraed he TSD process. TSD mehods were hen successfully applied o characerizing and predicing complex, nonsaionary, chaoic ime series evens from boh he engineering and financial domains. Given a muli-dimensional ime series generaed by sensors on a welding saion, he TSD framework was able o, wih a high degree of accuracy, characerize and predic meal drople releases. In he financial domain, he TSD framework was able o generae a rading-edge by characerizing and predicing sock price evens.
188 Chaper 9 Conclusions and Fuure Effors 175 Fuure effors will fall ino hree caegories: heoreical, applicaion, and performance. Theoreical research will be conduced o deermine he required dimension of he reconsruced phase space given an arbirary number of observable saes. There are many research applicaions for TSD, including: high frequency financial even predicion, incipien faul predicion in inducion moor-drive sysems, and characerizaion of hear fibrillaion. As he ime series daa ses grow larger, he compuaional effor required o find hidden emporal paerns grows, requiring higher performance implemenaions of he TSD mehods. As discussed in Chaper 2, Takens proved ha a 2Q+1 dimensional phase space formed using ime-delay embedding is guaraneed o be an embedding of, i.e., opologically equivalen o, an original Q-dimensional sae space. This heorem is based on using one observable sae o reconsruc he sae space. Povinelli and Feng showed experimenally in [2] ha using muliple observable saes can yield beer resuls. The unanswered heoreical quesion is: Wha phase space dimension is required for an arbirary number of observable saes so ha he phase space is opologically equivalen o he original sae space? I is obvious ha when all Q saes are observable, hen he reconsruced phase space need only be Q-dimensional. Fuure research effors will invesigae he relaionship beween he number of observable saes n and he required phase space dimensionaliy when 1< n< Q. One of he fuure applicaion effors will be o creae a synergy beween he research of Demerdash and Bangura, which demonsraed he powerful abiliies of he Time-Sepping Coupled Finie Elemen-Sae Space (TSCFE-SS) mehod in predicing a priori characerisic waveforms of healhy and fauly moor performance characerisics
189 Chaper 9 Conclusions and Fuure Effors 176 [60-65], and he Time Series Daa ining (TSD) framework presened in his disseraion o characerizing and predicing incipien moor fauls. Improving compuaional performance will be addressed hrough wo research direcions. One direcion is o invesigae alernaive global opimizaion mehods such as inerval branch and bound. A second parallel direcion is o invesigae disribued and parallel implemenaions of he TSD mehods. Through he creaion of he novel TSD framework and mehods, which have been validaed on complex real-world ime series, his disseraion has made a significan conribuion o he sae of he ar in he fields of ime series analysis and daa mining.
190 177 References [1] S.. Pandi and S.-. Wu, Time series and sysem analysis, wih applicaions. New York: Wiley, [2] R. J. Povinelli and X. Feng, Daa ining of uliple Nonsaionary Time Series, proceedings of Arificial Neural Neworks in Engineering, S. Louis, issouri, 1999, pp [3] R. J. Povinelli and X. Feng, Temporal Paern Idenificaion of Time Series Daa using Paern Waveles and Geneic Algorihms, proceedings of Arificial Neural Neworks in Engineering, S. Louis, issouri, 1998, pp [4] G. E. P. Box and G.. Jenkins, Time series analysis: forecasing and conrol, Rev. ed. San Francisco: Holden-Day, [5] B. L. Bowerman and R. T. O'Connell, Forecasing and ime series: an applied approach, 3rd ed. Belmon, California: Duxbury Press, [6] U.. Fayyad, G. Piaesky-Shapiro, P. Smyh, and R. Uhursamy, Advances in knowledge discovery and daa mining. enlo Park, California: AAAI Press, [7] S.. Weiss and N. Indurkhya, Predicive daa mining: a pracical guide. San Francisco: organ Kaufmann, [8] R. A. Gabel and R. A. Robers, Signals and linear sysems, 2nd ed. New York: Wiley, [9] S. Haykin, Adapive filer heory, 3rd ed. Upper Saddle River, New Jersey: Prenice Hall, [10] C. K. Chui, An inroducion o waveles. Boson: Academic Press, [11] C. K. Chui, Waveles: a uorial in heory and applicaions. Boson: Academic Press, [12] I. Daubechies, Ten lecures on waveles. Philadelphia: Sociey for Indusrial and Applied ahemaics, [13] E. Hernandez and G. L. Weiss, A firs course on waveles. Boca Raon, Florida: CRC Press, [14] P. R. assopus, Fracal funcions, fracal surfaces, and waveles. San Diego: Academic Press, [15] T. H. Koornwinder, Waveles: an elemenary reamen of heory and applicaions. River Edge, New Jersey: World Scienific, [16] G. Kaiser, A friendly guide o waveles. Boson: Birkhäuser, [17] G. Srang and T. Nguyen, Waveles and filer banks. Wellesley, assachuses: Wellesley-Cambridge Press, [18] R. Polikar, The Engineer's Ulimae Guide To Wavele Analysis - The Wavele Tuorial, 2nd ed. available a hp:// 1996, cied 1 Aug [19] D. E. Goldberg, Geneic algorihms in search, opimizaion, and machine learning. Reading, assachuses: Addison-Wesley, [20] R. J. Povinelli and X. Feng, Improving Geneic Algorihms Performance By Hashing Finess Values, proceedings of Arificial Neural Neworks in Engineering, S. Louis, issouri, 1999, pp
191 References 178 [21] J. Heiköer and D. Beasley, The Hich-Hiker's Guide o Evoluionary Compuaion (FAQ for comp.ai.geneic), 5.2 ed. available a hp:// 1997, cied 1 Aug [22] R. L. Haup and S. E. Haup, Pracical geneic algorihms. New York: Wiley, [23] Z. ichalewicz, Geneic algorihms + daa srucures = evoluion programs, 3rd rev. and exended ed. Berlin: Springer, [24] E. Walers, Design of efficien FIR digial filers using geneic algorihms, asers Thesis, arquee Universiy, [25] G. Deboeck, Trading on he edge: neural, geneic, and fuzzy sysems for chaoic financial markes. New York: Wiley, [26] G. R. Harik, E. Canú-Paz, D. E. Goldberg, and B. L. iller, The gambler's ruin problem, geneic algorihms, and he sizing of populaions, proceedings of IEEE Conference on Evoluionary Compuaion, 1997, pp [27] J. H. Holland, Adapaion in naural and arificial sysems: an inroducory analysis wih applicaions o biology, conrol, and arificial inelligence, 1s IT Press ed. Cambridge, assachuses: IT Press, [28] H. D. I. Abarbanel, Analysis of observed chaoic daa. New York: Springer, [29] A. J. Crilly, R. A. Earnshaw, and H. Jones, Applicaions of fracals and chaos. Berlin: Springer, [30] N. B. Tufillaro, T. Abbo, and J. Reilly, An experimenal approach o nonlinear dynamics and chaos. Redwood Ciy, California: Addison-Wesley, [31] E. E. Peers, Chaos and order in he capial markes: a new view of cycles, prices, and marke volailiy, 2nd ed. New York: Wiley, [32] E. E. Peers, Fracal marke analysis: applying chaos heory o invesmen and economics. New York: Wiley, [33] R. Cawley and G.-H. Hsu, Chaoic Noise Reducion by Local-Geomeric- Projecion wih a Reference Time Series, proceedings of The Chaos Paradigm: Developmens and Applicaions in Engineering and Science, ysic, Connecicu, 1993, pp [34] R. Cawley, G.-H. Hsu, and L. W. Salvino, Deecion and Diagnosis of Dynamics in Time Series Daa: Theory of Noise Reducion, proceedings of The Chaos Paradigm: Developmens and Applicaions in Engineering and Science, ysic, Connecicu, 1993, pp [35] J. Iwanski and E. Bradley, Recurrence plo analysis: To embed or no o embed?, Chaos, vol. 8, pp , [36] F. Takens, Deecing srange aracors in urbulence, proceedings of Dynamical Sysems and Turbulence, Warwick, 1980, pp [37] T. Sauer, J. A. Yorke, and. Casdagli, Embedology, Journal of Saisical Physics, vol. 65, pp , [38] Aussie Gold Hisory, available a hp:// cied 13 Sep [39] Newmon - Core Gold Values, available a hp:// cied 10 Sep 1998.
192 References 179 [40] U.. Fayyad, G. Piaesky-Shapiro, and P. Smyh, From Daa ining o Knowledge Discovery: An Overview, in Advances in knowledge discovery and daa mining, U.. Fayyad, G. Piaesky-Shapiro, P. Smyh, and R. Uhursamy, Eds. enlo Park, California: AAAI Press, [41] A. A. Freias and S. H. Lavingon, ining very large daabases wih parallel processing. Boson: Kluwer Academic Publishers, [42] H. Liu and H. ooda, Feaure selecion for knowledge discovery and daa mining. Boson: Kluwer Academic Publishers, [43] P. Cabena and Inernaional Business achines Corporaion., Discovering daa mining : from concep o implemenaion. Upper Saddle River, New Jersey: Prenice Hall, [44] P. Gray and H. J. Wason, Decision suppor in he daa warehouse. Upper Saddle River, New Jersey: Prenice Hall, [45] S. Iyanaga and Y. Kawada, Encyclopedic dicionary of mahemaics by he ahemaical Sociey of Japan. Cambridge, assachuses: IT Press, [46] E. Bradley, Analysis of Time Series, in An inroducion o inelligen daa analysis,. Berhold and D. Hand, Eds. New York: Springer, 1999, pp [47] D. J. Bernd and J. Clifford, Finding Paerns in Time Series: A Dynamic Programming Approach, in Advances in knowledge discovery and daa mining, U.. Fayyad, G. Piaesky-Shapiro, P. Smyh, and R. Uhursamy, Eds. enlo Park, California: AAAI Press, 1996, pp [48] E. Keogh and P. Smyh, A Probabilisic Approach o Fas Paern aching in Time Series Daabases, proceedings of Third Inernaional Conference on Knowledge Discovery and Daa ining, Newpor Beach, California, [49] E. Keogh, A Fas and Robus ehod for Paern aching in Time Series Daabases, proceedings of 9h Inernaional Conference on Tools wih Arificial Inelligence (TAI '97), [50] E. J. Keogh and. J. Pazzani, An enhanced represenaion of ime series which allows fas and accurae classificaion, clusering and relevance feedback, proceedings of AAAI Workshop on Predicing he Fuure: AI Approaches o Time-Series Analysis, adison, Wisconsin, [51]. T. Rosensein and P. R. Cohen, Coninuous Caegories For a obile Robo, proceedings of Sixeenh Naional Conference on Arificial Inelligence, [52] H. D. I. Abarbanel, R. Brown, J. J. Sidorowich, and L. S. Tsimring, The analysis of observed chaoic daa in physical sysems, Reviews of odern Physics, vol. 65, pp , [53] E. W. inium, Saisical reasoning in psychology and educaion, 2nd ed. New York: Wiley, [54] D. J. Sheskin, Handbook of parameric and nonparameric saisical procedures. Boca Raon, Florida: CRC Press, [55] A. Papoulis, Probabiliy, random variables, and sochasic processes, 3rd ed. New York: cgraw-hill, [56] D. G. Luenberger, Opimizaion by vecor space mehods. New York: Wiley, [57] Using malab: version 5. Naick, assachuses: The ahworks, Inc., 1998.
193 References 180 [58] F. K. Reilly and K. C. Brown, Invesmen analysis and porfolio managemen, 5h ed. For Worh, Texas: Dryden Press, [59] J. D. Freeman, Behind he smoke and mirrors: Gauging he inegriy of invesmen simulaions, Financial Analyss Journal, vol. 48, pp , [60] J. F. Bangura and N. A. Demerdash, Simulaion of Inverer-Fed Inducion oor Drives wih Pulse-Widh odulaion by a Time-Sepping Coupled Finie Elemen-Flux Linkage-Based Sae Space odel, IEEE Transacions on Energy Conversion, vol. 14, pp , [61] J. F. Bangura and N. A. Demerdash, Comparison Beween Characerizaion and Diagnosis of Broken Bars/End-Ring Connecors and Airgap Eccenriciies of Inducion moors in ASDs Using a Coupled Finie Elemen-Sae Space ehod, IEEE Transacions on Energy Conversion, Paper No. PE313ECa (04-99). [62] N. A. O. Demerdash and J. F. Bangura, Characerizaion of Inducion oors in Adjusable-Speed Drives Using a Time-Sepping Coupled Finie-Elemen Sae- Space ehod Including Experimenal Validaion, IEEE Transacions on Indusry Applicaions, vol. 35, pp , [63] J. F. Bangura and N. A. O. Demerdash, Effecs of Broken Bars/End-Ring Connecors and Airgap Eccenriciies on Ohmic and Core Losses of Inducion oors in ASDs Using a Coupled Finie Elemen-Sae Space ehod, IEEE Transacions on Energy Conversion, Paper No. PE312EC (04-99). [64] J. F. Bangura, A Time-Sepping Coupled Finie Elemen-Sae Space odeling for On-Line Diagnosis of Squirrel-Cage Inducion oor Fauls, Ph.D. Disseraion, arquee Universiy, June [65] N. A. Demerdash and J. F. Bangura, A Time-Sepping Coupled Finie Elemen- Sae Space odeling for Analysis and Performance Qualiy Assessmen of Inducion oors in Adjusable Speed Drives Applicaions, proceedings of Naval Symposium on Elecric achines, Newpor, Rhode Island, 1997, pp
TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS
TEMPORAL PATTERN IDENTIFICATION OF TIME SERIES DATA USING PATTERN WAVELETS AND GENETIC ALGORITHMS RICHARD J. POVINELLI AND XIN FENG Deparmen of Elecrical and Compuer Engineering Marquee Universiy, P.O.
Chapter 8: Regression with Lagged Explanatory Variables
Chaper 8: Regression wih Lagged Explanaory Variables Time series daa: Y for =1,..,T End goal: Regression model relaing a dependen variable o explanaory variables. Wih ime series new issues arise: 1. One
INTRODUCTION TO FORECASTING
INTRODUCTION TO FORECASTING INTRODUCTION: Wha is a forecas? Why do managers need o forecas? A forecas is an esimae of uncerain fuure evens (lierally, o "cas forward" by exrapolaing from pas and curren
Journal Of Business & Economics Research September 2005 Volume 3, Number 9
Opion Pricing And Mone Carlo Simulaions George M. Jabbour, (Email: [email protected]), George Washingon Universiy Yi-Kang Liu, ([email protected]), George Washingon Universiy ABSTRACT The advanage of Mone Carlo
Chapter 8 Student Lecture Notes 8-1
Chaper Suden Lecure Noes - Chaper Goals QM: Business Saisics Chaper Analyzing and Forecasing -Series Daa Afer compleing his chaper, you should be able o: Idenify he componens presen in a ime series Develop
Measuring macroeconomic volatility Applications to export revenue data, 1970-2005
FONDATION POUR LES ETUDES ET RERS LE DEVELOPPEMENT INTERNATIONAL Measuring macroeconomic volailiy Applicaions o expor revenue daa, 1970-005 by Joël Cariolle Policy brief no. 47 March 01 The FERDI is a
DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR
Invesmen Managemen and Financial Innovaions, Volume 4, Issue 3, 7 33 DOES TRADING VOLUME INFLUENCE GARCH EFFECTS? SOME EVIDENCE FROM THE GREEK MARKET WITH SPECIAL REFERENCE TO BANKING SECTOR Ahanasios
Time Series Analysis Using SAS R Part I The Augmented Dickey-Fuller (ADF) Test
ABSTRACT Time Series Analysis Using SAS R Par I The Augmened Dickey-Fuller (ADF) Tes By Ismail E. Mohamed The purpose of his series of aricles is o discuss SAS programming echniques specifically designed
Multiprocessor Systems-on-Chips
Par of: Muliprocessor Sysems-on-Chips Edied by: Ahmed Amine Jerraya and Wayne Wolf Morgan Kaufmann Publishers, 2005 2 Modeling Shared Resources Conex swiching implies overhead. On a processing elemen,
ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS
ANALYSIS AND COMPARISONS OF SOME SOLUTION CONCEPTS FOR STOCHASTIC PROGRAMMING PROBLEMS R. Caballero, E. Cerdá, M. M. Muñoz and L. Rey () Deparmen of Applied Economics (Mahemaics), Universiy of Málaga,
The naive method discussed in Lecture 1 uses the most recent observations to forecast future values. That is, Y ˆ t + 1
Business Condiions & Forecasing Exponenial Smoohing LECTURE 2 MOVING AVERAGES AND EXPONENTIAL SMOOTHING OVERVIEW This lecure inroduces ime-series smoohing forecasing mehods. Various models are discussed,
Diane K. Michelson, SAS Institute Inc, Cary, NC Annie Dudley Zangi, SAS Institute Inc, Cary, NC
ABSTRACT Paper DK-02 SPC Daa Visualizaion of Seasonal and Financial Daa Using JMP Diane K. Michelson, SAS Insiue Inc, Cary, NC Annie Dudley Zangi, SAS Insiue Inc, Cary, NC JMP Sofware offers many ypes
Automatic measurement and detection of GSM interferences
Auomaic measuremen and deecion of GSM inerferences Poor speech qualiy and dropped calls in GSM neworks may be caused by inerferences as a resul of high raffic load. The radio nework analyzers from Rohde
Hedging with Forwards and Futures
Hedging wih orwards and uures Hedging in mos cases is sraighforward. You plan o buy 10,000 barrels of oil in six monhs and you wish o eliminae he price risk. If you ake he buy-side of a forward/fuures
Vector Autoregressions (VARs): Operational Perspectives
Vecor Auoregressions (VARs): Operaional Perspecives Primary Source: Sock, James H., and Mark W. Wason, Vecor Auoregressions, Journal of Economic Perspecives, Vol. 15 No. 4 (Fall 2001), 101-115. Macroeconomericians
Chapter 6: Business Valuation (Income Approach)
Chaper 6: Business Valuaion (Income Approach) Cash flow deerminaion is one of he mos criical elemens o a business valuaion. Everyhing may be secondary. If cash flow is high, hen he value is high; if he
USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES
USE OF EDUCATION TECHNOLOGY IN ENGLISH CLASSES Mehme Nuri GÖMLEKSİZ Absrac Using educaion echnology in classes helps eachers realize a beer and more effecive learning. In his sudy 150 English eachers were
DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS
DYNAMIC MODELS FOR VALUATION OF WRONGFUL DEATH PAYMENTS Hong Mao, Shanghai Second Polyechnic Universiy Krzyszof M. Osaszewski, Illinois Sae Universiy Youyu Zhang, Fudan Universiy ABSTRACT Liigaion, exper
GoRA. For more information on genetics and on Rheumatoid Arthritis: Genetics of Rheumatoid Arthritis. Published work referred to in the results:
For more informaion on geneics and on Rheumaoid Arhriis: Published work referred o in he resuls: The geneics revoluion and he assaul on rheumaoid arhriis. A review by Michael Seldin, Crisopher Amos, Ryk
MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR
MACROECONOMIC FORECASTS AT THE MOF A LOOK INTO THE REAR VIEW MIRROR The firs experimenal publicaion, which summarised pas and expeced fuure developmen of basic economic indicaors, was published by he Minisry
SPEC model selection algorithm for ARCH models: an options pricing evaluation framework
Applied Financial Economics Leers, 2008, 4, 419 423 SEC model selecion algorihm for ARCH models: an opions pricing evaluaion framework Savros Degiannakis a, * and Evdokia Xekalaki a,b a Deparmen of Saisics,
The Transport Equation
The Transpor Equaion Consider a fluid, flowing wih velociy, V, in a hin sraigh ube whose cross secion will be denoed by A. Suppose he fluid conains a conaminan whose concenraion a posiion a ime will be
Usefulness of the Forward Curve in Forecasting Oil Prices
Usefulness of he Forward Curve in Forecasing Oil Prices Akira Yanagisawa Leader Energy Demand, Supply and Forecas Analysis Group The Energy Daa and Modelling Cener Summary When people analyse oil prices,
Can Individual Investors Use Technical Trading Rules to Beat the Asian Markets?
Can Individual Invesors Use Technical Trading Rules o Bea he Asian Markes? INTRODUCTION In radiional ess of he weak-form of he Efficien Markes Hypohesis, price reurn differences are found o be insufficien
Duration and Convexity ( ) 20 = Bond B has a maturity of 5 years and also has a required rate of return of 10%. Its price is $613.
Graduae School of Business Adminisraion Universiy of Virginia UVA-F-38 Duraion and Convexiy he price of a bond is a funcion of he promised paymens and he marke required rae of reurn. Since he promised
How To Calculate Price Elasiciy Per Capia Per Capi
Price elasiciy of demand for crude oil: esimaes for 23 counries John C.B. Cooper Absrac This paper uses a muliple regression model derived from an adapaion of Nerlove s parial adjusmen model o esimae boh
The Greek financial crisis: growing imbalances and sovereign spreads. Heather D. Gibson, Stephan G. Hall and George S. Tavlas
The Greek financial crisis: growing imbalances and sovereign spreads Heaher D. Gibson, Sephan G. Hall and George S. Tavlas The enry The enry of Greece ino he Eurozone in 2001 produced a dividend in he
Appendix D Flexibility Factor/Margin of Choice Desktop Research
Appendix D Flexibiliy Facor/Margin of Choice Deskop Research Cheshire Eas Council Cheshire Eas Employmen Land Review Conens D1 Flexibiliy Facor/Margin of Choice Deskop Research 2 Final Ocober 2012 \\GLOBAL.ARUP.COM\EUROPE\MANCHESTER\JOBS\200000\223489-00\4
Morningstar Investor Return
Morningsar Invesor Reurn Morningsar Mehodology Paper Augus 31, 2010 2010 Morningsar, Inc. All righs reserved. The informaion in his documen is he propery of Morningsar, Inc. Reproducion or ranscripion
The Application of Multi Shifts and Break Windows in Employees Scheduling
The Applicaion of Muli Shifs and Brea Windows in Employees Scheduling Evy Herowai Indusrial Engineering Deparmen, Universiy of Surabaya, Indonesia Absrac. One mehod for increasing company s performance
Principal components of stock market dynamics. Methodology and applications in brief (to be updated ) Andrei Bouzaev, bouzaev@ya.
Principal componens of sock marke dynamics Mehodology and applicaions in brief o be updaed Andrei Bouzaev, [email protected] Why principal componens are needed Objecives undersand he evidence of more han one
CHARGE AND DISCHARGE OF A CAPACITOR
REFERENCES RC Circuis: Elecrical Insrumens: Mos Inroducory Physics exs (e.g. A. Halliday and Resnick, Physics ; M. Sernheim and J. Kane, General Physics.) This Laboraory Manual: Commonly Used Insrumens:
Distributing Human Resources among Software Development Projects 1
Disribuing Human Resources among Sofware Developmen Proecs Macario Polo, María Dolores Maeos, Mario Piaini and rancisco Ruiz Summary This paper presens a mehod for esimaing he disribuion of human resources
Why Did the Demand for Cash Decrease Recently in Korea?
Why Did he Demand for Cash Decrease Recenly in Korea? Byoung Hark Yoo Bank of Korea 26. 5 Absrac We explores why cash demand have decreased recenly in Korea. The raio of cash o consumpion fell o 4.7% in
DETERMINISTIC INVENTORY MODEL FOR ITEMS WITH TIME VARYING DEMAND, WEIBULL DISTRIBUTION DETERIORATION AND SHORTAGES KUN-SHAN WU
Yugoslav Journal of Operaions Research 2 (22), Number, 6-7 DEERMINISIC INVENORY MODEL FOR IEMS WIH IME VARYING DEMAND, WEIBULL DISRIBUION DEERIORAION AND SHORAGES KUN-SHAN WU Deparmen of Bussines Adminisraion
Stock Price Prediction Using the ARIMA Model
2014 UKSim-AMSS 16h Inernaional Conference on Compuer Modelling and Simulaion Sock Price Predicion Using he ARIMA Model 1 Ayodele A. Adebiyi., 2 Aderemi O. Adewumi 1,2 School of Mahemaic, Saisics & Compuer
A New Type of Combination Forecasting Method Based on PLS
American Journal of Operaions Research, 2012, 2, 408-416 hp://dx.doi.org/10.4236/ajor.2012.23049 Published Online Sepember 2012 (hp://www.scirp.org/journal/ajor) A New Type of Combinaion Forecasing Mehod
Gene Regulatory Network Discovery from Time-Series Gene Expression Data A Computational Intelligence Approach
Gene Regulaory Nework Discovery from Time-Series Gene Expression Daa A Compuaional Inelligence Approach Nikola K. Kasabov 1, Zeke S. H. Chan 1, Vishal Jain 1, Igor Sidorov 2 and Dimier S. Dimirov 2 1 Knowledge
Chapter 4: Exponential and Logarithmic Functions
Chaper 4: Eponenial and Logarihmic Funcions Secion 4.1 Eponenial Funcions... 15 Secion 4. Graphs of Eponenial Funcions... 3 Secion 4.3 Logarihmic Funcions... 4 Secion 4.4 Logarihmic Properies... 53 Secion
11/6/2013. Chapter 14: Dynamic AD-AS. Introduction. Introduction. Keeping track of time. The model s elements
Inroducion Chaper 14: Dynamic D-S dynamic model of aggregae and aggregae supply gives us more insigh ino how he economy works in he shor run. I is a simplified version of a DSGE model, used in cuing-edge
How To Predict A Person'S Behavior
Informaion Theoreic Approaches for Predicive Models: Resuls and Analysis Monica Dinculescu Supervised by Doina Precup Absrac Learning he inernal represenaion of parially observable environmens has proven
Mathematics in Pharmacokinetics What and Why (A second attempt to make it clearer)
Mahemaics in Pharmacokineics Wha and Why (A second aemp o make i clearer) We have used equaions for concenraion () as a funcion of ime (). We will coninue o use hese equaions since he plasma concenraions
Chapter 1.6 Financial Management
Chaper 1.6 Financial Managemen Par I: Objecive ype quesions and answers 1. Simple pay back period is equal o: a) Raio of Firs cos/ne yearly savings b) Raio of Annual gross cash flow/capial cos n c) = (1
Real-time Particle Filters
Real-ime Paricle Filers Cody Kwok Dieer Fox Marina Meilă Dep. of Compuer Science & Engineering, Dep. of Saisics Universiy of Washingon Seale, WA 9895 ckwok,fox @cs.washingon.edu, [email protected] Absrac
Statistical Analysis with Little s Law. Supplementary Material: More on the Call Center Data. by Song-Hee Kim and Ward Whitt
Saisical Analysis wih Lile s Law Supplemenary Maerial: More on he Call Cener Daa by Song-Hee Kim and Ward Whi Deparmen of Indusrial Engineering and Operaions Research Columbia Universiy, New York, NY 17-99
Market Liquidity and the Impacts of the Computerized Trading System: Evidence from the Stock Exchange of Thailand
36 Invesmen Managemen and Financial Innovaions, 4/4 Marke Liquidiy and he Impacs of he Compuerized Trading Sysem: Evidence from he Sock Exchange of Thailand Sorasar Sukcharoensin 1, Pariyada Srisopisawa,
Relationships between Stock Prices and Accounting Information: A Review of the Residual Income and Ohlson Models. Scott Pirie* and Malcolm Smith**
Relaionships beween Sock Prices and Accouning Informaion: A Review of he Residual Income and Ohlson Models Sco Pirie* and Malcolm Smih** * Inernaional Graduae School of Managemen, Universiy of Souh Ausralia
Individual Health Insurance April 30, 2008 Pages 167-170
Individual Healh Insurance April 30, 2008 Pages 167-170 We have received feedback ha his secion of he e is confusing because some of he defined noaion is inconsisen wih comparable life insurance reserve
A Note on Using the Svensson procedure to estimate the risk free rate in corporate valuation
A Noe on Using he Svensson procedure o esimae he risk free rae in corporae valuaion By Sven Arnold, Alexander Lahmann and Bernhard Schwezler Ocober 2011 1. The risk free ineres rae in corporae valuaion
Making a Faster Cryptanalytic Time-Memory Trade-Off
Making a Faser Crypanalyic Time-Memory Trade-Off Philippe Oechslin Laboraoire de Securié e de Crypographie (LASEC) Ecole Polyechnique Fédérale de Lausanne Faculé I&C, 1015 Lausanne, Swizerland [email protected]
Inductance and Transient Circuits
Chaper H Inducance and Transien Circuis Blinn College - Physics 2426 - Terry Honan As a consequence of Faraday's law a changing curren hrough one coil induces an EMF in anoher coil; his is known as muual
ARCH 2013.1 Proceedings
Aricle from: ARCH 213.1 Proceedings Augus 1-4, 212 Ghislain Leveille, Emmanuel Hamel A renewal model for medical malpracice Ghislain Léveillé École d acuaria Universié Laval, Québec, Canada 47h ARC Conference
Single-machine Scheduling with Periodic Maintenance and both Preemptive and. Non-preemptive jobs in Remanufacturing System 1
Absrac number: 05-0407 Single-machine Scheduling wih Periodic Mainenance and boh Preempive and Non-preempive jobs in Remanufacuring Sysem Liu Biyu hen Weida (School of Economics and Managemen Souheas Universiy
Chapter 7. Response of First-Order RL and RC Circuits
Chaper 7. esponse of Firs-Order L and C Circuis 7.1. The Naural esponse of an L Circui 7.2. The Naural esponse of an C Circui 7.3. The ep esponse of L and C Circuis 7.4. A General oluion for ep and Naural
The Real Business Cycle paradigm. The RBC model emphasizes supply (technology) disturbances as the main source of
Prof. Harris Dellas Advanced Macroeconomics Winer 2001/01 The Real Business Cycle paradigm The RBC model emphasizes supply (echnology) disurbances as he main source of macroeconomic flucuaions in a world
DEMAND FORECASTING MODELS
DEMAND FORECASTING MODELS Conens E-2. ELECTRIC BILLED SALES AND CUSTOMER COUNTS Sysem-level Model Couny-level Model Easside King Couny-level Model E-6. ELECTRIC PEAK HOUR LOAD FORECASTING Sysem-level Forecas
Strategic Optimization of a Transportation Distribution Network
Sraegic Opimizaion of a Transporaion Disribuion Nework K. John Sophabmixay, Sco J. Mason, Manuel D. Rossei Deparmen of Indusrial Engineering Universiy of Arkansas 4207 Bell Engineering Cener Fayeeville,
Cointegration: The Engle and Granger approach
Coinegraion: The Engle and Granger approach Inroducion Generally one would find mos of he economic variables o be non-saionary I(1) variables. Hence, any equilibrium heories ha involve hese variables require
Option Put-Call Parity Relations When the Underlying Security Pays Dividends
Inernaional Journal of Business and conomics, 26, Vol. 5, No. 3, 225-23 Opion Pu-all Pariy Relaions When he Underlying Securiy Pays Dividends Weiyu Guo Deparmen of Finance, Universiy of Nebraska Omaha,
Hotel Room Demand Forecasting via Observed Reservation Information
Proceedings of he Asia Pacific Indusrial Engineering & Managemen Sysems Conference 0 V. Kachivichyanuul, H.T. Luong, and R. Piaaso Eds. Hoel Room Demand Forecasing via Observed Reservaion Informaion aragain
Nikkei Stock Average Volatility Index Real-time Version Index Guidebook
Nikkei Sock Average Volailiy Index Real-ime Version Index Guidebook Nikkei Inc. Wih he modificaion of he mehodology of he Nikkei Sock Average Volailiy Index as Nikkei Inc. (Nikkei) sars calculaing and
The Kinetics of the Stock Markets
Asia Pacific Managemen Review (00) 7(1), 1-4 The Kineics of he Sock Markes Hsinan Hsu * and Bin-Juin Lin ** (received July 001; revision received Ocober 001;acceped November 001) This paper applies he
Task is a schedulable entity, i.e., a thread
Real-Time Scheduling Sysem Model Task is a schedulable eniy, i.e., a hread Time consrains of periodic ask T: - s: saring poin - e: processing ime of T - d: deadline of T - p: period of T Periodic ask T
PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE
Profi Tes Modelling in Life Assurance Using Spreadshees PROFIT TEST MODELLING IN LIFE ASSURANCE USING SPREADSHEETS PART ONE Erik Alm Peer Millingon 2004 Profi Tes Modelling in Life Assurance Using Spreadshees
SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS
SELF-EVALUATION FOR VIDEO TRACKING SYSTEMS Hao Wu and Qinfen Zheng Cenre for Auomaion Research Dep. of Elecrical and Compuer Engineering Universiy of Maryland, College Park, MD-20742 {wh2003, qinfen}@cfar.umd.edu
Small and Large Trades Around Earnings Announcements: Does Trading Behavior Explain Post-Earnings-Announcement Drift?
Small and Large Trades Around Earnings Announcemens: Does Trading Behavior Explain Pos-Earnings-Announcemen Drif? Devin Shanhikumar * Firs Draf: Ocober, 2002 This Version: Augus 19, 2004 Absrac This paper
SEASONAL ADJUSTMENT. 1 Introduction. 2 Methodology. 3 X-11-ARIMA and X-12-ARIMA Methods
SEASONAL ADJUSTMENT 1 Inroducion 2 Mehodology 2.1 Time Series and Is Componens 2.1.1 Seasonaliy 2.1.2 Trend-Cycle 2.1.3 Irregulariy 2.1.4 Trading Day and Fesival Effecs 3 X-11-ARIMA and X-12-ARIMA Mehods
Analogue and Digital Signal Processing. First Term Third Year CS Engineering By Dr Mukhtiar Ali Unar
Analogue and Digial Signal Processing Firs Term Third Year CS Engineering By Dr Mukhiar Ali Unar Recommended Books Haykin S. and Van Veen B.; Signals and Sysems, John Wiley& Sons Inc. ISBN: 0-7-380-7 Ifeachor
Niche Market or Mass Market?
Niche Marke or Mass Marke? Maxim Ivanov y McMaser Universiy July 2009 Absrac The de niion of a niche or a mass marke is based on he ranking of wo variables: he monopoly price and he produc mean value.
Predicting Stock Market Index Trading Signals Using Neural Networks
Predicing Sock Marke Index Trading Using Neural Neworks C. D. Tilakarane, S. A. Morris, M. A. Mammadov, C. P. Hurs Cenre for Informaics and Applied Opimizaion School of Informaion Technology and Mahemaical
TSG-RAN Working Group 1 (Radio Layer 1) meeting #3 Nynashamn, Sweden 22 nd 26 th March 1999
TSG-RAN Working Group 1 (Radio Layer 1) meeing #3 Nynashamn, Sweden 22 nd 26 h March 1999 RAN TSGW1#3(99)196 Agenda Iem: 9.1 Source: Tile: Documen for: Moorola Macro-diversiy for he PRACH Discussion/Decision
COMPUTATION OF CENTILES AND Z-SCORES FOR HEIGHT-FOR-AGE, WEIGHT-FOR-AGE AND BMI-FOR-AGE
COMPUTATION OF CENTILES AND Z-SCORES FOR HEIGHT-FOR-AGE, WEIGHT-FOR-AGE AND BMI-FOR-AGE The mehod used o consruc he 2007 WHO references relied on GAMLSS wih he Box-Cox power exponenial disribuion (Rigby
Contrarian insider trading and earnings management around seasoned equity offerings; SEOs
Journal of Finance and Accounancy Conrarian insider rading and earnings managemen around seasoned equiy offerings; SEOs ABSTRACT Lorea Baryeh Towson Universiy This sudy aemps o resolve he differences in
4. International Parity Conditions
4. Inernaional ariy ondiions 4.1 urchasing ower ariy he urchasing ower ariy ( heory is one of he early heories of exchange rae deerminaion. his heory is based on he concep ha he demand for a counry's currency
MALAYSIAN FOREIGN DIRECT INVESTMENT AND GROWTH: DOES STABILITY MATTER? Jarita Duasa 1
Journal of Economic Cooperaion, 8, (007), 83-98 MALAYSIAN FOREIGN DIRECT INVESTMENT AND GROWTH: DOES STABILITY MATTER? Jaria Duasa 1 The objecive of he paper is wofold. Firs, is o examine causal relaionship
Model-Based Monitoring in Large-Scale Distributed Systems
Model-Based Monioring in Large-Scale Disribued Sysems Diploma Thesis Carsen Reimann Chemniz Universiy of Technology Faculy of Compuer Science Operaing Sysem Group Advisors: Prof. Dr. Winfried Kalfa Dr.
Term Structure of Prices of Asian Options
Term Srucure of Prices of Asian Opions Jirô Akahori, Tsuomu Mikami, Kenji Yasuomi and Teruo Yokoa Dep. of Mahemaical Sciences, Risumeikan Universiy 1-1-1 Nojihigashi, Kusasu, Shiga 525-8577, Japan E-mail:
A Re-examination of the Joint Mortality Functions
Norh merican cuarial Journal Volume 6, Number 1, p.166-170 (2002) Re-eaminaion of he Join Morali Funcions bsrac. Heekung Youn, rkad Shemakin, Edwin Herman Universi of S. Thomas, Sain Paul, MN, US Morali
Random Walk in 1-D. 3 possible paths x vs n. -5 For our random walk, we assume the probabilities p,q do not depend on time (n) - stationary
Random Walk in -D Random walks appear in many cones: diffusion is a random walk process undersanding buffering, waiing imes, queuing more generally he heory of sochasic processes gambling choosing he bes
Supplementary Appendix for Depression Babies: Do Macroeconomic Experiences Affect Risk-Taking?
Supplemenary Appendix for Depression Babies: Do Macroeconomic Experiences Affec Risk-Taking? Ulrike Malmendier UC Berkeley and NBER Sefan Nagel Sanford Universiy and NBER Sepember 2009 A. Deails on SCF
Forecasting. Including an Introduction to Forecasting using the SAP R/3 System
Forecasing Including an Inroducion o Forecasing using he SAP R/3 Sysem by James D. Blocher Vincen A. Maber Ashok K. Soni Munirpallam A. Venkaaramanan Indiana Universiy Kelley School of Business February
Risk Modelling of Collateralised Lending
Risk Modelling of Collaeralised Lending Dae: 4-11-2008 Number: 8/18 Inroducion This noe explains how i is possible o handle collaeralised lending wihin Risk Conroller. The approach draws on he faciliies
A Natural Feature-Based 3D Object Tracking Method for Wearable Augmented Reality
A Naural Feaure-Based 3D Objec Tracking Mehod for Wearable Augmened Realiy Takashi Okuma Columbia Universiy / AIST Email: [email protected] Takeshi Kuraa Universiy of Washingon / AIST Email: [email protected]
Stock Trading with Recurrent Reinforcement Learning (RRL) CS229 Application Project Gabriel Molina, SUID 5055783
Sock raing wih Recurren Reinforcemen Learning (RRL) CS9 Applicaion Projec Gabriel Molina, SUID 555783 I. INRODUCION One relaively new approach o financial raing is o use machine learning algorihms o preic
Stochastic Optimal Control Problem for Life Insurance
Sochasic Opimal Conrol Problem for Life Insurance s. Basukh 1, D. Nyamsuren 2 1 Deparmen of Economics and Economerics, Insiue of Finance and Economics, Ulaanbaaar, Mongolia 2 School of Mahemaics, Mongolian
WATER MIST FIRE PROTECTION RELIABILITY ANALYSIS
WATER MIST FIRE PROTECTION RELIABILITY ANALYSIS Shuzhen Xu Research Risk and Reliabiliy Area FM Global Norwood, Massachuses 262, USA David Fuller Engineering Sandards FM Global Norwood, Massachuses 262,
Performance Center Overview. Performance Center Overview 1
Performance Cener Overview Performance Cener Overview 1 ODJFS Performance Cener ce Cener New Performance Cener Model Performance Cener Projec Meeings Performance Cener Execuive Meeings Performance Cener
Optimal Stock Selling/Buying Strategy with reference to the Ultimate Average
Opimal Sock Selling/Buying Sraegy wih reference o he Ulimae Average Min Dai Dep of Mah, Naional Universiy of Singapore, Singapore Yifei Zhong Dep of Mah, Naional Universiy of Singapore, Singapore July
LIFE INSURANCE WITH STOCHASTIC INTEREST RATE. L. Noviyanti a, M. Syamsuddin b
LIFE ISURACE WITH STOCHASTIC ITEREST RATE L. oviyani a, M. Syamsuddin b a Deparmen of Saisics, Universias Padjadjaran, Bandung, Indonesia b Deparmen of Mahemaics, Insiu Teknologi Bandung, Indonesia Absrac.
Feasibility of Quantum Genetic Algorithm in Optimizing Construction Scheduling
Feasibiliy of Quanum Geneic Algorihm in Opimizing Consrucion Scheduling Maser Thesis Baihui Song JUNE 2013 Commiee members: Prof.dr.ir. M.J.C.M. Herogh Dr. M. Blaauboer Dr. ir. H.K.M. van de Ruienbeek
Market Analysis and Models of Investment. Product Development and Whole Life Cycle Costing
The Universiy of Liverpool School of Archiecure and Building Engineering WINDS PROJECT COURSE SYNTHESIS SECTION 3 UNIT 11 Marke Analysis and Models of Invesmen. Produc Developmen and Whole Life Cycle Cosing
Purchasing Power Parity (PPP), Sweden before and after EURO times
School of Economics and Managemen Purchasing Power Pariy (PPP), Sweden before and afer EURO imes - Uni Roo Tes - Coinegraion Tes Masers hesis in Saisics - Spring 2008 Auhors: Mansoor, Rashid Smora, Ami
