IMAGE PATCH ANALYSIS AND CLUSTERING OF SUNSPOTS: A DIMENSIONALITY REDUCTION APPROACH Kevin R. Moon1, Jimmy J. Li1, Véronique Delouille2, Fraser Watson3, Alfred O. Hero III1 1. EECS Dept., University of Michigan; 2. SIDC, Royal Observatory of Belgium; 3. National Solar Observatory, USA
Introduction
Clustering
• Sunspots are associated with active regions (areas of locally increased magnetic flux on the Sun) • Sunspot and active region morphology is correlated with solar flares i.e. sudden increases of the photon flux • Large flares can be disruptive to technology on Earth • Prediction is desirable • Current sunspot classification scheme is the Mt. Wilson scheme • Based on global features identified by eye • Suffers from bias • Recent work has focused on supervised classification techniques • Reduces human bias, but still potentially suboptimal • Goal: build a spatially adaptive descriptive model of the sunspot and active region image modalities for predictive modeling • i.e. perform unsupervised classification on the images • Use both global and local features
• Evidence Accumulating Clustering with MST (EAC-DC) [5] • Forms a metric from the hitting time of two Minimal Spanning Trees grown sequentially from a pair of points • Apply spectral clustering to the resulting dissimilarity matrix • Found to be robust and competitive 1. Learn a dictionary from each image based on information about the intrinsic dimension and spatial and modal correlations 2. Cluster the dictionaries using EAC-DC
Results Intrinsic Dimension Estimation Results
• Current Contributions: a local feature based analysis • We answer the following questions: 1. How many intrinsic parameters are required to describe spatial and modal dependencies? 2. What dependencies exist between the two image modalities and are they captured by linear correlation? 3. What phenomena exist at different scales within the images? • We use this information to cluster the images
• Used 3 × 3 patches • PCA results are within 1 Background Penumbra Umbra STD of the 𝑘𝑘–nn results Single 𝒌𝒌-nn 8.9 4.5 3.4 (except for the umbra in Single PCA (97%) 10.1 4.3 6.3 the single sunspot) Multiple 𝒌𝒌-nn 8.6 4.8 4.0 • Linear decomposition Multiple PCA (97%) 8.9 4.8 3.4 methods are sufficient • Sunspots and magnetic fragments (regions outside of sunspot with magnetic activity) exhibit stronger spatial and modal dependencies (𝑚𝑚 � ≤ 7) • Multiresolution analysis also applied (see paper) Figure 4: Estimated local intrinsic dimension of
Data
CCA Results
Figure 2: Examples of continuum (top) and magnetogram (bottom) images
Methods • Image patches are used to extract local features • Extract patches from both modalities at each pixel into a data matrix (see figure to the right) • #image pixels = 𝑛𝑛, 𝑚𝑚 × 𝑚𝑚 patch size 2 ⇒ Data matrix is 2𝑚𝑚 × 𝑛𝑛
Intrinsic dimension estimation
example images
• 𝜌𝜌1 ≥ 0.9 within sunspot using 3 × 3 patches • Magnetic fragments and edge regions have highest correlation • Magnetic fragments have high positive or negative correlations • ⇒ Processing should include both modalities
• Two image modalities [1] • Continuum (white light) • Magnetogram (mag. field strength) • Masks mark penumbra and umbra [2] • Sunspot group information [3]; e.g. Mt. Wilson label, longitudinal extent, etc.
Figure 1: Masks for the images in Fig. 2
Table 1: Avg. est. intrinsic dimension for 20 images with single sunspots and multiple sunspots using both methods.
Figure 5: Canonical variable images using different patch sizes of the single sunspot image
Clustering Results
• Image dictionaries learned using linear methods (PCA shown) • Some clusters are correlated with sunspot group longitudinal extent • Cluster 5 only includes groups with longitudinal extent ≤ 10° • Cluster 2 has many groups with extent between 2° and 4° • Multidimensional scaling (MDS) used to visualize the results
Figure 3: Mapping of image patches to a single vector
• Used to answer question 1 • Linear method (PCA) • The estimate is the number of eigenvalues of the covariance matrix required to account for a certain percentage of the variance (e.g. 97%) • Non-linear method (𝑘𝑘-nn local dimension estimator) [4] • Intuition: the 𝑘𝑘-nn graph approximates the shape of the data manifold 1. Construct the 𝑘𝑘-nn graph of the set of 𝑛𝑛 points 𝒁𝒁𝑛𝑛 𝛾𝛾 𝑛𝑛 2. Calculate the total edge length: 𝐿𝐿𝛾𝛾,𝑘𝑘 𝒁𝒁𝑛𝑛 = ∑𝑖𝑖=1 ∑𝑧𝑧∈𝑁𝑁𝑘𝑘,𝑖𝑖 𝑧𝑧 − 𝑧𝑧𝑖𝑖 • 𝛾𝛾 > 0 and 𝑁𝑁𝑘𝑘,𝑖𝑖 is the 𝑘𝑘-nn neighborhood of 𝑧𝑧𝑖𝑖 3. Then for large 𝑛𝑛, 𝐿𝐿𝛾𝛾,𝑘𝑘 𝒁𝒁𝑛𝑛 = 𝑛𝑛𝛼𝛼 𝑚𝑚 𝑐𝑐 + 𝜖𝜖𝑛𝑛 𝑚𝑚−𝛾𝛾 • 𝜖𝜖𝑛𝑛 → 0 as 𝑛𝑛 → ∞, 𝛼𝛼 = , 𝑚𝑚 is the intrinsic dimension, 𝑐𝑐 a constant 𝑚𝑚 4. Use non-linear least squares over different values of 𝑛𝑛 to estimate 𝑚𝑚 � 5. Estimate local intrinsic dimension by using smaller neighborhoods • Useful when data lie on different manifolds
Canonical Correlation Analysis (CCA) • Used to answer questions 2 and 3 • Finds vectors 𝑎𝑎𝑖𝑖 and 𝑏𝑏𝑖𝑖 that maximize 𝜌𝜌𝑖𝑖 = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑎𝑎𝑖𝑖𝑇𝑇 𝒙𝒙, 𝑏𝑏𝑖𝑖𝑇𝑇 𝒚𝒚 • 𝑢𝑢𝑖𝑖 = 𝑎𝑎𝑖𝑖𝑇𝑇 𝒙𝒙, 𝑣𝑣𝑖𝑖 = 𝑏𝑏𝑖𝑖𝑇𝑇 𝒚𝒚 are uncorrelated from all other 𝑢𝑢𝑗𝑗 and 𝑣𝑣𝑗𝑗 • Apply with 𝒙𝒙 = continuum patch and 𝒚𝒚 = magnetogram patch
Figure 6: Scatter plot of the first two MDS projections. Colors indicate the cluster assignment (left) and Mt. Wilson label (right)
• 3 clearly separable regions are visible • Compare Mt. Wilson • NMI=0.1, ARI=0.07 • Consistent with global approach of Mt Wilson vs. our local approach • Small groupings of Mt. Wilson labels
Conclusion • There are strong spatial and modal correlations within sunspots • Linear methods are sufficient to capture these correlations • Magnetic fragments and transition regions are the most coupled • Image patch dictionary clustering results in clearly separable regions Acknowledgments This work was partially supported by NSF grant CCF-1217880 and a NSF Graduate Research Fellowship to the first author under Grant No. F031543.
References [1] P.H. Scherrer et al, “The Solar Oscillations Investigation—Michelson Doppler Imager,” Solar Physics, vol. 162, pp. 129—188, Dec. 1995. [2] F.T. Watson, L. Fletcher, and S. Marshall, “Evolution of sunspot properties during solar cycle 23,” Astronomy & Astrophysics, vol. 533, pp. A14, Sept. 2011. [3] http://www.swpc.noaa.gov/ftpdir/forecasts/SRS [4] K.M. Carter, R. Raich, and A.O. Hero III, “On local intrinsic dimension estimation and its applications,” IEEE Tran. on Signal Processing, vol. 58, no. 2, pp. 650—663, 2010. [5] L. Gallucio, O. Michel, P. Comon, M. Kliger, and A.O. Hero III, “Clustering with a new distance measure based on a dual-rooted tree,” Information Sciences, vol. 251, pp. 96—113, 2013.