Hierarchic Clustering of 3D Galaxy Distributions

'

1

$

Hierarchic Clustering of 3D Galaxy Distributions Topics: • Data • Hierarchic clustering • Ultrametric topology • P-adic algebra • Practical interest • Testing for ultrametricity • Lerman’s H-classifiability • Conclusion and critique

&

%

Hierarchic Clustering of 3D Galaxy Distributions

2

'

$

Data • Sloan Digital Sky Survey data • RA, Dec, redshift value, reliability indicator • 345109 galaxies in right ascension and declination, photometric redshift • In this work we used the low RA, galaxy plane area.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

3

'

$

&

%

Hierarchic Clustering of 3D Galaxy Distributions

4

'

$

Hierarchic Clustering

7

Labeled, ranked dendrogram on 8 terminal nodes. Branches labeled 0 and 1.

1

6

0

1

5

0

0 4

1

0 3

1

1

2

0

0 1

1

1

&

x8

x7

x6

x5

x4

x3

x2

x1

0

0

%

Hierarchic Clustering of 3D Galaxy Distributions

5

'

$

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

6

$

Hierarchic Clustering: Metric =⇒ Ultrametric • Hierarchical agglomeration on n observation vectors, i ∈ I • Series of 1, 2, . . . , n − 1 pairwise agglomerations of observations or clusters • Hierarchy H = {q|q ∈ 2I } such that (i) I ∈ H, (ii) i ∈ H ∀i, and (iii) for each q ∈ H, q  ∈ H : q ∩ q  = ∅ =⇒ q ⊂ q  or q  ⊂ q. • Indexed hierarchy is the pair (H, ν) where the positive function defined on H, i.e., ν : H → IR+ , satisfies: ν(i) = 0 if i ∈ H is a singleton; and (ii) q ⊂ q  =⇒ ν(q) < ν(q  ). Function ν is the agglomeration level. • Take q ⊂ q  , let q ⊂ q  and q  ⊂ q  , and let q  be the lowest level cluster for which this is true. Then if we define D(q, q ) = ν(q  ), D is an ultrametric.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

7

$

Ultrametric Spaces and Properties • Let (E, d) be a metric space, i.e. a set E and a positive function E × E −→ IR+ satisfying 1. d(x, y) = d(y, x) 2. d(x, y) = 0 iff x = y 3. d(x, z) ≤ d(x, y) + d(y, z) A space is ultrametric if in addition we have d(x, z) ≤ max(d(x, y), d(y, z)) • A metric space (E, d) is ultrametric iff all its triangles are isosceles, with the length of the base being less than or equal to that of the sides. • Each point of a circle in E is its center. Each ball in an ultrametric space is both open and closed. • Two non-disjoint balls are concentric.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

8

'

$ P-adic Coding

• For the dendrogram shown in we develop the following p-adic encoding for p = 2 of terminal nodes, traversing a path from the root. • x1 = 0 · 27 + 0 · 25 + 0 · 22 + 0 · 21 ; • x2 = 0 · 27 + 0 · 25 + 0 · 22 + 1 · 21 ; • x4 = 0 · 27 + 1 · 25 + 0 · 24 + 0 · 23 ; • x6 = 0 · 27 + 1 · 25 + 1 · 24 . • The decimal equivalents of this p-adic representation of terminal nodes work out as x1 , x2 , . . . x8 = 0, 2, 4, 32, 40, 48, 128, 192. • A p-adic encoding for xi is given by pk = 2 k .

&

n−1 1

ak pk where ak ∈ {0, 1} and

%

Hierarchic Clustering of 3D Galaxy Distributions

'

9

$

P-adic (Algebraic) = Ultrametric (Topology) • Various terms are used interchangeably for analysis in and over such fields such as p-adic, ultrametric, non-Archimedean, and isosceles. • The natural geometric ordering of metric valuations is on the real line, whereas in the ultrametric case the natural ordering is a hierarchical tree. • Ostrowski’s theorem: Each non-trivial valuation on the field of the rational numbers is equivalent either to the absolute value function or to some p-adic valuation • Alternatively: Up to equivalence, the only norms on the rationals are the p-adic norm and the usual norm given by the absolute value.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

10

$

Practical Interest of Ultrametricity • Hierarchies arise naturally in language syntax, and (it has been claimed) in financial markets. • Rammal et al.: Ultrametricity is a natural property of high-dimensional spaces, and ultrametricity emerges as a consequence of randomness and of the law of large numbers. • Again Rammal et al. and recent work of ours: Sparsely coded data tend to be ultrametric. Examples include: the use of complete disjunctive forms of coding in correspondence analysis; and categorical data coding in genomics and proteomics, speech, and other fields. • Ultrametricity is considered to hold at low Planck scales, and in superstrings (Brekke and Freund, Phys. Rep., 233, 1–66, 1993). • Also to be valid for optimization spaces.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

11

$

Testing for Ultrametricity • Rammal et al.: determine the subdominant ultrametric (aka single link hierarchic clustering). • Interesting phase space effects for increase in dimensionality. • However the subdominant ultrametric gives rise to pathologies. • E.g. “friends of friends” chaining effect: d(x, y) ≤ r0 , d(y, z) ≤ r0 then d(x, z) = 2r0 −  for arbitrarily small . Hence d(x, z) can be anomalously large.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

12

$

Lerman’s H-classifiabilty • A basic unifying framework for pairs of objects, and the distance valuation on them, is that of a binary relation. • On a set E, a binary relation is a preorder if it is reflexive and transitive; • it is an equivalence relation if the binary relation is reflexive, transitive and symmetric; • and it is an order if the binary relation is reflexive, transitive, and anti-symmetric.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

13

$

Lerman’s H-classifiabilty • Let F denote the set of pairs of distinct units in E. A distance defines a total preorder on F: ∀{(x, y), (z, t)} ∈ F : (x, y) ≤ (z, t) ⇐⇒ d(x, y) ≤ d(z, t) • A preorder is called ultrametric if: ∀x, y, z ∈ E : ρ(x, y) ≤ r and ρ(y, z) ≤ r =⇒ ρ(x, z) ≤ r where r is a given integer and ρ(x, y) denotes the rank of pair (x, y) for ω ¯. • A necessary and sufficient condition for a distance on E to be ultrametric is that the associated preorder (on E × E, or alternatively preordonnance on E) is ultrametric.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

14

$

Lerman’s H-classifiabilty • We move on now to define Lerman’s H-classifiability index, which measures how ultrametric a given metric is. • Let M (x, y, z) be the median pair among {(x, y), (y, z), (x, z)} and let S(x, y, z) be the highest ranked pair among this triplet. J is the set of all such triplets of E. • Mapping τ of all triplets J into the open interval of all pairs F for the given preorder ω: τ : J −→]M (x, y, z), S(x, y, z)[ • Given a triplet {x, y, z} for which (x, y) ≤ (y, z) ≤ (x, z), for preorder ω, the interval ]M (x, y, z), S(x, y, z)[ is empty if ω is ultrametric. Relative to such a triplet, the preorder ω is “less ultrametric” to the extent that the cardinal of ]M (x, y, z), S(x, y, z)[, defined on ω, is large. • H(ω) =

&



J

|]M (x, y, z), S(x, y, z)[|/(|F | − 3)|J|

%

Hierarchic Clustering of 3D Galaxy Distributions

'

15

$

Lerman’s H-classifiabilty • Data sets that are “more classifiable” in an intuitive way, i.e. they contain “sporadic islands” of more dense regions of points – a prime example is Fisher’s iris data contrasted with 150 uniformly distributed values in IR4 – such data sets have a smaller value of H(ω). For Fisher’s data we find H(ω) = 0.0899, whereas for 150 uniformly distributed points in a 4-dimensional hypercube, we find H(ω) = 0.1835. • Extensive tests carried out have shown that uniform data has values around 0.18 – 0.21. Whereas with more sparsely coded data, etc., one finds values around 0.1 – 0.14.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

16

$

Lerman’s H-classifiabilty • We took 3D cylanders defined by RA and Dec within a tight radius of a position, to limit the number of galaxies studied at any given time to around 500. • We used data in (lower left block in Sloan data) – low RA, near galactic plane. • Then we used 3D uniformly distributed data to see how different the Lerman index would be. • For Sloan data: 0.149837, 0.115096, 0.148676. • For uniform data: 0.187662, 0.179590, 0.171903. • Numbers in each case: 589, 554, 715.

&

%

Hierarchic Clustering of 3D Galaxy Distributions

'

17

$

Conclusions and Critique • The Sloan data came out as more ultrametric in all cases, compared to uniformly distributed 3D values. • But a Euclidean distance was used for determining the Lerman index. • Also the cylandrical volume used in Sloan space may have biased the results (in view of the redshift value). • Future work: replace the cylander with a cone, and study replacement for the Euclidean distance.

&

%

Hierarchic Clustering of 3D Galaxy Distributions - multiresolutions.com

Sloan Digital Sky Survey data. • RA, Dec, redshift ... Hierarchic Clustering of 3D Galaxy Distributions. 4. ' &. $. %. Hierarchic Clustering. Labeled, ranked dendrogram on 8 terminal nodes. Branches labeled 0 and 1. x1 x2 x3 x4 x5 x6 x7 x8 ... Ostrowski's theorem: Each non-trivial valuation on the field of the rational numbers is ...

124KB Sizes 1 Downloads 225 Views

Recommend Documents

Tuning clustering in random networks with arbitrary degree distributions
Sep 30, 2005 - scale-free degree distributions with characteristic exponents between 2 and 3 as ... this paper, we make headway by introducing a generator of random networks ..... 2,3 and its domain extends beyond val- ues that scale as ...

Centroid-based Actionable 3D Subspace Clustering
is worth investing, e.g. Apple) as a centroid. The shaded region shows a cluster .... We denote such clusters as centroid-based, actionable 3D subspace clusters ...

Nomads of the Galaxy
Jan 12, 2012 - locities find that ∼ 30 − 50% of GK dwarf stars have planets greater than the mass of ... described (for illustration) by a δ-function with a best fit near the Jupiter mass. ..... We draw the impact parame- ter for the source and

Convergence of Pseudo Posterior Distributions ... -
An informative sampling design assigns probabilities of inclusion that are correlated ..... in the design matrix, X, is denoted by P and B are the unknown matrix of ...

Increasing Interdependence of Multivariate Distributions
Apr 27, 2010 - plays a greater degree of interdependence than another. ..... of Rn with the following partial order: z ≤ v if and only if zi ≤ vi for all i ∈ N = {1,...

Skewed Wealth Distributions - Department of Economics - NYU
above the "bliss point," marginal utility can become negative, creating complications. For an ...... https://sites.google.com/site/jessbenhabib/working-papers ..... Lessons from a life-Cycle model with idiosyncratic income risk," NBER WP 20601.

Parametric Characterization of Multimodal Distributions ...
convex log-likelihood function, only locally optimal solutions can be obtained. ... distribution function. 2011 11th IEEE International Conference on Data Mining Workshops ..... video foreground segmentation,” J. Electron. Imaging, vol. 17, pp.

Testing Parametric Conditional Distributions of ...
Nov 2, 2010 - Estimate the following GARCH(1, 1) process from the data: Yt = µ + σtεt with σ2 ... Compute the transformation ˆWn(r) and the test statistic Tn.

Asymptotic Distributions of Instrumental Variables ...
IV Statistics with Many Instruments. 113. Lemma 6 of Phillips and Moon (1999) provides general conditions under which sequential convergence implies joint convergence. Phillips and Moon (1999), Lemma 6. (a) Suppose there exist random vectors XK and X

Skewed Wealth Distributions - Department of Economics - NYU
In all places and all times, the distribution of income remains the same. Nei- ther institutional change ... constant of social sciences.2. The distribution, which now takes his name, is characterized by the cumulative dis- tribution function. F (x)=

Application of complex-lag distributions for estimation of ...
The. Fig. 1. (Color online) (a) Original phase ϕ(x; y) in radians. (b) Fringe pattern. (c) Estimated first-order phase derivative. ϕ(1)(x; y) in radians/pixel. (d) First-order phase derivative esti- mation error. (e) Estimated second-order phase de

CLUSTERING of TEXTURE FEATURES for CONTENT ... - CiteSeerX
storage devices, scanning, networking, image compression, and desktop ... The typical application areas of such systems are medical image databases, photo ...

COMPARISON OF CLUSTERING ... - Research at Google
with 1000 web images, and comparing the exemplars chosen by clustering to the ... surprisingly good, the computational cost of the best cluster- ing approaches ...

Topical Clustering of Search Results
Feb 12, 2012 - H.3.3 [Information Storage and Retrieval]: Information. Search and ... Figure 1: The web interface of Lingo3G, the com- mercial SRC system by ...

Spatiotemporal clustering of synchronized bursting ...
Mar 13, 2006 - In vitro neuronal networks display synchronized bursting events (SBEs), with characteristic temporal width of 100–500ms and frequency of once every few ... neuronal network for about a week, which is when the. SBE activity is observe

Spatiotemporal clustering of synchronized bursting ...
School of Physics and Astronomy. Raymond and Beverly Sackler Faculty of Exact Sciences. Tel Aviv University, Tel Aviv 69978, Israel barkan1,[email protected]. January 30, 2005. SUMMARY. In vitro neuronal networks display Synchronized Bursting Events (SB

Spatiotemporal clustering of synchronized bursting ...
We study recordings of spiking neuronal networks from the ... Some network features: ... COMPACT - Comparative Package for Clustering Assessment -.

Performance Comparison of Optimization Algorithms for Clustering ...
Performance Comparison of Optimization Algorithms for Clustering in Wireless Sensor Networks 2.pdf. Performance Comparison of Optimization Algorithms for ...Missing:

Probability Distributions for the Number of Radio ...
IN these years, the increasing interest towards wireless ad hoc and sensor networks .... The comparison between (10) and [1, eq. (7)] is shown in. Fig. ... [4] D. Miorandi and E. Altman, “Coverage and connectivity of ad hoc networks presence of ...

Distribution of Objects to Bins: Generating All Distributions
has the lowest level i.e. 0. The level ... the sum of one or more positive integers xi, i.e., n = x1 + x2 + . ..... [7] A. S. Tanenbaum, Modern Operating Systems, Pren-.

Equilibrium distributions of topological states in circular ...
data were obtained by direct simulation of this conditional distribution. Each ..... Jenkins, R. (1989) Master's thesis (Carnegie Mellon Univ., Pittsburgh). 36. ... Darcy, I. K. & Sumners, D. W. (1998) in Knot Theory (Banach Center Publications),.

Magnification of subwavelength field distributions using ...
Oct 29, 2010 - 2 School of Electronic Engineering and Computer Science, Queen Mary. University of London, Mile End Road, London E1 4NS, UK.

Skewed Wealth Distributions: Theory and Empirics - Department of ...
Indeed, the top end of the wealth distribution obeys a power law ..... In equilibrium y(h) is homogeneous of degree m/(1−a) > 1 in h: small differences in skills ..... Returns on private equity have an even higher idiosyncratic dispersion ..... 188