Nature Inspired Visualization of Unstructured Big Data

Aaditya Prakash [email protected]

Motivation •

• •

Unstructured data is ubiquitous and is assumed to be around 80% of all data generated1 Lack of recognizable structure and huge size makes it very challenging to work with Unstructured Large Datasets Classical Visualization – not suited for BigData; slow, memory hogging, limited dimensions

Self Organizing Maps • • • • •

Unsupervised Machine Learning Technique Provides dimension reduction Multivariate Analysis Fast and low on memory (2D planar images) Reconstructing Self Organizing Maps as Spider Graphs for better visual interpretation

1Unstructured

Data and the 80 Percent Rule, Clarabridge Bridgepoints, 2008 Q3. http://clarabridge.com/default.aspx?tabid=137&ModuleID=635&ArticleID=551

Self Organizing Maps •



Artificial Neural Networks proposed by Teuvo Kohonen1 which transforms the input dataset into two dimensional lattice Points in input layer are mapped onto 2D lattice, making each point potentially a Neuron

Figure: Discriminant Function where, x = point on Input Layer w = weight of the input point (x) i = all the input points j = all the neurons on the lattice d = Euclidean distance

Figure: Kohonen Network 1Kohonen,

T.; "The self-organizing map," Proceedings of the IEEE , vol.78, no.9 http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=58325&isnumber=21

Current Visualization of SOM

Fig: RapidMiner Tool (AGPL)

Fig: ‘R’, package ‘Kohonen’

Fig: ‘R’, package ‘SOM’

Shows the Kohonen Map obtained after training the Neurons

Intervariate plot of 4 frequent words in Spam

Regression of the same four words

Algorithm 1. 2. 3. 4. 5. 6. 7. 8.

Filter the results Make a polygon with as many sides as the variables. Make the radius of the polygon to be the maximum of the value in the dataset. Draw the grid for the polygon Make segments inside the polygon if the strength of the two variables inside the segment is greater than the specified threshold. Loop Step 5 for every variable against every other variable Color the segments based on the frequency of variable. Color the line segments based on the threshold of each variable pair plotted.

Spider Plots

SOM visualization in R using the Algorithm given above. (showing segments i.e inter-variable dependency)

SOM visualization in R using Algorithm given above (showing threads, i.e inter-variable strength)

Big picture for Big Data

Conclusion • • • •

Analyze inter-variate relations No need to convert Unstructured to Structured Data Advantages of Machine Learning and Visualization in single step Discover hidden relationships and potentially mining oppurtunities

Scope • •

Enhance to work with images, sound and videos Dynamic representation to show live changes

References • Kohonen, T.; "The self-organizing map," Proceedings of the IEEE , vol.78, no.9, pp.1464-1480, Sep 1990 • Teuvo Kohonen, Panu Somervuo, How to make large self-organizing maps for nonvectorial data, Neural Networks, Volume 15, Issues 8–9, October–November 2002 • Gail A. Carpenter, Stephen Grossberg, A massively parallel architecture for a selforganizing neural pattern recognition machine, Computer Vision, Graphics, and Image Processing, Volume 37, Issue 1, January 1987, Pages 54-115 • R. Wehrens and L.M.C. Buydens, Self- and Super-organising Maps in R: the kohonen package J. Stat. Softw., 21(5), 2007 • Jun Yan, Self-Organizing Map (with application in gene clustering) in R • Gordon V. Cormack. 2008. Email Spam Filtering: A Systematic Review. Found. Trends Inf. Retr.1, 4 (April 2008) • Anurat Chapanond, Mukkai S. Krishnamoorthy, and B\&\#252;lent Yener. 2005. Graph Theoretic and Spectral Analysis of Enron Email

Protocol for Common Branch Platform - GitHub

Analyze inter-variate relations. • No need to convert Unstructured to. Structured Data. • Advantages of Machine Learning and. Visualization in single step. • Discover hidden relationships and potentially mining oppurtunities. • Enhance to work with images, sound and videos. • Dynamic representation to show live changes.

410KB Sizes 0 Downloads 313 Views

Recommend Documents

Protocol for Common Branch Platform - GitHub
Faults. Tolerance & Recovery. References. Outline. 1 Simulated Annealing. Boltzmann Equation. Algorithm. Distributed Simulated Annealing. 2 Faults. Design Faults. Operational Faults. Communication Faults. 3 Tolerance & Recovery. Tolerance. Recovery.

Orc Protocol Specification - GitHub
Jun 7, 2017 - RPC message format changed (4.1 Structure and Authentication). • New CLAIM .... signature to authenticate the payload. Positions 3 and ..... Kademlia (http://www.scs.stanford.edu/~dm/home/papers/kpos.pdf). • S/Kademlia ...

Orc Protocol Specification - GitHub
Aug 15, 2017 - This specification documents the Orc network protocol in its entirety for the purpose of enabling .... services and authentication is performed by the nature of Tor's routing. Each Orc node ... associated with held contracts (5. Data T

SPP-MASTERcommunication protocol - GitHub
Auto-reconnecting when master device is beyond the valid range(slave device will auto-reconnect in 30 min when it is beyond the valid range).

QReal DSM platform - GitHub
development productivity (3 to 10 times in selected cases compared to common development ... Two cases of successful application of this technology to creating domain-specific solutions ..... web-camera on a server to a mobile phone. The.

Swift Navigation Binary Protocol - GitHub
RTK accuracy with legacy host hardware or software that can only read NMEA, recent firmware ..... search space with the best signal-to-noise (SNR) ratio.

Changes in the axxia-dev Branch - GitHub
Support setting QoS values for the A53 clusters (6700) with U-Boot environments. ... band boot” or “eioa boot”. An overview is available in Readme.md/Readme.pdf. 2 .... in GPDMA driver. • Define SYSCACHE_ONLY_MODE in config files. 5 ...

Changes in the lsi-v2013.01.01 Branch - GitHub
In simulation, change bootargs to have Linux use virtio (axxia-55xx-sim- virtio) or mmc .... Handle memory sizes larger than 4G. U-Boot 5.8.1.35 ... the U-Boot SPL parameter data prior to system memory initialization and having heap and stack ...

Changes in the lsi-v2010.03 Branch - GitHub
Updated build to work with the new Yocto tools. • Now builds out of ... on waveform analysis - suspicion was that in these isolated cases, the. ODT on ACP side ...

Changes in the axxia-dev Branch - GitHub
PCIe designware driver support for simulation. • Fix variable sizes in the environment structure. Note that the environment will have to be restored after loading ...

LOOPRING Decentralized Token Exchange Protocol v1.22 - GitHub
Aug 7, 2017 - Blockchain[1][2] technology was created to facilitate the cryptocurrency Bitcoin[3]. It was ... Bitcoin exchange ”Mt. Gox” suspended trading, closed its website and exchange service, ... ILP[10]) to power payments across different l

Security Proof for the Tabby PAKE Protocol - GitHub
Mar 30, 2014 - 2013 as part of their Elligator9 system. Tabby adapts the Elligator full .... This runs in about ~100 milliseconds on a laptop. The selection of ...

Investigating Routing-Protocol Characteristics with MLC - GitHub
Workshop, Computer Architecture Department UPC, Barcelona, Spain November 2011 ... Create 100-nodes network on your laptop. ... s 0:0:0:10:0:1анd Broadcastанo veth1001_1анj markаннmarkset 0x3аннmarktarget ACCEPT.

Ripple Protocol Consensus Algorithm Review - GitHub
May 11, 2015 - 1. Reviewed white papers and development documentation at https://ripple. com. 2. .... denial of service due to the Ripple network being unable to process transactions, ..... https:// download.wpsoftware.net/bitcoin/pos.pdf. 15 ...

An Open-Source Hardware and Software Platform for ... - GitHub
Aug 6, 2013 - Release 1.03. Zihan Chen. 1. , Anton Deguet. 1. , Russell Taylor. 1. , Simon DiMaio .... the high-speed serial network (IEEE-1394a) and the I/O hardware. In this design .... of services: isochronous and asynchronous transfers.

Changes in the standard/axxia-dev/base Branch - GitHub
standard/axxia-dev/base-10.8.2.x branch is axxia_linux_10.8.2.7. • Configuration and device tree cleanup. 10.8.1.6 Changes. • Add support for X9 PCIe DW MSI.

Branch Operations_National Branch Marketing Executive JD.pdf ...
Page 1 of 1. Branch Operations_National Branch Marketing Executive JD.pdf. Branch Operations_National Branch Marketing Executive JD.pdf. Open. Extract.

Branch Ops_National Branch Marketing Executive.pdf
Branch Ops_National Branch Marketing Executive.pdf. Branch Ops_National Branch Marketing Executive.pdf. Open. Extract. Open with. Sign In. Main menu.

Zcash Protocol Specification, Version 2017.0-beta-2.7 - GitHub
T. The domain of a randomized algorithm may be (), indicating that it requires no ...... 32. 100. PoWMaxAdjustUp ◦. ◦ Q := 16. 100. PoWDampingFactor ◦. ◦ N := 4 .... The language consisting of the following encoding possibilities is pre x-fre

midlanz branch - items for sale
Made from 2mm stainless steel. Size: 85 x 45mm approx. $10 each. Key Rings – Aluminium. Size: 90 x 25 x 3mm. $10 each. Aluminium Folding Dog Ramp. Dimensions – unfolded – 1800mm L x 400mm W x 70mm T. Dimensions – folded – 900mm L x 400mm W

Branch pages
Sep 21, 2006 - additional support against gravity and allow the white dwarf to become overmassive before it exploded. The maximum mass a white dwarf ...

Branch pages
Sep 21, 2006 - enough energy from nuclear fusion to blast the white dwarf apart at speeds of a ... about 0.6 solar masses of the white dwarf to a single isotope ...