Simulated Annealing Faults Tolerance & Recovery References

Measures of Fault Tolerance in Distributed Simulated Annealing

Aaditya Prakash Infosys Limited [email protected]

International Conference on Perspective of Computer Conuence with Sciences, 2012

Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Probabilistic and Meta-heuristic Algorithm. Similar to Annealing in Metallurgy

−E

P(E ) = e kT . where, P(E ) is Energy Function, T is

Temperature, k is Boltzmann constant Energy Function has high value at Higher Temperature. Uses Metropolis-Hastings algorithm to generate its sample space.

Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Probabilistic and Meta-heuristic Algorithm. Similar to Annealing in Metallurgy

−E

P(E ) = e kT . where, P(E ) is Energy Function, T is

Temperature, k is Boltzmann constant Energy Function has high value at Higher Temperature. Uses Metropolis-Hastings algorithm to generate its sample space.

Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Probabilistic and Meta-heuristic Algorithm. Similar to Annealing in Metallurgy

−E

P(E ) = e kT . where, P(E ) is Energy Function, T is

Temperature, k is Boltzmann constant Energy Function has high value at Higher Temperature. Uses Metropolis-Hastings algorithm to generate its sample space.

Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Algorithm

Start with the system in a known conguration, at known energy E . while T is High { Perturb system slightly (goto new location on search space) Compute E , change in energy due to perturbation

if(∆E < 0) then accept this perturbation, this is the new system else accept this system with probability equal to Energy Function P(E ) } stop when equilibrium is reached or T is low Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Algorithm

Start with the system in a known conguration, at known energy E . while T is High { Perturb system slightly (goto new location on search space) Compute E , change in energy due to perturbation

if(∆E < 0) then accept this perturbation, this is the new system else accept this system with probability equal to Energy Function P(E ) } stop when equilibrium is reached or T is low Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Algorithm

Start with the system in a known conguration, at known energy E . while T is High { Perturb system slightly (goto new location on search space) Compute E , change in energy due to perturbation

if(∆E < 0) then accept this perturbation, this is the new system else accept this system with probability equal to Energy Function P(E ) } stop when equilibrium is reached or T is low Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Algorithm

Start with the system in a known conguration, at known energy E . while T is High { Perturb system slightly (goto new location on search space) Compute E , change in energy due to perturbation

if(∆E < 0) then accept this perturbation, this is the new system else accept this system with probability equal to Energy Function P(E ) } stop when equilibrium is reached or T is low Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Algorithm

Start with the system in a known conguration, at known energy E . while T is High { Perturb system slightly (goto new location on search space) Compute E , change in energy due to perturbation

if(∆E < 0) then accept this perturbation, this is the new system else accept this system with probability equal to Energy Function P(E ) } stop when equilibrium is reached or T is low Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Algorithm

Start with the system in a known conguration, at known energy E . while T is High { Perturb system slightly (goto new location on search space) Compute E , change in energy due to perturbation

if(∆E < 0) then accept this perturbation, this is the new system else accept this system with probability equal to Energy Function P(E ) } stop when equilibrium is reached or T is low Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Algorithm

Start with the system in a known conguration, at known energy E . while T is High { Perturb system slightly (goto new location on search space) Compute E , change in energy due to perturbation

if(∆E < 0) then accept this perturbation, this is the new system else accept this system with probability equal to Energy Function P(E ) } stop when equilibrium is reached or T is low Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Aaditya Prakash

Distributed Simulated Annealing

Search Space Problem of local optima

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Basic Problem Algorithm Distributed Simulated Annealing

Distributed Simulated Annealing (DSA)

MapReduce (Radesnki 2012)

CUDA (Zbierski 2011)

OpenCL (Choong 2010)

DSA Algorithm Master/Host - Compare Cluster/Device - Search Solution Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Sources of Design Faults

Design Faults Dicult Search Space No memory of best solution (unlike Tabu search) Pseudo Random Number Generator

Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Sources of Operational Faults

Source: Lecture Notes- Prof. Jalal Y. Kawash at Univ. of Calgary

Independent Failure -Loss of Node and Loss of Data -solved by design of MapReduce Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Sources of Operational Faults

Source: Lecture Notes- Prof. Jalal Y. Kawash at Univ. of Calgary

Independent Failure -Loss of Node and Loss of Data -solved by design of MapReduce Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Communication Faults

Unreliable Communication Incorrect result

Insecure Communication Incorrect result

Costly Communication Poor Performance If the overhead of communication of nodes exceed the ratio of fraction of work to total Speedup then benets of distribution of optimization is highly compromised

Amdahl's Law

1

(1−P)+ PS

Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Communication Faults

Unreliable Communication Incorrect result

Insecure Communication Incorrect result

Costly Communication Poor Performance If the overhead of communication of nodes exceed the ratio of fraction of work to total Speedup then benets of distribution of optimization is highly compromised

Amdahl's Law

1

(1−P)+ PS

Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Design Faults Operational Faults Communication Faults

Communication Faults

Unreliable Communication Incorrect result

Insecure Communication Incorrect result

Costly Communication Poor Performance If the overhead of communication of nodes exceed the ratio of fraction of work to total Speedup then benets of distribution of optimization is highly compromised

Amdahl's Law

1

(1−P)+ PS

Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Tolerance Recovery

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Tolerance Recovery

Tolerance

Adaptive vs Strategic Flexible Adaptive Tolerant System Can handle unprecedented failures Strategic Fault Tolerance Predictive handling (Marin et al 2001 - Flexible) Pooling of Search Space - Futile Stochastic Search Hashing of Intermediate results No guarantee of having searched but quick (O(n)) verication MapReduce - fast at hashing Ganjisaar et al, Tunning of MapReduce for DSA, achieved AUC > 90% Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Tolerance Recovery

Tolerance

Adaptive vs Strategic Flexible Adaptive Tolerant System Can handle unprecedented failures Strategic Fault Tolerance Predictive handling (Marin et al 2001 - Flexible) Pooling of Search Space - Futile Stochastic Search Hashing of Intermediate results No guarantee of having searched but quick (O(n)) verication MapReduce - fast at hashing Ganjisaar et al, Tunning of MapReduce for DSA, achieved AUC > 90% Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Tolerance Recovery

Tolerance

Adaptive vs Strategic Flexible Adaptive Tolerant System Can handle unprecedented failures Strategic Fault Tolerance Predictive handling (Marin et al 2001 - Flexible) Pooling of Search Space - Futile Stochastic Search Hashing of Intermediate results No guarantee of having searched but quick (O(n)) verication MapReduce - fast at hashing Ganjisaar et al, Tunning of MapReduce for DSA, achieved AUC > 90% Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Tolerance Recovery

Tolerance

Adaptive vs Strategic Flexible Adaptive Tolerant System Can handle unprecedented failures Strategic Fault Tolerance Predictive handling (Marin et al 2001 - Flexible) Pooling of Search Space - Futile Stochastic Search Hashing of Intermediate results No guarantee of having searched but quick (O(n)) verication MapReduce - fast at hashing Ganjisaar et al, Tunning of MapReduce for DSA, achieved AUC > 90% Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Tolerance Recovery

Outline

1

2

3

Simulated Annealing Boltzmann Equation Algorithm Distributed Simulated Annealing Faults Design Faults Operational Faults Communication Faults Tolerance & Recovery Tolerance Recovery Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Tolerance Recovery

Recovery

Cluster Replacement Cold/Warm Standby - No use Cannot perform backward error recovery If temperature is still High, next search sequence is as good as any other

Hybrid Replication Mechanism

If Temperature is High - No result replication or broadcast -Saves lot of time and space If Temperature is Low (T < TLow ), convert some searching Node to reciprocating Nodes Ensures when solution is found and if Node is dead, we will have a copy of the solution -Reasoning: Higher Probability of nding optimal solution at lower T . Remember P(E ).

Anomaly Node Detection

Several Machine Learning algorithms to detect anomalous Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

Tolerance Recovery

Recovery

Cluster Replacement Cold/Warm Standby - No use Cannot perform backward error recovery If temperature is still High, next search sequence is as good as any other

Hybrid Replication Mechanism

If Temperature is High - No result replication or broadcast -Saves lot of time and space If Temperature is Low (T < TLow ), convert some searching Node to reciprocating Nodes Ensures when solution is found and if Node is dead, we will have a copy of the solution -Reasoning: Higher Probability of nding optimal solution at lower T . Remember P(E ).

Anomaly Node Detection

Several Machine Learning algorithms to detect anomalous Aaditya Prakash

Distributed Simulated Annealing

Simulated Annealing Faults Tolerance & Recovery References

References 1 2 3 4 5 6 7 8 9

S. Kierkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science, 220: 671680, 1983 Muhammad Arshad and Marius C. Silaghi. Distributed Simulated Annealing. In Distributed Constraint Problem Solving and Reasoning in Multi-Agent Systems, volume 112 of Frontiers in Articial Intelligence and Applications. IOS Press, 2004. Krishan, K. Ganeshan, and Ram, D. Janaki. Distributed simulated annealing algorithms for job shop scheduling. IEEE Trans. Systems Man Cybernet. 25, 7 (July 1995), 11021109 Atanas Radenski. 2012. Distributed simulated annealing with mapreduce. In Proceedings of the 2012t European conference on Applications of Evolutionary Computation(EvoApplications'12). Springer-Verlag. F. Glover and C. McMillan (1986). "The general employee scheduling problem: an integration of MS and AI". Computers and Operations Research. YANG, C., YEN, C., TAN, C., AND MADDEN, S. Osprey: (2010) Implementing MapReduce-style fault tolerance in a shared-nothing distributed database, ICDE. Capiluppi M. (2007). Fault Tolerance in Large Scale Systems: Hybrid and distributed Approaches. Ph.D. Thesis, University of Bologna, Italy. Rodgers, David P. (June 1985). "Improvements in multiprocessor system design". ACM SIGARCH Computer Architecture News archive (New York, NY, USA: ACM) 13 (3): 225231.

Ganeshan, K. Designing and implementing exible distributed problem solving systems. M.S. Thesis, Department of Computer Science and Engineering, Indian Institute of Technology, Madras, 1993 10 Ganjisaar Y., Debeauvais T., Javanmardi S., Caruana R., Lopes C., Distributed tuning of machine learning algorithms using MapReduce clusters, Proceedings of the KDD 2011 Workshop on Large-scale Data Mining, San Diego, 2011, pages 1-8. Aaditya Prakash

Distributed Simulated Annealing

Protocol for Common Branch Platform - GitHub

Faults. Tolerance & Recovery. References. Outline. 1 Simulated Annealing. Boltzmann Equation. Algorithm. Distributed Simulated Annealing. 2 Faults. Design Faults. Operational Faults. Communication Faults. 3 Tolerance & Recovery. Tolerance. Recovery. Aaditya Prakash. Distributed Simulated Annealing ...

348KB Sizes 0 Downloads 308 Views

Recommend Documents

Protocol for Common Branch Platform - GitHub
Analyze inter-variate relations. • No need to convert Unstructured to. Structured Data. • Advantages of Machine Learning and. Visualization in single step. • Discover hidden relationships and potentially mining oppurtunities. • Enhance to wor

Orc Protocol Specification - GitHub
Jun 7, 2017 - RPC message format changed (4.1 Structure and Authentication). • New CLAIM .... signature to authenticate the payload. Positions 3 and ..... Kademlia (http://www.scs.stanford.edu/~dm/home/papers/kpos.pdf). • S/Kademlia ...

Orc Protocol Specification - GitHub
Aug 15, 2017 - This specification documents the Orc network protocol in its entirety for the purpose of enabling .... services and authentication is performed by the nature of Tor's routing. Each Orc node ... associated with held contracts (5. Data T

SPP-MASTERcommunication protocol - GitHub
Auto-reconnecting when master device is beyond the valid range(slave device will auto-reconnect in 30 min when it is beyond the valid range).

QReal DSM platform - GitHub
development productivity (3 to 10 times in selected cases compared to common development ... Two cases of successful application of this technology to creating domain-specific solutions ..... web-camera on a server to a mobile phone. The.

Swift Navigation Binary Protocol - GitHub
RTK accuracy with legacy host hardware or software that can only read NMEA, recent firmware ..... search space with the best signal-to-noise (SNR) ratio.

Changes in the axxia-dev Branch - GitHub
Support setting QoS values for the A53 clusters (6700) with U-Boot environments. ... band boot” or “eioa boot”. An overview is available in Readme.md/Readme.pdf. 2 .... in GPDMA driver. • Define SYSCACHE_ONLY_MODE in config files. 5 ...

Changes in the lsi-v2013.01.01 Branch - GitHub
In simulation, change bootargs to have Linux use virtio (axxia-55xx-sim- virtio) or mmc .... Handle memory sizes larger than 4G. U-Boot 5.8.1.35 ... the U-Boot SPL parameter data prior to system memory initialization and having heap and stack ...

Changes in the lsi-v2010.03 Branch - GitHub
Updated build to work with the new Yocto tools. • Now builds out of ... on waveform analysis - suspicion was that in these isolated cases, the. ODT on ACP side ...

Changes in the axxia-dev Branch - GitHub
PCIe designware driver support for simulation. • Fix variable sizes in the environment structure. Note that the environment will have to be restored after loading ...

LOOPRING Decentralized Token Exchange Protocol v1.22 - GitHub
Aug 7, 2017 - Blockchain[1][2] technology was created to facilitate the cryptocurrency Bitcoin[3]. It was ... Bitcoin exchange ”Mt. Gox” suspended trading, closed its website and exchange service, ... ILP[10]) to power payments across different l

Security Proof for the Tabby PAKE Protocol - GitHub
Mar 30, 2014 - 2013 as part of their Elligator9 system. Tabby adapts the Elligator full .... This runs in about ~100 milliseconds on a laptop. The selection of ...

Investigating Routing-Protocol Characteristics with MLC - GitHub
Workshop, Computer Architecture Department UPC, Barcelona, Spain November 2011 ... Create 100-nodes network on your laptop. ... s 0:0:0:10:0:1анd Broadcastанo veth1001_1анj markаннmarkset 0x3аннmarktarget ACCEPT.

Ripple Protocol Consensus Algorithm Review - GitHub
May 11, 2015 - 1. Reviewed white papers and development documentation at https://ripple. com. 2. .... denial of service due to the Ripple network being unable to process transactions, ..... https:// download.wpsoftware.net/bitcoin/pos.pdf. 15 ...

An Open-Source Hardware and Software Platform for ... - GitHub
Aug 6, 2013 - Release 1.03. Zihan Chen. 1. , Anton Deguet. 1. , Russell Taylor. 1. , Simon DiMaio .... the high-speed serial network (IEEE-1394a) and the I/O hardware. In this design .... of services: isochronous and asynchronous transfers.

Changes in the standard/axxia-dev/base Branch - GitHub
standard/axxia-dev/base-10.8.2.x branch is axxia_linux_10.8.2.7. • Configuration and device tree cleanup. 10.8.1.6 Changes. • Add support for X9 PCIe DW MSI.

Branch Operations_National Branch Marketing Executive JD.pdf ...
Page 1 of 1. Branch Operations_National Branch Marketing Executive JD.pdf. Branch Operations_National Branch Marketing Executive JD.pdf. Open. Extract.

Branch Ops_National Branch Marketing Executive.pdf
Branch Ops_National Branch Marketing Executive.pdf. Branch Ops_National Branch Marketing Executive.pdf. Open. Extract. Open with. Sign In. Main menu.

Zcash Protocol Specification, Version 2017.0-beta-2.7 - GitHub
T. The domain of a randomized algorithm may be (), indicating that it requires no ...... 32. 100. PoWMaxAdjustUp ◦. ◦ Q := 16. 100. PoWDampingFactor ◦. ◦ N := 4 .... The language consisting of the following encoding possibilities is pre x-fre

midlanz branch - items for sale
Made from 2mm stainless steel. Size: 85 x 45mm approx. $10 each. Key Rings – Aluminium. Size: 90 x 25 x 3mm. $10 each. Aluminium Folding Dog Ramp. Dimensions – unfolded – 1800mm L x 400mm W x 70mm T. Dimensions – folded – 900mm L x 400mm W

Branch pages
Sep 21, 2006 - additional support against gravity and allow the white dwarf to become overmassive before it exploded. The maximum mass a white dwarf ...

Branch pages
Sep 21, 2006 - enough energy from nuclear fusion to blast the white dwarf apart at speeds of a ... about 0.6 solar masses of the white dwarf to a single isotope ...