Kay Sripanidkulchai, Sambit Sahu, Yaoping Ruan, Anees Shaikh, and Chitra Dorai IBM T.J. Watson Research Center

Are Clouds Ready for Large Distributed Applications?

© 2009 IBM Corporation


 What are users expecting from the cloud? –Establish a base-line for requirements  Is the cloud meeting user requirements? –Service deployment –Service availability –Service problem resolution  Where are opportunities?


LADIS 2009

© 2009 IBM Corporation

Enterprise vs. individual customers have different requirements Typical Enterprise Application Architecture ITIL System Management Eco-system

We study three primary requirements

Security and Network Components Scalable/High-Availability/DR Architectures Enterprise-Class Application Building Blocks (3-Tiered + Messaging + etc.) Enterprise-Class Hardware

Typical Small/Individual Application Architecture ? ? ? Application Building Blocks (3-Tiered ) Commodity Hardware 3

LADIS 2009

• How to deploy largescale distributed services on the cloud, • How to deliver high availability services using clouds, and • What to do when there are problems with services running on the cloud. • For others, see [AFG et. al 08], [WSRV09] © 2009 IBM Corporation

Are there sufficient building blocks available to enterprise users to quickly deploy their services on the cloud? March 23, 2009

Base OS






26 530



VMWare 0%






Base OS and middle-ware images dominate the landscape. Where are the complex applications? Where are the multi-tier distributed applications with multiple images?


LADIS 2009

© 2009 IBM Corporation

Towards supporting deployment of large-scale distributed applications….  Service composition to support complex applications beyond single VMs. – Express relationships among these VMs denoting the dependencies at configuration time and at running time – Compose complex deployment from single and already built set of VMs, and – Instantiate the deployment based on the above stated dependencies. Current status: Already headed this way with third-party services such as 3Tera and RightScale, but will eventually need a common standard.  Transformation of existing enterprise service deployment into a cloud-based deployment – Discovery of application configuration and dependency of the enterprise services to be migrated to the cloud – Determine the amount of infrastructure resources needed on the cloud and map application components to the resources – Support for provisioning the service and migrating to the cloud in an easy and quick manner, without incurring service down time. Can we do this live? Current status: Discovery techniques and dependency graphs have been explored in other contexts such as problem determination. The rest is open.


LADIS 2009

© 2009 IBM Corporation

6 96

LADIS 2009 99 .9 99 81 .99 4 99 .9 99 97 .96 2 99 .9 99 97 .99 6 99 .9 99 68 .99 9

www.tobaks fakta.org search. yahoo.com www. amazon.com www.cnn.com


99 .9 99 83 .99 3

99 .6 9923 .90 6


Individual/Small 99.368% (~55 hours downtime/year)

www. walmart.com

99 .9 99 93 .84 6


99 .7 99 57 .92 3


www.matematiker samfundet.org.se





2007 2008

99 .8 99 97 .91 8


97 .35




State-ofthe-art cloud SLA at 99.95% or ~4 hours downtime/ year. Availability (%)

There are gaps in service availability requirements for enterprise users Enterprise 99.987% (~1 hour downtime/year)

© 2009 IBM Corporation

Bridging the gap in service availability requirements  Implementing scaling architectures in the cloud – Templates and rules to determine based on system conditions to automatically leverage the appropriate architectural solution – Commoditize the expertise so that it can be reused by different cloud users Current status: components such as content delivery networks, load-balancing and automatic scaling (elasticity) are available, but best practices for how to use these components have not been established. Can the cloud just automatically do this for me?  Extending availability beyond one cloud – API or framework to commoditize the construction of high availability services delivered across multiple clouds Current status: few service providers -- too early but already concerned about lock-in  Using the latest and greatest virtualization capabilities – Live migration to avoid down time Current status: non-existent inside one cloud and across clouds. Who gets to decide when/why to migrate? The user or the cloud provider?


LADIS 2009

© 2009 IBM Corporation

Best practice in service problem resolution faces scaling challenges Feature Request

HowTo/ Info

Problem Cloud Error User Error







Amazon EC2 Forum: April 1-7, 2009

Observations • • • •

Top problems: Instance, EBS, Security The same symptom presented to the user has many underlying root causes Resolution process is highly manual and ad-hoc; manual information sharing is error-prone and not scalable Users do not know what is happening in the underlying infrastructure and cloud provider does not know what happening in the users applications

Where to go next •


Define an API for information sharing between users and providers that addresses privacy concerns • Is a minimum of a binary “your problem” vs. “my problem” query sufficient? • Can all of a user’s instances be managed together?

LADIS 2009

© 2009 IBM Corporation

Summary  Explored three requirements from the perspective of cloud users – Compared individual/small users vs. enterprise users – Established a base-line using publicly available data

ITIL System Management Eco-system Security and Network Components Scalable/High-Availability/DR Architectures Enterprise-Class Application Building Blocks (3-Tiered + Messaging + etc.) Enterprise-Class Hardware

 Service deployment – Current practice focuses on monolithic systems, with some initial support for more complex distributed applications underway. – Future work to support large-scale distributed architectures is needed.  Service availability – SLA’s are in place and high enough to meet individuals’ needs. – Future work to increase availability is crucial to attract enterprise users and would also benefit individual users.  Problem resolution – Current manual process faces scaling challenges – Future work to reduce the load on the cloud support staff such as providing cloud users with enough visibility into the cloud infrastructure to independently identify the root cause of problems is needed to scale up. © 2009 IBM Corporation 9

LADIS 2009

Are Clouds Ready for Large Distributed Applications?

Page 1 ... Security and Network Components ... Transformation of existing enterprise service deployment into a cloud-based deployment. – Discovery of ...

148KB Sizes 3 Downloads 48 Views

Recommend Documents

Are Clouds Ready for Large Distributed Applications?
software procurement, base OS installation, middle-ware and ... For example, infrastructure as a service providers ... For example, RightScale [6] and 3Tera [5].

Distributed-Large-Scale-Dimensional-Metrology-New-Insights.pdf ...
Page 3 of 3. Distributed-Large-Scale-Dimensional-Metrology-New-Insights.pdf. Distributed-Large-Scale-Dimensional-Metrology-New-Insights.pdf. Open. Extract.

Thr Temple above the clouds Large. Collegium 2013.pdf ...
Thr Temple above the clouds Large. Collegium 2013.pdf. Thr Temple above the clouds Large. Collegium 2013.pdf. Open. Extract. Open with. Sign In.

Are You Ready for the Holidays.pdf
Steubens Bread Winners Mighty Joes. Yak & Yeti The Bluegrass School House. Fazoli's Arvada Tavern Klines. 303 Ramen GB Fish & Chips Bada Bing. Ready to Walk the Runway. It's not the holiday season without the. holiday parties. Be pepared by finding t

Efficient Large-Scale Distributed Training of ... - Research at Google
Training conditional maximum entropy models on massive data sets requires sig- ..... where we used the convexity of Lz'm and Lzm . It is not hard to see that BW .... a large cluster of commodity machines with a local shared disk space and a.

Efficient Large-Scale Distributed Training of Conditional Maximum ...
computer vision [12] over the last decade or more. ..... online product, service, and merchant reviews with a three-label output (positive, negative .... Our analysis and experiments give significant support for the mixture weight method for training

Large Scale Distributed Semi-Supervised Learning Using Streaming ...
Figure 1 shows an illustration of the various graph types. We focus ..... Tutorial, June 2008. [7] A. Carlson, J. .... gation from imagenet to 3d point clouds. In Pro-.

Distributed Large-scale Natural Graph ... - Research at Google
Natural graphs, such as social networks, email graphs, or instant messaging ... cated values in order to perform most of the computation ... On a graph of 200 million vertices and 10 billion edges, de- ... to the author's site if the Material is used

Large Scale Distributed Deep Networks - Research at Google
second point, we trained a large neural network of more than 1 billion parameters and .... rameter server service for an updated copy of its model parameters.

Improved Mining of Outliers in Distributed Large Data Sets ... - IJRIT
Abstract- In Data Mining, a distributed approach for detecting distance-based ... of all the data sets is widely adopted solution requires to a single storage and .... This implementation is portable on a large number of parallel architectures and it

Improved Mining of Outliers in Distributed Large Data Sets ... - IJRIT
achieve a large time savings and it meets two basic requirements: the reduction of the ... of real data sets and in the prevalence of distributed data sources [11].