Towards a General Framework for Secure MapReduce Computation on Hybrid Clouds Chunwang Zhang, Ee-Chien Chang, Roland H.C. Yap School of Computing, National University of Singapore {chunwang, changec, ryap}@comp.nus.edu.sg

Keywords Data security; MapReduce; hybrid clouds; information leakage The idea of a hybrid cloud is to combine a private cloud (e.g., an organization’s in-house private datacenter) together with a public cloud (e.g., Amazon EC2). Hybrid cloud computing offers increased scalability and cost-effectiveness: the private cloud can be used for typical workloads, but when additional resources are needed during peak computations, the public cloud is harnessed. This hybrid cloud architecture has already gained adoption [1] and is still undergoing rapid development [4]. However, hybrid cloud computing needs to address the confidentiality and privacy issues on public clouds. Security and privacy are ranked as the top concerns for organizations considering moving their applications and data to the cloud [2, 4]. There are good reasons for these concerns, e.g., Ristenpart et al. [8] demonstrate that confidential information can be extracted through side-channel information leakage in VMs. Many data breaches have been reported for various cloud service providers [3, 5]. On the other hand, organization data often involve both sensitive and non-sensitive information. For example, an organization’s filesystem may contain both general files mixed with confidential business data. Also, many datasets for analytical tasks such as network logs and healthcare records may involve data from public sources with private organization data. Computations on such mixed-sensitivity data should not be carried out on the public cloud without protection to prevent data leakages. Cryptographic techniques such as fully homoPermission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author. Copyright is held by the owner/author(s). SoCC’13, Oct 01–03 2013, Santa Clara, CA, USA ACM 978-1-4503-2428-1/13/10. http://dx.doi.org/10.1145/2523616.2525944

morphic encryption [7] that enable computation on encrypted data are still far from efficient for large data. With a hybrid cloud, one solution may be to separate the computation on non-sensitive data from that on sensitive data, such that the former can be comfortably outsourced to the public cloud while the latter, possibly much smaller in size, can be easily handled on the private cloud. In this way, the computation can be carried out both securely and efficiently. However, this hybrid computing model is not supported by today’s dataintensive computing frameworks. In particular, MapReduce [6] (MR) is designed for only one cloud and does not distinguish between data and servers with differing sensitivities. A cloud user who wants to run MR jobs with mixed-sensitivity data on a hybrid cloud needs to manually split the data, compute each partition on the corresponding cloud independently and combine the results in her own code. What is desired is an automatic and general framework to facilitate secure computing on hybrid clouds and we focus on MR. Sedic [9] addresses this problem to some degree but has limitations in terms of flexibility and support for complex MR jobs. We work towards this direction by providing a general hybrid MR framework which deals automatically with mixed sensitivity MR jobs while supporting new kinds of MR programming which manipulate the data sensitivity. We propose a general tagged-MapReduce that (conceptually) augments each key-value pair in MR with a sensitivity tag and extends the map and reduce functions appropriately which: 1) allows fine-grained dataflow control during execution and supports scheduling of map and reduce tasks in the two clouds; 2) allows programmers to code sophisticated policies to guide sensitivity transformation during execution; 3) provides sensitivity information for data across multiple MR jobs necessary for complex MR computations with chained jobs. Sedic programs are a special case of our model but Sedic cannot express all tagged-MR programs. However, a hybrid MR framework may sacrifice in performance due to the security constraint. Our tagged-MR has a security model and scheduling mechanisms to deal with these issues.

References [1] Forecast for 2010: The Rise of Hybrid Clouds. Online at http://gigaom.com/2010/01/01/ on-the-rise-of-hybrid-clouds/, 2010. [2] AMD 2011 Global Cloud Computing Adoption, Attitudes and Approaches Study. Online at http://www.slideshare.net/AMD/ amd-cloud-adoption-approaches-andattitudes-research-report, 2011. [3] Epsilon Data Breach Highlights Cloud Computing Security Concerns. http://www.eweek. com/c/a/Security/Epsilon-DataBreach-Highlights-Cloud-ComputingSecurity-Concerns-637161/, 2011. [4] 2012 Cloud Computing Survey. Online at http://northbridge.com/2012-cloudcomputing-survey, 2012. [5] Dropbox: Yes, we were hacked. Online at http://gigaom.com/2012/08/01/ dropbox-yes-we-were-hacked/, 2012. [6] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating System Design and Implementation, pages 137–150, 2004. [7] C. Gentry. Fully homomorphic encryption using ideal lattices. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, pages 169–178, 2009. [8] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM Conference on Computer and Communications Security, pages 199– 212. ACM, 2009. [9] K. Zhang, X. Zhou, Y. Chen, X. Wang, and Y. Ruan. Sedic: privacy-aware data intensive computing on hybrid clouds. In Proceedings of the 18th ACM Conference on Computer and Communications Security, pages 515–526. ACM, 2011.

Towards a General Framework for Secure MapReduce ...

on the public cloud without protection to prevent data leakages. Cryptographic techniques such as fully homo-. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that.

71KB Sizes 0 Downloads 263 Views

Recommend Documents

Towards a Secure Key Generation and Storage Framework ... - EWSN
International Conference on Embedded Wireless ..... ported on this technology. Most of .... tional Conference on Advanced Video and Signal-Based Surveillance.

A Scalable MapReduce Framework for All-Pair ... - Research at Google
stage computes the similarity exactly for all candidate pairs. The V-SMART-Join ... 1. INTRODUCTION. The recent proliferation of social networks, mobile appli- ...... [12] eHarmony Dating Site. http://www.eharmony.com. [13] T. Elsayed, J. Lin, ...

A GENERAL FRAMEWORK FOR PRODUCT ...
procedure to obtain natural dualities for classes of algebras that fit into the general ...... So, a v-involution (where v P tt,f,iu) is an involutory operation on a trilattice that ...... G.E. Abstract and Concrete Categories: The Joy of Cats (onlin

Towards a Secure, Resilient, and Distributed Infrastructure for ... - EWSN
Runs on. Inspired by IEC 61131. Offers Values as Datapoints. Hardware. View. Software. View. Cyclic. Task. Component. Component. Composition. Component.

Wheel of Trust: A Secure Framework for Overlay ...
agement in email systems [1], IBE allows any arbitrary string. (e.g., email ..... shows the message exchange when a node n leaves the system. When n leaves the ...

Wheel of Trust: A Secure Framework for Overlay-based Services
not interact with users or hosting any services. These functions .... id idk, where v is the data the user wishes to store and k is an identifier associated with v.

W-EHR: A Wireless Distributed Framework for secure ...
Technological Education Institute of Athens, Greece [email protected] ... advanced operations (such as to provide access to the data stored in their repository ...

Towards a Framework for Social Web Platforms: The ...
factors and challenges for communities and social networks is available .... publicly available to the best of our knowledge. As it can ... From a business view, we.

Towards a Framework for Designing Applications ...
Key words: CAD tool, nanotechnology, fault tolerance. PACS: 1. Introduction. As an alternative to CMOS based designs, novel nanofabrics are being proposed based on a com- bination of lithographic processes and bottom-up self-assembly based manufactur

Towards a Strategy and Results Framework for the CGIAR - CGSpace
Jun 3, 2009 - new crop variety, management system, or policy concept. ... population distribution in the future (map 1 and Annex A), ...... Developing a global commons of molecular tools and techniques to harness advanced science for.

Towards a Strategy and Results Framework for the CGIAR - CGSpace
Jun 3, 2009 - The Team is in regular communication by email and teleconferences. It held its first face- to-face meeting on May 3 and 4, 2009, in Washington, ...

Towards a Relation Extraction Framework for ... - ACM Digital Library
to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi- supervised Natural Language Processing and ...

Towards a Performance Measurement Framework for ...
according to three dimensions: organisational structure, type of lending .... methods, and describe how the data were collected ..... big section of our company ( ).

Towards a Framework for Business Process Compliance
organizations and software engineers assess the compliance of business .... to capture legal requirements and analyze business process compliance with ...

Towards a Unified Framework for Declarative ...
In a second stage, the customer uses an online broker to mediate between him ... Broker = accept ob(k) given m ≤ 500ms in ( .... closure operators for security.

Towards a Performance Measurement Framework for ...
accounting systems, most firms remained reliant upon a single set of financial measures to gauge their performance. The situation has changed substantially.

Towards a Framework for Social Web Platforms: The ...
Sensitive handling of data, a stable and fast website, rules of behavior, and ... users, but omitting a clear and well-structured approach, resulting in a series of arising ..... Information Growth Through 2010”, IDC white paper, www.emc.com.

A Distributed Kernel Summation Framework for General ...
Dequeue a set of task from it and call the serial algorithm (Algo- ..... search Scientific Computing Center, which is supported .... Learning, pages 911–918, 2000.

A General Kernelization Framework for Learning ...
Oct 1, 2009 - In summary, after defining a between-class scatter matrix Sb and a within-class matrix Sw ..... Kaufmann, San Francisco, CA, 1998, pp. 515–521 ...

IFT-SLIC: A General Framework for Superpixel ...
age into relevant regions that can together represent objects. This partition can greatly reduce the computational time of the algorithms, by replacing the rigid structure of the pixel grid [1]. A superpixel can be defined as a compact region of simi

A Secure Socially-Aware Content Retrieval Framework ...
Through extensive simulation studies using real-world mobility traces, we show that our content retrieval scheme can ...... diffusion of participatory sensor data or popular content (news, software patch, etc.) over multiple devices. Multicast for ..

Cloud MapReduce: a MapReduce Implementation on ...
a large-scale system design and implementation if we build on top of it. Unfortunately .... The theorem states that, of the three properties of shared-data systems ...

Towards Creation Of Logical Framework For Event ...
For Event-Driven Information Systems ... Example Use Case. Conclusion ... reasoning about active systems, (conflicting) situations etc. .... Workflow Management.