Design and Implement Secure Cloud Computing System Based on Hadoop
A thesis Submitted to the Council of the College of Science at the University of Sulaimani in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer
By Ribwar Khalid Mohammed B.Sc. Computer Science (2009), University of Sulaimani
Supervised by Dr. Soran Ab. Saeed Assistant Professor
2017, May
Dr. Alaa Kh. Jumaa Lecturer
2717, Jozardan
ﺴ ِﻢ ﱠ ﯿﻢ ﺑِ ْ ﷲِ اﻟ ﱠﺮ ْﺣﻤٰ َ ِﻦ اﻟ ﱠﺮ ِﺣ ِ وح ِﻣﻦْ أَ ْﻣ ِﺮ َرﺑﱢﻲ ) َوﯾَ ْ وح ﻗُ ِﻞ ﱡ ﺴﺄَﻟُﻮﻧَﻚَ َﻋ ِﻦ ﱡ اﻟﺮ ُ اﻟﺮ ِ َو َﻣﺎ أُوﺗِﯿﺘُ ْﻢ ِﻣ َﻦ ا ْﻟ ِﻌ ْﻠ ِﻢ إِ ﱠﻻ ﻗَﻠِ ً ﯿﻼ(. ﺻ َﺪق ْ ﱠ ﷲ ا ْﻟ َﻌ ِﻈﯿﻢ ﺳﻮرة اﻻﺳﺮاء اﻵﯾﺔ٨٥
Supervisor Certification
I certify that the preparation of thesis titled "Design and Implement Secure Cloud Computing System Based on Hadoop"accomplished by (Ribwar Khalid Mohammed), was prepared under my supervision in the college of Science, at the University of Sulaimani, as partial fulfillment of the requirements for the degree of Master of Science in (Computer).
Signature: Name: Dr. Soran A. Saeed Title: Assistant Professor Date:
/
/ 2017
Signature: Name: Dr. Alaa K. Jumaah Title: Lecturer Date:
/
/ 2017
In view of the available recommendation, I forward this thesis for debate by the examining committee
Signature: Name: Aree Ali Mohammed Title: Professor Date:
/
/ 2017
Linguistic Evaluation Certification I herby certify that this thesis titled "Design and Implement Secure Cloud Computing System Based on Hadoop" prepared by (Ribwar Khalid Mohammed), has been read and checked and after indicating all the grammatical and spelling mistakes; the thesis was given again to the candidate to make the adequate corrections. After the second reading, I found that the candidate corrected the indicated mistakes. Therefore, I certify that this thesis is free from mistakes.
Signature: Name: Soma Nawzad Abubakr Position: English Department, College of Languages, University of Sulaimani Date:
/
/ 2017
Examining Committee Certification We certify that we have read this thesis entitled "Design and Implement Secure Cloud Computing System Based on Hadoop" was prepared by (Ribwar Khalid Mohammed), and as Examining Committee, examined the student in its content and in what is connected with it, and in our opinion it meets the basic requirements toward the degree of Master of Science in Computer. Signature:
Signature:
Name: Dr.Sufyan T. Faraj
Name: Dr. Najmadin W. Abdulrahman
Title: Professor
Title: Lecturer
Date:
Date:
/
/ 2017
(Chairman)
/
/ 2017
(Member)
Signature:
Signature:
Name: Dr.Fadhil S. Abed
Name: Dr. Soran A. Saeed
Title: Assistant Professor
Title: Assistant Professor
Date:
Date:
/
/ 2017
(Member)
/
/ 2017
(Member- Supervisor)
Signature: Name: Dr. Alaa K. Jumaa Title: Lecturer Date:
/
/ 2017
(Member- Co-Supervisor) Approved by the Dean of the College of Science. Signature: Name: Dr. Bakhtiar Q. Aziz Title: Professor Date:
/
/ 2017
Acknowledgements First of all, my great thanks to Allah who helped me and gave me the ability to fulfill this work. We thank everybody who helped us to complete this project specially my supervisor Lecturer Dr. Alaa K. jumaa and Assistant Professor Dr.Soran A. Saeed for his help. Special thanks to my wife Miss.Shaida, for her encouragements, scientific notes, and support that she has shown during my study and to finalize this work . Special thanks to my father, my mother and all my family for their endless support, understanding and encouragement. They have taken their part of suffering and sacrifice during the entire phase of my research. Special thank goes to those who helped me during this work. I am glad to have this work done, thanks to my colleagues and the entire faculty members.
Dedications
This thesis is dedicated to
My mother and father my son
our families
our friends
Computer Department
all who shared by any support .
Ribwar
Abstract A proposed secure cloud computing system has been built using Linux OS and Hadoop package. Hadoop package was utilized to build the area for saving and managing users’ data and enhance the security for it . Hadoop consist of one master (Name Node) and the number of slaves (Data Node). The master nodes oversee the two key functional pieces that make up Hadoop: storing lots of data in Hadoop Distributed File System(HDFS), and excuting parallel computations on all stored data (MapReduce). The features for the Linux and Hadoop package are supporting most of the cloud computing system requirements.This system has the ability for users registration and it can support the cloud computing requirements like file management and data computation . when a user wanted to access cloud system, at the beginning he should visit the cloud system website and access the cloud’s Home Page (cloud system interface). when the user registration process was completed successfully, the cloud system Administrator read user’s data from database and
responsible for creating user account, creating user space, and
generate RSA public key, RSA private key and AES secret key, and save these keys with the uploadfile.sh program . After that, the cloud Administrator had to send user name and the password to the cloud’s user. The cloud’s user could upload files by two ways, non-secure and secure upload files. A number of java programs and shell scripting programs have been utized to perform the data management and enhance the security for the proposed cloud system. Apache Web Server has been used to hosts and manage all the cloud system web pages. The security enhancement for this system are designed and implemented using cryptography techniques (RSA and AES algorithms). The proposed system support most of the cloud computing system service models like Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a i
Service (IaaS). Furthermore, it can support most of the cloud computing requirements, like user registration and management, data management (upload, download, delete …etc.), and security issues (secure transfer data, and key managements). The proposed cloud computing system achieved user privacy or confidentiality by utilizing cryptographic techniques. Two crypto graphical algorithms were utilized in the proposed cloud system. The first one was a RSA algorithm which was utilized for Key Management among the cloud’s users, and the second was an AES algorithm, which was used through data transformation between cloud’s users and cloud system.
ii
CONTENTS Abstract ......................................................................................
I
Contents………………………………………………………….
III
List of Tables...............................................................................
VI
List of Figures .............................................................................
VII
List of Abbreviations ...................................................................
IX
Chapter one :General Introduction 1.1 Introduction ………………..………………………………
1
1.2 Definition of Cloud and Cloud computing ……...………….
2
1.3 Advantages and Disadvantage of Cloud Computing………..
4
1.4 Security in Cloud Computing ………………………….…...
6
1.5 Related Works ………………………….…………………..
7
1.6 Aim of the Work ……………………………….…………...
10
1.7 Thesis Layouts …………………………………………….
11
Chapter Two :Cloud Computing and Hadoop 2.1 Introduction ……………………………………………….
12
2.2 Cloud Computing System……………………………………
12
2.2.1 Cloud Computing Deployment Models……………............
13
2.2.2 Cloud Computing Service Models ………………………...
15
2.2.3 Cloud Computing Attributes ………………………............
16
2.2 .4 Characteristics of Cloud Computing ……………………
17
2.2.5 Technical Components …………………. ………………..
18
2.3 Security Issue in Cloud Computing ……………………….
20
2.4 Data security issues in Cloud Computing …………………
21
2.5 Hadoop Apache Project …………………………………….
24
iii
2.5.1 Hadoop Characteristics ………………………………….
25
2.5.2 Hadoop Components……………… …………………….
26
2.5.2.1 Hadoop Distributed File System (HDFS) ……………..
27
2.5.2.2 Map Reduce …………………………………………….
28
Chapter Three :Proposed Cloud Computing System 3.1 Introduction ……………………………………………….
31
3.2 The Proposed System Architecture…………………………
31
3.3 Cloud Computing Environment ………………………......
32
3.3.1 Build Cloud Computing System Using Hadoop………….
32
3.3.2 Hadoop Package Support Cloud Service Models…………
33
3.3.3 Hadoop Installation on Linux Multiple Nodes ……............
35
3.4 User Interface & Data & Management ……………………..
35
3.4.1 User Registration………………………….……………….
35
3.4.2 User’s Files and Folders Permissions …………………….
37
3.5 Upload and Download Data with Cloud……………......
39
3.5.1 Upload Data to Cloud Server ………………………….
40
3.5.2 Download Files form Cloud System …………………….
42
3.6 Cloud System Security Issue ………………………………
43
3.6.1 Key Management …………………………………………
45
3.6.2 Secure File Transfer over Cloud System ………………….
47
iv
Chapter Four :The Proposed Cloud Computing System Implementation 4.1 Introduction …………………………………………………
51
4.2 Main Steps for Cloud System Implementation……………...
51
4.3 User Interface and Data and Management …………….…...
52
4.3.1 User Registration Process ………………….....................
53
4.4 Upload and Download Files Processes ……………………..
56
4.4.1 Upload Files using FTP and SSH Protocols.. ……….... .
56
4.4.2 Download Files Process …………………………………...
59
4.5 Results Based on Different Packet Sizes …………………...
61
4.5.1 Results based on Different Key Size …………………….
62
Chapter Five: Conclusions and Suggestions for Future Work 5.1 Conclusions ………………………………………………..
65
5.2 Suggestions for Future Work………………………………
67
Appendex ……………………………………………………...
69
References …………………………………………………….
73
v
List of Tables
Table No.
Table Title
Page No.
4.1 Time Consumption (Different Key Size) for Encryption ……..
63
4.2 Time Consumption (Different Key Size) for Decryption …….
64
vi
List of Figures Figure No.
Figure Title
Page No.
1.1
Cloud Computing …………………………………………..
4
2.1
Cloud Deployment Models………………………………….
15
2.2
Cloud Service Model………………………………………
16
2.3
The Cloud Computing Components………………………
19
2.4
HDFS Architecture………………………………………..
21
3.1
General Architecture for Proposed Cloud Computing System ……..………………………………………………
32
3.2
Proposed Cloud System with the cloud services …………..
34
3.3A
Flowchart for adding user process (Client Side)………….
37
3.4B
Flowchart for adding user process (Administrator Side)….
37
3.4
Cloud User - File permissions ……………………………
38
3.5
Cloud user –Change permissions ………………………….
39
3.6
Upload and download files processes in Cloud ……………
40
3.7
shows the flowchart for uploading files to Cloud System …
42
3.8
the list of the users’ files in Cloud System………………….
43
3.9
Proposed Key Management System………………………..
47
3.10
Secure File Transfer over Cloud……………………………
48
3.11
Flowchart for upload secure and non-secure files to Cloud system ……………………………………………………....
3.12
Flowchart for download secure and non-secure
49
files to
Cloud system ………………………………………………..
50
4.1
Diagram show the steps for system ………………………
52
4.2
Proposed Cloud Computing System Home Page………….
53
vii
4.3
User Registration WebPage………………………………...
4.4
Cloud’s Administrator creating User Account ……………..
54 55
4.5
Files included in the Cloud’s user space (folder)…………
55
4.6
FTP Web ………………………………………………….
56
4.7
Cloud’s user choosing files process…………………………
57
4.8
Encryption process using SecureFile.jar program…………..
57
4.9
Cloud’s user using PuTTY application to access………….
58
4.10 4.11
Cloud’s user run uploadfile.sh and upload file to Hadoop…………………………………………………….. Cloud’s user use
WinSCP application with the SFTP
protocol…………………………………………………….
59 60
4.12
Cloud’s user access Hadoop webpage …………………….
60
4.13
Cloud’s user decrypt encrypted downloaded file………….
61
4.14
Analysis with Different Key Size for Encryption…………
63
4.15
Analysis with Different Key Size for Decryption…………
64
viii
List of Abbreviations Abbreviation
Full Text
AES
Advanced Encryption Standard
API
Application Pogram Interface
CSP
Cloud Service Provider
CRM
Customer Relationship Management
DFS
Data File System
DES
Data Encryption Standard
GFS
Global file system
HDFS
Distributed File System
IaaS
Infrastructure as a Service
LAN
Local Area Network
NIST Naas OS PaaS PC RAID RDBMS
National
Institute
of
Standards
and
Technology Network as a Service Operating System Platform as a Service Personal Computer RedundantArray of Independent Disks Relational Database Management Systems
RSA
Rivest, Shamir, & Adleman
SaaS
Software as a Service ix
TB
Tera Bytes
VPN
Virtual Private Network
WAN
Wide Area Network
x
Chapter One
General Introduction
Chapter One General Introduction 1.1
Introduction
From the past few years, there has been a quick progress in cloud computing. Cloud computing convey a wide range of resources like computational platforms, storage, computational power, and applications to users via Internet. The major cloud available now in the markets are Amazon, Google, IBM, Microsoft, etc... . With a growing number of companies resorting to use resources in the cloud, there is a need for protecting the data of different users. Presently ,cloud computing is utilized tremendous amount in different fields. Large amount of data is generated in daily life. To store this huge amount of data, users use cloud computing services. Some major challenges that are being encountered by cloud computing are to secure, protect and process the data which is the property of the user [1]. Cloud computing is being appeared as one of the most powerful and developing networking system which is used by developers as well as users , cloud computing is well fitted for the users who are interested to could in networking environment, cloud computing environment permits its resources to be distributed
among
servers, users and individuals, in turn files or data that are stored in the cloud are publicly accessible to all , due to this open accessibility factor, the files or data of an individual can be used by other users of the cloud as a result attacking threat on data or files become more vulnerable [2]. Cloud computing is has gained popularity in recent years. Cloud facilitates the storage of various sorts of data. Cloud is highly scalable when it comes to huge data and can provide infinite computing resources on demand. Clients can use cloud services without any installation and the data uploaded on cloud is accessible 1
Chapter One
General Introduction
from any corner of the world, all it needs to be accessed is a computer with active Internet connection on it [3]. Cloud computing is the notion of using remote services through a network utilizing different resources. It is basically meant to provide maximum with the minimum resources i.e. the end user is having the minimum hardware requirement but is using the maximum capability of computing. This is available only through this technology which requires and uses its resources in the best way. In the cloud, the end user is just utilizing a very light device which is capable of using a network that connects it to a server at some other location. The users do not need to store the data at its end as all the data is stored on the remote server at some other place, here comes the first advantage of the cloud computing i.e. it decrease cost of hardware that could have been utilized at end user [4]. The utilized of cloud computing has increased fast in many organizations, arguing that small and medium companies utilize cloud computing services for different reasons, because these services provide rapid access to their applications and decrease their infrastructure costs [5]. With the advent of Internet in the 1990s up to the present time, day facilities of widespread computing has changed the computing world in an extreme way. It has traveled from the notion of parallel computing to distribute computing to grid computing and recently to cloud computing [6].
1.2
Definition of Cloud and Cloud Computing
The expression cloud belongs to a network or Internet. In other words, cloud is something, which is present at remote location. Cloud can give services over network, i.e., on public networks or on private networks, i.e., Wide Area Network (WAN), Local Area Network (LAN) or Virtual Private Network(VPN). 2
Chapter One
General Introduction
Applications such as e-mail, web conferencing, customer relationship management (CRM), all run in cloud [7]. Cloud computing belong to configuring, manipulating and accessing the applications online. It offers online data storage, infrastructure and application [7]. The term cloud computing came from the Internet computing or computation through Internet, where many flowcharts and diagrams represent the Internet as a cloud shape [8]. The US national institute of standards and technology (NIST) define cloud computing as "a model for user convenience, on demand network access contribute the computing resources (e.g. networks, storage, applications, servers, and services) that can be quickly implemented with minimal management effort or service provider interference". Cloud computing can also be defined as a new service, which the collection of technologies and a resources of supporting the utilized large scale Internet services for the remote applications with good quality of service (QoS) levels [9]. The information technology (IT) model for computing, which consists of all the IT components (services, networking, software and hardware) that are needed to enable development and delivery of cloud services via the Internet or a private network [10]. Figure(1.1) show the general diagram for simple cloud computing [10].
3
Chapter One
General Introduction
Wire
Wirele
Mobile ss
Figure (1.1) Cloud computing [11]
1.3 Advantages and Disadvantage of Cloud Computing The advantages of cloud computing are explain bellow, the main benefits for organizations in general, focusing on some points for small businesses [12]: § Cost efficiency: Cloud computing is apparently the most cost efficient method to utilize, maintain and upgrade. Traditional desktop software costs organizations a lot, in terms of finance. Adding up the licensing fees for multiple users can prove to be very expensive for the establishment worried. The cloud, on the other hand, is available at much inexpensive rates and, can significantly lower the organizations IT expenses. Besides, there are many one-time-payment, pay-as-you-go and other scalable options available, which is it very reasonable for the organization in question. § Almost Unlimited Storage: Storing information in the cloud gives user almost indefinite storage capacity. 4
Chapter One
General Introduction
§ Backup and Recovery: Since all the data is stored in the cloud, backing it up and restoring the same is relatively much easier and simpler than storing the same on a physical device. Moreover, most cloud services are usually capable enough to handle recovery of information. Hence, this creates the entire process of backup and recovery much easier than other traditional methods of data storage. § Automatic Software Integration. In the cloud, software combination is normally something that happens automatically. This process that cloud users don’t have to take additional efforts to customize and combine their applications as per own preferences. This aspect generally takes care of itself. § Easy Access to Information: Once the users sign up in the cloud, they can access the information from any place, where there is an Internet connection. This useful feature lets users move beyond time zone and geographic location issues. § Quick Deployment: in conclusion and most particularly, cloud computing gives the benefit of quick deployment. Once opting for this strategy for working, the entire system can be completely practical in a matter of a few minutes. Surely, the measure of time taken here will rely on the exact type of technology that is required for the business. § Easier Scale of Services: It creates a less complex for enterprises to scale their service according to the demand of clients. § Deliver New Services: It makes conceivable new classes of applications and deliveries of new services that are interactive in nature. In spite of its many benefits, as specified above, cloud computing likewise has its disadvantages. Businesses, especially smaller ones, need to be aware of these
5
Chapter One
General Introduction
aspects before going in for this technology. The main dangers faced in cloud Computing are [13]: § Data Location: In general, cloud users are not aware of the exact location of the data center and also they do not have any control over the physical access mechanisms to that data. § Investigation: Exploring an unlawful activity might be inconceivable in cloud environments. Cloud services are especially difficult to explore, because data, for multiple customers, may be co-located and might likewise be spread across different data centers. § Data Segregation: Data in the cloud is regularly in a distributed environment together with data from other customers. Encryption cannot be assumed as the single solution for data isolation issue. 1.4 Security in Cloud Computing At the present time increasing the number of companies have moved to cloud market to provide cloud services (e.g. Microsoft Exchange/SharePoint1, Google Apps2, and Amazon EC23). One can argue that this new approach is the future of network capabilities, but there are hidden costs, and security is one of them [14]. Security is reasonable to be one of the most critical aspects in a cloud computing environment due to the sensitive and significant information stored in the cloud for users. Users are wondering about assaults on the integrity and the availability of their data in the cloud from malicious insiders and outsiders, and from any collateral harm of cloud service [15]. As a NIST’s definition, information security is the practice of maintaining the integrity, confidentiality and availability of data from malicious access and system
6
Chapter One
General Introduction
failure. The three main characteristics for the cloud computing security are [11][14]:• Integrity: Information is authentic, complete and reliable. Data shall not be modified inappropriately, whether by accident or deliberately malicious activities .One of the most significant issues related to cloud security dangers is data integrity. The stored data in the cloud storage may suffer from any damage happen during transformation operations from or to the cloud storage provider. The risk of assaults from both inside and outside the cloud provider exists and should be measured. Data authentication assures that the returned data is the same stored data which is extremely important. • Confidentiality: Information can be only accessed by permitted users or distributed among authorized groups. Authentication methods, including credential confirmation, can be applied to protect data against malicious disclosure. • Availability: It refers to the availability of data resource. Data should be available under permitted operation containing read, write and etc.
1.5 Related Works Several research works has proposed and done on security of cloud computing. These researches focus on the number of techniques like cryptography. A number of them are discussed below: K. Suthar et al (2012) proposed a new secure technique for the cloud computing system, they deduced that the Advanced Encryption Standard (AES) is quick and more efficient cryptographic algorithms. When the data sending was considered, there has been unimportant difference in the performance of various symmetric key 7
Chapter One
General Introduction
plans. A study in conducted for various popular secret key algorithms as DES, AES, and Blowfish [16]. P. Rewagad et al (2013) proposed using the digital signature and Diffie Hellman key exchange blended with AES encryption algorithm to protect confidentiality of data stored in cloud. Even if the key in transmission is hacked, the facility of Diffie Hellman key exchange render is useless, since the key in transit is of no use without user’s private key, which is confined only to the legitimate user [17]. S. Ranjan et al (2014) implemented RSA for both encryption and secure communication goals, where as MD5 hashing was utilized for digital signature and hiding key information. This model gave security for the entire cloud computing environment. In the suggested system, an intruder cannot simply access or upload the file because the algorithms are run in various servers at different locations. Both RSA encryption and digital signatures algorithms as a result a powerful security and data integrity service system is procure. In spite of RSA encryption algorithm is quite deterministic but MD5 algorithm makes the model highly secured [18]. V. Mahhale et al (2014) proposed a hybrid encryption algorithm utilizing RSA and AES algorithms for giving data security to the user in the cloud. The biggest benefit of it give users the keys which are created on the basis of time system , so no intruder can even guess them. Private Key and secret key is only known to the user and hence user’s private data was not accessible to anyone not even the cloud’s Administrator. The main aim behind utilizing RSA and AES encryption algorithm was that it gave three keys i.e. private key , secret key for decryption and public key for encryption. The data after uploading was stored in an encrypted form and can be only decrypted by the private key and the secret key of the user. The main advantage of this is that the data is extremely secure on the cloud [19]. 8
Chapter One
General Introduction
H. Mehak et al (2014) proposed Hadoop setup for cloud computing, AES Implementation with compression just uploading the data was not sufficient, some modifications were demanded to be done when they are uploaded the data in order to become unreadable and also for the security purposed. In the proposed technique there was a shared key between sender and receiver, which was known as private key. After performing the compression and encryption, the results were computed on the basis of various parameters, most commonly utilized parameters were the processing time and the space. The experimental results clearly reflect the effectiveness of the methodology to improve the security of data in cloud environment [20]. R. Titare et al (2015) proposed a system of secure data implementation backup mechanism. The main purposes were giving secure data backup and achieve high level security to the data while stored into cloud. If any file was deleted or corrupted by error from cloud server. Simply that file could be recovered by utilizing the suggested system. RC6 algorithm is symmetric block cipher algorithm that are used in this system. Benefits of this recommended system are requiring minimum memory space and minimum time for data encryption and decryption. User requires keys to download the data from cloud. So only permitted person can download the data form cloud [21]. N. Sengupta (2015) proposed hybrid RSA encryption algorithm for security of data in cloud system. In the first phase RSA Encryption algorithm was applied and in the second phase Feistel encryption algorithm was applied on the output data, i.e., cipher text of first phase. After final phase, encrypted data was sent for transmission. Hybrid RSA encryption algorithm which creates the data difficult to decrypt for the attacker. Man in the middle attack was minimized by applying this proposed hybrid RSA encryption algorithm for data transfer in cloud system [22]. 9
Chapter One
General Introduction
S. Kumar et al (2016)proposed a method by implementing RSA algorithm, they were utilizing RSA algorithm to encrypt the data to provide security so that only the concerned user could access it. By securing the data, only the permitted user could access the data. Even if some intruder (unauthorized user) could not get the data accidentally or intentionally even if he captures the data also, he can’t decrypt it and get back the original data . User data is encrypted first and then it was stored in the cloud. When required user places a request for the data to the cloud provider, cloud provider authenticated the user and delivered the data [23]. 1.6 Aim of the Work The security of cloud computing has always been significant aspect of quality of service from cloud service providers. In spite of, cloud computing poses many new security challenges, which have not been well investigated. The main aim of this work is to build a proposed cloud computing system , Hadoop package was used to build the area for saving and managing users’ data and enhance the security for it. This system can be used in any organizations like university data center, hospital data center …etc , and it can support number of services to the users like data storage and data computation. The proposed cloud computing system achieved user privacy or confidentiality by using cryptographic techniques. Two crypto graphical algorithms were used in the proposed cloud system. The first one was a RSA algorithm which was used for Key Management between the cloud’s users, and the second was an AES algorithm, which was used in the data transformation between cloud’s users and cloud system .The stored data in cloud system are protected from unauthorized access. Only the authorized user can access the data in cloud computing system. Even if some attacker gets the data accidentally or intentionally, s/he can’t decrypt it and get genuine data. 10
Chapter One
General Introduction
1.7 Thesis Layouts The remaining parts of this thesis are organized into four chapters they are contents are summarized below: · Chapter Two: In this chapter the explanation about the cloud computing, Hadoop, cloud Computing Deployment Models, cloud computing service Models, cloud computing attributes, characteristics of cloud computing and data security issues in cloud computing are explained.
· Chapter Three: In this chapter the proposed cloud computing system is explained. It include how the proposed system built using Hadoop package and how the security over this system are implemented.
· Chapter Four: In this chapter, the implementation for the proposed cloud Computing system is presented. It shows and explain how the proposed system work and how the user management processes are implemented. · Chapter Five: This chapter is devoted to present the derived conclusions and recommendations for the future work.
11
Cloud Computing and Hadoop
Chapter Two
Chapter Two Cloud Computing and Hadoop 2.1 Introduction The cloud system is running in the internet and the security issues in the internet which can be discovered in the cloud system. The cloud system is not different from the traditional system in the personal computer (PC) and it can meet other special and new security issues. The biggest worries about cloud computing are security and privacy. The traditional security issues, like for instance ,security vulnerabilities, virus and hack attack can likewise make risks to the cloud system and can lead more serious results because of the property of cloud computing. Malicious and hackers intruder may hack into cloud accounts and take sensitive data stored in cloud systems. The data and business application are stored in the cloud center and the cloud system must protect the resource cautiously. Cloud computing is a technology development of the popular adoption of virtualization, service oriented architecture and utility computing. Over the internet and it contains the applications, platform and services. If the systems meet the failure, fast recovery of the resource also is a problem. Systems of cloud hide the details of service implementation technology and the management. The user can’t control the process of deal with the data and the user can’t create certainly the data security by themselves[24]. 2.2 Cloud Computing System The data source storage , operation and network transform deals with the cloud system. The key data resource and privacy data are very import for the user. The cloud must give data control system for the user. The data security review likewise can be deployed in the cloud system. Data transferring to any authorized place user 12
Cloud Computing and Hadoop
Chapter Two
is needed, in a form that any permitted application can utilize it, by any authorized user, on any permitted device. Data integrity requires that only authorized users can modify the data and confidentiality means that only permitted users can read the data. Cloud computing must give strong user access control to strengthen the licensing, quarantine, and certification other aspects of data management. In the cloud computing, the cloud provider system has many users in a dynamic response to modifying service needs. The users do not know which servers are processing the data and do not know what position the data have. The user do not know what networks are moving the data because of the flexibility and scalability of cloud system. The user can’t ensure data privacy operated by the cloud in a confidential way. The cloud system can deploy the cloud center in various area and the data can be stored in different cloud nodes. The various area has various law so the security management can meet the law danger. Cloud computing service must be improved in legal protection[24]. 2.2.1 Cloud Computing Deployment Models As indicated by National Institute of Standards and Technology(NIST),the cloud model is composed of four deployment models[25]: 1-Public Cloud: The cloud infrastructure is made available to the general public or a large industry group and is owned by an institution selling cloud services. The cloud is available to the public on business basis by a cloud service provider. The public cloud has a large variety of organizational and general public clients making it easier to adapt but more vulnerable to security dangers. It is usually owned by a large company (e.g. Amazon’s EC2, Google’s AppEngine and
Microsoft’s
Azure).The
13
owner-organization
makes
its
Cloud Computing and Hadoop
Chapter Two
infrastructure available to the general public via a multi-tenant model on a selfservice basis delivered over the Internet . 2- Private Cloud: The cloud infrastructure is worked only for a single organization. It might be managed by the organization or a third party. The private cloud gives an organization greater control over its data and resources, a private cloud costs more than the public cloud. Thus, the private cloud is more interesting to enterprises especially in mission and safety critical organizations. 3- Community Cloud: The cloud infrastructure is distributed by few organizations and supports a specific community that has distributed concerns (e.g., security requirements, mission , policy, or compliance considerations). It might be managed by the organizations or a third party and might present onpremises or off-premises. It might be managed by any one of the organizations or a third party. 4- Hybrid Cloud: The cloud infrastructure is a composition of two or more clouds (community, private , or public) that stay unique entities but are bound together by standardized or proprietary technology that enables data and application portability. The infrastructure is made up of a combination of two or more clouds, each might be any of the clouds said above (private, community or public). This model offers the largest degree of fault tolerance and scalability.
14
Cloud Computing and Hadoop
Chapter Two
Figure (2.1)Cloud Deployment Models[25] 2.2.2Cloud Computing Service Models Cloud computing can give a collection of services at the moment but main three services are platform-as-a-service ,infrastructure-as-a-service and Software-as-aservice also called as service model of cloud computing[26]. 1-Cloud Software as a Service (SaaS): The ability given to the consumer is to utilize the provider's applications executing on a cloud infrastructure. The applications are accessible from different client devices via a thin client interface such as a web browser (e.g., web-based email). 2-Cloud Platform as a Service (PaaS):The ability given to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created utilizing programming languages and tools supported by the provider (e.g., configurations).
15
Cloud Computing and Hadoop
Chapter Two
3-Cloud Infrastructure as a Service (IaaS): The ability given to the consumer is to provision preparing, networks, storage, and other fundamental computing resources where the consumer is able to deploy and execute arbitrary
software,
which
can
contain
operating
systems
and
applications.(e.g., host fire walls).
Figure(2.2) Cloud Service Model [14] 2.2.3 Cloud Computing Attributes The cloud computing has a number of attributes .The main attributes of cloud computing is illustrated as follows[27]:1-Elasticity and scalability: Elasticity enables scalability, which means that the cloud can scale upward for peak request and descending for lighter request. Scalability also means that an application can scale when adding users and when application requirements modify.
16
Cloud Computing and Hadoop
Chapter Two
2-Self-service provisioning: Cloud customers can have an access to cloud without having a lengthy process. User demand an amount of computing, storage, software, process, or more from the service provider 3-Standardization: For the communication between the services there should be standardized application program interface (API)’s. A standardized interface lets the customer more simply link cloud services together . 4-Pay-as-you-go: Customers has to pay some amount for what resources he has utilized. This pay-as-you-go model means user need to pay for what he has utilized. 2.2.4 Characteristics of Cloud Computing Cloud computing system consists of server characteristics, these characteristics are mentioned below [28]:-· Virtualization: Via cloud computing, the user is able to get service anywhere through any type of terminal. User can achieve or distribute it securely whenever. · High Reliability: Cloud utilizes data mistake tolerant to guarantee the high credibility of the service. · Versatility: Cloud computing can produce different applications supported by cloud, and one cloud can support different applications running it simultaneously. · On Demand Service: Cloud is a large resource pool that a user can buy according to his/her need; cloud is just like running water, and gas that can be charged by the amount that user utilized. · Extremely Inexpensive: The centered management of cloud create the enterprise does not which need to undertake the management cost of 17
Cloud Computing and Hadoop
Chapter Two
datacenter that increase very fast. The versatility can increase the utilization use of the available resources compared with traditional system, souses can completely take benefit of low cost. Some benefits are listed below§ Cloud computing do not need high quality equipment for user and it is easy to utilize. § Cloud computing can realize data distributing between different equipments. § Cloud computing provides dependable and secure data storage center. User don’t worry the problems such as data loss or virus. 2.2.5 Technical Components The key functions of a cloud management system is divided into four layers, the Resources & Network Layer, Services Layer, Access Layer, and User Layer, these layers are shown in the Figure (2.3). Each layer has a set of functions[29]: · The Resources & Network Layer manages the physical and virtual resources. · The Services Layer includes the main categories of cloud services, namely, Network as a Service(Naas),IaaS, PaaS, Saa, the service orchestration function and the cloud operational function. · The Access Layer includes API termination function, and inter-cloud peering and federation function.
18
Cloud Computing and Hadoop
Chapter Two
· The User Layer contains End-user function, Administration function and Partner function
Figure(2.3) The Cloud Computing Components [29] Other functions like Management, Security & Privacy, etc. are considered as cross-layer functions that covers all the layers. The main principle of this architecture is that all these layers are supposed to be optional. This means that a cloud provider who wants to utilize the reference architecture may select and implement only a subset of these layers. In spite of, from the security perspective, the principal of separation requires each layer to take charge of certain responsibilities. In event the security controls of one layer are by passed (e.g. access layer), other security functions could compensate and thus should be implemented either in other layers or as cross-layer functions[29] .
19
Cloud Computing and Hadoop
Chapter Two
2.3 Security Issue in Cloud Computing Although cloud is one of the most sought out technologies nowadays at the present time, it is also the most recent technology. With cloud, like any other new technology, there has not been much research on the security that cloud provides its users. Due to this reason, a cloud suffers from serious security problems including [25]: A-Problems with Data Security As mentioned earlier, there can be a very large amount of data on a cloud. With so much information, it becomes susceptible to losses. Data loss is a common problem with cloud storage where due to improper storage, the required information may be hard to find later. This is considered as loss of data since the data cannot be discovered. In spite of, in certain situations, a cloud service provider might try to reutilize a service, might use someone else’s server to provide service to the user instead of using their own servers. In these cases, data stealing can happen. Data stealing is a problem issue in a cloud. The actual owner of the server will have complete access to his/her server. When a service provider allocates the same server to a user, and the user places all the personal information into that server, the owner of the server could easily access this personal information .
B- Problem of Infection Since a user can access the data on a cloud, a mischievous user could upload a virus or any other application or software that could seriously harm another user’s computer or hardware device when it is downloaded. To prevent such mishaps, a great deal of security measures must be implemented .
20
Cloud Computing and Hadoop
Chapter Two
C- Other Security issues The other type of security issues problems are not just exclusive to a cloud. These include some of the common security threats like hacking a user’s cloud account to gather information. A sound password and a mechanism where tracing is impossible must be implemented.
2.4 Data Security Issues in Cloud Computing Cloud computing is in some way the base of any small or large size companies, as people are utilizing cloud for running their businesses and in cloud they are transmitted their data, or utilizing the cloud for data storage purpose, so the major worry of these people is the security of their data i.e. whether their data is secure in cloud or not. So data security becomes the major obstacle in utilizing the services of cloud computing. Some of the key features which can be taken in the context of data security are Data Integrity, Privacy or confidentiality, and Data Availability [30]. Cloud computing security features are explained in following paragraphs [30][31] : A-Data Integrity Integrity of data means that there should not be any tampering or alteration done to the user data, so to ensure data integrity in cloud environment the cloud Service Provider (CSP) use some effective mechanisms. As user, data is very important for them. To maintain its integrity and to be accountable about the data and what happens to the data at what point it has to implement some mechanism for data integrity . With providing the security of the data, cloud service providers should implement mechanisms to ensure data integrity and be able to tell what happened to a certain dataset and at what point. The cloud provider should make the client aware of what particular data is hosted on the cloud, the origin 21
Cloud Computing and Hadoop
Chapter Two
and the integrity mechanisms put in place for compliance purposes, it may be necessary to have exact records as to what data was placed in a public cloud . B-Privacy and Confidentiality This is the major issue during storing the data in cloud environment, as Cloud Service Provider (CSP) use some mechanism to ensure the privacy and confidentiality of data i.e. the user data should be very confidential and it should not be accessed by anyone else other than the authorized user. Moreover it should not be accessed by the cloud personnel, so it needs to maintain the privacy and confidentiality of data by using different mechanisms. When user host data in cloud ,cloud service provider gives guarantee that hosted data only accessed by authorized user and CPS give the assurance that privacy policy should be applied proper. However data hosted on cloud will be confidential . Once the client host data in the cloud there should be some guarantee that access to that data will only be limited to the authorized access. Inappropriate access to customer sensitive data by cloud personnel is another risk that can pose potential threat to cloud data. Assurances should be provided to the clients and proper practices and privacy policies and procedures should be in place to assure the cloud users of the data safety. The cloud seeker should be assured that the data hosted on the Cloud will be confidential .
C -Data Availability Data availability is another security issue in cloud computing, as cloud Service Provider (CSP) stores its data in distributed manner i.e. at different locations, so as to provide uninterruptible data it needs to use some mechanism [30].Customer data is normally stored in chunk on different servers often 22
Cloud Computing and Hadoop
Chapter Two
residing in different locations or in different clouds. In this case, data availability becomes a major legitimate issue as the availability of uninterruptible and seamless provision becomes relatively difficult . D- Data Location and Relocation Initially cloud computing stores the data at some place but after a while it relocates the data i.e. it locates and relocates the data according to the availability and requirement of storage place so in that case it needs to maintain the security of data at different locations. Cloud Computing offers a high degree of data mobility .Consumers do not always know the location of their data. However, when an enterprise has some sensitive data that is kept on a storage device in the cloud . This, then, requires a contractual agreement, between the cloud provider and the consumer that the data should stay in a particular location or reside on a given known server. Also, cloud providers should take responsibility to ensure the security of systems (including data)and provide robust authentication to safe guard customers’ information. Another issue is the movement of data from one location to another. Data is initially stored at an appropriate location decide by the cloud provider. However, it is often moved from one place to another. Cloud providers have contracts with each other and they use each others’ resources. E-Storage, Backup and Recovery :Cloud provider provides data storage, backup and recovery facility so that in the event of such hardware failure, consumer can roll back to an earlier state. As the user knows that the data is stored in cloud, but what happen if cloud’s storage system get corrupted or data theft occur in its storage system, so it needs to use some mechanism to provide the backup and recovery of the lost data . 23
Cloud Computing and Hadoop
Chapter Two
When the user decide to move his data to the cloud the cloud provider should ensure adequate data resilience storage systems. At a minimum they should be able to provide RAID (Redundant Array of Independent Disks) store systems although most Cloud providers will store the data in multiple copies across many independent servers . In addition to that, most Cloud providers should be able to provide options on backup services which are certainly important for those businesses that run cloud based applications so that in the event of a serious hardware failure they can roll back to an earlier state and storage it resided on, where it was processed .When such data integrity requirements exists, the origin and custody of data or information must be maintained in order to prevent tampering or to prevent the exposure of data beyond the agreed territories . 2.5 Hadoop Apache Project Hadoop is a top level Apache project, an open source software framework, that is written in java programming language which allows the distributed processing of massive data sets across different sets of servers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage. Rather than relying on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, delivering a highly available service on top of a cluster of computers, each of which may be prone to failures .Hadoop was created by Doug Cutting with the goal to create a distributed computing framework and programming model to provide for easier development of distributed applications. The philosophy is to provide scale-out scalability over large clusters of rather cheap commodity hardware. Its creation was 24
Cloud Computing and Hadoop
Chapter Two
motivated and was largely based on papers published by Google to describe some of their internally systems, namely the Global file system(GFS) and Google Map Reduce[32]. Hadoop is designed to be scalable, and can run on small as well as very large installations. Several programming frameworks including Pig Latin and Hive allow users to write applications in high level languages (loosely based on the SQL syntax) which compile into MapReduce jobs, which are then executed on a Hadoop cluster. Hadoop committers today work at several different organizations such as Hortonworks, Microsoft, Facebook, Cloudera, LinkedIn, yahoo, eBay and many others around the world [32].
2.5.1 Hadoop Characteristics Hadoop Apache Project consists of the number of characteristics, these characteristics are[33] [32]:1. Scalable:Automatic scale up/ down Hadoop heavily relies on Data File System (DFS), and hence it comes with a capability of easily adding or deleting the number of nodes needed in the cluster without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. 2. Cost Effective:Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per Tera Bytes (TB) of storage, which in turn makes it affordable to model all the data. 3. Flexible:Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any single system can provide . 25
Cloud Computing and Hadoop
Chapter Two
4. Fault Tolerant:It is the ability of a system to stay functional without interruption and losing data even if any of the system components fail. One of the main goals of Hadoop is to be fault tolerant. Since Hadoop cluster can use thousands of nodes running on commodity hardware, it becomes highly susceptible to failures. Hadoop achieves fault tolerance by data redundancy replication. It also provides ability to monitor running tasks and auto restart the task if it fails. 5. Built in Redundancy: Hadoop essentially duplicates data in blocks across data nodes. For every block, there is assurance for a back-up block of same data existing somewhere across the data nodes. Master node keeps track of these nodes and data mapping. In case if any node fails, the other node where backup data block resides takes over making the infrastructure failsafe. A conventional Relational Database Management Systems (RDBMS) has the same concerns and uses terms like: persistence, backup and recovery. These concerns scale upwards with Big Data. 6. Computational Tasks on Data Residence: Moving computation to data, any computational queries are performed where the data resides. This avoid overhead required to bring the data to the computational environment. Queries are computed parallel and locally and combined to complete the result set . 2.5.2 Hadoop Components Hadoop Consists of two main components: Map Reduce(which is deals with the computational operation to be applied on data) and the Hadoop Distributed File System or(HDFS) (which deals with reliable storage of the data)[32].
26
Cloud Computing and Hadoop
Chapter Two
2.5.2.1 Hadoop Distributed File System (HDFS) The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size [35]. It has many similarities with existing distributed file systems. In spite of, the differences from other distributed file systems, they still are significant. (HDFS) is highly fault-tolerant and is designed to be deployed on low-cost hardware. (HDFS) gives high through put access to application data and is suitable for applications that have large data sets. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project[32].
Figure (2.4) HDFS Architecture[32] 27
Cloud Computing and Hadoop
Chapter Two
HDFS has a master/slave architecture as shown in Figure (2.5), comprised of a Name Node which manages the cluster metadata and Data Nodes that store the data. Master node called “NameNode”, and slave nodes are called “Data Nodes”. HDFS divides the data into fixed-size blocks (chunks) and spreads them across all DataNodes in the cluster. Each data block is typically replicated three times with two replicas placed within the same rack and one outside. The NameNode keeps track of which Data Nodes hold replicas of which block and Name node actively monitors the number of replicas of a block. When a replica of a block is lost due to a Data Node failure or disk failure, the Name Node creates another replica of the block. It also determines the mapping of blocks to Data Nodes. The Data Nodes are responsible for serving read and write requests from the file system’s clients. The Data Nodes also perform block creation, deletion, and replication upon instruction from the Name Node[32].
2.5.2.2Map Reduce Map Reduce is a heart of Hadoop, a programming model for parallel processing of tasks on a distributed computing system and an associated implementation for processing and generating large data sets. This programming model allows splitting a single computation task to multiple nodes or computers for distributed processing. As a single task can be broken down into multiple subparts, each handled by a separate node, the number of nodes determines the processing power of the system. As Map Reduce is an algorithm, it can be written in any programming language [36]. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution 28
Cloud Computing and Hadoop
Chapter Two
across a set of machines, handling machine failures, and managing the required inter-machine communication[37]. When there is a large amount of data, the computations have to be distributed across hundreds or thousands of machines in order to finish in a reasonable amount of time. The issues of how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code to deal with these issues. As a reaction to this complexity, Google designed a MapReduce as a new abstraction that allows expressing the simple computations that programmers trying to perform but hides the messy details of parallelization, fault-tolerance, data distribution and load balancing in a library [37]. Programmers find the system easy to use because MapeRduce allows programmers to focus on the application logic and handling the messy details such as handling failures, application deployment, task duplication, and aggregation of results automatically. The Map Reduce paradigm has become a popular way of expressing distributed data processing problems that need dealing with large amount of data. MapReduce is used by a number of organizations worldwide for diverse tasks such as application log processing, user behavior analysis, processing scientific data, web crawling and indexing etc. A Map Reduce program consists of two user-specified functions, the Map and the Reduce function which are explained bellow: a. Map Phase In the Map phase, each Mapper reads raw input, record by record, and converts it into Key/Value pair [(k,v)], and feeds it to the map function then map function performs a computation on a key value pair. The Map function operates on each of the pairs in the input and produces intermediate output in the form of new key/value pairs depending upon how the user has defined the 29
Cloud Computing and Hadoop
Chapter Two
Map function. Output of the map function is then passed to the reduce function as input [33]. b. Reduce Phase The reduce function applies an aggregate function on its input merging all intermediate values associated with intermediate keys (e.g. count or sum values), and storing its output to disk. The reduce output is also in the form of key-value pairs. At the end of the reducer, the output is sorted according to the values of keys, and the function for comparison of the keys is usually supplied by the user. During the execution of a MapReduce job, the input is first divided into a set of input splits. The system then applies map functions on each of the splits in parallel. The system spawns one task for each input split, and output of the task is stored on a disk for transferring to the reduce tasks. The system starts reduce tasks once all the map tasks have been successfully completed. Task or node failures are dealt by re-launching the tasks. Data given as input to the tasks, and generated as output of the tasks which is stored in a distributed file system (HDFS for instance), to make sure that output of the task survives failures [34].
30
Proposed Cloud Computing System Design
Chapter Three
Chapter Three Proposed Cloud Computing System Design
3.1 Introduction In cloud computing, providing security for protecting data and resources is an important task to handle security problems . In order to solve the problem of data security in cloud computing system, one of the solutions is cryptography algorithms, data is stored in format of encryption. Through data encryption is not accessed by the unauthorized users from cloud environment. In this chapter, the proposed system for enhancing the security of the cloud computing systems is presented. Hadoop environment has been installed on the virtual machine environment which is used to build the cloud system. This system has the ability for the users registration and it can support almost all cloud computing requirements like file management and data computation. Most of the cloud computing system issues are presented and discussed in this chapter. 3.2 The Proposed System Architecture In this thesis, the proposed cloud computing system was built using the Linux OS and Hadoop package. The features for the Linux and Hadoop package were supporting all the cloud computing system requirements. Figure (3.1) shows the general architecture for the proposed cloud computing system.
31
Chapter Three
Proposed Cloud Computing System Design
Figure (3.1) General Architecture for Proposed Cloud Computing System 3.3 Cloud Computing Environments The proposed system can be divided into three parts, cloud computing environment, user interface and data management, and security issues. These parts are discussed in the next sections. The cloud computing environments means how the cloud properties and requirements can be satisfied. In the proposed system the Linux and Hadoop package was used to perform and satisfy all of the cloud system requirements. 3.3.1 Building a Cloud Computing System Using Hadoop In this thesis Hadoop package was used for building cloud computing system. Hadoop consist of one master (Name Node) and the number of slaves (Data Node). The master nodes oversee the two key functional pieces that make up Hadoop: storing lots of data in Hadoop Distributed File System(HDFS), and running parallel computations on all stored data (MapReduce).
32
Chapter Three
Proposed Cloud Computing System Design
The Name Node oversees and coordinates the data storage function (HDFS), while the Job Tracker oversees and coordinates the parallel processing of data using Map Reduce. The Master (Name Node) manages the file system namespace operations like opening, closing, renaming files and directories and determines the mapping of blocks to Data Nodes along with regulating access to files by clients. Slaves (Data Nodes) are responsible for serving read and write requests from the file system’s clients along with performing block creation, deletion, and replication upon instruction from the Master (Name Node). 3.3.2 Hadoop Package Support Cloud Service Models As it mentioned in the previous chapter, Cloud Computing system consist of three Service Models, Software as a Service (SaaS), Platform as a Service (PaaS),and Infrastructure as a Service (IaaS). The Software as a Service (SaaS) involves the Cloud provider maintaining and installing software in the Cloud, and users executing the software from their cloud clients over the internet. In the proposed cloud computing system, the cloud system Webpage (cloud interface) can support all the SaaS services in Cloud computing system. The proposed cloud interface allow the cloud’s users to access cloud computing system and transfer files by an easy way. Platform as a Service (PaaS) provides the users with application platforms and databases as a service. In the Proposed cloud System, Apache Web server and MySQL server supported the PaaS services. Infrastructure as a service is (IaaS) taking the physical hardware and going completely virtual (e.g. all servers, networks, storage, and system management all existing in the cloud). In the Proposed cloud System, the Linux OS, and the Hadoop package can support the cloud infrastructure and all the services that are needed in IaaS. 33
Chapter Three
Proposed Cloud Computing System Design
Figure (3.2) shows how the proposed cloud System can support the cloud computing services. Cloud Computing Layers
Hadoop package
Software as a Service (SaaS)
Web page (SaaS) Platform as a Service (PaaS)
Apache Server &MySQL Database (PaaS)
Infrastructure as a Service (IaaS)
VM Linux OS & Hadoop package (Iass)
Figure (3.2) Proposed Cloud System with the Cloud Services
34
Chapter Three
Proposed Cloud Computing System Design
3.3.3 Hadoop Installation on Linux Multiple Nodes Hadoop is an open-source framework that allows to store and process Big Data in a shared environment across clusters of computers utilizing simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. The Hadoop version 2.7 was downloaded from the internet [38] and installed in Linux/Debian version7. One Master Machine(Name Node) and two Slaves Machines (Data Node)were used in the Proposed cloud System. At the beginning, the Hadoop System was installed in the Master server, which was responsible for cloud’s users’ managements. Other Hadoop System was installed in the Slaves, which was responsible for data storing and computation. The instruction steps and commands for Hadoop System installation and configuration for the Master and Slaves are explained in Appendix A. 3.4 User Interface and Data Management In this section the user interface system that were used by the cloud’s user to access cloud computing system are discussed, and the data and files management over cloud System are discussed too. 3.4.1 User Registration To perform cloud user registration process, each user need to use User-LoginPage. This page includes numbers of fields like Full Name, User Name, Password and Email Address, all these fields should be filled by user. The Login-Page was designed and implemented using PHP, and user information saved in database which was built using MySQL Server. In this module all fields must be filled, otherwise error message will be displayed. At the beginning, cloud’s user needed to be registered in the cloud 35
Chapter Three
Proposed Cloud Computing System Design
system then he can perform the remaining operations, so without registration, the clouds’ operations cloud not be performed. In the user registration form user must enter the valid information otherwise they will get the error message, once the user registers by entering proper information it automatically receive message saying , “The User Successfully Registered “. All the users in the cloud system have a specific space (user home folder) which contains all user’s information (files and folders).Each user has full permission to access his files and folders, and he can change these permissions ass/he wants. In cloud Server side, when the information for the registered user are reached, the cloud Administrator should be create a new user account, and because the Linux OS was used in the proposed cloud system, the Administrator needed to use “adduser” command to create this account depends on the user information that are saved in the database. Then the cloud’s Administrator create a specific space (Folder) for the user in Hadoop system, this space using later to save all the user information in the cloud system. After that, cloud Administrator create RSA and AES keys (which are used for the security issue) and save it in the user space. When the cloud’s Administrator create a new user and locate a space to him, he send User-Name and Password to the new user by email. In the administrator side, a system program built using shell programming language, this program is responsible for creating user, creating user space, and generate RSA, AES keys and save these keys in the user space (folder).Figure (3.3-A) shows flowchart for user process in client side and Figure (3.3-B) shows this process in cloud’s Administrator side.
36
Chapter Three
Proposed Cloud Computing System Design
Figure (3.3-A) Flowchart for adding
Figure (3.3-B) Flowchart for adding
user process (Client Side)
user process (Administrator Side)
3.4.2User’s Files and Folders Permissions All users in cloud system should have specific permissions for their files and folders, these permissions were read, write, and execute. These permissions allows users to specify exactly who has a permissions for accessing his files and folders. This will enhanced the security for the cloud system. There are three possible attributes for file access permissions (Figure 3.4): 37
Proposed Cloud Computing System Design
Chapter Three
1- r - Read Permission. Whether the file may be read. In the case of a directory, this would mean the ability to list the contents of the directory. 2-w - Write Permission. Whether the file may be written to or modified. For a directory or files, this defines whether to make any changes to the contents of the directory. If write permission is not set, then user will not be able to delete, rename or update a file. 3-x - Execute Permission. Whether the file may be executed. In the case of a directory, this attribute decides whether user has permission to enter, run a search through that directory or execute some program from that directory The commands that are used for modifying file permissions and ownership arechmod(for change permissions) and chown (for changing the ownership).When auser needs to add or remove the permissions, he can use the command “chmod” with a “+” or “-“ characters (“+” to add permission and “-“to remove permission).
Figure (3.4) Cloud User - File Permissions Users in the cloud can give the file/folder permissions to the other types of users like user group or others. From Figure (3.5) it can be seen that there are three different types of system users:38
Proposed Cloud Computing System Design
Chapter Three
u – User: - A user is the owner of the file, by default, the person who created a file becomes its owner. Hence, the user is also sometimes called an owner. g – Group:-A user group can contain multiple users. All users are belonging to a group will have the same access permissions to the file. o – Other: - Any other users who have access to a file .This person has neither created the file nor belongs to a user group who could own the file. Practically, it means, when the user set the permission for others, it is also referred to as set permissions for word. By these features user in the cloud system can give other users in their group or others users outside their group the ability for accessing their files/folders according to the permissions that are granted early to them.
Figure (3.5) Cloud User –Change Permissions
3.5 Upload and Download Data with Cloud Cloud’s user can upload and download files to/from cloud Server. The Proposed cloud System supported appropriate applications (programs) to their users which were used for uploading and downloading files to/from the cloud system. Figure (3.6) shows the file upload and download process in cloud system
39
Proposed Cloud Computing System Design
Chapter Three
Figure (3.6) Upload and Download Files Processes
3.5.1 Upload Data to Cloud Server Cloud server is the area where the user is going to request and upload files. In order to upload data, the data owner has to be registered in the cloud server. Once the data owner register in Cloud server, the space will be assigned to him and he can upload and download data. The proposed system supported two programs for uploading files to cloud system. The first one was used to upload file to the Linux server and save it in the user space, and the second one used to upload files from the Linux Server to the Hadoop package. 1. Upload Data to Linux Server Cloud’s users used File Transfer Protocol (FTP) to upload files to the Linux server. The FTP is the most popular protocol used to transfer files from one system to other system. It provides the fastest way to transfer files. There are many applications available on Linux and Windows that is support FTP services. 40
Chapter Three
Proposed Cloud Computing System Design
When cloud’s user wanted to upload the file to the cloud server, he needed to visit cloud System web page and use FTP page. This web page provided file manager interface for uploading files. It also give the user the ability to synchronize files between user local computer and the Web server, change or establish file permissions, and perform other advanced file and folder management functions. After that, the uploaded files were saved in the Linux user space and it was ready for uploading to the Hadoop system. 2- Upload Files to Hadoop System Secure Shell Protocol (SSH) is a protocol which is used by clouds’ users to access cloud server and run shell script commands. SSH provides a secure channel over an unsecured network in a client-server architecture and it connects an SSH client application with an SSH server. Users cloud be used “Putty” application program which depends on SSH protocol to connect to the cloud system. The “Putty” application was available in the proposed cloud system website and cloud’s users can download indirectly. After cloud’s users were connected to the cloud system via SSH, they cloud use the “upload.sh” program (which were built and saved in the user space) to upload files from Linux server to Hadoop system .The “upload.sh” program written by shell script language. Figure (3.7) shows the flowchart for uploading files to cloud System.
41
Chapter Three
Proposed Cloud Computing System Design
Figure (3.7) Shows the Flowchart for Uploading Files to Cloud 3.5.2 Download Files form Cloud System In the proposed system, when a cloud’s user wanted to download files from a cloud system, at the beginning he/she must visit a cloud System Webpage and access his/her files in the Hadoop. Then the cloud system was providing a list of files that the users were uploaded earlier to the cloud. Among that user select the indented file from the list then automatically the file was downloaded and saved in the user local host. Figure (3.8) shows the list of the users’ files in cloud system.
42
Proposed Cloud Computing System Design
Chapter Three
Figure (3.8) The list of the Users’ Files in Cloud System.
3.6 Cloud System Security Issue Even though, the cloud computing delivers wide range of dynamic resources, the security concern is generally perceived as the huge problem in the cloud which creates the users to resist themselves in adopting the technology of cloud computing.
In general, the security issue in cloud computing consist of four
characteristics Authentication, Integrity, Availability and Confidentiality. The proposed cloud computing system achieved these four characteristics according the following:1- Authentication (Username &Password) The
process
of
identifying
an
individual,
usually
based
on
a username and password. User authenticates himself to the cloud system by his unique Username and Password. If the Username and Password for the cloud user is incorrect, the user will not allowed him to access the cloud System. The 43
Chapter Three
Proposed Cloud Computing System Design
Username and Password for the user have a high degree of confidentiality, because the Linux System encrypted the Username and Password using one of the Secure Hashing algorithm. 2- Integrity & Availability Integrity, in expressions of data is the assurance that information can only be accessed or modified by those allowed. Integrity of data means that there must not be any tampering or alteration done to the user data. And because the users’ data is very important for them, so the cloud system administrator need to use some effective mechanisms to maintain the data Integrity. Data Availability is another security problem in cloud computing, as a cloud system Administrator stores data in shared manner i.e. at various locations, so as to provide data availability in the cloud system the administrator needs to utilize some mechanism. Integrity &Availability in the Proposed cloud System were achieved by the Hadoop system. Hadoop system supported the data Integrity for the proposed cloud system by using the number of algorithms that already exists in the Hadoop Master and Slaves so these data could not be altered and changed. About Availability, the Hadoop system using the number of Slaves (Data Node) which were used to save users’ data, these slaves connected internally with each other, and if any one of them went down the other slaves cloud take a place for it. And because the data were repeated in these Slaves (frequent saved), so this data was available any times even when the number of slaves went down.
44
Chapter Three
Proposed Cloud Computing System Design
3- Confidentiality (AES andRSA) The major problem during storing the data in cloud environment is to protect user Confidentiality, as cloud system Administrator utilize some mechanism to guarantee the confidentiality of the data i.e. the user data must be very confidential and it must not be accessed by anyone excepts the person who has the permissions to do that, so it needs to maintain the privacy and confidentiality of data by utilizing various mechanisms. The proposed cloud computing system achieved user privacy or confidentiality by using cryptographic techniques. Two crypto graphical algorithms were used in the proposed cloud system. The first one was a RSA algorithm (Rivest, Shamir, and Adelman) which was used for Key Management among the cloud’s users, and the second was an AES algorithm (Advance Encryption Standard), which was used through data transformation between cloud’s users and cloud system. 3.6.1 Key Management Key management is the management of cryptographic keys in a cryptosystem. This contains dealing with the generation, storage, exchange, use, and replacement of keys. Cloud system Administrator is responsible for preparing the keys for all cloud’s users that have been registered in cloud system. Key exchange is prior to any secured communication, users should set up the details of the cryptography. In some instances this may require exchanging identical keys (in the case of a symmetric key system). In others it might require possessing the other party's public key. While public keys can be openly exchanged (their corresponding private key is kept secret).Symmetric keys must be exchanged over a secure communication channel.
45
Chapter Three
Proposed Cloud Computing System Design
In the proposed cloud system, when a cloud system Administrator created a new user and give him a space in the system, s/he automatically generated the keys (secret key for AES, public and private keys for RSA) and saved it in the user space. Only the intended user can access and download these keys because he had a permissions to do that. So no any other users can access the user space and see his secret keys. After that each user in the cloud System has AES secret key and RSA public and private keys. When the cloud’s users wanted to share some secret data in the cloud system, they can do that by sharing the AES secret key with the other users by using their public and private keys. For example user one cloud encrypt the AES secret key using RSA public key for the second user and send the encrypted secret key to him. The second user can be decrypted the encrypted secret key by using his RSA private key. The cloud’s user had the ability to change his AES secret key, he needed that when he wanted to prevent the cloud Administrator from access and show the real data in his files. So, for key management process, at the beginning cloud’s user downloaded his/her keys (RSA and AES keys) from the cloud System by using Secure File Transfer Protocol (SFTP), this protocol was a secure transfer protocol and also it was available in the cloud System Webpage and the cloud’s user cloud download it directly. Furthermore, cloud’s user must to download the RSA public keys for all other cloud’s users. This process was used by all the cloud’s users, Figure (3.9) shows the key management process over proposed cloud computing system.
46
Proposed Cloud Computing System Design
Chapter Three
Figure (3.9) Proposed Key Management System
3.6.2 Secure File Transfer over Cloud System When the cloud’s user downloaded his keys, s/he becomes ready to upload and download files to/from cloud system. Figure (3.10) shows the Secure File Transfer over cloud system.
47
Chapter Three
Proposed Cloud Computing System Design
Figure (3.10) Secure File transfer over When a cloud’s user wanted to upload file to the cloud system, s/he can decide if this file was in need to be encrypted or not. If the file did not need security the user can upload it directly by requesting the FTP-page, selecting File from local computer, and uploading files to the cloud system. If the file was a secure file, it was in need to be encrypted before sending to the cloud System. So the user run then crypt ion program that was responsible for selecting a file and encrypting it using AES algorithm. So, the encrypted file was ready to be uploaded using FTP web page, the program that was used for file encrypted written by java language and the cloud’s user cloud download it from the cloud system home page . 48
Chapter Three
Proposed Cloud Computing System Design
Figure (3.11) shows the flowchart for upload secure and non-secure files to cloud system.
Figure (3.11) Flowchart for upload secure and non-secure files to Cloud system When a cloud’s user was in need to download files from the cloud system, he requested a cloud system Webpage and selected the Hadoop home page. After that, the user cloud browse his/her files and other files which had a permission for browsing and accessing . Then the user cloud download file from the Hadoop page directly. 49
Chapter Three
Proposed Cloud Computing System Design
The cloud’s user cloud download two types of files, normal files and encrypted files. When the user wanted to download a normal file, he should select file, download selected file, and save it in the user local host. But when he wanted to download an encrypted file, select file, download selected file, decrypt the contents for the downloaded file (using AES algorithm), and save the decrypted file in the user local host. Figure (3.12) shows the flowchart for download files from the cloud system.
Figure (3.12) Flowchart for download secure and non-secure files to Cloud system 50
Chapter Four
The Proposed Cloud Computing System Implementation
Chapter Four The Proposed Cloud Computing System Implementation
4.1 Introduction Nowadays large amount of data is stored on cloud. But it’s a difficult task to provide security to data for life time. In this chapter, the implementation for the proposed cloud computing system is presented .It shows and explains how the proposed system work and how it can be managed all the users processes like user-registration ,upload files and download files . It also explained how the security enhancement for the cloud system along with its key management and secure data transfer (upload/download) can be achieved in the practical part. 4.2 Main Steps for Cloud System Implementation The main idea of the proposed system is to design and implement cloud computing system and enhanced its security. Hadoop package was used to build the area for saving and managing users’ data and supporting availability, integrity, large Storage space and fast data computations, but still it needs other application programs that are used to assist cloud’s users for accessing the cloud system. So, the web application system was built using PHP language and it used Apache Server in Linux as a hosting place. Also another application programs that were used to enhance the security for the proposed cloud system was built. These application programs were written using java language. Figure (4.1) shows the main steps for proposed cloud computing system implementation.
51
Chapter Four
The Proposed Cloud Computing System Implementation
User-registration process
Create username , password and security keys
Upload file process
Download file process
Figure (4.1) Diagram Show the Steps for System 4.3 User Interface and Data Management In the proposed cloud system, when a user wanted to access cloud system, at the beginning he should visit the cloud system website and access the Cloud’s Home Page (cloud system interface).This Home Page was used to facilitate navigation to other pages on the site by providing links to prioritized and recent articles and pages. So, the Home Page had the number of links for the other pages in the cloud system, these pages are User-Registration, Uploading Files, Cloud Server Page and Application Programs. Cloud’s users used these pages for uploading and downloading files to/from cloud system. Figure (4.2) shows the proposed cloud computing system Home Page. The following sub sections explain how the system can be implement step by step.
52
Chapter Four
The Proposed Cloud Computing System Implementation
Figure (4.2) Proposed Cloud Computing System Home Page 4.3.1User Registration Process User registration process started when the users selected the Register-User link in the cloud system Home Page. After that user registration page was loaded in the internet browser, it had the number of input texts which was represent the user information like username, password, email address …etc. When the user the filled out his information in the registration page, he had to click Register command to send his information to the cloud server, and the information was saved in the server database. Figure (4.3) shows the user registration web page.
53
Chapter Four
The Proposed Cloud Computing System Implementation
Figure (4.3) User Registration Web Page On the other side, when the user registration process was completed successfully, the cloud system Administrator read user’s data from database, and run the creatuser.sh program which was responsible for creating user account, creating user space, and generate RSA public key, RSA private key and AES secret key, and save these keys with the uploadfile.sh program (which is responsible for upload files from Linux server to Hadoop) in the user space. After that, the cloud Administrator had to send user name and the password to the cloud’s user. Figure (4.4) shows how the cloud Administrator run creatuser.sh program to create new user, and Figure (4.5) shows the user space (folder) contents which are includes RSA public and private key, AES secret key, and uploadfile.sh program.
54
Chapter Four
The Proposed Cloud Computing System Implementation
Figure (4.4) Cloud’s Administrator creating User Account
Figure (4.5) Files included in the Cloud’s User Space (folder)
55
Chapter Four
The Proposed Cloud Computing System Implementation
4.4 Upload and Download Files Processes The data owner (cloud’s user) uploaded his files to the cloud server. For the security purpose (if it needed) the data owner encrypted data files, sent it to cloud system, and saved it in the user space. The data owner could have capable of manipulating the encrypted data file, and he could set the access privilege to the encrypted data file. 4.4.1 Upload Files using FTP and SSH Protocols As it mentioned in the previous chapter, the cloud’s user could upload files by two ways, non-secure and secure upload files. For the first one the Cloud’s user had to access the uploaded application web page by clicking on Upload Data link in the cloud’s Home Page, then the FTP web page was loaded in the web browser, then cloud’s user had to enter IP address or domain name for the cloud server, enter his User Name and Password, and then click on Login command. After that the user could choose the file that he wanted to send, and sent it to cloud server (Linux OS).Figures (4.6)shows the FTP web page, and (4.7) shows how the cloud’s user choosing files that he want to send.
Figure (4.6) FTP Web 56
Chapter Four
The Proposed Cloud Computing System Implementation
Figure (4.7) Cloud’s User Choosing Files Process The second one (secure upload file), the cloud’s user had to encrypt data file before he access the FTP web page. So, s/he had to run SecureFile.jar program (that is downloaded early form the cloud’s home page) that was used to encrypt the data file using AES algorithm, then he cloud access FTP page and upload the encrypted file to the cloud server. Figure (4.8) shows the encryption process using SecureFile.jar program.
Figure (4.8) Encryption Process using SecureFile.jar Program. 57
Chapter Four
The Proposed Cloud Computing System Implementation
When uploading file process (using FTP web page) was completed, all user’s file were saved in Linux server user space. So, these files had to be uploaded to the Hadoop package .To do that, cloud’s user had to access the Linux server and run the uploadfile.sh program which was already saved in the Linux user space. Cloud’s user used PuTTY application (run SSH protocol) to access Linux server. SSH protocol was in the need of the username and the password for the cloud’s user. When the accessing process done, the user cloud run the uploadfile.sh program and upload file to the Hadoop package. Figures (4.9) and (4.10) shows how the cloud’s user can upload file from the Linux server to the Hadoop package.
Figure (4.9) Cloud’s user using PuTTY application to access
58
Chapter Four
The Proposed Cloud Computing System Implementation
Figure (4.10) Cloud’s user run uploadfile.sh and upload file to Hadoop 4.4.2 Download Files Process In the proposed cloud System, there were two types of downloading files from cloud server. The first one, cloud’s user had to download RSA public key, RSA private key, and AES secret key. This download process cloud be done using (WinSCP) application with the SFTP protocol. Figure 4.11 shows how the cloud’s user download encryption keys using SFTP protocol. The second one, cloud’s user cloud download his files from the cloud server. This cloud be done by accessing the Hadoop website by clicking the Cloud Server link in the cloud’s Home Page, selecting files, downloading this file, decrypting it (if it needed), and save it in the user local host. Figures 4.12 and 4.13 shows the cloud’s user download process.
59
Chapter Four
The Proposed Cloud Computing System Implementation
Figure (4.11) Cloud’s user use WinSCP application with the SFTP protocol
Figure (4.12) Cloud’s user access Hadoop webpage 60
Chapter Four
The Proposed Cloud Computing System Implementation
Figure (4.13) Cloud’s user decrypt encrypted downloaded file
4.5 Results Based on Different Packet Sizes Encryption time is utilized to calculate the throughput of an encryption scheme. It indicates the speed of encryption. Different packet sizes are utilized in this experiment for AES algorithms .The encryption time is recorded for the encryption algorithms. The average data rate is calculated for AES based on the recorded data. The formula utilized for calculating average data rate is (4.1) Where AvgTime = Average Data Rate (Kb/s) Nb = Number of Messages Mi=Message Size (Kb) Ti=Time taken to Encrypt Message Mi
61
Chapter Four
The Proposed Cloud Computing System Implementation
Encryption time is utilized to calculate the throughput of an encryption scheme. It indicates the speed of encryption. The throughput of the encryption scheme is calculated utilizing the following formula
(4.2)
Tp= Total Plain text Et= Encryption time It is very important to calculate the throughput time for the encryption algorithm to known better performance of the algorithm.
4.5.1 Results based on Different Key Size The last performance comparison point is the changing different key sizes for AES algorithm. In case of AES, We consider the three different key sizes possible i.e., 128 bit, 192 bits and 256 bit keys. In case of AES it can be seen that higher key size leads to clear change in the battery and time consumption. Experimental result for encryption algorithm AES is shown in table 1 time consumption (different key size) for encryption, Figure (4.14) shows the analysis with different key size for encryption , another experimental result for decryption algorithm AES is shown in table 2 time consumption (different key size) for decryption, Figure (4.15) shows the analysis with different key size for decryption.
. 62
Chapter Four
The Proposed Cloud Computing System Implementation
Table 1: Time Consumption (Different Key Size) for Encryption
Input Size (Kb)
Time(Millisecond) AES 128
AES 192
AES 256
45
33
54
65
76
80
90
100
102
92
103
119
500
127
157
168
900
210
267
292
1024
310
321
350
AvgTime
170.4
165.33
182.33
Throughput
15.53
16
14.5
Figure (4.14) Analysis with Different Key Size for Encryption
63
Chapter Four
The Proposed Cloud Computing System Implementation
Table 2: Time Consumption (Different Key Size) for Decryption Input Size (Kb)
Time(Millisecond) AES 128
AES 192
AES 256
45
30
49
60
76
78
87
95
102
88
99
110
500
123
133
155
900
200
255
280
1024
300
309
340
AvgTime
136.5
155.3
173.3
Throughput
19.39
17
15.27
Figure (4.14) Analysis with Different Key Size for Decryption
64
Chapter Five
Conclusions and Future Work
Chapter Five Conclusions and Future Work
5.1 Conclusions In this thesis, the cloud computing system is built using Hadoop package with Linux OS, and the security enhancement for this system are designed and implemented. This system has the ability for users registration and it can support the cloud computing requirements like file management and data computation. when a user wanted to access cloud system, at the beginning he should visit the cloud system website and access the cloud’s Home Page (cloud system interface). when the user registration process was completed successfully, the cloud system Administrator read user’s data from database, and responsible for creating user account, creating user space, and generate RSA public key, RSA private key and AES secret key, and save these keys with the uploadfile.sh program (which is responsible for upload files from Linux server to Hadoop) in the user space. After that, the cloud Administrator had to send user name and the password to the cloud’s user. The cloud’s user could upload files by two ways, non-secure and secure upload files. For the first one the cloud’s user had to access the uploaded application web page by clicking on Upload Data link in the cloud’s Home Page, then the FTP web page was loaded in the web browser, after that the user could choose the file that he wanted to send, and sent it to cloud server (Linux OS). The second one (secure upload file), the cloud’s user had to encrypt data file before he access the FTP web page. So, s/he had to run SecureFile.jar program (that is downloaded early form the cloud’s home page) that was used to encrypt the data file using AES algorithm, then s/he cloud access FTP page and upload the encrypted file to the cloud server .When uploading file 65
Chapter Five
Conclusions and Future Work
process (using FTP web page) was completed, all user’s file were saved in Linux server user space. So, these files had to be uploaded to the Hadoop package.To do that, cloud’s user had to access the Linux server and run the uploadfile.sh program which was already saved in the Linux user space. Cloud’s user used PuTTY application (run SSH protocol) to access Linux server. SSH protocol was in the need of the username and the password for the cloud’s user. When the accessing process done, the user cloud run the uploadfile.sh program and upload file to the Hadoop package. Cloud’s user had to download RSA public key, RSA private key, and AES secret key. This download process cloud be done using (WinSCP) application with the SFTP protocol . The proposed cloud computing system achieved user privacy or confidentiality by utilizing cryptographic techniques. Two crypto graphical algorithms were utilized in the proposed cloud system. The first one was a RSA algorithm which was utilized for Key Management among the cloud’s users, and the second was an AES algorithm, which was used through data transformation between cloud’s users and cloud system.
From the system implementation and results, the number of interesting points of conclusions related to cloud system management and security can be drawn :1. The proposed system consist of Hadoop package and Linux OS. This system supports most of the cloud computing system service Models, which are Software as a Service (SaaS), Platform as a Service (PaaS),and Infrastructure as a Service (IaaS). For the proposed cloud computing system, cloud system webpage (cloud interface) can support most of the SaaS services in cloud computing system, Apache web server and MySQL server support the PaaS services, and the Linux OS with the Hadoop package can support the cloud infrastructure and all the services that are needed in IaaS.
66
Chapter Five
Conclusions and Future Work
2. The proposed cloud computing system can support all the cloud computing requirements, like: · User registration and management. · Data management (upload and download files). · Security issues (secure transfer data, and key managements). 3. Proposed cloud computing system can support all security characteristics (Authentication, Integrity, Availability, and Confidentiality). · Authentication: it can support by the Username and Password for the cloud’s user, these username and password have a high degree of confidentiality, because the Linux System encrypted the Username and the Password using Secure Hashing algorithm. · Integrity: Hadoop system support the data Integrity for the Proposed Cloud System by using the number of algorithms that are already exist in the Hadoop Master and Slaves. · Availability: About Availability, the Hadoop system using the number of Slaves (Data Node) which are used to save users’ data, these slaves connected internally for each other, and if any one of them go down the other slaves can take a place for it. · Confidentiality: - Confidentiality can be achieved for the proposed system by using cryptographic techniques (RSA, and AES). 5.2 Future Work Suggestions This work can be extended in several directions, the following are some suggested ideas : A. The proposed cloud computing system can be developed to perform the people software application, people will access and share their software 67
Chapter Five
Conclusions and Future Work
B. applications through online and access information by using the remote server networks instead of depending on primary tools and information hosted in their personal computers because of flexibility in Cloud Computing. C. Also the proposed system can be developed to perform a number of applications that are require more time and deals with huge data. Data Mining and Neural network algorithms that need to deal with the Big data which can be implemented in this system, also the Big data analysis techniques can be implemented too.
68
Appendix A Project Source Code 1- Code list create user accounts Step1 sudo adduser
Step2 sudo adduser sudo Step3 hadoop fs -mkdir /user/ Step4 hadoop fs -chown :supergroup /user/ Step5 hadoop fs -ls /user Step6 su Step7 gedit .bashrc
2-Code list Hadoop pre-requisites 1-sudo apt-get update 2- sudo apt-get install default-jdk
3- java –version 4- sudo addgroup hadoop 5- sudo adduser --ingroup hadoop hduser 6- sudo adduser hduser sudo 7-sudo apt-get install openssh-server 8-su hduser : change user to hduser 9-ssh-keygen -t rsa -P "" 10-cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_key
3-Code list Installing Hadoop Master Servers 1)http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz 2) tar xvzf hadoop-2.7.1.tar.gz Extracting hadoop 3) sudo mv hadoop-2.7.1 /usr/local/hadoop move the Hadoop installation to the /usr/local/hadoop directory 4) sudo chown -R hduser:hadoop /usr/local/hadoop 69
4-Code list Installing Hadoop on Slave Servers # su hadoop $ cd /opt/hadoop $ scp -r hadoop hadoop-slave-1:/opt/hadoop $ scp -r hadoop hadoop-slave-2:/opt/hadoop Configuring Hadoop on Master ServerOpen the master server and configure it by following the given commands. # su hadoop $ cd /opt/hadoop/hadoop Configuring Master Node $ vi etc/hadoop/masters hadoop-master Configuring Slave Node $ vi etc/hadoop/slaves hadoop-slave-1 hadoop-slave-2 sudo gedit ~/.bashrc #append the code below to the end of the file and save it export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/ naive export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" source ~/.bashrc
A-Edit core-site.xml edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file 70
fs.default.name hdfs://localhost:9000
B-Edit hdfs-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml dfs.replication 1 dfs.namenode.name.dir file:/usr/local/hadoop_tmp/hdfs/namenode dfs.datanode.data.dir file:/usr/local/hadoop_tmp/hdfs/datanode
C-Edit yarn-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml yarn.nodemadnager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler
71
D-Edit mapred-site.xml cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml mapreduce.framework.name yarn
72
References
[1]
Venkata Sravan Kumar , (2011)“Security Techniques for Protecting Data in Cloud Computing”, MSc, School of Computing Blekinge Institute
of
Technology SE - 371 79 Karlskrona Sweden . [2]
D. Anil Kumar, Y. Manasa,(2015)“A Survey on Security Issues in Cloud Computing”, International Journal for Development of Computer Science & Technology ,vol. 7884 Issue- V-3, I-5, SW-23.
[3] Poonam M. Pardeshi , Deepali R. Borade,(2015)”Improving Data Integrity for Data Storage Security in Cloud Computing”, International Journal of Computer Science and Network Security, vol. 15, no. 6, pp. 75–82.
[4] Mandeep Kaur and Manish Mahajan,(2013)”Using encryption Algorithms to enhance the Data Security in Cloud Computing”, International Journal of Communication and Computer Technologies vol. 01, no. 12, pp. 56–59.
[5]
Mohammed
A.
AlZain,Eric
Pardede
,Ben
Soh
and
James
A.
Thom,(2012)“Cloud Computing Security: From Single to Multi-Clouds”, 45th Hawaii International Conference on System Sciences, Pages: 5490 5499, DOI: 10.1109/HICSS.2012.153 .
[6]
Yashpalsinh
Jadeja
and
Kirit
Modi,(2012)“Cloud
Computing
Concepts, Architecture and Challenges “,International Conference on Computing,
Electronics
and
Electrical
880,DOI:10.1109/ICCEET. 73
Technologies
,Page
:877
[7] Available https://www.tutorialspoint.com,access on May 2016.
[8]
H. Karajeh, M. Maqableh, and T. Masa ,(2014) “Security of Cloud Computing Environment”, International Business Information Management Association Conference ,pp. 2202–2215.
[9] Suresh Kumar R G,(2014)“Security Facet in Cloud Computing “,International Journal of Advanced Research in
Computer Science and Software
Engineering, vol. 4, no. 5, pp. 964–967.
[10]
Sudhansu Ranjan Lenka1, Biswaranjan Nayak2 k,(2014)“Enhancing Data Security in Cloud Computing Using RSA Encryption and MD5 Algorithm”,International Journal of Computer Science Trends
and
Technology ,vol. 2, no. 3, pp. 60–64.
[11]
Nikisha J. Mithaiwala , Nikisha B. Jariwala ,(2016) “A Study on Cloud Computing and its Security”,International Journal of Innovations & Advancement in Computer Science ,ISSN 2347 – 8616 vol. 5, no. 4, pp. 1–6.
[12] Anca Apostu, Florina Puican, Geanina Ularu, George Suciu and GyorgyTodoran,(2012)”Study on advantages and disadvantages of Cloud Computing – the advantages of Telemetry Applications in the Cloud”,Recent Advances in Applied Computer Science and Digital Services, ISBN: 978-161804-179-1 . [13]
Pooja.D.Bardiya1, Miss. Rutuja.A.Gulhane , Dr. Prof. P.P.Karde, (2014) “Data Security using Hadoop on Cloud Computing”, International Journal of Computer Science and Mobile Computing, vol. 3, no. 4, pp. 802–809. 74
[14]
Ya Liu ,(2012) “Data Security in Cloud Computing”, MSc, Eindhoven University of Technology .
[15] Atul Patel and Kalpit Soni ,(2014) “Three Major Security Issues in Single Cloud Environment”, International Journal of Advanced Research in Computer Science and Software Engineering , ISSN: 2277 128X, vol. 4, no. 4, pp. 268–271.
[16]
Krunal Suthar,Parmalik Kumar,Gupta and Hiren Patel,(2012)“Analytical Comparison of Symmetric Encryption and Encoding Techniques for Cloud Environment “,International Journal of Computer Applications (0975 – 8887), vol. 60, no. 19, pp. 16–19 .
[17] Prashant Rewagad and Ms.Yogita Pawar,(2013) “Use of Digital Signature with Diffie Hellman Key Exchange and AES Encryption Algorithm to Enhance Data Security in Cloud Computing”, International Conference on Communication Systems and Network Technologies, DOI 10.1109/CSNT..
[18]
Sudhansu Ranjan Lenka and Biswaranjan Nayak,(2014) “Enhancing Data Security in Cloud Computing Using RSA Encryption and MD5 Algorithm”, international Journal of Computer Science Trends and Technology ,vol. 2, no. 3, pp. 60–64..
[19]
Vishwanath S Mahalle and Aniket K Shahade,(2014) “Enhancing the Data Security in Cloud by Implementing Hybrid (Rsa&Aes) Encryption Algorithm”,IEEE,Pages: 146 149, DOI: 10.1109/INPAC.2014.6981152.
75
[20]
H.Mehak and Gagandeep,(2014)“Improving Data Storage Security in Cloud using Hadoop”, International Journal of Engineering Research and Applications ,vol. 4, no. 9, pp. 133–138.
[21] P. Kulurkar ,Ruchira. H. Titare1,(2015) “Data Security and Privacy in Cloud using RC6 Algorithm for Remote Data Back-up Server”, International Journal of Engineering Science & Advanced Technology,Volume-5, Issue-2, pp,149-153
[22] Nandita Sengupta,(2015)“Designing of Hybrid RSA Encryption Algorithm for Cloud Security”, International Journal of Innovative Research in Computer and Communication Engineering ,pp. 4146–4152.
[23] Santosh Kumar Singh, Dr. P.K. Manjhi and Dr. R.K. Tiwari,(2016) “Data Security using RSA Algorithm in Cloud Computing”, International Journal of Advanced Research in Computer and Communication Engineering ISO 3297:2007 Certified .
[24]
Wentao Liu,( 2012) “Research on Cloud Computing Security Problem and Strategy” , Department of Computer and Information Engineering, Wuhan Polytechnic University, Wuhan Hubei Province 430023, China .
[25]
Varsha Alangar ,( 2013)“ Cloud Computing Security and Encryption” International Journal of Advance Research in Management Studies, vol. 1, no. 5, pp. 58–63 .
76
Computer Science and
[26] S. B. Dash, H.Saini , T.C.Panda , A. Mishra,(2014) “ A Theoretical Aspect of Cloud Computing Service Models and Its Security Issues: A Paradigm” Journal of Engineering Research and Applications, vol. 4, no. 6, pp. 248– 254.
[27]
Gurpreet Kaur and Manish Mahajan, (2013) “Analyzing Data Security for Cloud Computing Using Cryptographic Algorithms” International Journal of Engineering Research and Applications, ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct, pp.782-786 .
[28] Mandeep Kaur and Manish Mahajan,(2013)”Using encryption Algorithms to enhance the Data Security in Cloud Computing” International Journal of Communication and Computer Technologies, vol. 01, no. 12, pp. 56–59.
[29] Kangchan Lee,(2012)“Security Threats in Cloud Computing Environments” International Journal of Security and Its Applications, vol. 6, no. 4, pp.2532.
[30]
Shreya Srivastav and Neeraj Verma ,(2015)“Improving Data Security in Cloud Computing Using RSA Algorithm and MD5 Algorithm “,International Journal of Innovative Research in Science,Engineering and Technology pp. 5450–5457.
[31]
Pachipala Yellamma, C h a lla Narasimham and Velagapudi sreenivas ,(2013) “Data Security In Cloud Using Rsa”, International Conference on Computing,
Communications
and
DOI: 10.1109/ICCCNT.2013.6726471 . 77
Networking
Technologies,
[32] Mzhda Hiwa Hama,(2015) “Sentimental Analysis Of Big Data Using Naïve Bayes And Neural Network”, MSc, University of Sulaimani [33]
Hadoop
characteristics,
available
at:https://www
01.ibm.com/software/au/data/infosphere/hadoop/ accessed on 29/09/2016.
[34]
Ketaki Subhash Raste,(2014)"Big Data Analytics -Hadoop Performance Analysis", M.Sc. Thesis, San Deng State University .
[35]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia and Robert Chansler,(2010)” The Hadoop Distributed File System”, IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) 1 10, DOI: 10.1109/MSST.2010.5496972
[36]
Benoy Bhagattjee, ( 2014) “Emergence and Taxonomy of Big Data as a Service”,MSc,Composite Information Systems Laboratory (CISL) Sloan School
of
Management,
Massachusetts
Institute
of
Technology
Cambridge,MSc,Massachusetts Institute of Technology . [37] Jeffrey Dean and Sanjay Ghemawa, (2008) “MapReduce: simplified data processing on large clusters”, Communications of the ACM, vol. 51, no. 1, pp. 107–113. [38] Avalibale https://www.hadoop.apache.org/docs/r2.7.1/ on May 2016
78
. .
, Data Name Node . .Node , HDFS . MapReduce cloud computing cloud computing. , cloud computing. . , , RSA private key,RSA public , ,uploadfile.sh. key AES secret keykey . shell scripting programs . . . .hosts, . (RSA and AES algorithms) , cloud computing Software as a Service (SaaS),
. Infrastructure as a Service (IaaS)Platform as a Service (PaaS) , cloud computing . , , , , , RSA .. AES , .
..
(Hadoop) (Apache Server) RSA SaaSAES IaaSPaaS