Design and Implement Secure Cloud Computing System Based on Hadoop

A thesis Submitted to the Council of the College of Science at the University of Sulaimani in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer

By Ribwar Khalid Mohammed B.Sc. Computer Science (2009), University of Sulaimani

Supervised by Dr. Soran Ab. Saeed Assistant Professor

2017, May

Dr. Alaa Kh. Jumaa Lecturer

2717, Jozardan

‫ﺴ ِﻢ ﱠ‬ ‫ﯿﻢ‬ ‫ﺑِ ْ‬ ‫ﷲِ اﻟ ﱠﺮ ْﺣﻤٰ َ ِﻦ اﻟ ﱠﺮ ِﺣ ِ‬ ‫وح ِﻣﻦْ أَ ْﻣ ِﺮ َرﺑﱢﻲ‬ ‫) َوﯾَ ْ‬ ‫وح ﻗُ ِﻞ ﱡ‬ ‫ﺴﺄَﻟُﻮﻧَﻚَ َﻋ ِﻦ ﱡ‬ ‫اﻟﺮ ُ‬ ‫اﻟﺮ ِ‬ ‫َو َﻣﺎ أُوﺗِﯿﺘُ ْﻢ ِﻣ َﻦ ا ْﻟ ِﻌ ْﻠ ِﻢ إِ ﱠﻻ ﻗَﻠِ ً‬ ‫ﯿﻼ‪(.‬‬ ‫ﺻ َﺪق ْ ﱠ‬ ‫ﷲ ا ْﻟ َﻌ ِﻈﯿﻢ‬ ‫ﺳﻮرة اﻻﺳﺮاء اﻵﯾﺔ‪٨٥‬‬

Supervisor Certification

I certify that the preparation of thesis titled "Design and Implement Secure Cloud Computing System Based on Hadoop"accomplished by (Ribwar Khalid Mohammed), was prepared under my supervision in the college of Science, at the University of Sulaimani, as partial fulfillment of the requirements for the degree of Master of Science in (Computer).

Signature: Name: Dr. Soran A. Saeed Title: Assistant Professor Date:

/

/ 2017

Signature: Name: Dr. Alaa K. Jumaah Title: Lecturer Date:

/

/ 2017

In view of the available recommendation, I forward this thesis for debate by the examining committee

Signature: Name: Aree Ali Mohammed Title: Professor Date:

/

/ 2017

Linguistic Evaluation Certification I herby certify that this thesis titled "Design and Implement Secure Cloud Computing System Based on Hadoop" prepared by (Ribwar Khalid Mohammed), has been read and checked and after indicating all the grammatical and spelling mistakes; the thesis was given again to the candidate to make the adequate corrections. After the second reading, I found that the candidate corrected the indicated mistakes. Therefore, I certify that this thesis is free from mistakes.

Signature: Name: Soma Nawzad Abubakr Position: English Department, College of Languages, University of Sulaimani Date:

/

/ 2017

Examining Committee Certification We certify that we have read this thesis entitled "Design and Implement Secure Cloud Computing System Based on Hadoop" was prepared by (Ribwar Khalid Mohammed), and as Examining Committee, examined the student in its content and in what is connected with it, and in our opinion it meets the basic requirements toward the degree of Master of Science in Computer. Signature:

Signature:

Name: Dr.Sufyan T. Faraj

Name: Dr. Najmadin W. Abdulrahman

Title: Professor

Title: Lecturer

Date:

Date:

/

/ 2017

(Chairman)

/

/ 2017

(Member)

Signature:

Signature:

Name: Dr.Fadhil S. Abed

Name: Dr. Soran A. Saeed

Title: Assistant Professor

Title: Assistant Professor

Date:

Date:

/

/ 2017

(Member)

/

/ 2017

(Member- Supervisor)

Signature: Name: Dr. Alaa K. Jumaa Title: Lecturer Date:

/

/ 2017

(Member- Co-Supervisor) Approved by the Dean of the College of Science. Signature: Name: Dr. Bakhtiar Q. Aziz Title: Professor Date:

/

/ 2017

Acknowledgements First of all, my great thanks to Allah who helped me and gave me the ability to fulfill this work. We thank everybody who helped us to complete this project specially my supervisor Lecturer Dr. Alaa K. jumaa and Assistant Professor Dr.Soran A. Saeed for his help. Special thanks to my wife Miss.Shaida, for her encouragements, scientific notes, and support that she has shown during my study and to finalize this work . Special thanks to my father, my mother and all my family for their endless support, understanding and encouragement. They have taken their part of suffering and sacrifice during the entire phase of my research. Special thank goes to those who helped me during this work. I am glad to have this work done, thanks to my colleagues and the entire faculty members.

Dedications

This thesis is dedicated to

My mother and father my son

our families

our friends

Computer Department

all who shared by any support .

Ribwar

Abstract A proposed secure cloud computing system has been built using Linux OS and Hadoop package. Hadoop package was utilized to build the area for saving and managing users’ data and enhance the security for it . Hadoop consist of one master (Name Node) and the number of slaves (Data Node). The master nodes oversee the two key functional pieces that make up Hadoop: storing lots of data in Hadoop Distributed File System(HDFS), and excuting parallel computations on all stored data (MapReduce). The features for the Linux and Hadoop package are supporting most of the cloud computing system requirements.This system has the ability for users registration and it can support the cloud computing requirements like file management and data computation . when a user wanted to access cloud system, at the beginning he should visit the cloud system website and access the cloud’s Home Page (cloud system interface). when the user registration process was completed successfully, the cloud system Administrator read user’s data from database and

responsible for creating user account, creating user space, and

generate RSA public key, RSA private key and AES secret key, and save these keys with the uploadfile.sh program . After that, the cloud Administrator had to send user name and the password to the cloud’s user. The cloud’s user could upload files by two ways, non-secure and secure upload files. A number of java programs and shell scripting programs have been utized to perform the data management and enhance the security for the proposed cloud system. Apache Web Server has been used to hosts and manage all the cloud system web pages. The security enhancement for this system are designed and implemented using cryptography techniques (RSA and AES algorithms). The proposed system support most of the cloud computing system service models like Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a i

Service (IaaS). Furthermore, it can support most of the cloud computing requirements, like user registration and management, data management (upload, download, delete …etc.), and security issues (secure transfer data, and key managements). The proposed cloud computing system achieved user privacy or confidentiality by utilizing cryptographic techniques. Two crypto graphical algorithms were utilized in the proposed cloud system. The first one was a RSA algorithm which was utilized for Key Management among the cloud’s users, and the second was an AES algorithm, which was used through data transformation between cloud’s users and cloud system.

ii

CONTENTS Abstract ......................................................................................

I

Contents………………………………………………………….

III

List of Tables...............................................................................

VI

List of Figures .............................................................................

VII

List of Abbreviations ...................................................................

IX

Chapter one :General Introduction 1.1 Introduction ………………..………………………………

1

1.2 Definition of Cloud and Cloud computing ……...………….

2

1.3 Advantages and Disadvantage of Cloud Computing………..

4

1.4 Security in Cloud Computing ………………………….…...

6

1.5 Related Works ………………………….…………………..

7

1.6 Aim of the Work ……………………………….…………...

10

1.7 Thesis Layouts …………………………………………….

11

Chapter Two :Cloud Computing and Hadoop 2.1 Introduction ……………………………………………….

12

2.2 Cloud Computing System……………………………………

12

2.2.1 Cloud Computing Deployment Models……………............

13

2.2.2 Cloud Computing Service Models ………………………...

15

2.2.3 Cloud Computing Attributes ………………………............

16

2.2 .4 Characteristics of Cloud Computing ……………………

17

2.2.5 Technical Components …………………. ………………..

18

2.3 Security Issue in Cloud Computing ……………………….

20

2.4 Data security issues in Cloud Computing …………………

21

2.5 Hadoop Apache Project …………………………………….

24

iii

2.5.1 Hadoop Characteristics ………………………………….

25

2.5.2 Hadoop Components……………… …………………….

26

2.5.2.1 Hadoop Distributed File System (HDFS) ……………..

27

2.5.2.2 Map Reduce …………………………………………….

28

Chapter Three :Proposed Cloud Computing System 3.1 Introduction ……………………………………………….

31

3.2 The Proposed System Architecture…………………………

31

3.3 Cloud Computing Environment ………………………......

32

3.3.1 Build Cloud Computing System Using Hadoop………….

32

3.3.2 Hadoop Package Support Cloud Service Models…………

33

3.3.3 Hadoop Installation on Linux Multiple Nodes ……............

35

3.4 User Interface & Data & Management ……………………..

35

3.4.1 User Registration………………………….……………….

35

3.4.2 User’s Files and Folders Permissions …………………….

37

3.5 Upload and Download Data with Cloud……………......

39

3.5.1 Upload Data to Cloud Server ………………………….

40

3.5.2 Download Files form Cloud System …………………….

42

3.6 Cloud System Security Issue ………………………………

43

3.6.1 Key Management …………………………………………

45

3.6.2 Secure File Transfer over Cloud System ………………….

47

iv

Chapter Four :The Proposed Cloud Computing System Implementation 4.1 Introduction …………………………………………………

51

4.2 Main Steps for Cloud System Implementation……………...

51

4.3 User Interface and Data and Management …………….…...

52

4.3.1 User Registration Process ………………….....................

53

4.4 Upload and Download Files Processes ……………………..

56

4.4.1 Upload Files using FTP and SSH Protocols.. ……….... .

56

4.4.2 Download Files Process …………………………………...

59

4.5 Results Based on Different Packet Sizes …………………...

61

4.5.1 Results based on Different Key Size …………………….

62

Chapter Five: Conclusions and Suggestions for Future Work 5.1 Conclusions ………………………………………………..

65

5.2 Suggestions for Future Work………………………………

67

Appendex ……………………………………………………...

69

References …………………………………………………….

73

v

List of Tables

Table No.

Table Title

Page No.

4.1 Time Consumption (Different Key Size) for Encryption ……..

63

4.2 Time Consumption (Different Key Size) for Decryption …….

64

vi

List of Figures Figure No.

Figure Title

Page No.

1.1

Cloud Computing …………………………………………..

4

2.1

Cloud Deployment Models………………………………….

15

2.2

Cloud Service Model………………………………………

16

2.3

The Cloud Computing Components………………………

19

2.4

HDFS Architecture………………………………………..

21

3.1

General Architecture for Proposed Cloud Computing System ……..………………………………………………

32

3.2

Proposed Cloud System with the cloud services …………..

34

3.3A

Flowchart for adding user process (Client Side)………….

37

3.4B

Flowchart for adding user process (Administrator Side)….

37

3.4

Cloud User - File permissions ……………………………

38

3.5

Cloud user –Change permissions ………………………….

39

3.6

Upload and download files processes in Cloud ……………

40

3.7

shows the flowchart for uploading files to Cloud System …

42

3.8

the list of the users’ files in Cloud System………………….

43

3.9

Proposed Key Management System………………………..

47

3.10

Secure File Transfer over Cloud……………………………

48

3.11

Flowchart for upload secure and non-secure files to Cloud system ……………………………………………………....

3.12

Flowchart for download secure and non-secure

49

files to

Cloud system ………………………………………………..

50

4.1

Diagram show the steps for system ………………………

52

4.2

Proposed Cloud Computing System Home Page………….

53

vii

4.3

User Registration WebPage………………………………...

4.4

Cloud’s Administrator creating User Account ……………..

54 55

4.5

Files included in the Cloud’s user space (folder)…………

55

4.6

FTP Web ………………………………………………….

56

4.7

Cloud’s user choosing files process…………………………

57

4.8

Encryption process using SecureFile.jar program…………..

57

4.9

Cloud’s user using PuTTY application to access………….

58

4.10 4.11

Cloud’s user run uploadfile.sh and upload file to Hadoop…………………………………………………….. Cloud’s user use

WinSCP application with the SFTP

protocol…………………………………………………….

59 60

4.12

Cloud’s user access Hadoop webpage …………………….

60

4.13

Cloud’s user decrypt encrypted downloaded file………….

61

4.14

Analysis with Different Key Size for Encryption…………

63

4.15

Analysis with Different Key Size for Decryption…………

64

viii

List of Abbreviations Abbreviation

Full Text

AES

Advanced Encryption Standard

API

Application Pogram Interface

CSP

Cloud Service Provider

CRM

Customer Relationship Management

DFS

Data File System

DES

Data Encryption Standard

GFS

Global file system

HDFS

Distributed File System

IaaS

Infrastructure as a Service

LAN

Local Area Network

NIST Naas OS PaaS PC RAID RDBMS

National

Institute

of

Standards

and

Technology Network as a Service Operating System Platform as a Service Personal Computer RedundantArray of Independent Disks Relational Database Management Systems

RSA

Rivest, Shamir, & Adleman

SaaS

Software as a Service ix

TB

Tera Bytes

VPN

Virtual Private Network

WAN

Wide Area Network

x

Chapter One

General Introduction

Chapter One General Introduction 1.1

Introduction

From the past few years, there has been a quick progress in cloud computing. Cloud computing convey a wide range of resources like computational platforms, storage, computational power, and applications to users via Internet. The major cloud available now in the markets are Amazon, Google, IBM, Microsoft, etc... . With a growing number of companies resorting to use resources in the cloud, there is a need for protecting the data of different users. Presently ,cloud computing is utilized tremendous amount in different fields. Large amount of data is generated in daily life. To store this huge amount of data, users use cloud computing services. Some major challenges that are being encountered by cloud computing are to secure, protect and process the data which is the property of the user [1]. Cloud computing is being appeared as one of the most powerful and developing networking system which is used by developers as well as users , cloud computing is well fitted for the users who are interested to could in networking environment, cloud computing environment permits its resources to be distributed

among

servers, users and individuals, in turn files or data that are stored in the cloud are publicly accessible to all , due to this open accessibility factor, the files or data of an individual can be used by other users of the cloud as a result attacking threat on data or files become more vulnerable [2]. Cloud computing is has gained popularity in recent years. Cloud facilitates the storage of various sorts of data. Cloud is highly scalable when it comes to huge data and can provide infinite computing resources on demand. Clients can use cloud services without any installation and the data uploaded on cloud is accessible 1

Chapter One

General Introduction

from any corner of the world, all it needs to be accessed is a computer with active Internet connection on it [3]. Cloud computing is the notion of using remote services through a network utilizing different resources. It is basically meant to provide maximum with the minimum resources i.e. the end user is having the minimum hardware requirement but is using the maximum capability of computing. This is available only through this technology which requires and uses its resources in the best way. In the cloud, the end user is just utilizing a very light device which is capable of using a network that connects it to a server at some other location. The users do not need to store the data at its end as all the data is stored on the remote server at some other place, here comes the first advantage of the cloud computing i.e. it decrease cost of hardware that could have been utilized at end user [4]. The utilized of cloud computing has increased fast in many organizations, arguing that small and medium companies utilize cloud computing services for different reasons, because these services provide rapid access to their applications and decrease their infrastructure costs [5]. With the advent of Internet in the 1990s up to the present time, day facilities of widespread computing has changed the computing world in an extreme way. It has traveled from the notion of parallel computing to distribute computing to grid computing and recently to cloud computing [6].

1.2

Definition of Cloud and Cloud Computing

The expression cloud belongs to a network or Internet. In other words, cloud is something, which is present at remote location. Cloud can give services over network, i.e., on public networks or on private networks, i.e., Wide Area Network (WAN), Local Area Network (LAN) or Virtual Private Network(VPN). 2

Chapter One

General Introduction

Applications such as e-mail, web conferencing, customer relationship management (CRM), all run in cloud [7]. Cloud computing belong to configuring, manipulating and accessing the applications online. It offers online data storage, infrastructure and application [7]. The term cloud computing came from the Internet computing or computation through Internet, where many flowcharts and diagrams represent the Internet as a cloud shape [8]. The US national institute of standards and technology (NIST) define cloud computing as "a model for user convenience, on demand network access contribute the computing resources (e.g. networks, storage, applications, servers, and services) that can be quickly implemented with minimal management effort or service provider interference". Cloud computing can also be defined as a new service, which the collection of technologies and a resources of supporting the utilized large scale Internet services for the remote applications with good quality of service (QoS) levels [9]. The information technology (IT) model for computing, which consists of all the IT components (services, networking, software and hardware) that are needed to enable development and delivery of cloud services via the Internet or a private network [10]. Figure(1.1) show the general diagram for simple cloud computing [10].

3

Chapter One

General Introduction

Wire

Wirele

Mobile ss

Figure (1.1) Cloud computing [11]

1.3 Advantages and Disadvantage of Cloud Computing The advantages of cloud computing are explain bellow, the main benefits for organizations in general, focusing on some points for small businesses [12]: § Cost efficiency: Cloud computing is apparently the most cost efficient method to utilize, maintain and upgrade. Traditional desktop software costs organizations a lot, in terms of finance. Adding up the licensing fees for multiple users can prove to be very expensive for the establishment worried. The cloud, on the other hand, is available at much inexpensive rates and, can significantly lower the organizations IT expenses. Besides, there are many one-time-payment, pay-as-you-go and other scalable options available, which is it very reasonable for the organization in question. § Almost Unlimited Storage: Storing information in the cloud gives user almost indefinite storage capacity. 4

Chapter One

General Introduction

§ Backup and Recovery: Since all the data is stored in the cloud, backing it up and restoring the same is relatively much easier and simpler than storing the same on a physical device. Moreover, most cloud services are usually capable enough to handle recovery of information. Hence, this creates the entire process of backup and recovery much easier than other traditional methods of data storage. § Automatic Software Integration. In the cloud, software combination is normally something that happens automatically. This process that cloud users don’t have to take additional efforts to customize and combine their applications as per own preferences. This aspect generally takes care of itself. § Easy Access to Information: Once the users sign up in the cloud, they can access the information from any place, where there is an Internet connection. This useful feature lets users move beyond time zone and geographic location issues. § Quick Deployment: in conclusion and most particularly, cloud computing gives the benefit of quick deployment. Once opting for this strategy for working, the entire system can be completely practical in a matter of a few minutes. Surely, the measure of time taken here will rely on the exact type of technology that is required for the business. § Easier Scale of Services: It creates a less complex for enterprises to scale their service according to the demand of clients. § Deliver New Services: It makes conceivable new classes of applications and deliveries of new services that are interactive in nature. In spite of its many benefits, as specified above, cloud computing likewise has its disadvantages. Businesses, especially smaller ones, need to be aware of these

5

Chapter One

General Introduction

aspects before going in for this technology. The main dangers faced in cloud Computing are [13]: § Data Location: In general, cloud users are not aware of the exact location of the data center and also they do not have any control over the physical access mechanisms to that data. § Investigation: Exploring an unlawful activity might be inconceivable in cloud environments. Cloud services are especially difficult to explore, because data, for multiple customers, may be co-located and might likewise be spread across different data centers. § Data Segregation: Data in the cloud is regularly in a distributed environment together with data from other customers. Encryption cannot be assumed as the single solution for data isolation issue. 1.4 Security in Cloud Computing At the present time increasing the number of companies have moved to cloud market to provide cloud services (e.g. Microsoft Exchange/SharePoint1, Google Apps2, and Amazon EC23). One can argue that this new approach is the future of network capabilities, but there are hidden costs, and security is one of them [14]. Security is reasonable to be one of the most critical aspects in a cloud computing environment due to the sensitive and significant information stored in the cloud for users. Users are wondering about assaults on the integrity and the availability of their data in the cloud from malicious insiders and outsiders, and from any collateral harm of cloud service [15]. As a NIST’s definition, information security is the practice of maintaining the integrity, confidentiality and availability of data from malicious access and system

6

Chapter One

General Introduction

failure. The three main characteristics for the cloud computing security are [11][14]:• Integrity: Information is authentic, complete and reliable. Data shall not be modified inappropriately, whether by accident or deliberately malicious activities .One of the most significant issues related to cloud security dangers is data integrity. The stored data in the cloud storage may suffer from any damage happen during transformation operations from or to the cloud storage provider. The risk of assaults from both inside and outside the cloud provider exists and should be measured. Data authentication assures that the returned data is the same stored data which is extremely important. • Confidentiality: Information can be only accessed by permitted users or distributed among authorized groups. Authentication methods, including credential confirmation, can be applied to protect data against malicious disclosure. • Availability: It refers to the availability of data resource. Data should be available under permitted operation containing read, write and etc.

1.5 Related Works Several research works has proposed and done on security of cloud computing. These researches focus on the number of techniques like cryptography. A number of them are discussed below: K. Suthar et al (2012) proposed a new secure technique for the cloud computing system, they deduced that the Advanced Encryption Standard (AES) is quick and more efficient cryptographic algorithms. When the data sending was considered, there has been unimportant difference in the performance of various symmetric key 7

Chapter One

General Introduction

plans. A study in conducted for various popular secret key algorithms as DES, AES, and Blowfish [16]. P. Rewagad et al (2013) proposed using the digital signature and Diffie Hellman key exchange blended with AES encryption algorithm to protect confidentiality of data stored in cloud. Even if the key in transmission is hacked, the facility of Diffie Hellman key exchange render is useless, since the key in transit is of no use without user’s private key, which is confined only to the legitimate user [17]. S. Ranjan et al (2014) implemented RSA for both encryption and secure communication goals, where as MD5 hashing was utilized for digital signature and hiding key information. This model gave security for the entire cloud computing environment. In the suggested system, an intruder cannot simply access or upload the file because the algorithms are run in various servers at different locations. Both RSA encryption and digital signatures algorithms as a result a powerful security and data integrity service system is procure. In spite of RSA encryption algorithm is quite deterministic but MD5 algorithm makes the model highly secured [18]. V. Mahhale et al (2014) proposed a hybrid encryption algorithm utilizing RSA and AES algorithms for giving data security to the user in the cloud. The biggest benefit of it give users the keys which are created on the basis of time system , so no intruder can even guess them. Private Key and secret key is only known to the user and hence user’s private data was not accessible to anyone not even the cloud’s Administrator. The main aim behind utilizing RSA and AES encryption algorithm was that it gave three keys i.e. private key , secret key for decryption and public key for encryption. The data after uploading was stored in an encrypted form and can be only decrypted by the private key and the secret key of the user. The main advantage of this is that the data is extremely secure on the cloud [19]. 8

Chapter One

General Introduction

H. Mehak et al (2014) proposed Hadoop setup for cloud computing, AES Implementation with compression just uploading the data was not sufficient, some modifications were demanded to be done when they are uploaded the data in order to become unreadable and also for the security purposed. In the proposed technique there was a shared key between sender and receiver, which was known as private key. After performing the compression and encryption, the results were computed on the basis of various parameters, most commonly utilized parameters were the processing time and the space. The experimental results clearly reflect the effectiveness of the methodology to improve the security of data in cloud environment [20]. R. Titare et al (2015) proposed a system of secure data implementation backup mechanism. The main purposes were giving secure data backup and achieve high level security to the data while stored into cloud. If any file was deleted or corrupted by error from cloud server. Simply that file could be recovered by utilizing the suggested system. RC6 algorithm is symmetric block cipher algorithm that are used in this system. Benefits of this recommended system are requiring minimum memory space and minimum time for data encryption and decryption. User requires keys to download the data from cloud. So only permitted person can download the data form cloud [21]. N. Sengupta (2015) proposed hybrid RSA encryption algorithm for security of data in cloud system. In the first phase RSA Encryption algorithm was applied and in the second phase Feistel encryption algorithm was applied on the output data, i.e., cipher text of first phase. After final phase, encrypted data was sent for transmission. Hybrid RSA encryption algorithm which creates the data difficult to decrypt for the attacker. Man in the middle attack was minimized by applying this proposed hybrid RSA encryption algorithm for data transfer in cloud system [22]. 9

Chapter One

General Introduction

S. Kumar et al (2016)proposed a method by implementing RSA algorithm, they were utilizing RSA algorithm to encrypt the data to provide security so that only the concerned user could access it. By securing the data, only the permitted user could access the data. Even if some intruder (unauthorized user) could not get the data accidentally or intentionally even if he captures the data also, he can’t decrypt it and get back the original data . User data is encrypted first and then it was stored in the cloud. When required user places a request for the data to the cloud provider, cloud provider authenticated the user and delivered the data [23]. 1.6 Aim of the Work The security of cloud computing has always been significant aspect of quality of service from cloud service providers. In spite of, cloud computing poses many new security challenges, which have not been well investigated. The main aim of this work is to build a proposed cloud computing system , Hadoop package was used to build the area for saving and managing users’ data and enhance the security for it. This system can be used in any organizations like university data center, hospital data center …etc , and it can support number of services to the users like data storage and data computation. The proposed cloud computing system achieved user privacy or confidentiality by using cryptographic techniques. Two crypto graphical algorithms were used in the proposed cloud system. The first one was a RSA algorithm which was used for Key Management between the cloud’s users, and the second was an AES algorithm, which was used in the data transformation between cloud’s users and cloud system .The stored data in cloud system are protected from unauthorized access. Only the authorized user can access the data in cloud computing system. Even if some attacker gets the data accidentally or intentionally, s/he can’t decrypt it and get genuine data. 10

Chapter One

General Introduction

1.7 Thesis Layouts The remaining parts of this thesis are organized into four chapters they are contents are summarized below: · Chapter Two: In this chapter the explanation about the cloud computing, Hadoop, cloud Computing Deployment Models, cloud computing service Models, cloud computing attributes, characteristics of cloud computing and data security issues in cloud computing are explained.

· Chapter Three: In this chapter the proposed cloud computing system is explained. It include how the proposed system built using Hadoop package and how the security over this system are implemented.

· Chapter Four: In this chapter, the implementation for the proposed cloud Computing system is presented. It shows and explain how the proposed system work and how the user management processes are implemented. · Chapter Five: This chapter is devoted to present the derived conclusions and recommendations for the future work.

11

Cloud Computing and Hadoop

Chapter Two

Chapter Two Cloud Computing and Hadoop 2.1 Introduction The cloud system is running in the internet and the security issues in the internet which can be discovered in the cloud system. The cloud system is not different from the traditional system in the personal computer (PC) and it can meet other special and new security issues. The biggest worries about cloud computing are security and privacy. The traditional security issues, like for instance ,security vulnerabilities, virus and hack attack can likewise make risks to the cloud system and can lead more serious results because of the property of cloud computing. Malicious and hackers intruder may hack into cloud accounts and take sensitive data stored in cloud systems. The data and business application are stored in the cloud center and the cloud system must protect the resource cautiously. Cloud computing is a technology development of the popular adoption of virtualization, service oriented architecture and utility computing. Over the internet and it contains the applications, platform and services. If the systems meet the failure, fast recovery of the resource also is a problem. Systems of cloud hide the details of service implementation technology and the management. The user can’t control the process of deal with the data and the user can’t create certainly the data security by themselves[24]. 2.2 Cloud Computing System The data source storage , operation and network transform deals with the cloud system. The key data resource and privacy data are very import for the user. The cloud must give data control system for the user. The data security review likewise can be deployed in the cloud system. Data transferring to any authorized place user 12

Cloud Computing and Hadoop

Chapter Two

is needed, in a form that any permitted application can utilize it, by any authorized user, on any permitted device. Data integrity requires that only authorized users can modify the data and confidentiality means that only permitted users can read the data. Cloud computing must give strong user access control to strengthen the licensing, quarantine, and certification other aspects of data management. In the cloud computing, the cloud provider system has many users in a dynamic response to modifying service needs. The users do not know which servers are processing the data and do not know what position the data have. The user do not know what networks are moving the data because of the flexibility and scalability of cloud system. The user can’t ensure data privacy operated by the cloud in a confidential way. The cloud system can deploy the cloud center in various area and the data can be stored in different cloud nodes. The various area has various law so the security management can meet the law danger. Cloud computing service must be improved in legal protection[24]. 2.2.1 Cloud Computing Deployment Models As indicated by National Institute of Standards and Technology(NIST),the cloud model is composed of four deployment models[25]: 1-Public Cloud: The cloud infrastructure is made available to the general public or a large industry group and is owned by an institution selling cloud services. The cloud is available to the public on business basis by a cloud service provider. The public cloud has a large variety of organizational and general public clients making it easier to adapt but more vulnerable to security dangers. It is usually owned by a large company (e.g. Amazon’s EC2, Google’s AppEngine and

Microsoft’s

Azure).The

13

owner-organization

makes

its

Cloud Computing and Hadoop

Chapter Two

infrastructure available to the general public via a multi-tenant model on a selfservice basis delivered over the Internet . 2- Private Cloud: The cloud infrastructure is worked only for a single organization. It might be managed by the organization or a third party. The private cloud gives an organization greater control over its data and resources, a private cloud costs more than the public cloud. Thus, the private cloud is more interesting to enterprises especially in mission and safety critical organizations. 3- Community Cloud: The cloud infrastructure is distributed by few organizations and supports a specific community that has distributed concerns (e.g., security requirements, mission , policy, or compliance considerations). It might be managed by the organizations or a third party and might present onpremises or off-premises. It might be managed by any one of the organizations or a third party. 4- Hybrid Cloud: The cloud infrastructure is a composition of two or more clouds (community, private , or public) that stay unique entities but are bound together by standardized or proprietary technology that enables data and application portability. The infrastructure is made up of a combination of two or more clouds, each might be any of the clouds said above (private, community or public). This model offers the largest degree of fault tolerance and scalability.

14

Cloud Computing and Hadoop

Chapter Two

Figure (2.1)Cloud Deployment Models[25] 2.2.2Cloud Computing Service Models Cloud computing can give a collection of services at the moment but main three services are platform-as-a-service ,infrastructure-as-a-service and Software-as-aservice also called as service model of cloud computing[26]. 1-Cloud Software as a Service (SaaS): The ability given to the consumer is to utilize the provider's applications executing on a cloud infrastructure. The applications are accessible from different client devices via a thin client interface such as a web browser (e.g., web-based email). 2-Cloud Platform as a Service (PaaS):The ability given to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created utilizing programming languages and tools supported by the provider (e.g., configurations).

15

Cloud Computing and Hadoop

Chapter Two

3-Cloud Infrastructure as a Service (IaaS): The ability given to the consumer is to provision preparing, networks, storage, and other fundamental computing resources where the consumer is able to deploy and execute arbitrary

software,

which

can

contain

operating

systems

and

applications.(e.g., host fire walls).

Figure(2.2) Cloud Service Model [14] 2.2.3 Cloud Computing Attributes The cloud computing has a number of attributes .The main attributes of cloud computing is illustrated as follows[27]:1-Elasticity and scalability: Elasticity enables scalability, which means that the cloud can scale upward for peak request and descending for lighter request. Scalability also means that an application can scale when adding users and when application requirements modify.

16

Cloud Computing and Hadoop

Chapter Two

2-Self-service provisioning: Cloud customers can have an access to cloud without having a lengthy process. User demand an amount of computing, storage, software, process, or more from the service provider 3-Standardization: For the communication between the services there should be standardized application program interface (API)’s. A standardized interface lets the customer more simply link cloud services together . 4-Pay-as-you-go: Customers has to pay some amount for what resources he has utilized. This pay-as-you-go model means user need to pay for what he has utilized. 2.2.4 Characteristics of Cloud Computing Cloud computing system consists of server characteristics, these characteristics are mentioned below [28]:-· Virtualization: Via cloud computing, the user is able to get service anywhere through any type of terminal. User can achieve or distribute it securely whenever. · High Reliability: Cloud utilizes data mistake tolerant to guarantee the high credibility of the service. · Versatility: Cloud computing can produce different applications supported by cloud, and one cloud can support different applications running it simultaneously. · On Demand Service: Cloud is a large resource pool that a user can buy according to his/her need; cloud is just like running water, and gas that can be charged by the amount that user utilized. · Extremely Inexpensive: The centered management of cloud create the enterprise does not which need to undertake the management cost of 17

Cloud Computing and Hadoop

Chapter Two

datacenter that increase very fast. The versatility can increase the utilization use of the available resources compared with traditional system, souses can completely take benefit of low cost. Some benefits are listed below§ Cloud computing do not need high quality equipment for user and it is easy to utilize. § Cloud computing can realize data distributing between different equipments. § Cloud computing provides dependable and secure data storage center. User don’t worry the problems such as data loss or virus. 2.2.5 Technical Components The key functions of a cloud management system is divided into four layers, the Resources & Network Layer, Services Layer, Access Layer, and User Layer, these layers are shown in the Figure (2.3). Each layer has a set of functions[29]: · The Resources & Network Layer manages the physical and virtual resources. · The Services Layer includes the main categories of cloud services, namely, Network as a Service(Naas),IaaS, PaaS, Saa, the service orchestration function and the cloud operational function. · The Access Layer includes API termination function, and inter-cloud peering and federation function.

18

Cloud Computing and Hadoop

Chapter Two

· The User Layer contains End-user function, Administration function and Partner function

Figure(2.3) The Cloud Computing Components [29] Other functions like Management, Security & Privacy, etc. are considered as cross-layer functions that covers all the layers. The main principle of this architecture is that all these layers are supposed to be optional. This means that a cloud provider who wants to utilize the reference architecture may select and implement only a subset of these layers. In spite of, from the security perspective, the principal of separation requires each layer to take charge of certain responsibilities. In event the security controls of one layer are by passed (e.g. access layer), other security functions could compensate and thus should be implemented either in other layers or as cross-layer functions[29] .

19

Cloud Computing and Hadoop

Chapter Two

2.3 Security Issue in Cloud Computing Although cloud is one of the most sought out technologies nowadays at the present time, it is also the most recent technology. With cloud, like any other new technology, there has not been much research on the security that cloud provides its users. Due to this reason, a cloud suffers from serious security problems including [25]: A-Problems with Data Security As mentioned earlier, there can be a very large amount of data on a cloud. With so much information, it becomes susceptible to losses. Data loss is a common problem with cloud storage where due to improper storage, the required information may be hard to find later. This is considered as loss of data since the data cannot be discovered. In spite of, in certain situations, a cloud service provider might try to reutilize a service, might use someone else’s server to provide service to the user instead of using their own servers. In these cases, data stealing can happen. Data stealing is a problem issue in a cloud. The actual owner of the server will have complete access to his/her server. When a service provider allocates the same server to a user, and the user places all the personal information into that server, the owner of the server could easily access this personal information .

B- Problem of Infection Since a user can access the data on a cloud, a mischievous user could upload a virus or any other application or software that could seriously harm another user’s computer or hardware device when it is downloaded. To prevent such mishaps, a great deal of security measures must be implemented .

20

Cloud Computing and Hadoop

Chapter Two

C- Other Security issues The other type of security issues problems are not just exclusive to a cloud. These include some of the common security threats like hacking a user’s cloud account to gather information. A sound password and a mechanism where tracing is impossible must be implemented.

2.4 Data Security Issues in Cloud Computing Cloud computing is in some way the base of any small or large size companies, as people are utilizing cloud for running their businesses and in cloud they are transmitted their data, or utilizing the cloud for data storage purpose, so the major worry of these people is the security of their data i.e. whether their data is secure in cloud or not. So data security becomes the major obstacle in utilizing the services of cloud computing. Some of the key features which can be taken in the context of data security are Data Integrity, Privacy or confidentiality, and Data Availability [30]. Cloud computing security features are explained in following paragraphs [30][31] : A-Data Integrity Integrity of data means that there should not be any tampering or alteration done to the user data, so to ensure data integrity in cloud environment the cloud Service Provider (CSP) use some effective mechanisms. As user, data is very important for them. To maintain its integrity and to be accountable about the data and what happens to the data at what point it has to implement some mechanism for data integrity . With providing the security of the data, cloud service providers should implement mechanisms to ensure data integrity and be able to tell what happened to a certain dataset and at what point. The cloud provider should make the client aware of what particular data is hosted on the cloud, the origin 21

Cloud Computing and Hadoop

Chapter Two

and the integrity mechanisms put in place for compliance purposes, it may be necessary to have exact records as to what data was placed in a public cloud . B-Privacy and Confidentiality This is the major issue during storing the data in cloud environment, as Cloud Service Provider (CSP) use some mechanism to ensure the privacy and confidentiality of data i.e. the user data should be very confidential and it should not be accessed by anyone else other than the authorized user. Moreover it should not be accessed by the cloud personnel, so it needs to maintain the privacy and confidentiality of data by using different mechanisms. When user host data in cloud ,cloud service provider gives guarantee that hosted data only accessed by authorized user and CPS give the assurance that privacy policy should be applied proper. However data hosted on cloud will be confidential . Once the client host data in the cloud there should be some guarantee that access to that data will only be limited to the authorized access. Inappropriate access to customer sensitive data by cloud personnel is another risk that can pose potential threat to cloud data. Assurances should be provided to the clients and proper practices and privacy policies and procedures should be in place to assure the cloud users of the data safety. The cloud seeker should be assured that the data hosted on the Cloud will be confidential .

C -Data Availability Data availability is another security issue in cloud computing, as cloud Service Provider (CSP) stores its data in distributed manner i.e. at different locations, so as to provide uninterruptible data it needs to use some mechanism [30].Customer data is normally stored in chunk on different servers often 22

Cloud Computing and Hadoop

Chapter Two

residing in different locations or in different clouds. In this case, data availability becomes a major legitimate issue as the availability of uninterruptible and seamless provision becomes relatively difficult . D- Data Location and Relocation Initially cloud computing stores the data at some place but after a while it relocates the data i.e. it locates and relocates the data according to the availability and requirement of storage place so in that case it needs to maintain the security of data at different locations. Cloud Computing offers a high degree of data mobility .Consumers do not always know the location of their data. However, when an enterprise has some sensitive data that is kept on a storage device in the cloud . This, then, requires a contractual agreement, between the cloud provider and the consumer that the data should stay in a particular location or reside on a given known server. Also, cloud providers should take responsibility to ensure the security of systems (including data)and provide robust authentication to safe guard customers’ information. Another issue is the movement of data from one location to another. Data is initially stored at an appropriate location decide by the cloud provider. However, it is often moved from one place to another. Cloud providers have contracts with each other and they use each others’ resources. E-Storage, Backup and Recovery :Cloud provider provides data storage, backup and recovery facility so that in the event of such hardware failure, consumer can roll back to an earlier state. As the user knows that the data is stored in cloud, but what happen if cloud’s storage system get corrupted or data theft occur in its storage system, so it needs to use some mechanism to provide the backup and recovery of the lost data . 23

Cloud Computing and Hadoop

Chapter Two

When the user decide to move his data to the cloud the cloud provider should ensure adequate data resilience storage systems. At a minimum they should be able to provide RAID (Redundant Array of Independent Disks) store systems although most Cloud providers will store the data in multiple copies across many independent servers . In addition to that, most Cloud providers should be able to provide options on backup services which are certainly important for those businesses that run cloud based applications so that in the event of a serious hardware failure they can roll back to an earlier state and storage it resided on, where it was processed .When such data integrity requirements exists, the origin and custody of data or information must be maintained in order to prevent tampering or to prevent the exposure of data beyond the agreed territories . 2.5 Hadoop Apache Project Hadoop is a top level Apache project, an open source software framework, that is written in java programming language which allows the distributed processing of massive data sets across different sets of servers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage. Rather than relying on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, delivering a highly available service on top of a cluster of computers, each of which may be prone to failures .Hadoop was created by Doug Cutting with the goal to create a distributed computing framework and programming model to provide for easier development of distributed applications. The philosophy is to provide scale-out scalability over large clusters of rather cheap commodity hardware. Its creation was 24

Cloud Computing and Hadoop

Chapter Two

motivated and was largely based on papers published by Google to describe some of their internally systems, namely the Global file system(GFS) and Google Map Reduce[32]. Hadoop is designed to be scalable, and can run on small as well as very large installations. Several programming frameworks including Pig Latin and Hive allow users to write applications in high level languages (loosely based on the SQL syntax) which compile into MapReduce jobs, which are then executed on a Hadoop cluster. Hadoop committers today work at several different organizations such as Hortonworks, Microsoft, Facebook, Cloudera, LinkedIn, yahoo, eBay and many others around the world [32].

2.5.1 Hadoop Characteristics Hadoop Apache Project consists of the number of characteristics, these characteristics are[33] [32]:1. Scalable:Automatic scale up/ down Hadoop heavily relies on Data File System (DFS), and hence it comes with a capability of easily adding or deleting the number of nodes needed in the cluster without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. 2. Cost Effective:Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per Tera Bytes (TB) of storage, which in turn makes it affordable to model all the data. 3. Flexible:Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any single system can provide . 25

Cloud Computing and Hadoop

Chapter Two

4. Fault Tolerant:It is the ability of a system to stay functional without interruption and losing data even if any of the system components fail. One of the main goals of Hadoop is to be fault tolerant. Since Hadoop cluster can use thousands of nodes running on commodity hardware, it becomes highly susceptible to failures. Hadoop achieves fault tolerance by data redundancy replication. It also provides ability to monitor running tasks and auto restart the task if it fails. 5. Built in Redundancy: Hadoop essentially duplicates data in blocks across data nodes. For every block, there is assurance for a back-up block of same data existing somewhere across the data nodes. Master node keeps track of these nodes and data mapping. In case if any node fails, the other node where backup data block resides takes over making the infrastructure failsafe. A conventional Relational Database Management Systems (RDBMS) has the same concerns and uses terms like: persistence, backup and recovery. These concerns scale upwards with Big Data. 6. Computational Tasks on Data Residence: Moving computation to data, any computational queries are performed where the data resides. This avoid overhead required to bring the data to the computational environment. Queries are computed parallel and locally and combined to complete the result set . 2.5.2 Hadoop Components Hadoop Consists of two main components: Map Reduce(which is deals with the computational operation to be applied on data) and the Hadoop Distributed File System or(HDFS) (which deals with reliable storage of the data)[32].

26

Cloud Computing and Hadoop

Chapter Two

2.5.2.1 Hadoop Distributed File System (HDFS) The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached storage and execute user application tasks. By distributing storage and computation across many servers, the resource can grow with demand while remaining economical at every size [35]. It has many similarities with existing distributed file systems. In spite of, the differences from other distributed file systems, they still are significant. (HDFS) is highly fault-tolerant and is designed to be deployed on low-cost hardware. (HDFS) gives high through put access to application data and is suitable for applications that have large data sets. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project[32].

Figure (2.4) HDFS Architecture[32] 27

Cloud Computing and Hadoop

Chapter Two

HDFS has a master/slave architecture as shown in Figure (2.5), comprised of a Name Node which manages the cluster metadata and Data Nodes that store the data. Master node called “NameNode”, and slave nodes are called “Data Nodes”. HDFS divides the data into fixed-size blocks (chunks) and spreads them across all DataNodes in the cluster. Each data block is typically replicated three times with two replicas placed within the same rack and one outside. The NameNode keeps track of which Data Nodes hold replicas of which block and Name node actively monitors the number of replicas of a block. When a replica of a block is lost due to a Data Node failure or disk failure, the Name Node creates another replica of the block. It also determines the mapping of blocks to Data Nodes. The Data Nodes are responsible for serving read and write requests from the file system’s clients. The Data Nodes also perform block creation, deletion, and replication upon instruction from the Name Node[32].

2.5.2.2Map Reduce Map Reduce is a heart of Hadoop, a programming model for parallel processing of tasks on a distributed computing system and an associated implementation for processing and generating large data sets. This programming model allows splitting a single computation task to multiple nodes or computers for distributed processing. As a single task can be broken down into multiple subparts, each handled by a separate node, the number of nodes determines the processing power of the system. As Map Reduce is an algorithm, it can be written in any programming language [36]. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution 28

Cloud Computing and Hadoop

Chapter Two

across a set of machines, handling machine failures, and managing the required inter-machine communication[37]. When there is a large amount of data, the computations have to be distributed across hundreds or thousands of machines in order to finish in a reasonable amount of time. The issues of how to parallelize the computation, distribute the data, and handle failures conspire to obscure the original simple computation with large amounts of complex code to deal with these issues. As a reaction to this complexity, Google designed a MapReduce as a new abstraction that allows expressing the simple computations that programmers trying to perform but hides the messy details of parallelization, fault-tolerance, data distribution and load balancing in a library [37]. Programmers find the system easy to use because MapeRduce allows programmers to focus on the application logic and handling the messy details such as handling failures, application deployment, task duplication, and aggregation of results automatically. The Map Reduce paradigm has become a popular way of expressing distributed data processing problems that need dealing with large amount of data. MapReduce is used by a number of organizations worldwide for diverse tasks such as application log processing, user behavior analysis, processing scientific data, web crawling and indexing etc. A Map Reduce program consists of two user-specified functions, the Map and the Reduce function which are explained bellow: a. Map Phase In the Map phase, each Mapper reads raw input, record by record, and converts it into Key/Value pair [(k,v)], and feeds it to the map function then map function performs a computation on a key value pair. The Map function operates on each of the pairs in the input and produces intermediate output in the form of new key/value pairs depending upon how the user has defined the 29

Cloud Computing and Hadoop

Chapter Two

Map function. Output of the map function is then passed to the reduce function as input [33]. b. Reduce Phase The reduce function applies an aggregate function on its input merging all intermediate values associated with intermediate keys (e.g. count or sum values), and storing its output to disk. The reduce output is also in the form of key-value pairs. At the end of the reducer, the output is sorted according to the values of keys, and the function for comparison of the keys is usually supplied by the user. During the execution of a MapReduce job, the input is first divided into a set of input splits. The system then applies map functions on each of the splits in parallel. The system spawns one task for each input split, and output of the task is stored on a disk for transferring to the reduce tasks. The system starts reduce tasks once all the map tasks have been successfully completed. Task or node failures are dealt by re-launching the tasks. Data given as input to the tasks, and generated as output of the tasks which is stored in a distributed file system (HDFS for instance), to make sure that output of the task survives failures [34].

30

Proposed Cloud Computing System Design

Chapter Three

Chapter Three Proposed Cloud Computing System Design

3.1 Introduction In cloud computing, providing security for protecting data and resources is an important task to handle security problems . In order to solve the problem of data security in cloud computing system, one of the solutions is cryptography algorithms, data is stored in format of encryption. Through data encryption is not accessed by the unauthorized users from cloud environment. In this chapter, the proposed system for enhancing the security of the cloud computing systems is presented. Hadoop environment has been installed on the virtual machine environment which is used to build the cloud system. This system has the ability for the users registration and it can support almost all cloud computing requirements like file management and data computation. Most of the cloud computing system issues are presented and discussed in this chapter. 3.2 The Proposed System Architecture In this thesis, the proposed cloud computing system was built using the Linux OS and Hadoop package. The features for the Linux and Hadoop package were supporting all the cloud computing system requirements. Figure (3.1) shows the general architecture for the proposed cloud computing system.

31

Chapter Three

Proposed Cloud Computing System Design

Figure (3.1) General Architecture for Proposed Cloud Computing System 3.3 Cloud Computing Environments The proposed system can be divided into three parts, cloud computing environment, user interface and data management, and security issues. These parts are discussed in the next sections. The cloud computing environments means how the cloud properties and requirements can be satisfied. In the proposed system the Linux and Hadoop package was used to perform and satisfy all of the cloud system requirements. 3.3.1 Building a Cloud Computing System Using Hadoop In this thesis Hadoop package was used for building cloud computing system. Hadoop consist of one master (Name Node) and the number of slaves (Data Node). The master nodes oversee the two key functional pieces that make up Hadoop: storing lots of data in Hadoop Distributed File System(HDFS), and running parallel computations on all stored data (MapReduce).

32

Chapter Three

Proposed Cloud Computing System Design

The Name Node oversees and coordinates the data storage function (HDFS), while the Job Tracker oversees and coordinates the parallel processing of data using Map Reduce. The Master (Name Node) manages the file system namespace operations like opening, closing, renaming files and directories and determines the mapping of blocks to Data Nodes along with regulating access to files by clients. Slaves (Data Nodes) are responsible for serving read and write requests from the file system’s clients along with performing block creation, deletion, and replication upon instruction from the Master (Name Node). 3.3.2 Hadoop Package Support Cloud Service Models As it mentioned in the previous chapter, Cloud Computing system consist of three Service Models, Software as a Service (SaaS), Platform as a Service (PaaS),and Infrastructure as a Service (IaaS). The Software as a Service (SaaS) involves the Cloud provider maintaining and installing software in the Cloud, and users executing the software from their cloud clients over the internet. In the proposed cloud computing system, the cloud system Webpage (cloud interface) can support all the SaaS services in Cloud computing system. The proposed cloud interface allow the cloud’s users to access cloud computing system and transfer files by an easy way. Platform as a Service (PaaS) provides the users with application platforms and databases as a service. In the Proposed cloud System, Apache Web server and MySQL server supported the PaaS services. Infrastructure as a service is (IaaS) taking the physical hardware and going completely virtual (e.g. all servers, networks, storage, and system management all existing in the cloud). In the Proposed cloud System, the Linux OS, and the Hadoop package can support the cloud infrastructure and all the services that are needed in IaaS. 33

Chapter Three

Proposed Cloud Computing System Design

Figure (3.2) shows how the proposed cloud System can support the cloud computing services. Cloud Computing Layers

Hadoop package

Software as a Service (SaaS)

Web page (SaaS) Platform as a Service (PaaS)

Apache Server &MySQL Database (PaaS)

Infrastructure as a Service (IaaS)

VM Linux OS & Hadoop package (Iass)

Figure (3.2) Proposed Cloud System with the Cloud Services

34

Chapter Three

Proposed Cloud Computing System Design

3.3.3 Hadoop Installation on Linux Multiple Nodes Hadoop is an open-source framework that allows to store and process Big Data in a shared environment across clusters of computers utilizing simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. The Hadoop version 2.7 was downloaded from the internet [38] and installed in Linux/Debian version7. One Master Machine(Name Node) and two Slaves Machines (Data Node)were used in the Proposed cloud System. At the beginning, the Hadoop System was installed in the Master server, which was responsible for cloud’s users’ managements. Other Hadoop System was installed in the Slaves, which was responsible for data storing and computation. The instruction steps and commands for Hadoop System installation and configuration for the Master and Slaves are explained in Appendix A. 3.4 User Interface and Data Management In this section the user interface system that were used by the cloud’s user to access cloud computing system are discussed, and the data and files management over cloud System are discussed too. 3.4.1 User Registration To perform cloud user registration process, each user need to use User-LoginPage. This page includes numbers of fields like Full Name, User Name, Password and Email Address, all these fields should be filled by user. The Login-Page was designed and implemented using PHP, and user information saved in database which was built using MySQL Server. In this module all fields must be filled, otherwise error message will be displayed. At the beginning, cloud’s user needed to be registered in the cloud 35

Chapter Three

Proposed Cloud Computing System Design

system then he can perform the remaining operations, so without registration, the clouds’ operations cloud not be performed. In the user registration form user must enter the valid information otherwise they will get the error message, once the user registers by entering proper information it automatically receive message saying , “The User Successfully Registered “. All the users in the cloud system have a specific space (user home folder) which contains all user’s information (files and folders).Each user has full permission to access his files and folders, and he can change these permissions ass/he wants. In cloud Server side, when the information for the registered user are reached, the cloud Administrator should be create a new user account, and because the Linux OS was used in the proposed cloud system, the Administrator needed to use “adduser” command to create this account depends on the user information that are saved in the database. Then the cloud’s Administrator create a specific space (Folder) for the user in Hadoop system, this space using later to save all the user information in the cloud system. After that, cloud Administrator create RSA and AES keys (which are used for the security issue) and save it in the user space. When the cloud’s Administrator create a new user and locate a space to him, he send User-Name and Password to the new user by email. In the administrator side, a system program built using shell programming language, this program is responsible for creating user, creating user space, and generate RSA, AES keys and save these keys in the user space (folder).Figure (3.3-A) shows flowchart for user process in client side and Figure (3.3-B) shows this process in cloud’s Administrator side.

36

Chapter Three

Proposed Cloud Computing System Design

Figure (3.3-A) Flowchart for adding

Figure (3.3-B) Flowchart for adding

user process (Client Side)

user process (Administrator Side)

3.4.2User’s Files and Folders Permissions All users in cloud system should have specific permissions for their files and folders, these permissions were read, write, and execute. These permissions allows users to specify exactly who has a permissions for accessing his files and folders. This will enhanced the security for the cloud system. There are three possible attributes for file access permissions (Figure 3.4): 37

Proposed Cloud Computing System Design

Chapter Three

1- r - Read Permission. Whether the file may be read. In the case of a directory, this would mean the ability to list the contents of the directory. 2-w - Write Permission. Whether the file may be written to or modified. For a directory or files, this defines whether to make any changes to the contents of the directory. If write permission is not set, then user will not be able to delete, rename or update a file. 3-x - Execute Permission. Whether the file may be executed. In the case of a directory, this attribute decides whether user has permission to enter, run a search through that directory or execute some program from that directory The commands that are used for modifying file permissions and ownership arechmod(for change permissions) and chown (for changing the ownership).When auser needs to add or remove the permissions, he can use the command “chmod” with a “+” or “-“ characters (“+” to add permission and “-“to remove permission).

Figure (3.4) Cloud User - File Permissions Users in the cloud can give the file/folder permissions to the other types of users like user group or others. From Figure (3.5) it can be seen that there are three different types of system users:38

Proposed Cloud Computing System Design

Chapter Three

u – User: - A user is the owner of the file, by default, the person who created a file becomes its owner. Hence, the user is also sometimes called an owner. g – Group:-A user group can contain multiple users. All users are belonging to a group will have the same access permissions to the file. o – Other: - Any other users who have access to a file .This person has neither created the file nor belongs to a user group who could own the file. Practically, it means, when the user set the permission for others, it is also referred to as set permissions for word. By these features user in the cloud system can give other users in their group or others users outside their group the ability for accessing their files/folders according to the permissions that are granted early to them.

Figure (3.5) Cloud User –Change Permissions

3.5 Upload and Download Data with Cloud Cloud’s user can upload and download files to/from cloud Server. The Proposed cloud System supported appropriate applications (programs) to their users which were used for uploading and downloading files to/from the cloud system. Figure (3.6) shows the file upload and download process in cloud system

39

Proposed Cloud Computing System Design

Chapter Three

Figure (3.6) Upload and Download Files Processes

3.5.1 Upload Data to Cloud Server Cloud server is the area where the user is going to request and upload files. In order to upload data, the data owner has to be registered in the cloud server. Once the data owner register in Cloud server, the space will be assigned to him and he can upload and download data. The proposed system supported two programs for uploading files to cloud system. The first one was used to upload file to the Linux server and save it in the user space, and the second one used to upload files from the Linux Server to the Hadoop package. 1. Upload Data to Linux Server Cloud’s users used File Transfer Protocol (FTP) to upload files to the Linux server. The FTP is the most popular protocol used to transfer files from one system to other system. It provides the fastest way to transfer files. There are many applications available on Linux and Windows that is support FTP services. 40

Chapter Three

Proposed Cloud Computing System Design

When cloud’s user wanted to upload the file to the cloud server, he needed to visit cloud System web page and use FTP page. This web page provided file manager interface for uploading files. It also give the user the ability to synchronize files between user local computer and the Web server, change or establish file permissions, and perform other advanced file and folder management functions. After that, the uploaded files were saved in the Linux user space and it was ready for uploading to the Hadoop system. 2- Upload Files to Hadoop System Secure Shell Protocol (SSH) is a protocol which is used by clouds’ users to access cloud server and run shell script commands. SSH provides a secure channel over an unsecured network in a client-server architecture and it connects an SSH client application with an SSH server. Users cloud be used “Putty” application program which depends on SSH protocol to connect to the cloud system. The “Putty” application was available in the proposed cloud system website and cloud’s users can download indirectly. After cloud’s users were connected to the cloud system via SSH, they cloud use the “upload.sh” program (which were built and saved in the user space) to upload files from Linux server to Hadoop system .The “upload.sh” program written by shell script language. Figure (3.7) shows the flowchart for uploading files to cloud System.

41

Chapter Three

Proposed Cloud Computing System Design

Figure (3.7) Shows the Flowchart for Uploading Files to Cloud 3.5.2 Download Files form Cloud System In the proposed system, when a cloud’s user wanted to download files from a cloud system, at the beginning he/she must visit a cloud System Webpage and access his/her files in the Hadoop. Then the cloud system was providing a list of files that the users were uploaded earlier to the cloud. Among that user select the indented file from the list then automatically the file was downloaded and saved in the user local host. Figure (3.8) shows the list of the users’ files in cloud system.

42

Proposed Cloud Computing System Design

Chapter Three

Figure (3.8) The list of the Users’ Files in Cloud System.

3.6 Cloud System Security Issue Even though, the cloud computing delivers wide range of dynamic resources, the security concern is generally perceived as the huge problem in the cloud which creates the users to resist themselves in adopting the technology of cloud computing.

In general, the security issue in cloud computing consist of four

characteristics Authentication, Integrity, Availability and Confidentiality. The proposed cloud computing system achieved these four characteristics according the following:1- Authentication (Username &Password) The

process

of

identifying

an

individual,

usually

based

on

a username and password. User authenticates himself to the cloud system by his unique Username and Password. If the Username and Password for the cloud user is incorrect, the user will not allowed him to access the cloud System. The 43

Chapter Three

Proposed Cloud Computing System Design

Username and Password for the user have a high degree of confidentiality, because the Linux System encrypted the Username and Password using one of the Secure Hashing algorithm. 2- Integrity & Availability Integrity, in expressions of data is the assurance that information can only be accessed or modified by those allowed. Integrity of data means that there must not be any tampering or alteration done to the user data. And because the users’ data is very important for them, so the cloud system administrator need to use some effective mechanisms to maintain the data Integrity. Data Availability is another security problem in cloud computing, as a cloud system Administrator stores data in shared manner i.e. at various locations, so as to provide data availability in the cloud system the administrator needs to utilize some mechanism. Integrity &Availability in the Proposed cloud System were achieved by the Hadoop system. Hadoop system supported the data Integrity for the proposed cloud system by using the number of algorithms that already exists in the Hadoop Master and Slaves so these data could not be altered and changed. About Availability, the Hadoop system using the number of Slaves (Data Node) which were used to save users’ data, these slaves connected internally with each other, and if any one of them went down the other slaves cloud take a place for it. And because the data were repeated in these Slaves (frequent saved), so this data was available any times even when the number of slaves went down.

44

Chapter Three

Proposed Cloud Computing System Design

3- Confidentiality (AES andRSA) The major problem during storing the data in cloud environment is to protect user Confidentiality, as cloud system Administrator utilize some mechanism to guarantee the confidentiality of the data i.e. the user data must be very confidential and it must not be accessed by anyone excepts the person who has the permissions to do that, so it needs to maintain the privacy and confidentiality of data by utilizing various mechanisms. The proposed cloud computing system achieved user privacy or confidentiality by using cryptographic techniques. Two crypto graphical algorithms were used in the proposed cloud system. The first one was a RSA algorithm (Rivest, Shamir, and Adelman) which was used for Key Management among the cloud’s users, and the second was an AES algorithm (Advance Encryption Standard), which was used through data transformation between cloud’s users and cloud system. 3.6.1 Key Management Key management is the management of cryptographic keys in a cryptosystem. This contains dealing with the generation, storage, exchange, use, and replacement of keys. Cloud system Administrator is responsible for preparing the keys for all cloud’s users that have been registered in cloud system. Key exchange is prior to any secured communication, users should set up the details of the cryptography. In some instances this may require exchanging identical keys (in the case of a symmetric key system). In others it might require possessing the other party's public key. While public keys can be openly exchanged (their corresponding private key is kept secret).Symmetric keys must be exchanged over a secure communication channel.

45

Chapter Three

Proposed Cloud Computing System Design

In the proposed cloud system, when a cloud system Administrator created a new user and give him a space in the system, s/he automatically generated the keys (secret key for AES, public and private keys for RSA) and saved it in the user space. Only the intended user can access and download these keys because he had a permissions to do that. So no any other users can access the user space and see his secret keys. After that each user in the cloud System has AES secret key and RSA public and private keys. When the cloud’s users wanted to share some secret data in the cloud system, they can do that by sharing the AES secret key with the other users by using their public and private keys. For example user one cloud encrypt the AES secret key using RSA public key for the second user and send the encrypted secret key to him. The second user can be decrypted the encrypted secret key by using his RSA private key. The cloud’s user had the ability to change his AES secret key, he needed that when he wanted to prevent the cloud Administrator from access and show the real data in his files. So, for key management process, at the beginning cloud’s user downloaded his/her keys (RSA and AES keys) from the cloud System by using Secure File Transfer Protocol (SFTP), this protocol was a secure transfer protocol and also it was available in the cloud System Webpage and the cloud’s user cloud download it directly. Furthermore, cloud’s user must to download the RSA public keys for all other cloud’s users. This process was used by all the cloud’s users, Figure (3.9) shows the key management process over proposed cloud computing system.

46

Proposed Cloud Computing System Design

Chapter Three

Figure (3.9) Proposed Key Management System

3.6.2 Secure File Transfer over Cloud System When the cloud’s user downloaded his keys, s/he becomes ready to upload and download files to/from cloud system. Figure (3.10) shows the Secure File Transfer over cloud system.

47

Chapter Three

Proposed Cloud Computing System Design

Figure (3.10) Secure File transfer over When a cloud’s user wanted to upload file to the cloud system, s/he can decide if this file was in need to be encrypted or not. If the file did not need security the user can upload it directly by requesting the FTP-page, selecting File from local computer, and uploading files to the cloud system. If the file was a secure file, it was in need to be encrypted before sending to the cloud System. So the user run then crypt ion program that was responsible for selecting a file and encrypting it using AES algorithm. So, the encrypted file was ready to be uploaded using FTP web page, the program that was used for file encrypted written by java language and the cloud’s user cloud download it from the cloud system home page . 48

Chapter Three

Proposed Cloud Computing System Design

Figure (3.11) shows the flowchart for upload secure and non-secure files to cloud system.

Figure (3.11) Flowchart for upload secure and non-secure files to Cloud system When a cloud’s user was in need to download files from the cloud system, he requested a cloud system Webpage and selected the Hadoop home page. After that, the user cloud browse his/her files and other files which had a permission for browsing and accessing . Then the user cloud download file from the Hadoop page directly. 49

Chapter Three

Proposed Cloud Computing System Design

The cloud’s user cloud download two types of files, normal files and encrypted files. When the user wanted to download a normal file, he should select file, download selected file, and save it in the user local host. But when he wanted to download an encrypted file, select file, download selected file, decrypt the contents for the downloaded file (using AES algorithm), and save the decrypted file in the user local host. Figure (3.12) shows the flowchart for download files from the cloud system.

Figure (3.12) Flowchart for download secure and non-secure files to Cloud system 50

Chapter Four

The Proposed Cloud Computing System Implementation

Chapter Four The Proposed Cloud Computing System Implementation

4.1 Introduction Nowadays large amount of data is stored on cloud. But it’s a difficult task to provide security to data for life time. In this chapter, the implementation for the proposed cloud computing system is presented .It shows and explains how the proposed system work and how it can be managed all the users processes like user-registration ,upload files and download files . It also explained how the security enhancement for the cloud system along with its key management and secure data transfer (upload/download) can be achieved in the practical part. 4.2 Main Steps for Cloud System Implementation The main idea of the proposed system is to design and implement cloud computing system and enhanced its security. Hadoop package was used to build the area for saving and managing users’ data and supporting availability, integrity, large Storage space and fast data computations, but still it needs other application programs that are used to assist cloud’s users for accessing the cloud system. So, the web application system was built using PHP language and it used Apache Server in Linux as a hosting place. Also another application programs that were used to enhance the security for the proposed cloud system was built. These application programs were written using java language. Figure (4.1) shows the main steps for proposed cloud computing system implementation.

51

Chapter Four

The Proposed Cloud Computing System Implementation

User-registration process

Create username , password and security keys

Upload file process

Download file process

Figure (4.1) Diagram Show the Steps for System 4.3 User Interface and Data Management In the proposed cloud system, when a user wanted to access cloud system, at the beginning he should visit the cloud system website and access the Cloud’s Home Page (cloud system interface).This Home Page was used to facilitate navigation to other pages on the site by providing links to prioritized and recent articles and pages. So, the Home Page had the number of links for the other pages in the cloud system, these pages are User-Registration, Uploading Files, Cloud Server Page and Application Programs. Cloud’s users used these pages for uploading and downloading files to/from cloud system. Figure (4.2) shows the proposed cloud computing system Home Page. The following sub sections explain how the system can be implement step by step.

52

Chapter Four

The Proposed Cloud Computing System Implementation

Figure (4.2) Proposed Cloud Computing System Home Page 4.3.1User Registration Process User registration process started when the users selected the Register-User link in the cloud system Home Page. After that user registration page was loaded in the internet browser, it had the number of input texts which was represent the user information like username, password, email address …etc. When the user the filled out his information in the registration page, he had to click Register command to send his information to the cloud server, and the information was saved in the server database. Figure (4.3) shows the user registration web page.

53

Chapter Four

The Proposed Cloud Computing System Implementation

Figure (4.3) User Registration Web Page On the other side, when the user registration process was completed successfully, the cloud system Administrator read user’s data from database, and run the creatuser.sh program which was responsible for creating user account, creating user space, and generate RSA public key, RSA private key and AES secret key, and save these keys with the uploadfile.sh program (which is responsible for upload files from Linux server to Hadoop) in the user space. After that, the cloud Administrator had to send user name and the password to the cloud’s user. Figure (4.4) shows how the cloud Administrator run creatuser.sh program to create new user, and Figure (4.5) shows the user space (folder) contents which are includes RSA public and private key, AES secret key, and uploadfile.sh program.

54

Chapter Four

The Proposed Cloud Computing System Implementation

Figure (4.4) Cloud’s Administrator creating User Account

Figure (4.5) Files included in the Cloud’s User Space (folder)

55

Chapter Four

The Proposed Cloud Computing System Implementation

4.4 Upload and Download Files Processes The data owner (cloud’s user) uploaded his files to the cloud server. For the security purpose (if it needed) the data owner encrypted data files, sent it to cloud system, and saved it in the user space. The data owner could have capable of manipulating the encrypted data file, and he could set the access privilege to the encrypted data file. 4.4.1 Upload Files using FTP and SSH Protocols As it mentioned in the previous chapter, the cloud’s user could upload files by two ways, non-secure and secure upload files. For the first one the Cloud’s user had to access the uploaded application web page by clicking on Upload Data link in the cloud’s Home Page, then the FTP web page was loaded in the web browser, then cloud’s user had to enter IP address or domain name for the cloud server, enter his User Name and Password, and then click on Login command. After that the user could choose the file that he wanted to send, and sent it to cloud server (Linux OS).Figures (4.6)shows the FTP web page, and (4.7) shows how the cloud’s user choosing files that he want to send.

Figure (4.6) FTP Web 56

Chapter Four

The Proposed Cloud Computing System Implementation

Figure (4.7) Cloud’s User Choosing Files Process The second one (secure upload file), the cloud’s user had to encrypt data file before he access the FTP web page. So, s/he had to run SecureFile.jar program (that is downloaded early form the cloud’s home page) that was used to encrypt the data file using AES algorithm, then he cloud access FTP page and upload the encrypted file to the cloud server. Figure (4.8) shows the encryption process using SecureFile.jar program.

Figure (4.8) Encryption Process using SecureFile.jar Program. 57

Chapter Four

The Proposed Cloud Computing System Implementation

When uploading file process (using FTP web page) was completed, all user’s file were saved in Linux server user space. So, these files had to be uploaded to the Hadoop package .To do that, cloud’s user had to access the Linux server and run the uploadfile.sh program which was already saved in the Linux user space. Cloud’s user used PuTTY application (run SSH protocol) to access Linux server. SSH protocol was in the need of the username and the password for the cloud’s user. When the accessing process done, the user cloud run the uploadfile.sh program and upload file to the Hadoop package. Figures (4.9) and (4.10) shows how the cloud’s user can upload file from the Linux server to the Hadoop package.

Figure (4.9) Cloud’s user using PuTTY application to access

58

Chapter Four

The Proposed Cloud Computing System Implementation

Figure (4.10) Cloud’s user run uploadfile.sh and upload file to Hadoop 4.4.2 Download Files Process In the proposed cloud System, there were two types of downloading files from cloud server. The first one, cloud’s user had to download RSA public key, RSA private key, and AES secret key. This download process cloud be done using (WinSCP) application with the SFTP protocol. Figure 4.11 shows how the cloud’s user download encryption keys using SFTP protocol. The second one, cloud’s user cloud download his files from the cloud server. This cloud be done by accessing the Hadoop website by clicking the Cloud Server link in the cloud’s Home Page, selecting files, downloading this file, decrypting it (if it needed), and save it in the user local host. Figures 4.12 and 4.13 shows the cloud’s user download process.

59

Chapter Four

The Proposed Cloud Computing System Implementation

Figure (4.11) Cloud’s user use WinSCP application with the SFTP protocol

Figure (4.12) Cloud’s user access Hadoop webpage 60

Chapter Four

The Proposed Cloud Computing System Implementation

Figure (4.13) Cloud’s user decrypt encrypted downloaded file

4.5 Results Based on Different Packet Sizes Encryption time is utilized to calculate the throughput of an encryption scheme. It indicates the speed of encryption. Different packet sizes are utilized in this experiment for AES algorithms .The encryption time is recorded for the encryption algorithms. The average data rate is calculated for AES based on the recorded data. The formula utilized for calculating average data rate is (4.1) Where AvgTime = Average Data Rate (Kb/s) Nb = Number of Messages Mi=Message Size (Kb) Ti=Time taken to Encrypt Message Mi

61

Chapter Four

The Proposed Cloud Computing System Implementation

Encryption time is utilized to calculate the throughput of an encryption scheme. It indicates the speed of encryption. The throughput of the encryption scheme is calculated utilizing the following formula

(4.2)

Tp= Total Plain text Et= Encryption time It is very important to calculate the throughput time for the encryption algorithm to known better performance of the algorithm.

4.5.1 Results based on Different Key Size The last performance comparison point is the changing different key sizes for AES algorithm. In case of AES, We consider the three different key sizes possible i.e., 128 bit, 192 bits and 256 bit keys. In case of AES it can be seen that higher key size leads to clear change in the battery and time consumption. Experimental result for encryption algorithm AES is shown in table 1 time consumption (different key size) for encryption, Figure (4.14) shows the analysis with different key size for encryption , another experimental result for decryption algorithm AES is shown in table 2 time consumption (different key size) for decryption, Figure (4.15) shows the analysis with different key size for decryption.

. 62

Chapter Four

The Proposed Cloud Computing System Implementation

Table 1: Time Consumption (Different Key Size) for Encryption

Input Size (Kb)

Time(Millisecond) AES 128

AES 192

AES 256

45

33

54

65

76

80

90

100

102

92

103

119

500

127

157

168

900

210

267

292

1024

310

321

350

AvgTime

170.4

165.33

182.33

Throughput

15.53

16

14.5

Figure (4.14) Analysis with Different Key Size for Encryption

63

Chapter Four

The Proposed Cloud Computing System Implementation

Table 2: Time Consumption (Different Key Size) for Decryption Input Size (Kb)

Time(Millisecond) AES 128

AES 192

AES 256

45

30

49

60

76

78

87

95

102

88

99

110

500

123

133

155

900

200

255

280

1024

300

309

340

AvgTime

136.5

155.3

173.3

Throughput

19.39

17

15.27

Figure (4.14) Analysis with Different Key Size for Decryption

64

Chapter Five

Conclusions and Future Work

Chapter Five Conclusions and Future Work

5.1 Conclusions In this thesis, the cloud computing system is built using Hadoop package with Linux OS, and the security enhancement for this system are designed and implemented. This system has the ability for users registration and it can support the cloud computing requirements like file management and data computation. when a user wanted to access cloud system, at the beginning he should visit the cloud system website and access the cloud’s Home Page (cloud system interface). when the user registration process was completed successfully, the cloud system Administrator read user’s data from database, and responsible for creating user account, creating user space, and generate RSA public key, RSA private key and AES secret key, and save these keys with the uploadfile.sh program (which is responsible for upload files from Linux server to Hadoop) in the user space. After that, the cloud Administrator had to send user name and the password to the cloud’s user. The cloud’s user could upload files by two ways, non-secure and secure upload files. For the first one the cloud’s user had to access the uploaded application web page by clicking on Upload Data link in the cloud’s Home Page, then the FTP web page was loaded in the web browser, after that the user could choose the file that he wanted to send, and sent it to cloud server (Linux OS). The second one (secure upload file), the cloud’s user had to encrypt data file before he access the FTP web page. So, s/he had to run SecureFile.jar program (that is downloaded early form the cloud’s home page) that was used to encrypt the data file using AES algorithm, then s/he cloud access FTP page and upload the encrypted file to the cloud server .When uploading file 65

Chapter Five

Conclusions and Future Work

process (using FTP web page) was completed, all user’s file were saved in Linux server user space. So, these files had to be uploaded to the Hadoop package.To do that, cloud’s user had to access the Linux server and run the uploadfile.sh program which was already saved in the Linux user space. Cloud’s user used PuTTY application (run SSH protocol) to access Linux server. SSH protocol was in the need of the username and the password for the cloud’s user. When the accessing process done, the user cloud run the uploadfile.sh program and upload file to the Hadoop package. Cloud’s user had to download RSA public key, RSA private key, and AES secret key. This download process cloud be done using (WinSCP) application with the SFTP protocol . The proposed cloud computing system achieved user privacy or confidentiality by utilizing cryptographic techniques. Two crypto graphical algorithms were utilized in the proposed cloud system. The first one was a RSA algorithm which was utilized for Key Management among the cloud’s users, and the second was an AES algorithm, which was used through data transformation between cloud’s users and cloud system.

From the system implementation and results, the number of interesting points of conclusions related to cloud system management and security can be drawn :1. The proposed system consist of Hadoop package and Linux OS. This system supports most of the cloud computing system service Models, which are Software as a Service (SaaS), Platform as a Service (PaaS),and Infrastructure as a Service (IaaS). For the proposed cloud computing system, cloud system webpage (cloud interface) can support most of the SaaS services in cloud computing system, Apache web server and MySQL server support the PaaS services, and the Linux OS with the Hadoop package can support the cloud infrastructure and all the services that are needed in IaaS.

66

Chapter Five

Conclusions and Future Work

2. The proposed cloud computing system can support all the cloud computing requirements, like: · User registration and management. · Data management (upload and download files). · Security issues (secure transfer data, and key managements). 3. Proposed cloud computing system can support all security characteristics (Authentication, Integrity, Availability, and Confidentiality). · Authentication: it can support by the Username and Password for the cloud’s user, these username and password have a high degree of confidentiality, because the Linux System encrypted the Username and the Password using Secure Hashing algorithm. · Integrity: Hadoop system support the data Integrity for the Proposed Cloud System by using the number of algorithms that are already exist in the Hadoop Master and Slaves. · Availability: About Availability, the Hadoop system using the number of Slaves (Data Node) which are used to save users’ data, these slaves connected internally for each other, and if any one of them go down the other slaves can take a place for it. · Confidentiality: - Confidentiality can be achieved for the proposed system by using cryptographic techniques (RSA, and AES). 5.2 Future Work Suggestions This work can be extended in several directions, the following are some suggested ideas : A. The proposed cloud computing system can be developed to perform the people software application, people will access and share their software 67

Chapter Five

Conclusions and Future Work

B. applications through online and access information by using the remote server networks instead of depending on primary tools and information hosted in their personal computers because of flexibility in Cloud Computing. C. Also the proposed system can be developed to perform a number of applications that are require more time and deals with huge data. Data Mining and Neural network algorithms that need to deal with the Big data which can be implemented in this system, also the Big data analysis techniques can be implemented too.

68

Appendix A Project Source Code 1- Code list create user accounts Step1 sudo adduser Step2 sudo adduser sudo Step3 hadoop fs -mkdir /user/ Step4 hadoop fs -chown :supergroup /user/ Step5 hadoop fs -ls /user Step6 su Step7 gedit .bashrc

2-Code list Hadoop pre-requisites 1-sudo apt-get update 2- sudo apt-get install default-jdk

3- java –version 4- sudo addgroup hadoop 5- sudo adduser --ingroup hadoop hduser 6- sudo adduser hduser sudo 7-sudo apt-get install openssh-server 8-su hduser : change user to hduser 9-ssh-keygen -t rsa -P "" 10-cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_key

3-Code list Installing Hadoop Master Servers 1)http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz 2) tar xvzf hadoop-2.7.1.tar.gz Extracting hadoop 3) sudo mv hadoop-2.7.1 /usr/local/hadoop move the Hadoop installation to the /usr/local/hadoop directory 4) sudo chown -R hduser:hadoop /usr/local/hadoop 69

4-Code list Installing Hadoop on Slave Servers # su hadoop $ cd /opt/hadoop $ scp -r hadoop hadoop-slave-1:/opt/hadoop $ scp -r hadoop hadoop-slave-2:/opt/hadoop Configuring Hadoop on Master ServerOpen the master server and configure it by following the given commands. # su hadoop $ cd /opt/hadoop/hadoop Configuring Master Node $ vi etc/hadoop/masters hadoop-master Configuring Slave Node $ vi etc/hadoop/slaves hadoop-slave-1 hadoop-slave-2 sudo gedit ~/.bashrc #append the code below to the end of the file and save it export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/ naive export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" source ~/.bashrc

A-Edit core-site.xml edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file 70

fs.default.name hdfs://localhost:9000

B-Edit hdfs-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml dfs.replication 1 dfs.namenode.name.dir file:/usr/local/hadoop_tmp/hdfs/namenode dfs.datanode.data.dir file:/usr/local/hadoop_tmp/hdfs/datanode

C-Edit yarn-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml yarn.nodemadnager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler

71

D-Edit mapred-site.xml cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml sudo gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml mapreduce.framework.name yarn

72

References

[1]

Venkata Sravan Kumar , (2011)“Security Techniques for Protecting Data in Cloud Computing”, MSc, School of Computing Blekinge Institute

of

Technology SE - 371 79 Karlskrona Sweden . [2]

D. Anil Kumar, Y. Manasa,(2015)“A Survey on Security Issues in Cloud Computing”, International Journal for Development of Computer Science & Technology ,vol. 7884 Issue- V-3, I-5, SW-23.

[3] Poonam M. Pardeshi , Deepali R. Borade,(2015)”Improving Data Integrity for Data Storage Security in Cloud Computing”, International Journal of Computer Science and Network Security, vol. 15, no. 6, pp. 75–82.

[4] Mandeep Kaur and Manish Mahajan,(2013)”Using encryption Algorithms to enhance the Data Security in Cloud Computing”, International Journal of Communication and Computer Technologies vol. 01, no. 12, pp. 56–59.

[5]

Mohammed

A.

AlZain,Eric

Pardede

,Ben

Soh

and

James

A.

Thom,(2012)“Cloud Computing Security: From Single to Multi-Clouds”, 45th Hawaii International Conference on System Sciences, Pages: 5490 5499, DOI: 10.1109/HICSS.2012.153 .

[6]

Yashpalsinh

Jadeja

and

Kirit

Modi,(2012)“Cloud

Computing

Concepts, Architecture and Challenges “,International Conference on Computing,

Electronics

and

Electrical

880,DOI:10.1109/ICCEET. 73

Technologies

,Page

:877

[7] Available https://www.tutorialspoint.com,access on May 2016.

[8]

H. Karajeh, M. Maqableh, and T. Masa ,(2014) “Security of Cloud Computing Environment”, International Business Information Management Association Conference ,pp. 2202–2215.

[9] Suresh Kumar R G,(2014)“Security Facet in Cloud Computing “,International Journal of Advanced Research in

Computer Science and Software

Engineering, vol. 4, no. 5, pp. 964–967.

[10]

Sudhansu Ranjan Lenka1, Biswaranjan Nayak2 k,(2014)“Enhancing Data Security in Cloud Computing Using RSA Encryption and MD5 Algorithm”,International Journal of Computer Science Trends

and

Technology ,vol. 2, no. 3, pp. 60–64.

[11]

Nikisha J. Mithaiwala , Nikisha B. Jariwala ,(2016) “A Study on Cloud Computing and its Security”,International Journal of Innovations & Advancement in Computer Science ,ISSN 2347 – 8616 vol. 5, no. 4, pp. 1–6.

[12] Anca Apostu, Florina Puican, Geanina Ularu, George Suciu and GyorgyTodoran,(2012)”Study on advantages and disadvantages of Cloud Computing – the advantages of Telemetry Applications in the Cloud”,Recent Advances in Applied Computer Science and Digital Services, ISBN: 978-161804-179-1 . [13]

Pooja.D.Bardiya1, Miss. Rutuja.A.Gulhane , Dr. Prof. P.P.Karde, (2014) “Data Security using Hadoop on Cloud Computing”, International Journal of Computer Science and Mobile Computing, vol. 3, no. 4, pp. 802–809. 74

[14]

Ya Liu ,(2012) “Data Security in Cloud Computing”, MSc, Eindhoven University of Technology .

[15] Atul Patel and Kalpit Soni ,(2014) “Three Major Security Issues in Single Cloud Environment”, International Journal of Advanced Research in Computer Science and Software Engineering , ISSN: 2277 128X, vol. 4, no. 4, pp. 268–271.

[16]

Krunal Suthar,Parmalik Kumar,Gupta and Hiren Patel,(2012)“Analytical Comparison of Symmetric Encryption and Encoding Techniques for Cloud Environment “,International Journal of Computer Applications (0975 – 8887), vol. 60, no. 19, pp. 16–19 .

[17] Prashant Rewagad and Ms.Yogita Pawar,(2013) “Use of Digital Signature with Diffie Hellman Key Exchange and AES Encryption Algorithm to Enhance Data Security in Cloud Computing”, International Conference on Communication Systems and Network Technologies, DOI 10.1109/CSNT..

[18]

Sudhansu Ranjan Lenka and Biswaranjan Nayak,(2014) “Enhancing Data Security in Cloud Computing Using RSA Encryption and MD5 Algorithm”, international Journal of Computer Science Trends and Technology ,vol. 2, no. 3, pp. 60–64..

[19]

Vishwanath S Mahalle and Aniket K Shahade,(2014) “Enhancing the Data Security in Cloud by Implementing Hybrid (Rsa&Aes) Encryption Algorithm”,IEEE,Pages: 146 149, DOI: 10.1109/INPAC.2014.6981152.

75

[20]

H.Mehak and Gagandeep,(2014)“Improving Data Storage Security in Cloud using Hadoop”, International Journal of Engineering Research and Applications ,vol. 4, no. 9, pp. 133–138.

[21] P. Kulurkar ,Ruchira. H. Titare1,(2015) “Data Security and Privacy in Cloud using RC6 Algorithm for Remote Data Back-up Server”, International Journal of Engineering Science & Advanced Technology,Volume-5, Issue-2, pp,149-153

[22] Nandita Sengupta,(2015)“Designing of Hybrid RSA Encryption Algorithm for Cloud Security”, International Journal of Innovative Research in Computer and Communication Engineering ,pp. 4146–4152.

[23] Santosh Kumar Singh, Dr. P.K. Manjhi and Dr. R.K. Tiwari,(2016) “Data Security using RSA Algorithm in Cloud Computing”, International Journal of Advanced Research in Computer and Communication Engineering ISO 3297:2007 Certified .

[24]

Wentao Liu,( 2012) “Research on Cloud Computing Security Problem and Strategy” , Department of Computer and Information Engineering, Wuhan Polytechnic University, Wuhan Hubei Province 430023, China .

[25]

Varsha Alangar ,( 2013)“ Cloud Computing Security and Encryption” International Journal of Advance Research in Management Studies, vol. 1, no. 5, pp. 58–63 .

76

Computer Science and

[26] S. B. Dash, H.Saini , T.C.Panda , A. Mishra,(2014) “ A Theoretical Aspect of Cloud Computing Service Models and Its Security Issues: A Paradigm” Journal of Engineering Research and Applications, vol. 4, no. 6, pp. 248– 254.

[27]

Gurpreet Kaur and Manish Mahajan, (2013) “Analyzing Data Security for Cloud Computing Using Cryptographic Algorithms” International Journal of Engineering Research and Applications, ISSN : 2248-9622, Vol. 3, Issue 5, Sep-Oct, pp.782-786 .

[28] Mandeep Kaur and Manish Mahajan,(2013)”Using encryption Algorithms to enhance the Data Security in Cloud Computing” International Journal of Communication and Computer Technologies, vol. 01, no. 12, pp. 56–59.

[29] Kangchan Lee,(2012)“Security Threats in Cloud Computing Environments” International Journal of Security and Its Applications, vol. 6, no. 4, pp.2532.

[30]

Shreya Srivastav and Neeraj Verma ,(2015)“Improving Data Security in Cloud Computing Using RSA Algorithm and MD5 Algorithm “,International Journal of Innovative Research in Science,Engineering and Technology pp. 5450–5457.

[31]

Pachipala Yellamma, C h a lla Narasimham and Velagapudi sreenivas ,(2013) “Data Security In Cloud Using Rsa”, International Conference on Computing,

Communications

and

DOI: 10.1109/ICCCNT.2013.6726471 . 77

Networking

Technologies,

[32] Mzhda Hiwa Hama,(2015) “Sentimental Analysis Of Big Data Using Naïve Bayes And Neural Network”, MSc, University of Sulaimani [33]

Hadoop

characteristics,

available

at:https://www

01.ibm.com/software/au/data/infosphere/hadoop/ accessed on 29/09/2016.

[34]

Ketaki Subhash Raste,(2014)"Big Data Analytics -Hadoop Performance Analysis", M.Sc. Thesis, San Deng State University .

[35]

Konstantin Shvachko, Hairong Kuang, Sanjay Radia and Robert Chansler,(2010)” The Hadoop Distributed File System”, IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) 1 10, DOI: 10.1109/MSST.2010.5496972

[36]

Benoy Bhagattjee, ( 2014) “Emergence and Taxonomy of Big Data as a Service”,MSc,Composite Information Systems Laboratory (CISL) Sloan School

of

Management,

Massachusetts

Institute

of

Technology

Cambridge,MSc,Massachusetts Institute of Technology . [37] Jeffrey Dean and Sanjay Ghemawa, (2008) “MapReduce: simplified data processing on large clusters”, Communications of the ACM, vol. 51, no. 1, pp. 107–113. [38] Avalibale https://www.hadoop.apache.org/docs/r2.7.1/ on May 2016

78

           





     

.      .             

 

                          

                            , Data    Name Node    .     .Node  , HDFS . MapReduce cloud computing            cloud computing. , cloud computing. . , , RSA private key,RSA public , ,uploadfile.sh. key AES secret keykey . shell scripting programs .      .          .     .hosts, . (RSA and AES algorithms)             ,   cloud computing  Software as a Service (SaaS),

. Infrastructure as a Service (IaaS)Platform as a Service (PaaS) ,  cloud computing .  ,     , , ,   ,  RSA ..  AES   ,     .

 



  

     

       ..                 

                         

              (Hadoop)    (Apache Server) RSA SaaSAES IaaSPaaS 

     

Design and Implement Secure Cloud ComputingSystem Based on ...

There was a problem previewing this document. Retrying. ... Design and Implement Secure Cloud ComputingSystem Based on Hadoop .pdf. Design and ...

2MB Sizes 2 Downloads 326 Views

Recommend Documents

Design and Implement Online SQL Injection Detection System Based ...
There was a problem previewing this document. ... Design and Implement Online SQL Injection Detection System Based on Query Signature.pdf. Design and ...

Semantics-based design for Secure Web Services
May 30, 2015 - these features in our examples, because their treatment can be directly inherited from λreq . Semantics of ...... 16, center) exposes the kinds of faults REP1,...,REPn the garage. May 30 ..... Alpha Works, 2003. [34] Li Gong.

Study on Cloud Computing Resource Scheduling Strategy Based on ...
proposes a new business calculation mode- cloud computing ... Cloud Computing is hotspot for business ... thought is scattered through the high-speed network.

Secure overlay cloud storage with access control and ...
We design and implement FADE, a secure overlay cloud storage system that achieves ... a more fine-grained approach called policy based file assured deletion, ...

Secure and Scalable Access to Cloud Data in ...
IJRIT International Journal of Research in Information Technology, Volume 3, Issue .... In future, we continue working on storage security and data dynamics with ...

A Compressive Sensing Based Secure Watermark Detection And ...
the cloud will store the data and perform signal processing. or data-mining in an encrypted domain in order to preserve. the data privacy. Meanwhile, due to the ...

A Robust and Secure RFID-Based Pedigree System
by reading in the digital signature and unique id. The pharmacist decrypts the signature with the public key, and compares the value against the hashed result.

A Robust and Secure RFID-Based Pedigree System - CiteSeerX
A Robust and Secure RFID-Based Pedigree. System (Short Paper). Chiu C. Tan and Qun Li. Department of Computer Science. College of William and Mary.

Strongly-Secure Identity-Based Key Agreement and Anonymous ...
can only have a negligible advantage in winning the interactive BDH game. ..... Boyd, C., Park, D.: Public Key Protocols for Wireless Communications (Available.

Secure Hamming Distance Based Computation and Its ...
database contains N entries, at most N/ log N of which have individual values, and the rest .... and samples from a. Bernoulli distribution; in addition it uses symmetric PIR protocols. ...... on distributed heterogeneous data. In: Proc. of the 10th 

using cloud computing to implement a security overlay network pdf ...
using cloud computing to implement a security overlay network pdf. using cloud computing to implement a security overlay network pdf. Open. Extract. Open with.

Cloud-based simulations on Google Exacycle ... - Research at Google
Dec 15, 2013 - timescales are historically only accessible on dedicated supercomputers. ...... SuperLooper – a prediction server for the modeling of loops in ...

Service specification in cloud environments based on ...
manifest and X.509 certificate files to ensure integrity and authenticity. The OVF .... For example, @MAC could be used to refer to the MAC address associated to a given NIC, to be used in the configuration of MAC-based firewall rules in a given serv

Study of Cloud Computing Security Based on Private ...
Abstract—Supporting study of a method to solve cloud computing security issue with private face recognition. The method has three parts: user part provides ...

Read How to Implement Evidence-Based Healthcare - Read Online
Read How to Implement Evidence-Based Healthcare - Read Online. Book detail. Title : Read How to Implement Evidence-Based q. Healthcare - Read Online.

Design of secure and energy-efficient cooperative video distribution ...
In a real-time video broadcast where multiple users are interested in the same content, mobile-to- mobile cooperation can be utilized to improve delivery efficiency and reduce network utilization. Under such cooperation, however, real-time video tran

Design of secure and energy-efficient cooperative video distribution ...
www.ijrit.com. ISSN 2001-5569. Design of secure and energy-efficient cooperative video distribution over wireless networks. 1 Divya J Alapatt, 2 Prof.Gayathri N.

Secure k-NN computation on encrypted cloud data without sharing key ...
May 8, 2013 - Without Sharing Key with Query Users. Youwen ... scheme for k-NN query on encrypted cloud data in which the key of data ... Therefore, a big.

Designing a Secure Cloud Architecture: The SeCA ...
Keywords: cloud computing, cloud security, information security, delphi study, ci3a, information systems, secure cloud architecture .... Accounting. X. X. X. Table 2: The experts (filtered on those that did all three rounds) in the delphi study. A de

Privacy Preserving Public Auditing for Secure Cloud Storage Using TPA
the task of allowing a third party auditor (TPA), on behalf of the cloud client, to verify the integrity of the dynamic data stored in the cloud. To securely introduce an ...

Towards secure the multi –cloud using homomorphic ... - IJRIT
Towards secure the multi –cloud using homomorphic encryption scheme. Rameshbabu .... in the design of current SNA techniques. Multiple key issues can be ...