JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 2, ISSUE 1, JULY 2010 16

Hybrid Load Balancing in AutoConfigurableTrusted Clusters Shakti Mishra, D.S. Kushwaha and A.K.Misra Abstract— Clustering enables a group of independent nodes to be managed as a single system for higher availability, easier manageability and greater scalability. Achieving optimal performance in clusters forces the migration of processes to other nodes. The first step towards achieving this is to propose a secure trusted environment in order to ascertain the credibility of the participating nodes working together to achieve the desired goal so that most of the nodes of the cluster do not get overloaded . Further, an optimized scheduling algorithm for migrated task inline with hybrid load balancing algorithm [2] is presented. Finally a B-tree based architectural framework for auto- configuration in clusters is proposed to achieve load balancing including failure recovery and fault tolerance. The experimental results indicate that priority based scheduling with hybrid reliable load balancing has lowered the network traffic by 80-90 percent, increased CPU utilization by 40-50 percent, with 4-9 percent lesser memory and computational requirement. Moreover, based on the execution load of nodes, various participating clusters are able to configure themselves thus providing resiliency against arbitrary node departure or failure. Index Terms—Load balancing, Clusters, Trust, Scheduling, Process priority, Critical process, MOSIX.

——————————  ——————————

1 INTRODUCTION In present computing plethora, the cluster technology seems to gain popularity day by day. The prime reason for this popularity is high availability, increased reliability and high performance computing. A cluster is a collection of interconnected workstations or PCs [1]. Cluster based distributed systems provides a better alternative due to the fact that the cost of highly available machines such as idle workstations and personal computers is significantly low in comparison to traditional supercomputers. One important aspect of using this technology is improving availability of services, sharing computational workload and performing computation intensive application. The computational requirement of various applications can be met using cluster technology in an effective manner. Achieving optimal performance in clusters forces the migration of processes to other nodes. This is where trust comes into play. Establishing trust in a distributed computing environment is one of the most challenging and important aspects of cluster computing. A secure trusted environment is also needed in order to ascertain the credibility of the participating nodes working together to achieve the goal of Computer Supported Cooperative Working (CSCW). Load balancing can be achieved by evenly distributing the load of heavily loaded nodes to lighter one through process migration. There are instances when majority of the nodes of a

cluster may be underloaded and these nodes may accept foreign processes to execute. Along with this, these nodes may itself get new processes to execute also. This scenario may lead majority of these nodes towards being overloaded. Migration of processes and its execution demands an effective and optimized scheduling approach so that neither (foreign nor home) of processes may starve. From the above observations, the major issues necessary for optimal cluster operation are: To establish trust among nodes in cluster. A hybrid load balancing algorithm for high availability among trusted nodes in a cluster. An optimized scheduling approach for migrated tasks to avoid starvation. An auto-configuration mechanism for load balanced clusters to attain high availability, scalability and fault tolerance.

1.1 Trust in Cluster based Distributed Systems Load Balancing is one of the application of CSCW, performed in cluster based distributed system. The role of trust in cluster based distributed systems is crucial, primarily because of the following reasons: (1) Cluster supports CSCW by evenly distributing the load of heavily loaded nodes to lighter one, thus migration of processes running on the heavily loaded system to the underloaded system needs authorization and authentication check, ———————————————— Shakti Mishra, Ph.D. Research Sclolar, Motilal Nehru National Institute (2) Resource Theft, another important aspect, where an of Technology, Allahabad, India. unsafe node or intruder in a cluster tries to steal CPU Dr. D S Kushwaha, Asstt. Professor, Department of Compuetr Science & cycles or memory of nodes in order to execute its own Engg., Motilal Nehru National Institute of Technology, Allahabad, India. processes by killing migrant processes, and Dr. A.K.Misra, Professor, Department of Compuetr Science & Engg., Motilal Nehru National Institute of Technology, Allahabad, India. (3) A perfidious node may alter the cluster statistics by sending incorrect information about it. Similarly, a © 2010 JCSE http://sites.google.com/site/jcseuk/

17

malicious node can steal/ alter the important information from the message meant for migrant. A node can also be designated as trusted or untrusted while certifying migration request in case of load balancing. We have designed a cooperative and dynamic trust management system [23, 26] that establishes the trust in the system at the time when nodes join the cluster and detects malevolent nodes at the run time environment by monitoring node’s activities.

1.2 Hybrid Load Balancing Algorithm for Trusted Clusters Jingle-Mingle model with hybrid load balancing algorithm [2] is presented as a basic model for balancing the load across nodes.

1.3 Scheduling of Migrated Tasks in Hybrid Load Balancing Algorithm In most cluster systems, submitted jobs are initially placed into a queue if there are no available compute nodes. Therefore, there is no guarantee as to when these jobs will be executed. This usage policy may cause a problem for time-critical applications. The fundamental policy of each machine in CSCW is to share only idle CPU cycles and not to use the resources of systems when the machine has its own local processes to execute. We propose an optimized scheduling approach in hybrid and reliable load balancing algorithm for cluster based trusted distributed architecture. The nodes in a cluster could reach a situation where all the underloaded nodes leave the cluster and the rest of the nodes are heavily loaded. This demands universal spread of load across nodes in cluster. Autoconfigurations of clusters proposed nest helps us to achieve this.

1.4 Auto-Configuration of Cluster Once nodes are verified as trusted nodes, then the next step is to design a cluster of trusted nodes. We propose a model for auto configuration and reconfiguration in clusters for load balancing based on CPU load of cluster members so as to increase the attainable performance, scalability, reliability and availability of the system based upon the concept of a B-tree.

2 RELATED WORK Though trust negotiation and establishment is challenging and provides a promising approach for any node joining or leaving a cluster dynamically, very few research proposals exist to date. Trusted Computing Group (TCG) is formed to solve security problems in individual computers. However, only few researches discuss the issue of trusted computing in cluster. Y. Wu et al [3] proposes Trusted Cluster Computing (TCC) to automatically

construct user trustable cluster computing environment. The trusted intermediaries [4] are the systems which authenticate clients and servers such as the Certificate Authorities in public key based systems and KDC in Kerberos. A decentralized approach [5] is used to establish trust in the information provided by specific nodes in a distributed infrastructure based on the reputation of the nodes. An Agent based approach [6] envisages on-line and off-line monitoring in order to analyze users' activity but the authors do not consider the level of trust at the time of joining of the nodes. Since a new node requests some other node in a cluster for networked resources, these resources also need to be made secured and trusted. Literature survey of previous researches reveals that a comprehensive approach for trustworthiness in cluster based distributed systems for load balancing is needed to address the following issues: Single tier identity based authorization Multiple entities participating for a single operation Determination of trust when a node joins a cluster Periodic identification of malicious nodes based on activities performed. With clustering, there is a fairly tight integration among the nodes in the cluster. However, when failure occurs in a cluster, resources are redirected and the workload is redistributed [1, 7] so as to maintain optimal throughput. Therefore, an efficient mechanism for configuration of clusters to attain high availability, scalability and fault tolerance is required. The research in [8] identifies the major forces for designing a scalable infrastructure framework that includes: Any node has a maximum computational ability beyond which it cannot increase its throughput. So, a node running out of its load boundary needs to relocate its load to some idle node. Individual nodes have maximum physical performance limitations, including limitations to the bus speed, the amount of memory, the number of processors, and the number of peripherals that anyone can use. If only one server is responsible for delivering the functionality of a component within an application, its failure results in an application failure. Adding nodes in coherent manner can increase the complexity of managing and monitoring the nodes and its associated peripherals. Many of the cluster configuration proposals are already exist in the market. Sun OS based Beowulf cluster [9] provides a clustering mechanism for varying small number of, relatively inexpensive computers running the Linux operating system. In Beowulf clusters, nodes are dedicated to the cluster. Condor [10,11] utilizes extra and idle cycles on workstation as part of a cluster same as MOSIX [12, 13].

18

Load balancing in MOSIX [15] is done at system level. The MOSIX scheduler automatically selects the under loaded nodes within a cluster and migrates processes to them. Network load balancing [14] is a centralized approach for balancing the load on nodes. Clients are statically distributed among cluster hosts so that each server receives its percentage of incoming request. However, the major limitation of this approach is that the algorithm does not respond to changes in the load on each cluster host. This mapping is only modified when the cluster membership changes. HP service guard [17] provides an infrastructure for the design and implementation of HP clusters that can restore mission critical application services after hardware and software failures. The strategy in this approach is totally relies on hardware redundancy with the assumption that the failure of one component is independent of the failure of other components. Lee et al. [16] propose a large scale monitoring system with the method of automatically building and restoring the same in the case of node failure. Also the study of basic configuration methodology of various clusters HPVM [18], PVM [19, 20], NIMROD [21], MOSIX [15], NOW [23], Beowulf [9], reveals that all of these available clusters are configured manually. The only solution to this problem is to sharing the load with multiple nodes that are configured to share the workload and have idle CPU cycles. All these researches have proposed some mechanisms for load balancing but the scenario where all nodes in a cluster tend to overload may demands user intervention leading to performance degradation. In order to overcome this, we propose a method for auto configuration and reconfiguration of clusters on the basis of CPU load. A dynamic load balancing system based on queue mechanism in PVM [24] offers the integration of dynamic scheduling system in PVM, which handles load balancing using data and process migration. Their mechanism provides dynamic load balancing through process migration [27] or data migration according to the property of application, but authors have not defined the property of application in their proposal. Hectiling System [28] follows a hierarchical architectural model; crash at single point failure may lead to system failure. The proposed approach in [29] is to combine the Remote UNIX (RU) remote execution facility with the UP-Down Algorithm for the fair assignment of remote capacity to a needy system. However, the proposal does not elaborate issue of high priority real time critical process and the case of frequent migration of processes. Zhang and Pande [30] discuss about minimization of migration cost and define a strategy for parts of the program that should migrate. Many of the researchers [31] have tried to resolve issues like longer freeze time that may be due to unavailability of competing resources but their approach resolves prefetching of memory pages for process migration. Some researchers [32] have also proposed virtualization of process

migration. Barak et. Al. [33] proposes a uniform virtual runtime environment on top of different hardware and software platform and this is the point of contention because every remote node has to be fine tuned to the kind of distributed clusters and Grids. The work in [25] addresses balancing the load by splitting processes in to separate jobs and then distribute them to nodes using Mobile Agent (MA). The authors propose a pool of agents to perform this task. This approach has limitation since MA supports the disruptive nature of wireless links and alleviates its associated bandwidth limitation. The work suffers from overhead of too many message exchanges since all nodes exchange their information with each other. All the above work provide opportunity to address the need of resolving the following issues: Frequent migration of processes. Handling Real time critical process. Elimination of requirement of specialized utility running on each node. Single point failure recovery mechanism. Handling homogenous and heterogeneous environment simultaneously. Optimized load balancing strategy with reduced network traffic load. The heterogeneity of machines is major hurdle in accomplishing load balancing. Machines with higher CPU processing capabilities and memory may accommodate many processes, which is not true for machines with lower processing power. So, the appropriate scheduling of tasks among available machines becomes major issue of concern in order to reduce the total waiting time, minimizing the completion time with little migration overhead. The distribution process may become further complicated if real time processes arrives on a remote machine. Thus, scheduling of task in an appropriate manner is mandatory for utilizing full capability of cluster computing nodes with reduce completion time. The basic motive of our proposal is to schedule the jobs on various nodes that lower response and waiting time while also considering the priority and criticality of processes. Although various approaches for scheduling tasks in clusters have been proposed and implemented by previous researchers. We find that each scheduling approach has its own assumption. However, for a load balancing system, an optimized scheduling approach for migrated tasks in trusted clusters is proposed that also considers the priority and criticality of processes.

3 PROPOSAL TO MAJOR ISSUES 3.1 Establishing Trust in Cluster based Distributed Systems The work in [26] elaborates on a Cooperative Trust Management Framework for Load Balancing in Cluster Based Distributed Systems that makes an attempt to design a dynamic trust management system that detects the malevolent nodes at the run time environment.

19

We propose the design of the system such that the Process Migration Server (PMS) is a trustworthy node which authenticates each node by itself. To accomplish this goal, PMS uses three modules, (1) Registration Module, (2) Node Authentication Module, and (3) Migration Module. Server Processes in the PMS itself authenticates each node, migration request and resource request coming from them. The node authenticity in the cluster can be verified at the time of joining of new node. The PMS based authentication mechanism for load balancing is given as follows. 1.

If a new node wants to be a part of load balancing cluster, then it has to register itself with PMS via registration module. 2. Registration Module manages and enforces an agreement for the registration for the node which means that each node has to follow the rules and policies mentioned in the agreement. To accomplish this, registration module demands the resource list of the client. Resource list includes the list of resources that includes computation power, physical memory, I/O devices. 3. Node provides resource list to Registration Module. The resource list contains a list of objects to which the node has right to share. 4. Registration Module stores the reference and details of new node for the use by itself and other nodes of the system administration. Registration Module also provides a Node_id to the newly joined node for its authentication. 5. Registration Module sends a notification to Node Authentication Module by sending the node_id, resource list of newly joined nodes. 6. Once, a node authentication module has received the node_id from registration module, it generates a session_id for restricting the life time of node in the cluster. 7. NAM also maintains a table containing the node_id, resource_id contained by that node and corresponding session_id. 8. Node Authentication Module forwards NAM Table to Migration module. Migration Module keeps this information to verify the identity and life time of requesting node. If a session of node expires, migration module can discard the request. 9. If a node requests PMS for certain resource, the migration modules of PMS extracts the list of nodes which hold the requested resource and sends it back to the requesting client. The PMS based authentication mechanism is particularly useful in maintaining trust between newly joined nodes of cluster. This scheme also provides a better resource management by maintaining tables at different modules. The consistency issue is also resolved as each module forwards its table to the next module. However, the basic operation of PMS doesn’t get

disturbed while this authentication procedure. Once a node has been authenticated, it can send its status information to PMS for further operation. It is necessary to detect malicious activities of the nodes and it is mandatory to immediately abort their execution. This issue can be resolved by using a dynamically reconfigurable trustworthy distributed system approach [22]. A measure of TrustProb establishes the trustworthiness of the approach. The initial value of TrustProb is assumed as 0, but as nodes participates in Computer Supported Cooperative Working (CSCW), their trust values increase up to upper threshold value 1. The higher value of TrustProb indicates most trusted nodes of the cluster and lower value represents less trusted nodes that may perform malicious activities in the cluster. Since, the maximum value of TrustProb is 1, on reaching this value no more, increments in the values are permitted. Nodes achieving TrustProb as 1 shall be declared as confidential nodes and these nodes are the most trustworthy entities of the distributed systems. Similarly, nodes with TrustProb<0 are considered as malicious nodes. As soon as a node is detected malicious by migration module, its session id gets expired and the node is no more allowed to be the part of cluster. The dynamic behavior of cluster leads to “joining and leaving” of nodes. So, we assume that trust building process starts with new nodes. As a new node X joins the cluster and its initial TrustProb is set to 0 by migration module. Now, this node X sends a request for a resource R to PMS via migration module. This module sends back the list of nodes having resource R with their TrustProb to X. X selects a node on the basis of the highest value of TrustProb, say a trusted node N1, from given list and sends its process directly to node N1. Since, a newly joined node can be potentially harmful, Node N1 scans the codes send by X through security utility routine. Now, there are three cases to arise: Case 1 (Fig. 9) a)

N1 finds no error in code, and sends Code Found “No Error” message to PMS with node_id of X, upon receiving which the migration module of PMS increases the value of TrustProb(X) by 0.1. b) N1 executes the code of X and sends back the result to X. c) Now, X scans the result through some security utility routine. If found, correct sends Result Found “No Error” to PMS with node_id of N1. This time the migration module of PMS increases the value of TrustProb of N1 by 0.1. Case 2 (Fig. 10) a) N1 finds error in code, and sends Code Found “Error” message to PMS, upon receiving which PMS decreases the value of TrustProb(X) by 0.1. Since, the initial TrustProb for X is 0, and after decreasing its value by 0.1, it becomes -0.1,

20

which indicates that it is malicious node and its further involvement in cluster should be stopped immediately by expiring its session_id. b) N1 discards the process execution request of X. Case 3 (Fig.11) a) N1 finds no error in code, and sends “Code Found “No Error” ” message to PMS, upon receiving which PMS increases the value of TrustProb(X) by 0.1. b) N1 executes the code of X and sends back the result to X. c) Now, X scans the result through some security utility routine. X finds incorrect or suspicious result and sends Result Found “Error” message to PMS and PMS decreases the value of TrustProb of N1 by 0.1. Hence, a newly joined node can build trust only after five successful execution of remote node request. So, as per our approach, scanning the node is the process of forming trusted nodes. The value of TrustProb depends upon the result of scan process. Nodes with TrustProb=1 are the most confidential nodes of the load balancing cluster and are most demanding candidate of CSCW. Next, trusted nodes (0.5 TrustProb<1) are considered as trusted destination partners. In the absence of both of these, suspicious nodes are given chance to increase their credibility.

destination node along with a intimation message to PMS. This information is used by the PMS to update its entries about the location of each migrated process as well as node_status.

3.2 Hybrid Load Balancing Algorithm for Trusted Clusters

3.3.1 Criticality Based Scheduling:

In the earlier Jingle-Mingle Model [2], we consider a network of three clusters which are connected to each other via intercommunication network & it is a group of trusted nodes. There is at least one common node between two clusters that is designated as Process Migration Server (PMS). The Hybrid load balancing algorithm works as follows: Each Workstation (idle / busy) sends its node status periodically to PMS for maintaining the centralized state table with contents like . PMS in turn updates its table based on this information including the following entries (Node_id, CPU_Status , Resource_Avail, < I/O, Network, Memory >,Closest _Node_Vector). U, A and O represent Underloaded, Average and Overloaded states of the CPU. PMS also maintains a vector of closest nodes for each of the node in the cluster. PMS group multicasts the status of underloaded nodes. Those nodes that are overloaded or are towards being overloaded shall not be able to contribute to any other process. Hence the status of these nodes need not be broadcast. Whenever an overloaded node has to migrate the task / processes to other node, it gathers the node status information from periodic broadcast received from PMS, and sends the process directly to

Destination node executes the remote process using proposed migration strategy and the results are sent back to the source node. Now, the source node sends the Table_Update message message to the PMS.

3.3 Scheduling of Migrated Tasks in Hybrid Load Balancing Algorithm The proposed scheduling approach is based on the priority and criticality of the processes. The term Critical Process refers to a process whose execution should not be delayed or the process which has strict time constraints. We consider that such processes are of the highest priority & if the process is critical, its execution should only be suspended if the node that receives this process is executing its own critical local/remote process. So, all processes can be scheduled on the basis of their criticality or priority. Non- critical processes may further scheduled in their priority order. The overall objective is to prevent any process (local or remote) from starvation and to allocate a fair-share of CPU cycles.

Based on the above discussion, we have four types of processes. The convention for these processes is listed as below: TABLE1 PROCESS NOMENCLATURE S.

Process Type

Convention

1.

Local Process

2

Remote Process

No

3

Local Critical Process

Li Ri C Li

4

Remote Critical Process

CRi

This scheduling of processes is based on the following four cases described below. [Case A] When an idle node receives a migration request of (local or remote) critical process, it immediately starts its execution. [Case B] When a remote process (Ri) gets migrated on idle node, it executes on remote machine until unless a critical local process arrives. The same has been explained in fig 1. If a critical local process arrives on the same machine while execution of a remote process, then it preempts the remote process to waiting queue. However, if in the mean time, local process waits for an I/O resource, the remote process dequeues from waiting queue and get chance to execute. Linger-Longer approach [23]

21

provides this facility for fine-grained idle periods to run foreign jobs with very low priority. [Case C] An underloaded node has critical local process to execute and simultaneously it gets a migration request of critical remote process. Then, it simply rejects the request and sends an intimation message back to PMS and the source node. [Case D] A local process arrives on an underutilized node currently executing a remote process and it is explained in fig.2. do { Round Robin scheduling between local & remote process } while (either local or remote process finishes its execution do { Allocate CPU cycles to remote processs (Ri) from wait queue; } while (C Li finishes with I/O); do { Preempt Ri,from running queue, put it in wait queue; Checkpoint after pre-emption of remote process; } while (C Li finishes execution) Fig. 1. Algorithm for Case [B]

a) Remote Processs: Remote process with CF= True, will be given higher priority than local process. b) Running Process: Running process will be given higher priority than blocked process. Running processes are further divided as high Running, Running and Low Running Processes, which are classified by the priorities decided by local scheduler. There is no use to migrate sleeping, locked, swap out, stopped processes as they do not consume any CPU resources. c) Zombie Process: Zombie process is a process that has completed execution but still has an entry in the process table. The memory and other resources associated with it are preempted so that they can be used by other processes. As these processes do not consume any resource, they also need not be migrated. 3.3.3 Process Transfer Strategy An overloaded node creates a child by forking the original process. It puts original process to sleep and transfers the child process (one that has same arguments and parameters as its parent process excluding pp_id) to an idle node. This time, an intimation message is sent to destination node along with migrated process in an intimation envelope (Table 2). TABLE 2. SIGNATURE OF INTIMATION ENVELOPE

do { Round Robin scheduling between local & remote process } while (either local or remote process finishes its execution) Resume execution of other process Fig. 2. Algorithm for Case [D]

The signature of following fields:

intimation

envelope

includes

Source_id: Source_id refers to the IP/MAC address of source node. Destination_id: Destination_id refers to the IP/MAC address of destination node.

3.3.2 Priority Based Scheduling Approach

Intimation Message: The intimation message contains following parameters:

We have considered different priority assignment in case of CPU Overloaded and memory Overloaded condition. In case none of the process is critical, then migration is based on the type of processes. However, in each case the local scheduler checks the priority of local and remote processes. If the process priority is equal, then a round-robin approach is followed for scheduling else the process with higher priority executes first.

P_id: P_id refers to the process identifier for uniquely identifying each process.

In case none of the process is critical, then process scheduling is based upon the type of processes as discussed under. However, in each case the local scheduler checks the priority of local and remote processes. If, the process priority is equal, then a round-robin approach is followed for scheduling else the process with higher priority executes first.

Critical Factor: Critical Factor refers to Boolean values (CF) associated with each process in order to identify the critical process by a local scheduler. Priority: This refers to the type of priority of process as per Table2. Timestamp: Each intimation message carries a timestamp. The value of timestamp increases per hop. It also carries the timeout value. Other parameters: Some other parameters include state of process at run time which may contain text, stack and data segments, register contexts, information about internal parameters and system queues, heap

22

area, message queues and communication information with other process.

3.4 Auto-Configuration of Cluster We propose a new cluster auto configuration mechanism for high availability and load balancing. It is B-tree based 3-level hierarchical structure of clusters. Btree is a data structure that keeps data values sorted and allows searches, insertion and deletion in logarithmic amortized time. Each leaf in the tree typically maps to exactly one computing node of the cluster. Also each of these computing nodes has an IP address to locate the node and to communicate with it. At the leaf level, nodes are organized in clusters. When a node joins a cluster and if it is the only member of the cluster, then it acts as Cluster Head (CH). As soon as other nodes join the cluster by declaring their statuses, the lowest loaded node among these becomes new CH for that cluster. The number of nodes in a cluster is selected on the basis of the network configuration and bandwidth. The CH with lowest value of load among all other CH is referred as Inter-Cluster Head (ICH). A root node acts as parent of all ICH and is termed as Process Management Server, in line with our previous proposal in [2] as PMS. As nodes start joining the cluster, at some point of time, the node count in a cluster reaches its upper bound and hence the cluster should split. The new cluster is formed in such a way so that both of these clusters contain nearly equal number of under loaded, average loaded and overloaded nodes. The same process is repeated whenever a cluster reaches its maximum limit. Similarly, as number of entries in ICH reaches its maximum limit, ICH node splits. A node can either leave a cluster or be eliminated if a malicious activity is found at run time [22]. When nodes depart, number of nodes in cluster may reach below its minimum threshold and now the cluster merging occurs. As, we have imposed a constraint on the number of overloaded nodes in a cluster, so before merging two clusters, we check for the number of overloaded nodes in these two cluster. If number of overloaded nodes in both of the clusters is lower than the maximum defined number of overloaded nodes in a cluster, then cluster merging occurs. If the summation of total number of nodes in cluster i and cluster i+1 reaches above the maximum threshold of cluster, then again a cluster split occurs by ensuring similar load status in each of the cluster.

4 IMPLEMENTATION AND EXPERIMENTAL STUDY For performance evaluation, we have implemented the architecture with MOSIX version 2.25.1.3 as middleware running on ubuntu 9.04 desktop edition. Each process is associated with certain priority in order to ensure. We have implemented our proposed approach. We implemented our proposed algorithm in

C++ and Shell Scripting. One node is defined as PMS having Server program and client programs in all other nodes. We observe all the nodes at different time with different load conditions and load variation. To set up a single cluster, TCP connection is established between nodes and PMS. PMS Algorithm, and Node Algorithm are devised to implement PMS and nodes in a cluster (Fig. 3 & 4). On each node, two threads are created. With first thread, every node checks its own status and calculates whether it is overloaded, normal or underloaded on the basis of their CPU utilization and sends their status back to the PMS server. On the other hand, server creates separate thread whenever a new node is attached to the server. This new thread continuously listens to the node and updates the node’s status in PMS state table. In our implementation, we considered node to be CPU overloaded if its load factor is above 80% and is below loaded if it’s below 30% and normal loaded between 30-80 percent. For memory overloaded its above 80% and below loaded 30% and normal loaded between 30%-80%. Bind a port to listen new incoming connection While(true) If new node is connected Create new thread to listen this node This thread periodically updates node corresponding entry in the PMS State table Another thread periodically check PMS State table and updates it and group multicasts to all over loaded nodes if there is change in table.

Fig.3 . PMS Algorithm Establish TCP connection with the server Separate thread created to continuously listen to PMS server. In main thread: While(true) Check its own CPU utilization and send it to the PMS Sleep(5) Another thread: While(true) Check its own CPU & Memory status Select a process to migrate on the basis of Priority table. Select a node from received PMS State table If no such node is found select a node from CUNT And migrate selected process to that node Else Do nothing Sleep(5)

Fig. 4. Node Algorithm

Performance Evaluation of Scheduling Approach CPU Utilization: We have first analyzed the CPU utilization of all systems while running 500-700 processes on three systems and 30-50 processes on three other systems in default MOSIX. Next, we observed the CPU utilization

23

in same configuration while implementing the proposed scheduling algorithm. Fig.5. indicates that using our approach, CPU utilization is higher than the default MOSIX behavior by 40%-50% .

Fig. 8. CPU Load Variation

Network Traffic

Fig. 5. CPU Utilization Graph

Memory Usage The memory usage for same configuration is observed for default MOSIX and proposed approach (Fig.6).We find that JMM approach requires 4%-9% lesser memory than MOSIX. Figure 7 indicates the swap history of memory pages. The graph shows that our proposed approach requires less number of memory pages than MOSIX.

We have observed the behavior of different processes on the systems and found the total network traffic. Table 3. shows that network traffic in JMM approach is much less as compared to default MOSIX behavior. The table shows the traffic recorded at different intervals. In MOSIX, node sends their status query to every node; hence a huge traffic is generated. In our proposed approach, only PMS collects node status and group multicasts the node status in entire cluster thus resulting in significant cut in the network traffic. Table3. Network Traffic (In KB/sec) Comparison Chart

7 Fig. 6. Memory Usage Graph

Swap Memory(%)

Memory Swap

25 20 15 10 5 0

Mosix JMM

0

2

4

6

8

Nodes

Fig. 7. Memory Swap History

CPU Load Variation Fig. 8 shows that the proposed approach more evenly distributes CPU load across nodes, while there is high fluctuation in MOSIX behavior. We also observe that the CPU load across all the nodes ranges from 20–40 % in proposed approach and from 20-80 % in case of default MOSIX thus ensuring uniform CPU utilization.

S.No

MOSIX

JMM

1. 2. 3. 4. 5.

160 120 86 100 60

0.5 1 1.8 2.9 2

Default(No Migration) 0.1 2.1 0.2 0.3 1.8

CONCLUSION

The interaction between nodes and resource allocation with trustworthiness dimensions is a central challenge in building a trusted distributed system. We have proposed methodology to dynamically scan the activities of the nodes thereby assigning the trust probability. This in turn provides control to remotely accessible resources in cluster based architecture. Experimental results show that the communication cost of establishing trust among entities in terms of number of message exchange by our approach is reduced by 11– 18%. This work addresses the issues like single tier identity based authorization, multiple entities participating for a single operation, determination of trust when a node joins a cluster and periodic identification of malicious nodes based on activities performed by them. Once establishment of trust is achieved, we have proposed hybrid load balancing approach that is able to resolve many of the issues left by previous researchers as remote process starvation in case of local process arrival, frequent migration of remote processes, selection criteria of idle node, single point failure recovery. The proposed migration strategy ensures that no process starvation take place. The Selection strategy is based on the hop_count parameter so that each node

24

can efficiently migrate the process to one of the closest node without any overhead. The PMS also performs knowledge based resource management of cluster along with failure recovery. Having addressed the issues of process migration, we have proposed process scheduling approach based on the priority and criticality of the process. The proposed scheduling algorithm prevents any local or remote process from starvation and ensures fair-share of CPY cycles. The final objective of this paper is to balance the load across clusters and thus balancing group of clusters as a whole. The work presents a B-tree based 3-level hierarchical tree-like structure for auto configuration and reconfiguration of clusters. The immediate advantage of this proposal lies in the fact that it provides the flexibility to the node leaving the cluster without any prior information. Further, we have proposed mechanisms to achieve load balanced clusters through process migration in order to harness the idle capacities of nodes in various clusters. Experimental results illustrate that the cost of finding an idle or under loaded node does not change much when the system expands. Although the number of nodes in all clusters varies, the cost of load balancing operation is O(1).

REFERENCES [1] [2]

[3]

[4]

[5]

[6] [7] [8] [9]

[10] [11]

[12]

R. Buyya, “High Performance Cluster Computing”, Vol. 1, Pearson Education, 2008. Shakti Mishra, D.S.Kushwaha, A.K.Misra, “Jingle-Mingle: A Hybrid Reliable Load Balancing Algorithm for a trusted Distributed Environment”, IN IEEE proc. of 5th International Conference on INC, IMS and IDC,NCM 2009, Seoul, pp. 117-122, 2009. Y.Wu et al “Automatically Constructing Trusted Cluster Computing Environment”, The Journal of Supercomputing, Springer-Netherlands, July 2009. Ping Liu, Rui Zong and Sizuo Liu, “A new model for Authentication and Authorization across Heterogeneous TrustDomain” in International Conference on Computer and Software Engineering, 2008. Yu, Zhang, “Decentralized Trust Management based on the Reputation of Information Sources”, In the proceedings of 2007 IEEE International Conference on Networking and Control, London, 2007. Serhiy Skakun and Nataliya Kussul, “An Agent Approach for providing Security in Distributed Systems”, TCSET 2006. “Cluster Computing White paper”, (edited by) Mark Baker, Version 2.0, 2000. Load Balanced Cluster, MSDN Library. “Considerations in Specifying Beowulf Clusters”, A White Paper Prepared by CustomSystems High Performance Computing Segment. Compaq Computer Corporation (2000) Wagstrom, “ An Overview of Condor”, Condor Project, Wisconsin University. Thain, D., Taum, T., Livny, M., “Condor and the Grid”, Grid Computing- Making the Global Infrastructure a Reality, Edited by Bermen, Hey , Fox. John Wiley & Sons, Ltd., pp. 299-336 (2003) Barak & La’adna ,“ The MOSIX Multicomputer Operating System

[13] [14] [15] [16] [17] [18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

for High Performance Cluster Computing”, Journal of Future Generation Computing System (13) 4-5, 361-372 (1998) Barak, La’adan, Shiloh, “Scalable Cluster Computing with MOSIX for LINUX”, Proc. Linux Expo ’99, pp 95-100, 1999. Network load Balancing Technical Overview. Microsoft TechNet (2010) Addad, F.I., Paquin, E., “ MOSIX: A Cluster a Load Balancing Solution for Linux”, Linux Journal, (2001) Lee et. Al, “Large scale cluster monitoring system, and method of automatically building/restoring the same”. “HP Serviceguard Cluster Configuration for HP-UX 11i or Linux Partitioned Systems”, 2009. G. Sampemane, S. Pakin, Chien, "Performance Monitoring on an HPVM Cluster", Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, June 2000 A. Geist, A. Beguelin, J. Dongarra, R. Manchek, W. Jiang, and V. Sunderam, “PVM: A Users' Guide and Tutorial for Networked Parallel Computing”, MIT Press, 1994. V. S. Sunderam, “PVM: A Framework for Parallel Distributed Computing”, Concurrency: Practice and Experience, 2, 4, pp 315-339, December, 1990 R. Buyya, “Nimrod/G: An architecture for the resource managementand scheduling system in a Global computational grid”, HPC Asia 2000. Shakti Mishra, D.S.Kushwaha, A.K.Misra, “A Novel Approach for Building a Dynamically Reconfigurable Trustworthy System”, LNCS, Springer, BAIP 2010, pp. 258-262, 2010. K. D. Ryu, J. K. Hollingsworth, “Exploiting Fine-Grained Idle Periods in Network of Workstations”, IEEE Transactions on Parallel and Distributed Systems, Vol.11. No.7. (July 2000) H.Liang, M.Faner, H.Ming, “A Dynamic Load Balancing System Based on Data Migration”, 8th IEEE International Conference on CSCW in Design Proceedings, 2003. Nehra N., Patel R.B., Bhatt V.K., “A Framework for Distributed Dynamic Load Balancing in Heterogeneous Cluster”, Journal of Computer Science, Science Publication, 2007. Shakti Mishra, D.S.Kushwaha, A.K.Misra,“ A Cooperative Trust Management Framework for Load Balancing in Cluster Based Distributed Systems”, In IEEE proceedings of International Conference on Recent Trends in Information, Telecommunication and Computing, ITC 2010, pp. 121-125. C.Hagen, G. Alonso, “Backup and Process Migration Mechanism in Process Support Systems” Technical Report No. 304, ETH Zurich, Institute of Information Systems, August 1998. T. Akgun, “BAG Distributed Real-Time Operating System and Task Migration”Turkish Journal Electrical Engineering, Vol. 9, NO. 2, 2001. M. J. Litzkow, M.Livny, M. W. Mutka, “Condor-A Hunter of Idle Workstations”, 8th IEEE International Conference of Distributed Computing and Systems, 1988. K.Zhang and S.Pande, “Efficient Application Migration under Compiler Guidance”, Proceedings of the 2005 ACM SIGPLAN conferences on languages, compilers, and tools for embedded systems, 2005 Ho, R.S.C., Cho-Li Wang, Lau, F.C. “Lightweight Process Migration and Memory Prefetching in Open MOSIX”, IEEE International Symposium on Parallel and Distributed Processing IPDPS, 2008.

25

[32] K. Noguchi, M.Dillencourt, L.Bic, “Efficient Global Pointers with Spontaneous Process Migration”, IEEE Proceedings of 16th Euromicro Conference on Parallel, Distributed and Network Based Processing (PDP 2008)-Volume 00, 2008. [33] Maoz T. ,Barak A., Amar L., “Combining Virtual Machine Migration with Process Migration for HPC and multi-clusters and Grid”, IEEE International Conference on Cluster Computing, 2008.

Shakti Mishra received her B.Tech degree from U.P. Technical University, Lucknow, India in the year 2005. After 03 years of teaching, she is presently pursuing Ph.D. from Motilal Nehru National Institute of Technology, Allahabad, India . under the supervision of Dr. D.S. Kushwaha and Dr. A.K. Misra. She is presently working on load balancing in distributed systems. Her interest area includes High Performance Cluster Computing, Automata theory and Service Oriented Paradigm.

Fig.9. Measurement of TrustProb for nodes by PMS (Case I)

Dr. D.S. Kushwaha received his Doctorate Degree in Computer Science & Engineering from Motilal Nehru National Institute of Technology, Allahabad, India in the year 2007. He is presently working with the same Institute as Assistant Professor in the Department of Computer Science & Engineering and carries 16 years of teaching experience. His research interests include areas in Software Engineering, High Performance Computing, Security issues in Distributed Systems, Web Services and Service Oriented Architectures. He has over 40 publications in various International conferences & journals. Fig.10. Measurement of TrustProb for nodes by PMS (Case II) Dr. A.K. Misra received his Doctorate Degree in Computer Science & Engineering from Motilal Nehru National Institute of Technology, Allahabad, India in the year 1990. He is presently working with the same Institute as Professor in the Department of Computer Science & Engineering and carries over 39 years of teaching experience. His research interests include areas in Software Engineering, Programming Methodology, Artificial Intelligence, Data Mining, Cognitive Sciences, Object Oriented Technologies and Data Structures. He has over 65 International publications in various conferences & journals.

Fig. 11. Measurement of TrustProb for nodes by PMS (Case-III)

Hybrid Load Balancing in Auto-ConfigurableTrusted ...

critical application services after hardware and software failures. ... process migration [27] or data migration according to the property of ... program that should migrate. Many of the ...... Data Mining, Cognitive Sciences, Object Oriented.

669KB Sizes 1 Downloads 424 Views

Recommend Documents

Hybrid Load Balancing in Auto-ConfigurableTrusted ...
together to achieve the goal of Computer Supported. Cooperative Working ... loaded nodes to lighter one through process migration. There are instances when ...

load balancing
Non-repudiation means that messages can be traced back to their senders, ..... Traffic sources: web, ftp, telnet, cbr (constant bit rate), stochastic, real audio. ... updates, maintaining topology boundary etc are implemented in C++ while plumbing ..

An Algorithm for Load Balancing in Network Management ...
tructures often have support for seamlessly adding and remov- ing computing resources, whether by changing the physical or virtual machines, or by adding/removing machines on the fly[3]. The advent of this heterogeneity, the increase in scale in mana

Multilevel Load Balancing in NUMA Computers
Avail- able at URL http://h21007.www2.hp.com/dspp/files/unprotected/super- domejan05.pdf, 2005. [6] Hewlett-Packard, Intel, Microsoft, Phoenix and Toshiba.

Utility-Aware Deferred Load Balancing in the Cloud ...
Abstract—Distributed computing resources in a cloud comput- ing environment ... in various ways including the use of renewable energy [2],. [3] and improved ...

Distributed Load-Balancing in a Multi-Carrier System
of SLIM to cases where Quality-of-Service (QoS) parameters are specified and ... For example, a cellular network based on EVDO-RevB [1] is one such system.

Efficient Load Balancing for Bursty Demand in Web ...
server IP address, the subsequent requests from the same client will be sent to the same .... to a dedicated server load balancer. It also being able to handle.

Load Balancing for Distributed File Systems in Cloud
for the public cloud based on the cloud making into parts idea of a quality common ... balancing secret design to get better the doing work well in the public cloud.

practical load balancing pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. practical load balancing pdf. practical load balancing pdf. Open.

vdm20-load-balancing-guide.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. vdm20-load-balancing-guide.pdf. vdm20-load-balancing-guide.pdf. Open.

Load Balancing in Cloud Computing: A Survey - IJRIT
Cloud computing is a term, which involves virtualization, distributed computing, ... attractive, however, can also be at odds with traditional security models and controls. ... Virtualization means “something which isn't real”, but gives all the

Load Balancing in Cloud Computing: A Survey - IJRIT
Keywords: Cloud computing, load balancing, datacenters, clients, distributed servers. 1. ... Hybrid Cloud (Combination of Public & Private Cloud). Fig. 2: Three ...

Multilevel Load Balancing in NUMA Computers
we present our test results, using benchmarks and analytical models. Finally, .... The Linux load balancing algorithm uses a data structure, called sched domain ...

Configuring Internal Load Balancing (console) Cloud Platform
... “create” and your ILB is ready to distribute traffic! Click Create. Done! Page 9. ‹#› https://cloud.google.com/compute/docs/load-balancing/internal/. Learn more.

Configuring Search Appliances for Load Balancing or Failover
Considerations. 9. Monitoring the Status of the Configuration ... Failover configurations typically involve two instances of an application or a particular type of hardware. The first .... This configuration requires a DNS server and switch. Benefits

Tutorial Load Balancing With Fail Over menggunakan Mikrotik 2.9.6 ...
Tutorial Load Balancing With Fail Over menggunakan Mikrotik 2.9.6.pdf. Tutorial Load Balancing With Fail Over menggunakan Mikrotik 2.9.6.pdf. Open. Extract.

The Power of Both Choices: Practical Load Balancing ...
stateful applications in DSPEs when the input stream follows a skewed key distribution. ... track which of the two possible choices has been made for each key. This requirement imposes ...... 10http://nlp.stanford.edu/software/parser-faq.shtml#n ...

An efficient load balancing strategy for grid-based ...
Feb 20, 2007 - 4 – the importance of the cost of the communications make load balancing more difficult. Thus, the load bal- ..... This guarantees the low cost of the unfold operator. As (13) .... RENATER4 national network, by a 2.5 Gigabit connecti

Simple Efficient Load Balancing Algorithms for Peer-to-Peer Systems
A core problem in peer to peer systems is the distribu- tion of items to be stored or computations to be car- ried out to the nodes that make up the system. A par-.