IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 393-399

International Journal of Research in Information Technology (IJRIT)

www.ijrit.com

ISSN 2001-5569

Effect of Replication Factor on the Cost of Sub Query allocation Palak Sharma, Shelza, Rajinder Singh Virk,Varinder Pannu Department of Computer Science & Engineering, Guru Nanak Dev University, Amritsar. [email protected] , [email protected], [email protected] [email protected] Abstract: The main purpose of this paper is to present effect of different allocation plans with different replication factor on the output of a simulator generating distributed sub query allocation plan. With the advancements in technologies need for distributed databases has also increased. Distributed database design has a major issue which is of data allocation. Data allocation is an np hard problem and is not easy to solve. Therefore the fragments/ tables accessed by queries must be allocated to sites in such a way that communication cost while executing queries gets reduced. In the same way, Replicas of different tables can also be stored so as to reduce communication costs. In this paper, we present a comparative study of effect of replication factor and different allocations plans for replicas of tables on cost of subquery allocation. Keywords: Distributed databases, replication, comparison, sites, and sub query allocation.

I) Introduction Because of the advancement of technologies, networks and databases, Distributed databases [8]

Palak Sharma,IJRIT

393

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 393-399

Figure 1: Steps in query processing and highlighted portion showing stage using data allocation and fragmentation plan

have seen a lot of advancement in the last few years,. A distributed database is such a database system in which all storage devices are not all attached to a CPU (central processing unit) and all of these storage devices are managed by distributed database management system. Such a database can reside in same large room, but all the fragments/replicas (stored at different sites) communicate with each other through network instead of shared memory. The biggest problem that arises while designing[4] each and every distributed database is of fragmenting/ replicating the database and then allocating those fragments/ replicas to different sites. Data allocation problem[1] is one of the severe issues. So, the main issues with which a distributed database management system design deals is: •

How to fragment the tables



How to replicate the tables



How to allocate fragments to different sites.

Palak Sharma,IJRIT

394

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 393-399

o

Distributed query processor[2] : The main task performed by query processor is of transforming a high level query into low level query

plan. But in case of distributed databases query processing becomes much more important. Because in case of distributed systems [6], relations involved in query may be fragmented or replicated, and hence locality of reference needs to be increased, so as to decrease communication cost. So, the function of a distributed query processor can be defined as, “mapping a query on a distributed databases into a sequence of operations on fragments of relations”[8]. The data used by the final low level query plan must be completely localized and optimized, so that operations bear on local fragments and not on global relations. The transformation must be correct and efficient.

II) Problem Definition Data allocation and fragmentation [3] is one of the major problems that arises while designing distributed database. The data allocation and fragmentation problem is to fragment database and allocate those fragments to sites which minimize transfer costs incurred while executing the queries in that environment. How to fragment, how to replicate, how to allocate fragments to the sites is a very challenging task. An inefficient fragmentation and allocation plan can increase query execution cost tremendously. The problem of fragmentation and allocation is NP hard in nature and is not easy to solve. The data allocation design problem is an essential issue, as a proper data allocation plan improves the performance of application processing.

Figure 2: Problem definition

Palak Sharma,IJRIT

395

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 393-399

The fragments should be allocated to the sites, which access that fragment most frequently, so as to reduce communication costs and processing time. Data heuristics are also used to solve this problem. We are working on a stochastic [7] simulator, which: •

Performs the task of allocating sub queries to sites.



Uses a genetic approach[5].



The chromosome basically represents sites on which corresponding sub queries will be executed.



Example: The chromosome in this genetic algorithm[9] is of the form: 9 4 6 9 8 10 8 9 4 6 3 3 10 8 9 7 3 8 6 6 6



At the end, gives an allocation plan with minimum total cost.

One of the problems with the simulator is that it uses a static data fragmentation and allocation plan, which poses problems sometimes and hence, leads to increase in communication costs. A formal description of this problem of data allocation and fragmentation can be given as: A distributed database environment E consists of a number of sites: E= {s1, s2 ,………… sn}, where s1,s2…… etc. are sites, and each site has some finite capacity. A set of fragments F can be defined as: F= {f1,f2, ………….. fm}, where fi represents ith fragment and each fragment is used by at least one site. A requirement matrix which shows which fragment is required by which site can be given as:

Where

represents requirement for

Fragment j by site i, this is basically a weighted matrix. Therefore, the data allocation and fragmentation problem is to generate the set of fragments/replicas F and allocate the fragments to proper site on the basis of requirement matrix.

III)

Methodology

In this section we present very a simple approach to generated dynamic replicated allocation plans for distributed database with the aim of improving the locality of table access and hence reducing the communication cost. This strategy uses replication [10] factor for generating dynamic allocation plan, which is used by a stochastic [7] simulator which generates subquery allocation plan for a distributed database using a genetic algorithm [10]. This Distributed subquery allocation simulator takes a distributed query plan as input and generates a minimum cost allocation plan for its subqueries. But one of the drawbacks of this simulator is that it uses a static data allocation plan.

Palak Sharma,IJRIT

396

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 393-399

Next section represents, the effect on minimum cost generated by this simulator for different allocation plans. Following pseudo code has been used to generate dynamic allocation plans for this simulator. The allocation plans are used to generate subquery allocation plan for same query, to compare minimum cost generated.

Allocation_plan[tables][sites] :- a matrix which represents allocation plan For i=1 to no_of_tables For j=1 to no_of_sites Generate a random no (rand) between 0 and 1 If(rand
This code generates different allocation plans depending on the number of tables, number of sites and replication factor. For a system with 7 tables (rows) and 10 sites (columns), and replication factor to be 0.2, an allocation plan as shown in figure will be generated.

Figure 3: Allocation plan for 7 tables and 10 sites

IV) Comparison of cost for different allocation plans Palak Sharma,IJRIT

397

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 393-399

This section presents comparison of output obtained by using different replication factor in tabular as well as in

graphical form. It shows how replication factor effects the cost of sub query allocation plan generation .

Palak Sharma,IJRIT

398

Figure 4: Cost Comparison in tabular form

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 393-399

Figure 5: Cost comparison in graphical form

V) Conclusion This paper presented comparison of different dynamically generated allocation plan by having varying degree of replication and presented a comparison of effect of replication factor on output of a stochastic sub query allocation simulator. We have presented a pseudo code for generating dynamic allocation on the basis of replication factor, which decides degree of replication in distributed design. Different allocation plans have been generated and provided as input to distributed sub query allocation generator and effect on output has been presented. From the above given comparative study it can be concluded that minimum cost is achieved with 50% replication. Small degree of replication or a large degree of replication results in increase in the cost, because with a small degree of replication communication cost increases and with large degree of replication storage cost increases,

VI) References [1]. Apers, Peter MG. "Data allocation in distributed database systems." ACM Transactions on Database Systems (TODS) 13.3 (1988): 263-304. [2]. Bamnote, G. R., and S. S. Agrawal. "Introduction to Query Processing and Optimization." International Journal 3.7 (2013). [3]. Connolly, Thomas M. Database systems: a practical approach to design, implementation, and management. Pearson Education, 2005. [4]. Ceri, Stefano, Barbara Pernici, and Gio Wiederhold. "Distributed database design methodologies." Proceedings of the IEEE 75.5 (1987): 533-546. [5]. Deb, Kalyanmoy, et al. "A fast and elitist multiobjective genetic algorithm: NSGA-II." Evolutionary Computation, IEEE Transactions on 6.2 (2002): 182-197. [6]. Korth, Henry F., Abraham Silberschatz, and Henry F. Korth. Database system concepts. Vol. 582. New York: McGraw-Hill, 1986. [7]. Manik, R.virk, et al. "Stochastic Analysis of DSS Queries for a Distributed Database Design." International Journal of Computer Applications 83 (2013). [8]. Özsu, M. Tamer, and Patrick Valduriez. Principles of distributed database systems. Springer, 2011. [9]. Whitley, Darrell. "A genetic algorithm tutorial." Statistics and computing 4.2 (1994): 65-85. [10].

Wiesmann, Matthias, et al. "Understanding replication in databases and distributed systems." Distributed

Computing Systems, 2000. Proceedings. 20th International Conference on. IEEE, 2000.

Palak Sharma,IJRIT

399

Effect of Replication Factor on the Cost of Sub Query ...

Keywords: Distributed databases, replication, comparison, sites, and sub query allocation. I) Introduction ... last few years,. A distributed database is such a database system in which all .... Principles of distributed database systems. Springer ...

170KB Sizes 0 Downloads 140 Views

Recommend Documents

Effect of Cost and differentiation strategies on firm performance ...
Effect of Cost and differentiation strategies on firm performance article.pdf. Effect of Cost and differentiation strategies on firm performance article.pdf. Open.

The effect of vascular endothelial growth factor and brain‐derived ...
from the site of injury [6]. ... adhered to charged slides (Superfrost Plus, ... *all comparisons P < 0.05, except 3 vs 4; †number of nerve fibres positive per ...

The Effect of Crossflow on Vortex Rings
The trailing column enhances the entrainment significantly because of the high pressure gradient created by deformation of the column upon interacting with crossflow. It is shown that the crossflow reduces the stroke ratio beyond which the trailing c

The Effect of Crossflow on Vortex Rings
University of Minnesota, Minneapolis, MN, 55414, USA. DNS is performed to study passive scalar mixing in vortex rings in the presence, and ... crossflow x y z wall. Square wave excitation. Figure 1. A Schematic of the problem along with the time hist

The effect of mathematics anxiety on the processing of numerical ...
The effect of mathematics anxiety on the processing of numerical magnitude.pdf. The effect of mathematics anxiety on the processing of numerical magnitude.pdf.

The effect of mathematics anxiety on the processing of numerical ...
The effect of mathematics anxiety on the processing of numerical magnitude.pdf. The effect of mathematics anxiety on the processing of numerical magnitude.pdf.

The effect of ligands on the change of diastereoselectivity ... - Arkivoc
ARKIVOC 2016 (v) 362-375. Page 362. ©ARKAT-USA .... this domain is quite extensive and has vague boundaries, we now focused only on a study of aromatic ...

The Effect of Recombination on the Reconstruction of ...
Jan 25, 2010 - Guan, P., I. A. Doytchinova, C. Zygouri and D. R. Flower,. 2003 MHCPred: a server for quantitative prediction of pep- tide-MHC binding. Nucleic ...

Effect of earthworms on the community structure of ...
Nov 29, 2007 - Murrell et al., 2000). The development and application of suitable molecular tools have expanded our view of bacterial diversity in a wide range ...

The effect of Quinine on Spontan.Rhythmic contrac. of Rabbit Ileal ...
The effect of Quinine on Spontan.Rhythmic contrac. of Rabbit Ileal smoot. musc..pdf. The effect of Quinine on Spontan.Rhythmic contrac. of Rabbit Ileal smoot.

Effect of Torcetrapib on the Progression of Coronary ...
29 Mar 2007 - additional use of these data to understand the mechanisms for adverse cardiovascular outcomes observed in the suspended torcetrapib trial. Methods. Study Design. The Investigation of Lipid Level Management Us- ing Coronary Ultrasound to

On the Effect of Bias Estimation on Coverage Accuracy in ...
Jan 18, 2017 - The pivotal work was done by Hall (1992b), and has been relied upon since. ... error optimal bandwidths and a fully data-driven direct plug-in.

On the Effect of Bias Estimation on Coverage Accuracy in ...
Jan 18, 2017 - degree local polynomial regression, we show that, as with point estimation, coverage error adapts .... collected in a lengthy online supplement.

Effect of Torcetrapib on the Progression of Coronary ...
Mar 29, 2007 - Pinnacle Health at Harrisburg Hospital, ... of Lipid Level Management to Understand Its Im- ...... College of Cardiology Task Force on Clin-.

An examination of the effect of messages on ...
Feb 9, 2013 - regarding promises rather than testing guilt aversion under double-blind procedures or discriminating among various models of internal motivation. (5) In CD, messages were sent before As made their decisions, and Roll choices were made

An examination of the effect of messages on ... - Springer Link
Feb 9, 2013 - procedure to test the alternative explanation that promise keeping is due to external influence and reputational concerns. Employing a 2 × 2 design, we find no evidence that communication increases the overall level of cooperation in o

On the detection and refinement of transcription factor ...
Jan 6, 2010 - 1Center for Statistical Genetics, 2Department of Biostatistics, 3Michigan Center of ..... model is no longer sufficient for analyzing ChIP-Seq data.

On the Factor Content of Trade
Apr 15, 2017 - predicted by conventional models, the data display rather smooth, .... from intra-industry trade in labor- and capital-intensive goods suppress ...

On the Factor Content of Trade
Nov 20, 2017 - ... Heckscher-Ohlin-Vanek, macroeconomic general equilibrium mo- ..... The final list ..... Response to a Consumption Tax Increase,” Board of Governors ..... Developed: Australia (AUS), Austria (AUT), Belgium (BEL), Canada ...

25 Effect of the Brazilian thermal modification process on the ...
25 Effect of the Brazilian thermal modification process ... Part 1: Cell wall polymers and extractives contents.pdf. 25 Effect of the Brazilian thermal modification ...

The Effect of the Internet on Performance, Market ...
May 19, 2017 - are not the most popular ones, without affecting other movies. .... studies the impact of various policy, economic, and social changes, .... net users–where Internet users are people with access to the worldwide network. ..... on the