IJRIT International Journal of Research in Information Technology, Volume 1, Issue 11, November, 2013, Pg. 329-332

International Journal of Research in Information Technology (IJRIT)

www.ijrit.com

ISSN 2001-5569

Challenges Faced While Dealing With Data Warehouses and Various Testing Approaches Rishabh Kumar Taneja1, Sahil Sarpal2, Pulkit Arora3 1

Student, Information Technology, Maharishi Dayanand University New Delhi, Delhi, India 2 Student, Information Technology, Maharishi Dayanand University New Delhi, Delhi, India 3 Student, Information Technology, Maharishi Dayanand University New Delhi, Delhi, India 1

[email protected], [email protected], [email protected]

Abstract A data warehouse is an integrated, non – volatile, time variant and subject oriented collection of data that is used by the management for making strategic decisions. As many important decisions are based on the results obtained from a data warehouse, it is of utmost importance that the data warehouse works properly. As there is a lot of data compressed in a single place, hence there is a requirement of making sure that the system works efficiently as faults may result in data loss. For this, extensive testing is required so that none of the faults slip by. This paper discusses the challenges in the way of testing data warehouses and also the various techniques performed to test them. uu

Keywords: Data Warehouse, Data Warehouse Testing, Data Warehouse Testing Approaches, Data Warehouse Testing Methods.

1. Introduction A data warehouse is a collaborated data model that captures an entire data of an organization. It brings together from heterogeneous sources into one single destination. The data is not simply brought but together just like that; it goes through the ETL process. The ETL process stands for extracted, transformed and loaded process. There is no need to normalize the data as it required only for read only activities. The data passes through several stages each of which causes a different kind of changes to the data and in the end the data finally reaches the user in a form of a chart or a report. There should be a way of guaranteeing that the data in the sources is the same data that reaches

Rishabh Kumar Taneja, IJRIT

329

the user, and the data quality is improved; not lost. For this, there is a requirement that the functioning of a data warehouse is properly tested before its implementation. The testing of a data warehouse is very much different to that of software. Software testing is predominantly focused on program code, while data warehouse testing is directed at data and information. Data warehouse testing has a broader scope than software testing because it focuses on the correctness and usefulness of the information delivered to users. Also, data warehouse testing involves a huge data volume, which significantly impacts performance and productivity. In short, the data warehouse testing is carried out to remove the inconsistencies that occur due to data being collected in different formats from different sources. Hence, there is the requirement to test the data warehouse properly so that problems don’t occur at a later stage. This paper precisely deals with the things to be sorted out before testing a data warehouse and how to test warehouses when they have been built.

2. Challenges Faced During Data Warehouse Testing Data warehousing is totally different from other systems, such as Computer Applications or even Transactional Database Systems. Consequently, the testing techniques used for these other systems are inadequate to be used for data warehouse testing. Enlisted are some different aspects that need to be taken care of when it comes to data warehouse checking:• • • • • • • •

One of the core challenges of testing data warehouses is its flexible architecture. Data warehouse systems could have different architectures according to business requirements, data warehouse functionalities, and/or budget/time constraints. Data warehouse always deals with huge data volumes. Data warehouse testing revolves around data, that is, it is data centric. Data warehouse always answers Ad-hoc queries because of which it is impossible to test it prior to system delivery. The testing process in other systems ends with the development life-cycle while in data warehouses it continues even after the system is delivered. The total number of test cases is unlimited due to the main objective of the data warehouse that allows all possible views of data. Most of the available testing scenarios are usually driven by some user inputs while in data warehouse most of the tests are system-triggered scenarios. A data warehouse project is continues due to the decision-making process requirement for continuing constant changes.

These are some of the challenges faced while testing data warehouses. Apart from these, it is necessary that the warehouse should be reliable, robust, perform well under stressful conditions and should able to recover easily in case new problems arise.

3. Testing the Data Warehouse To understand the different testing activities, we need to subdivide testing into two distinct classifications: what is tested and how it is tested. Firstly, we will consider “what” part of data warehouse testing. Testing data quality mainly involves an accurate check on the correctness of the data loaded by the ETL (extracted, transformed and loaded) procedures and accessed by front-end tools. However, in the light of the complexity of data warehouse projects and of the close relationship between good design and good performance, we suggest that testing the design quality is almost equally important. Testing design quality mainly implies verifying that user requirements are well expressed by the conceptual schema and that the conceptual and logical schemas are also well-built. Overall, the items that are to be tested can then be summarized as follows:

Rishabh Kumar Taneja, IJRIT

330

1.

Logical Schema: A logical design is conceptual and abstract. The process of logical design involves arranging data into a series of logical relationships. It deals with defining the types of information that one may need.

2.

Conceptual Schema: A conceptual schema is an abstract definition of the whole project. It represents the series of assertions and rules pertaining to the nature of processes, entities of events.

3.

ETL Procedures: The complex procedures that are in charge of feeding the data repository starting from data sources.

4.

Database: The repository storing data.

5.

Front-end: The applications accessed by end-users to analyze data are either static reporting tools or more flexible OLAP tools.

It has been evaluated that the maximum testing time is consumed by the ETL activities and procedures as compared to other development phases. Now discussing the second coordinate, “how” part of data warehouse testing. These tests are listed below:-

1.

Functional Test: It verifies that the item is compliant with its specified business requirements.

2.

Performance Test: It checks that the item performance is satisfactory under typical workload conditions.

3.

Usability Test: It evaluates the item by letting users interact with it, in order to verify that the item is easy to use and comprehensible.

4.

Recovery Test: It checks how well an item is able to recover from crashes, hardware failures and other similar problems.

5.

Stress Test: It shows how well the item performs with peak loads of data and very heavy workloads.

6.

Regression Test: It checks that the item still functions correctly after a change has occurred.

7.

Security Test: It checks that the item protects data and maintains functionality as intended.

Astonishingly, these types of test are tightly related to six of the software quality factors, that is, correctness, usability, efficiency, reliability, integrity and flexibility. But still data warehouse testing is very much different to that of software testing. These techniques are required to be performed after the completion and deployment of the data warehouse. Only after thorough testing using the above techniques one can be sure that the data warehouse will not be inconsistent.

4. Conclusions Enough research has already been carried out to study the different testing approaches that can be carried out in data warehouse testing. The possibility of performing an effective test depends on the documentation completeness and accuracy in terms of the collected requirements and project description. In simpler words, if at the very beginning one does not specify properly the desired outcome from the system, then one cannot expect to get it right later. Another thing to keep in mind is that testing is not a one-man activity. Testing is a job required to be done by a team

Rishabh Kumar Taneja, IJRIT

331

of experts. The testing team should include testers, developers, designers, database administrators, and end-users, and it should be set up during the project planning phase. Testing of data warehouse systems is largely based on data. A successful test must rely on real data, but it also must include mock data to reproduce the most common error situations that can be encountered in ETL. At last, it is very important to say that no matter how much testing is done, sooner or later, an unexpected data fault is bound to occur. For this, it is necessary to keep in mind that testing may end someday but quality certification is an everlasting process.

5. Acknowledgments We would like to thank the entire faculty of IT Department, Dronacharya College of Engineering, Gurgaon, Haryana for their support, encouragement and patience. Special thanks to Dr. Jitender Kumar for encouraging us to write this paper.

6. References [1] Manoj Philip Mathen, “Data Warehouse Testing”, DeveloperIQ Magazine, 10 Mar 2010. [2] Naveen ElGamal, “Data Warehouse Testing”, Ph.D., Thesis State : Middle, Faculty of Computers and Information, Cairo University, Giza, Egypt, 2013. [3] Matteo Golfarelli and Stefanno Rizzi, “A Comprehensive Approach to Data Warehouse Testing”, DOLAP’09, 2009, pp. 17-24. [4] S L Gupta, Payal Pahwa and Sonali Mathur, “Classification of Data Warehouse Testing Approaches”, International Journal of Computers & Technology, Vol. 3, No. 3, 2012, pp. 381-386. [5] http://docs.oracle.com [6] http://www.learn.geekinterview.com

Rishabh Kumar Taneja, IJRIT

332

Challenges Faced While Dealing With Data Warehouses and ... - IJRIT

IJRIT International Journal of Research in Information Technology, Volume 1, ... Data warehousing is totally different from other systems, such as Computer Applications or even Transactional .... Testing is a job required to be done by a team ...

83KB Sizes 5 Downloads 333 Views

Recommend Documents

Challenges Faced While Dealing With Data Warehouses and ... - IJRIT
Keywords: Data Warehouse, Data Warehouse Testing, Data Warehouse .... Front-end: The applications accessed by end-users to analyze data are either static ...

Examining the Unique Challenges Faced by Students with ...
18 20 U.S.C. §§ 1400 et seq.; 34 CFR Parts 300 and 301; Cal. .... 36 42 U.S.C. § 12182(b(2)(A)(ii); 34 C.F.R. § 104.44(a); Southeastern Community College v.

Designing data warehouses
evaluated locally, at each DW, without accessing the (remote) data of the ... over the source relations e,f , g,h , i,q , are stored materialized at the DW, and that a.

Introduction to Databases and Data Warehouses
READ ONLINE Database Systems: Introduction to Databases and Data .... Relational database management system (RDBMS) is a type of database ... isolation levels, and the standards-based APIs provide ways for you to set isolation levels.

Living with Big Data: Challenges and ... - Research at Google
Sep 14, 2012 - ~1 network rewiring (rolling ~5% of machines down over 2-day span). ~20 rack failures ... improved worldwide placement of data and services.

Opportunities and Challenges for 4G Wireless Network - IJRIT
Major wireless service providers planning to start deployment of 4G wireless ... for the first time such as accessing the Internet anytime from anywhere, global ... Within the cable television industry, the expansion to 4G Networks is a very real pos

Opportunities and Challenges for 4G Wireless Network - IJRIT
Major wireless service providers planning to start deployment of 4G wireless networks ... broadband technology is an opportunity for the corporation to expand its horizons ... identify areas where technological improvements are required. .... service

Advances and Challenges with Data Broadcasting in ...
Wireless mesh networks (WMNs) offer a promis- ing low-cost broadband access alternative in many cities (e.g. ..... concept of a connected dominating set (CDS). For a graph G = (V, E) (where V denotes the nodes ..... extensively in computer networking

Courts While Dealing With Their Employees Are “State” Within Art 12 ...
... for respondent- High Court in both the matters. Mr. A. A. Kumbhakoni, Advocate General a/w Mr. A. B. Vagyani,. Government Pleader a/w Mr. A. P. Vanarase, AGP a/w Mr. P. P. More,. AAGP for respondent no.1- State in PIL No.72/2018. ::: Uploaded on

Presentations- Dealing with Questions and ... - UsingEnglish.com
Part One: Predicting and answering questions. Listen to your partners' presentations in small groups, asking at least two or three questions afterwards and then ...

Data Mining: Current and Future Applications - IJRIT
Language. (SQL). Oracle, Sybase,. Informix, IBM,. Microsoft. Retrospective, dynamic data delivery at record level. Data Warehousing. &. Decision Support. (1990s). "What were unit sales in. New England last. March? Drill down to. Boston. On-line analy

Data Mining: Current and Future Applications - IJRIT
(KDD), often called data mining, aims at the discovery of useful information from ..... Advanced analysis of data for extracting useful knowledge is the next natural ...

Data Mining: Current and Future Applications - IJRIT
Artificial neural networks: Non-linear predictive models that learn through training ..... Semi-supervised learning and social network analysis are other methods ...

Dealing with precise and imprecise decisions with a ...
As it is difficult to compare the re- sults of such an algorithm to classical accuracy rates, .... teriori strategy on a classical probability distribution is most of the time ...

Review on Data Warehouse, Data Mining and OLAP Technology - IJRIT
An OLAP is market-oriented which is used for data analysis by knowledge ..... The data warehouse environment supports the entire decision. Database. Source.

Review on Data Warehouse, Data Mining and OLAP Technology - IJRIT
used for transactions and query processing by clerks, clients. An OLAP is market-oriented which is used for data analysis by knowledge employees, including ...

Survey on Data Clustering - IJRIT
common technique for statistical data analysis used in many fields, including machine ... The clustering process may result in different partitioning of a data set, ...

Inviting and dealing with invitations phrases and ... - UsingEnglish.com
Suggested answers. Phrases for inviting people. (Do you have) any plans for…? (Do you) fancy/ want to…? Are you free this…/ on…?/ If you're free…,…

Survey on Data Clustering - IJRIT
Data clustering aims to organize a collection of data items into clusters, such that ... common technique for statistical data analysis used in many fields, including ...