IJRIT International Journal of Research in Information Technology, Volume 3, Issue 2, February 2015, Pg. 73-76

International Journal of Research in Information Technology (IJRIT)

www.ijrit.com

ISSN 2001-5569

An Efficient Method to Modifying the Queries for easy Information Retrieval M. Deepa1, K. Santhi 2, Nagendra Nath Mishra3 1,2

Assistant Professor(Sr), School of Information Technology, VIT University, Tamilnadu, India. 3 1

School of Information Technology, VIT University, Tamilnadu, India.

[email protected],

2

[email protected], 3 [email protected]

Abstract There are always difficulties with the data fetching from the server to the client due to keyboard mismatch and the incomplete sentence. The most difficult part is the vocabulary problem, which was previously discussed by Furnas et al. [1987], which lead to the wrong data retrievals. These systems which are being used for information retrieval are very limited because of some factors like some very short queries for the fetching the information from the documents which are having complex structure and require often some very simple and enough length query to fetch the information. People use different synonyms to express their queries but they want the same result all the time. Uses of long queries cuts down the problem of word mismatch because there are more chances of occurrence of keywords that are helpful in answer retrieval from the document in the database. The query expansion expands the query using normal synonyms and phrases with exact meaning of previous word which increases the chances of matching the relevant documents. Query expansion, which is quite famous with the name “NLP”, is nothing but either minimizing or expanding the query at the level where it can easily fetch the desired information from the database. Keywords: - Natural language processing, Query minimization, Query expansion.

1. Introduction This paper presents an efficient method which modifies the users’ queries by replacing the words with related keywords. Keywords used in replacement are very common to the documents and are easy to match, by which retrieval of data can be done easily and more accurately.There are several information retrieval technologies available but each of them are having some limitation due to some factors which prevent them from satisfying the requirement of the user, who processes his query in very short form for document identification. Fetching the right information with a shorter query is very difficult. The documents and queries both are in natural language and may have different structure. The difference in the query structure leads to the keyword mismatch and information retrieval becomes more difficult. A query should be simple enough at the level at which user can easily understand the query and the query should contain some keyword that can be helpful for the information retrieval by matching the keywords with the information in the document. A query can be modified up to the requirement either by minimizing the query length or by adding the required keyword. These processes are known as query minimization and query expansion respectively. The method presented in the paper for query minimization removes words from the query which are not necessary and removal of them doesn’t affect the query meaning. The method also replaces some words of the query with keywords if possible. Addition of keywords also doesn’t change the meaning of query. The method presented for query expansion also does the same but it adds some more keywords, may be those keywords are not necessary, for making the query more accurate.

M. Deepa, IJRIT- 73

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 2, February 2015, Pg. 73-76

The remaining content of the paper is arranged as follows. Point 4 describes the implementation done in this paper, followed by the conclusion and future works as 5th point and 6th point. 7th point presents the references used in this paper.

2. Related works Query modification is a technique for summarizing the query in such manner so it becomes more accurate for information fetching. Query modification described in this paper focuses on both the minimization and the expansion of the query. In 2010 Sansonnet, Bourda and Asfari presented a user task and profile for improving user queries. They used data related to user with timeline so they can easily move towards the information the user is interested in. But the problem with this approach was the method was unable to interact with user’s preferences. In 2010, some improvement was done by Koutrika in this paper for query enrichment.

3. Implementation 3.1. Query Minimization A query can be minimized up to some extent without changing its meaning. Queries processed by a user are mostly in natural language so a single query can be written in many ways with same meaning. This structural difference come across ambiguity and mislead. This method uses a word-bag which stores the keywords used often in the documents in the database. Whenever a user enters any query, the query is crosschecked with word-bag. Once the query went through this cross-checking, it is not compulsory that query is modified. A query will only be modified when the modification is required. If word-bag finds the query is already summarized enough for information fetching, it skips the modification process and query remains same as users have processed. While cross-checking, query is broken into predicates and all predicates are again cross-checked to check whether the predicate is required or the query remains same after removal of this predicate. If that predicate is not required the method simply remove the corresponding word and continue with cross-checking but if that predicate is required then word-bag finds an alternative word with the same meaning in the list of keywords. If any other word rather than the original one is found then original word is replaced with the new keyword, if not cross-checking goes on till the last word of the query. Algorithm: 1. Read query in natural language 2. Store in a character array 3. Separate all words and store them in different arrays 4. Remove all the articles 5. Cross-check each array with word-bag 6. If minimization is not required Goto : 9 7. Remove unnecessary words 8. Replace original words with keywords from word-bag and Goto 5 9. Repeat above all step for next query 10. End Example 1: Finding the age of American President Barack Obama… Query 1: - How old is American President Barack Obama? Query 2: - What is the age of American President? Query 3: - How old is American President? Query 4: - Age of Barack Obama ………. ……..... ………. This single query can be written with multiple structures with same meaning. All the above listed queries need a single result. No matter what is the length of query, all listed queries point to single document which holds the detail about the Barack Obama age.

M. Deepa, IJRIT- 74

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 2, February 2015, Pg. 73-76

Query 1 is the longest query in the list which contains two unnecessary words ‘how’ & ‘is’ and while crosschecking rest predicates word ‘old’ can be easily replaced with ‘Age’ which means the same. The word pairs ‘American President’ and “Barack Obama” points to a single person so one of these two can be simply removed and the removal will not affect the query meaning at all. Let remove the Name then the final query will be “Age American President’ which is short enough for fetching the age of Barack Obama, and if we short the query to ‘Age Barack Obama’, it becomes more specific and returns with the age of Barack Obama. This simply means some words can be easily replaced with some fixed word without changing the meaning. Example 2: Query to find the location where TajMahal is situated: Query 1: - Where is TajMahal situated? Query 2: - What is the location of TajMahal? Query 3: - TajMahal location. Query 4: - Location where The TajMahal is situated. ……… ……… ……… All the above queries are trying to find the location of the TajMahal. The longest query which is query 4th can be minimized by removing article it is containing and the words ‘location’ and ‘where’ means same as ‘location’ so word ‘where’ can be removed. Unnecessary words like ‘is’ and ‘situated’ can also be removed with remaining the query meaning same. 3.3. Query Expansion Sometime a user enters a query which is not enough for the data fetching because either it contains very few keywords or not at all. Hence the keywords are the most necessary thing in a query for data fetching. It is not necessary that the query entered by the user is complete enough. Sometime user enters only some words from the query based on his knowledge which are either not enough for data fetching or may lead to wrong document which ends with either wrong data or with no data retrieval. Algorithm: 1.

Read query in natural language

2.

Store in a character array

3.

Separate all words and store them in different arrays

4.

Remove all the articles

5.

Cross-check each array with word-bag

6.

If expansion is not required Goto : 9

7.

Remove unnecessary words

8.

Replace original words with keywords from word-bag

9.

Add extra keyword if required and Goto 5

10. Repeat above all step for next query 11. End

4. Conclusion Paper presented here presents a very simple and effective method which modifies the users’ query according to the requirement. It modifies the query only when modification is required. Once the query is modified at what

M. Deepa, IJRIT- 75

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 2, February 2015, Pg. 73-76

level the modification is required, a new query is constructed with remaining words or modified words if modified. But if the queries is crossed-checked with word-bag and found up to level for data fetching, then the algorithm simply skips the modification and reconstruct a new query as old one.

5. Future Work This method can be modified further using the classified word-bags. All the word-bags will contain only words which are related to the word-bag type. That will help in quick reconstruction of modified query.

6. References [1] M.S. Hideo Joho, concept-based query expansion support tool, Verlag, 2004, page 42-56. [2] Sparck Jones, Automatic keywords classification, London, 1971 [3] Voorhees, Query expansion using lexical-semantic relation, 1994 [4] Carpineto, C., & Romano, G. (2012). Survey on query expansion in information retrieval.ACM 44(1), 1:1– 1:50. [5] Colace, F., De Santo, M., Greco, L., & Napoletano, P. (2013). A query expansion method based on a weighted word pairs approach. In Proceedings of the 3nd Italian Information Retrieval [6] Efthimiadis, E. N. (1996). Query expansion. (pp. 121–187).

M. Deepa, IJRIT- 76

An Efficient Method to Modifying the Queries for easy ...

Query 3: - How old is American President? Query 4: - Age of Barack Obama ………. ……..... ………. This single query can be written with multiple structures with same meaning. All the above listed queries need a single result. No matter what is the length of query, all listed queries point to single document which holds the.

99KB Sizes 1 Downloads 174 Views

Recommend Documents

Particle Swarm Optimization: An Efficient Method for Tracing Periodic ...
[email protected] e [email protected] ..... http://www.adaptiveview.com/articles/ipsop1.html, 2003. [10] J. F. Schutte ... email:[email protected].

Particle Swarm Optimization: An Efficient Method for Tracing Periodic ...
trinsic chaotic phenomena and fractal characters [1, 2, 3]. Most local chaos control ..... http://www.adaptiveview.com/articles/ipsop1.html, 2003. [10] J. F. Schutte ...

DART: An Efficient Method for Direction-aware ... - ISLAB - kaist
DART: An Efficient Method for Direction-aware. Bichromatic Reverse k Nearest Neighbor. Queries. Kyoung-Won Lee1, Dong-Wan Choi2, and Chin-Wan Chung1,2. 1Division of Web Science Technology, Korea Advanced Institute of Science &. Technology, Korea. 2De

DART: An Efficient Method for Direction-aware ... - ISLAB - KAIST
direction with respect to his/her movement or sight, and the direction can be easily obtained by a mobile device with GPS and a compass sensor [18]. However,.

Towards An Efficient Method for Studying Collaborative ...
emergency care clinical settings imposes a number of challenges that are often difficult .... account for these activities, we added “memory recall and information ...

An Efficient MRF Embedded Level Set Method For Image ieee.pdf ...
Whoops! There was a problem loading more pages. An Efficient MRF Embedded Level Set Method For Image ieee.pdf. An Efficient MRF Embedded Level Set ...

An Efficient Method for Channel State Information ...
School of Electrical and Computer Engineering ... Index Terms—degrees of freedom, relay X channel, decode- ... achievable degrees of freedom (DoF) [3], [4].

TECHNICAL NOTES An efficient method for PCR ...
Fax: + 44 1482-465458;. E-mail: ... techniques. The protocol is cheap and efficient, with the ... could be significantly cheaper in a laboratory which is not regularly ...

Differential Evolution: An Efficient Method in ... - Semantic Scholar
[email protected] e e4@163. .... too much control, we add the input value rin(t)'s squire in ..... http://www.engin.umich.edu/group/ctm /PID/PID.html, 2005.

Differential Evolution: An Efficient Method in ... - Semantic Scholar
[email protected] e e4@163. .... too much control, we add the input value rin(t)'s squire in ..... http://www.engin.umich.edu/group/ctm /PID/PID.html, 2005.

Efficient Method for Brain Tumor Segmentation using ...
Apr 13, 2007 - This paper works on the concept of segmentation based on grey levels. It proposes a new entropy method for MRI images. The segmentation is done using ABC algorithm and the method is used to search the value in continuous gray scale int

A Highly Efficient Recombineering-Based Method for ...
Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, Frederick, Maryland ..... earized gap repair plasmid or from uncut DNA (data not ...... Arriola, E.L., Liu, H., Price, H.P., Gesk, S., Steinemann, D., et al.

A Highly Efficient Recombineering-Based Method for ...
We also describe two new Neo selection cassettes that work well in both E. coli and mouse ES cells. ... E-MAIL [email protected]; FAX (301) 846-6666. Article and ...... template plasmid DNA (10 ng in 1 µL of EB) was performed using a ...

Efficient Incremental Plan Recognition method for ...
work with local nursing homes and hospitals in order to deploy assistive solutions able to help people ... Our system aims to cover and detect ... If the activity doesn't exist in the scene graph, an alarm is triggered to indicate an abnormal activit

Simple and efficient method for carbon nanotube ...
Cystic Fibrosis Center, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599. R. Superfine, M. R. ... introduced, tip-side down, onto the submerged platform. The ... the CNT cause the tubes that come into contact with the.

A Simple and Efficient Sampling Method for Estimating ...
Jul 24, 2008 - Storage and Retrieval; H.3.4 Systems and Software: Perfor- mance Evaluation ...... In TREC Video Retrieval Evaluation Online. Proceedings ...

Efficient Minimization Method for a Generalized Total ... - CiteSeerX
Security Administration of the U.S. Department of Energy at Los Alamos Na- ... In this section, we provide a summary of the most important algorithms for ...

An Efficient Synchronization Technique for ...
Weak consistency model. Memory read/write sequential ordering only for synchronization data. All the data can be cached without needing coherence protocol, while synchronization variables are managed by the. SB. Cache invalidation required for shared

Efficient processing of graph similarity queries with edit ...
DISK. LE. CP Disp.:2013/1/28 Pages: 26 Layout: Large. Author Proof. Page 2. uncorrected proof. X. Zhao et al. – Graph similarity search: find data graphs whose edit dis-. 52 .... tance between two graphs is proved to be NP-hard [38]. For. 182.

An Efficient Synchronization Technique for ...
Low-cost and low-power. Programmed with ad ... and low-power optimization of busy-wait synchronization ... Using ad hoc sync. engine is often a must for embedded systems ... Critical operation is the Associative Search (AS) phase. On lock ...