OCR in 3D Orientation Using a Web Cam Piyush Goel Bsc (Hons) Artificial Intelligence with Mathematics (Industry) 2003/2004

The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of student)____________________________

Summary

The aim of the project was to develop a software program to rectify perspective distortion in document images so as to make them readable by an OCR package. The objectives of the project included understanding the limitations of OCR packages and complexities of the using web cam to capture document images. Furthermore, it required investigating and understanding the research done in the field so far and to ultimately devise an application that implemented existing or new algorithm(s) to produce a solution for the concerned problem. This report details, in a schematic and systematic manner, various stages of the project that were carried out in order to fulfil the low-level objectives to achieve the overall goal. In the end the report evaluates the project and the developed system and makes suggestion to take this work further.

i

Acknowledgements First of all I would like to express my utmost gratitude to my parents for all their love and support which made it possible for me to pursue my studies at Leeds. Then I’d like thank Dr. Andy Bulpitt, my project supervisor, for his invaluable time to provide me with various ideas throughout the project and to help me develop the system. I would also like to thank him for providing the necessary equipment and software to carry out the project.

Finally, I would like to thank all my friends for providing me with all their support and help at all times for sharing wonderful moments with me.

ii

Table of Contents Summary .....................................................................................................................................i Acknowledgements .....................................................................................................................ii Table of Contents.......................................................................................................................iii List of Figures.............................................................................................................................ii Chapter 1 : Introduction.............................................................................................................1 1.1 P ROJECT AIM ......................................................................................................................1 1.2 MOTIVATION .......................................................................................................................1 1.3 OBJECTIVES ........................................................................................................................1 1.4 MINIMUM REQUIREMENTS ...................................................................................................1 1.5 P ROJECT METHODOLOGY .....................................................................................................2 1.5.1

The Waterfall model.................................................................................................2

1.5.2

The Prototype Model................................................................................................3

1.5.3

The Incremental Model.............................................................................................3

1.5.4

The Spiral Model......................................................................................................3

1.5.5

Chosen methodology with justification .......................................................................4

1.6 P ROJECT SCHEDULE.............................................................................................................4 1.7 REPORT OUTLINE ................................................................................................................5 Chapter 2 : Background Research..............................................................................................6 2.1 INTRODUCTION....................................................................................................................6 2.2 OCR P RECINCTS..................................................................................................................6 2.3 INTRICACIES OF A WEB CAM ...............................................................................................7 2.4 SYSTEM CONSTRAINTS?.......................................................................................................8 2.5 P REVIOUS WORK .................................................................................................................8 2.5.1

Introduction .............................................................................................................8

2.5.2

Extraction of illusory clues........................................................................................9

2.5.3

Using Projection Profiles........................................................................................13

2.5.4

Conclusions on previous work .................................................................................17

2.6 BASIC TECHNIQUES INVOLVED ........................................................................................... 17 2.6.1

Image warping .......................................................................................................18

2.6.2

Quadrilateral-to-Rectangle Mapping .......................................................................19

2.6.3

Interpolation ..........................................................................................................21

2.7 CONCLUSIONS ................................................................................................................... 23 2.8 REQUIREMENTS SPECIFICATION.......................................................................................... 24 iii

TABLE OF CONTENTS

2.8.1

Functional System Requirements .............................................................................24

2.8.2

Non-Functional System Requirements......................................................................24

2.9 SUMMARY OF BACKGROUND RESEARCH ............................................................................. 25 Chapter 3 : The Design.............................................................................................................26 3.1 INTRODUCTION.................................................................................................................. 26 3.2 SEGEMENTATION ............................................................ ERROR! B OOKMARK NOT DEFINED . 3.2.1

Thresholding ..........................................................................................................27

3.2.2

Edge Detection.......................................................................................................28

3.2.3

Determining the page corners..................................................................................29

3.2.4

Linear Regression...................................................................................................29

3.2.5

Mathematics Involved .............................................................................................30

3.3 QUADRILATERAL-TO-RECTANGLE MAPPING........................................................................ 31 3.4 IMPLEMENTATION TECHNOLOGY........................................................................................ 31 3.4.1

Java.......................................................................................................................31

3.4.2

MATLAB................................................................................................................32

3.4.3

Chosen Technology.................................................................................................32

3.5 DESIGN SUMMARY ............................................................................................................ 33 Chapter 4 : Implementation and Testing ..................................................................................34 4.1 INTRODUCTION.................................................................................................................. 34 4.2 COMPONENTS IMPLEMENTATION AND TESTING ................................................................... 35 4.2.1

Thresholding ..........................................................................................................35

4.2.2

Edge Detection.......................................................................................................37

4.2.3

Regression Fitting ..................................................................................................37

4.2.4

Corner Detection....................................................................................................38

4.2.5

Quadrilateral-to-Rectangle Mapping .......................................................................38

4.3 SYSTEM TESTING............................................................................................................... 40 4.4 TEST PARAMETERS ............................................................................................................ 41 4.5 TEST DATA ....................................................................................................................... 42 4.6 TEST RESULTS ................................................................................................................... 42 4.7 RESULTS ANALYSIS ........................................................................................................... 43 4.8 SUMMARY......................................................................................................................... 44 Chapter 5 : Evaluation..............................................................................................................45 5.1 INTRODUCTION.................................................................................................................. 45 5.2 EVALUATION OF THE P ROJECT............................................................................................ 45

v

TABLE OF CONTENTS

5.3 EVALUATION OF THE SYSTEM ............................................................................................46

5.3.1

Against functional requirements ..............................................................................46

5.3.2

Against non-functional requirements........................................................................46

5.4 FUTURE WORK.................................................................................................................. 47 5.5 EVALUATION SUMMARY.................................................................................................... 48 Chapter 6 : Conclusion.............................................................................................................49 RECAPITULATION OF THE P ROJECT P ROCESS ............................................................................ 49

References.................................................................................................................................50 Appendix A: Project Reflection................................................................................................52 Appendix B: Project Schedule ..................................................................................................54 Appendix C: OCR Evaluation ..................................................................................................57 Appendix D: Test Samples........................................................................................................59 Appendix E: The Process..........................................................................................................65 Appendix F: Experiment Apparatus .........................................................................................67

v

List of Figures

Figure 1.1: Project Methodology: A hybrid of realistic waterfall and incremental models......................................4 Figure 2.1: Shows vanishing points of an image of a cubical box...................................................................................9 Figure 2.2: Shows the different types of illusory clues (A,B,C,D,E)............................................................................. 10 Figure 2.3 : Associations between elongated and compact blobs respectively......................................................... 10 Figure 2.4: Association network ......................................................................................................................................... 12 Figure 2.5: Shows the different types of vertical associations. ..................................................................................... 12 Figure 2.6: Left image shows the dense network of vertical association..................................................................... 13 Figure 2.7: Confidence measures for projection profiles............................................................................................... 14 Figure 2.8: Geometry involved in line spacing................................................................................................................ 15 Figure 2.9: Quadrilateral-to-rectangle mapping using transformation M................................................................. 20 Figure 2.10 : Bilinear Interpolation................................................................................................................................... 22 Figure 3.1: Histograms......................................................................................................................................................... 27 Figure 3.2: Iterative threshold method.............................................................................................................................. 28 Figure 3.3: A diagram showing a regression line fitting a set of points...................................................................... 29 Figure 4.1 : Implementation and Testing process............................................................................................................ 35 Figure 4.2: Thresholded image with a threshold of 145............................................................................................... 36 Figure 4.3 : Edge Detection image produced from Figure 4.1(b) ................................................................................ 37 Figure 4.4 : Depicts corner detection in the edge image................................................................................................ 38 Figure 4.5 : Corrected version of the mapping shown in Figure 2.8 ........................................................................... 39 Figure 4.6: Interpolation Methods.................................................................................................................................... 40 Figure 4.7: Images rotated along (a) x-axis +30°, (b) x-axis -30°, (c) y-axis +30° and (d) y-axis -30°............... 41 Figure 4.8: Illustrating perspective.................................................................................................................................... 44

vi

Chapter 1 : INTRODUCTION

1.1 PROJECT AIM

The aim of the project is to correct perspective distortion in document images captured using a webcam, which makes the text recognizable by an existing package capable of performing OCR.

1.2 MOTIVATION

The motivation for the project is the increasing use of digital web cameras for the purpose of capturing text documents to replace the bulky and unw ieldy flatbed scanners. Limited research performed in the field of correcting perspective distortion in text document images is another motivating factor for the project.

1.3 OBJECTIVES

The objectives of the project that need to be completed in order to achieve the overall aim are: •

to understand the limitations of OCR packages



to understand the complexities of the using web cam to capture document images



to investigate the research done in the field so far



to analyse the different approaches taken to solve the problem



to devise an application that implements existing or new algorithm(s) to produce a solution for the concerned problem, and



to test and evaluate the accuracy of the developed system

1.4 MINIMUM REQUIREMENTS

The minimum requirements of the system are as follows:

1

INTRODUCTION

1. Understand the problems confronted while performing OCR on images captured using a web cam depending upon several aspects such as distortions, varied lighting conditions and text font size. 2. Understand the research done to date to resolve perspective distortions in captured images. 3. Produce at least one algorithm to correct perspective distortion in images captured using a web cam. 4. OCR would perform successfully on images captured under normal lighting conditions. 5. OCR will be evaluated for different degrees of document rotation out of the image plane.

1.5 PROJECT METHODOLOGY

Avison [1] states that a methodology comprises of phases that in turn consists of sub-phases, which guide the builders of the system to choose the techniques that may possibly be apposite at each step of the project and assist management, control and evaluation of the project. This section presents various project methodologies or development cycles that are employed to structure the development process for a project. The different types of methodologies discussed below include the Waterfall model, the Prototype model, the Incremental model and the Spiral model.

1.5.1 The Waterfall model

Generally, Royce [2] is credited for the waterfall model. This type of software development life cycle has been described by Nasa [3] to have the following steps: 1. Document system concept 2. Identify and analyse the system requirements 3. Divide the system into sub parts (Architectural design) 4. Design the subdivision of system (Detailed design) 5. Code the components and test them separately 6. Test the system as a whole 7. Install the system and operate it This is an idealistic model, in which the methodology only flows forwards, i.e., at any stage of the development, a previous stage cannot be changed. This assumes that the initial user needs and system requirements, etc, cannot be changed at a later stage and have to be perfectly laid at the beginning of the project and hence being the major drawback of the approach. A realistic waterfall model embraces

2

INTRODUCTION

feedback loops which enable to revert from one stage of the development process to the previous one. This induces flexibility to the project along with the added advantages of the waterfall model, the biggest being setting intermediate milestones and deadlines to ensure that the final deadline is met within the allocated time and budget.

1.5.2 The Prototype Model

In the prototype model, firstly, all the known initial requirements are congregated from the customer. A design is then quickly made on the basis of these requirements and a prototype model is developed based on the coined design. Then this prototype is evaluated by the customer. At this stage, if sufficient information is not known to begin the development of the actual desired product, additional iterations all the way through the prototype lifecycle are made to upgrade the requirements and to improve the prototype. When enough is identified to commence the development of the actual product, the prototype is discarded and the product is engineered upon. This type of lifecycle is very useful when the customer understands his requirements vaguely and the developers are not well acquainted with the development environment. Heusser and Sait [4] describe the limitations of the approach as the inability set milestones and that the determination of potential problems in the project can be difficult to foresee.

1.5.3 The Incremental Model

The incremental model slightly modifies the waterfall model. The waterfall model finished each stage of the process and then proceeds to the next phase. The incremental model does the same till the design phase after which, in the implementation stage, it divides the project into smaller modules that can be developed and tested separately. This approach reduces the complexity of the development of the product to some extent since the development of smaller entities would be simpler than the whole.

1.5.4 The Spiral Model

The spiral model is a combination of the waterfall and the prototype models [5]. It is a cyclic model comprising of four stages: pla nning, risk analysis, development, and evaluation. The model starts by gathering the initial system requirements. The project then goes to the risk analysis stage where the any potential conditions that may affect the project adversely are determined, after which the project may either be abandoned due to high risk or it can be kept alive. If the project is decided to be

3

INTRODUCTION

continued upon, then it enters the engineering stage where a prototype is built on the basis of the initial requirements. Then the customer evaluates the prototype and provides feedback on it. Subsequently, the project re-enters the planning stage. A new prototype is generated on the basis of the feedback after performing the risk analysis and the spiral goes on until a stable prototype is reached which matches the customers’ requirements. This method of project management is used when the application is highly complex, involves a big risk factor, and the customer does not have exhaustive knowledge of the requirements to begin with.

1.5.5 Chosen methodology with justification

The project at hand has a set of minimum requirements that are subject to change at a later stage but not too much, has a complex problem to solve which is difficult to design but risk factor is low and meeting the schedule is a high priority. Along with the analysis in the previous section about the individual approaches and taking into account the project aspects mentioned, a combination of the realistic waterfall model and the incremental model has been chosen as the methodology for the project lifecycle. This would ensure that the deadlines are met on time and the development of the product does not become overwhelmingly complex. Figure 1.1 shows a diagrammatic view of the project methodology.

Figure 1.1: Project Methodology: A hybrid of realistic waterfall and incremental models

1.6 PROJECT SCHEDULE Figure B.1 in Appendix B shows the planned project schedule where the blue dates represent the milestones to be achieved as per the rules laid by the School Of Computing, and the black dates represent the targeted completion dates set by the author. 4

INTRODUCTION

1.7 REPORT OUTLINE

The sections of this report nearly imitate the stages of the proposed methodology and are summarised as follows: • Chapter 1 states the aim, motivation and minimum requirements of the project. It also discusses various project methodologies and justifies the one chosen for this project. Based on the methodology it also states the proposed project schedule. It also introduces the report. • Chapter 2 discusses the limitations of OCR packages and web cameras and mentions the system constraints set on the basis on these limitations. It also gives an introduction to the problem being solved, details the previous work done to solve the problem and makes conclusions from the previous work. Moreover, it details the techniques involved in producing the solution. Lastly, it defined the success criteria for the project. • Chapter 3 details the design components of the system that needs to be built, in the order of their occurrence in the system. It also provides a justification for the proposed design by comparing it to the defined success criteria. Finally, it evaluates various technologies for the development of the system. • Chapter 4 explicates the implementation of the distortion correction program. It mentions the problems encountered in the development phase and the rectifications made. Lastly, it justifies the implementation as opposed to the design and the success criteria. • Chapter 5 evaluates the project and the developed system against the requirements set at prior stages. It also presents ideas on future work and further enhancements possible to add to this project. • Chapter 6 presents a recap to the whole project process and makes conclusions about the project methodology and the project outcome. • References and Appendices follow up. These sections contain additional information that complements the text presented in the report but is not essential for understanding the project.

5

Chapter 2 : BACKGROUND RESEARCH

2.1 INTRODUCTION

This chapter looks at the aspects of the project that need to be unfolded before any attempt is made to add to the knowledge. Firstly, the chapter mentions the limitations of OCR packages and complexities of using web cameras to capture images. Subsequently, it states the setup under which the experiment will be carried out, based on the scope defined by the intricacies of OCR and web cameras. After that the chapter introduces the problem and looks into the attempts that have been made to devise a solution to the concerned challenge. It then discusses the fundamental techniques that will be invariably used in the process of developing a solution followed by a conclusion based on the content of the entire chapter.

2.2 OCR PRECINCTS

OCR packages have a number of limitations with regard to their capabilities and the conditions under which they can perform effectively. Most of the OCR packages assume the images to be captured using flatbed scanners which eliminates the possibility of several distortions to crop in the image such as perspective distortion as the document lies flat on the scanner glass, prevents shadow effect across the document as the scanner light floods the document with ample illumination, great skew angles as the document is aligned with the edges of the scanner bed, and there are no lens distortions in contrast to the images captured using the webcams. Thus the OCR packages do not perform well on documents with above-mentioned distortions. An OCR package called ‘PageCam’ has been used to perform document recognition in this project. This is due to fact that this package was available at the School Of Computing and acquiring another package from the market was beyond the financial scope of the project. It has been promoted in [6] as an OCR package that caters to problems such as poor lighting conditions, low resolution images, blur, noise and compressions artefacts. It comes complementary with the “Philips PCVC750K Web Camera” and hence the Philips camera has been used to capture images for this research literature. Nicel [7] collected various images using the Philips camera and performed OCR using the PageCam.

6

BACKGROUND

He explained OCR to be successful if more than seventy percent of the characters of the text were recognised correctly in an image. Table C.1 in Appendix C illustrates the dataset collected to evaluate the performance of PageCam and the corresponding results. The camera was kept at a distance of sixteen inches from the document and all the images were captured in bright illumination and had a resolution of 640x480.

As can be seen from Table C.1, the OCR package produced varied results on the given dataset. It recognised text with font size with 14 points or bigger but could not do so for font size 12 points or lesser. Recognition was carried on effectively for documents that were skewed up to ten degrees and completely failed for the ones more than that. The package faired well with images oriented differently and recognised text with great precision. The OCR successfully identified text in only two instances in case of images containing perspective distortion and those two instances contained very little distortions. OCR could not cope well with the illumination differences across the document.

2.3 INTRICACIES OF A WEB CAM

Web cams are a cost effective way of capturing digital images but they have their own limitations. All the aspects that affect the performance of the OCR package are the majority of issues that occur when acquiring images using a web cam. Converse to the advantages of flatbed scanners mentioned in section 2.2, images captured with webcams suffer with shadows due to illumination variations. The images may contain skew due to no marked area as the “bed”. The images may also contain perspective distortion if the document is held up in front of the camera rather than laying it on a flat surface with the camera right above the document. The images may, furthermore, have poor picture quality due to low resolution of the webcam and due to this small sized text becomes very difficult to recognise correctly as the text letters become indistinct. Moreover, due to the wide angle of the camera lens, lens distortions such as barrel and pincushion may creep in to reduce the efficiency of OCR. These defects need to be corrected before OCR is applied to them due to the fact that most OCR packages require the image quality to be like the one that is obtained using a flatbed scanner, which a normal web cam does not deliver.

7

BACKGROUND

2.4 SYSTEM CONSTRAINTS?

Based on the discussion done in the section 2.2 and section 2.3, we can now decide what areas need to be dealt with before OCR is performed on images captured using a webcam. Although there are many issues that need to dealt with, due to time constraints it will not be possible to solve most of them within the time frame set for this project. So this project will deal with one such area, that is, the correction of perspective distortion due to the rotation of either the camera or the document which results in the image plane and the document plane to be non-parallel. For this purpose other constraints need to be set so that they do not hinder the OCR process. We shall make the following assumptions for the entirety of the project: •

Image capturing will be done in uniform and optimum lighting conditions.



Document will contain black text on white background.



The text font size of the document will be greater than or equal to fourteen points.



There will be no skew present in the captured image.



Correct orientation will be present.



Document will contain only textual data.

2.5 PREVIOUS WORK 2.5.1 Introduction

Efforts to read text pages using OCR engines have been in practice for a long time. These efforts have dealt noticeably with the issues of resolving skew in the document images captured using flat bed scanners and even web cams. However, even though the problem of perspective distortion in document images captured using webcams has been recognised, it has not been addressed greatly until now. Little research is evident in this field, where the plane of the paper is not fronto -parallel with the camera view. The following sections account for the methods that have been implemented to date to solve the problem of perspective in document images to make it possible for the OCR package to read the text. But before these methods are looked at, let us understand what the basic motive of these approaches is.

8

BACKGROUND

At the top level it is understood that the methods discussed below would all aim at removing the perspective distortion in document images, but if we go a level below this, the main objective of all these methods is to determine the vanishing points. What are vanishing points and what is their relevance to the problem? Vanishing Points As we look at a pair of straight railway tracks, we observe that at a point far away, theses tracks appear to merge into one point. This point, at which real-world parallel lines seem to converge in the 2D plane, is the called a vanishing point. Moreover, real world 3-dimensional scenes when projected onto the 2-dimensional image planes, depict parallel lines as lines that meet at the vanishing point. Figure 2.1 shows an example of vanishing points in an image. The figure shows a cuboid whose sides are parallel in the real world but appear to converge at vanishin g points in the image.

Figure 2.1: Shows vanishing points of an image of a cubical box

Vanishing points are important information to determine the orientation of the image objects. Once these points are detected in a particular image, they are used to form intersecting quadrilaterals that determine the geometry of the plane. How this is done will be discussed in more detail later in the report.

2.5.2 Extraction of illusory clues

Pilu [8] has prescribed an approach in which illusory [9] clues are used to determine the skew and perspective distortions in the image. These are clues which are not directly evident in the image but can be extracted as they correspond to linear features that arise due to linear arrangement of text. Pilu has mentioned five such clues: (a) “vertical illusory clues” which emerge from the apparent vertical lines due to alignment of text. (b) “vertical hard lines” that correspond to actual vertical document edges.

9

BACKGROUND

(c) “horizontal illusory clues” that originate from the arrangement of letters and words in text lines. (d) “horizontal hard lines” which correspond to actual horizontal document edges, and (e) “quadrilateral” that corresponds to either illusory quadrilaterals formed from the vertical and horizontal illusory clues or to actual quadrilaterals in the text. These clues are respectively shown in Figure 2.2.

Figure 2.2: Shows the different types of illusory clues (A,B,C,D,E) (Source: [8])

Horizontal and vertical hard lines can be found using either Hough Transforms (consult [10] for details on Hough Transform) or edge detectors likes Sobel and Canny edge detectors which will be discussed in the upcoming sections of this report. The illusory vertical and horizontal clues are rather complicated to detect. The image is turned into blobs of text, “compact” (made of characters) or “elongated” (comprising words or portions of lines), based on the font size and resolution of the image. A blob is classified as elongated if its major axis is more than three times longer than its minor axis. Different types of blobs are shown in Figure 2.3.

Figure 2.3 : (a), (b) show compact blobs and the associations between them (c), (d) show elongated blobs their associations with elongated and compact blobs respectively. (Source: [8])

10

BACKGROUND

A “Pairwise saliency measure” is calculated for all different blobs using two blob saliency features, “relative minimum distance” and “blob dimension ratio”. Blob dimension ratio (BDR) and relative minimum distance (RMD) are given as:

(Equation 2.1)

and (Equation 2.2)

where DijMIN is the minimum distance between two blobs Bi and Bj , and, AxMIN and AxMAX respectively represent the minor and major axes of a blob Bx . Pairwise saliency measure is a probability measure, which, for two compact blobs is given by:

(Equation 2.3)

where N ( x, µx, s x ) gives a Gaussian distribution of x with mean as µx and standard deviation as s x, and for one or two elongated blobs is given by:

(Equation 2.4)

where a ij is the angle between the horizontal axis of one elongated blob and either the centre of a compact blob or the horizontal axis of another elongated blob. This pairwise saliency measure is then used to form curvilinear arcs between blobs that represent the same line of text. To do this task a greedy path growing approach is adopted. In this approach random seeds (blobs) are selected and arcs are formed in one particular direction (right or left) by joining the blobs which satisfy the minimum required saliency measure with arcs until a blob is found which does not fit with the previous ones. Then, starting at the same seed arcs are formed in the opposite direction in the same way, ensuring that the blobs that have been previously used are not used again. This way, an association network is formed where the blobs act as nodes and curvilinear arcs between them form their associations. Horizontal lines are then fitted to these arcs in order to determine the exact angle of the direction of the linear groups. This is done by using linear regression which is discussed in section 3.2.4 in detail. Thus formed association network is depicted in Figure 2.4.

11

BACKGROUND

Figure 2.4: A:Original binary image, B:Association network, C:the extracted curvilinear groups, D:fitted horizontal lines (Source: [8])

The fitted horizontal lines are used to find the horizontal vanishing points. The homography thus calculated from this vanishing point can be used to correct the skew in the document. After the skew is corrected using the horizontal clues, it becomes easier to find the vertical clues as a rough idea of the vertical direction is known. To find the vertical illusory clues, the same blobs are used but a different sort of association network is formed. In this network the associations are formed between blobs which lie in the near vertical direction. Figure 2.5 depicts these associations.

Figure 2.5: Shows the different types of vertical associations. (Source: [8])

The associations are refined by rejecting the ones that are almost impossible to represent the vertical clues rather than choosing them by saliency measures, as was done for horizontal clues. Pilu has mentioned four rejection rules that are applied to the initial dense network: (a) “longest of two overlapping associations. (b) left-end-to-right-end-associations (and vice versa), since they can’t be formed from a justified paragraph. (c) associations that are at too much of an angle form the vertical direction.

12

BACKGROUND

(d) associations of blobs of two different heights as they are most unlikely to form the part of same paragraph.” Using these rules, the dense network is reduced to a network with relevant associations, though amongst quite a few insignificant ones. Ultimately, as done in the case of horizontal clues, a greedy split and merge policy is applied to bracket together all the clues in near-vertical groups to form the vertical clues. Figure 2.6 shows examples of the detected vertical clues.

Figure 2.6: Left image shows the dense network of vertical association and the right image shows the selected vertical clues. (Source: [8])

The vertical and horizontal lines thus formed are used to find the horizontal and vertical vanishing points. A homography is computed using the two vanishing points which is then used to warp the original image based on a transformation that maps quadrilaterals to rectangles. This technique is discussed in detail in section 2.6.2. The table below analyses the pros and cons of the method described above: Pros

1. This method can correct perspective distortion in text documents even when there are no page borders present in the document image.

Cons

1. The method does not work efficiently when there is only one vertical illusory clue available. This occurs when the text document is left justified, right justified or centrally justified.

2.5.3 Using Projection Profiles

Projection profiles are graphs made on the basis of number of black and white pixels along a particula r axis in a binary image (that only contains black and white pixels). Such graphs contain defined peaks and troughs corresponding to the pixels along the given axis. These projection profiles are used to determine both, the skew and perspective, in an image. The idea is that if the lines of text

13

BACKGROUND

are parallel to a particular axis then these text lines will produce peaks when a profile is created taking the projection in the axis perpendicular to the former. Projection profiles have proved to be an effective tool to determine the skew in fronto-parallel document images as discussed in [11]. Here various profiles are created for different probable angles of skew and the one with minimum entropy (a measure of least error) is marked as the angle of skew. However, Clark and Mirmehdi in [12] have utilised the technique of projection profiles to determine the perspective distortion from the text image. In their approach, they formed a circular sample space of all the possible vanishing points for a text region and generated projection profiles from the view point of all these points. The basic idea behind this approach is that all the parallel text lines point towards the vanishing point and thus the actual vanishing point will have distinct peaks for every text line, as they will have high number of black pixels. A “confidence measure” is calculated for each projection. This confidence measure is high for a point for which more distinct peaks are produced in the projection profile. The point with the highest confidence measure is chosen as the horizontal vanishing point of the text document plane. Confidence measures for all possible vanishing points are plotted as shown in Figure 2.7(b).

Figure 2.7: (a) Sample space, (b) confidence measures for projection profiles, white arrow represents a probable vanishing point and black cross is not a vanishing point. (Source: [12])

Once the horizontal point is found, then the justification of the document is determined, i.e., whether the document is left, centre, right or fully justified. This is done by first marking the starting-points, the mid-points and the end-points of all the text lines. Then lines are fitted to the starting-points of all text lines, the mid-points of all text lines and the end points, using the RANSAC method (for details on the RANSAC method consult [13]). The error associated with each of these fittings determines the justification of the document. If the line fitting the starting points of the text line has the least error associated with it, then the document is said to be left justified. Similarly, it is said to be centrally justified if the line joining the mid points has the least error and right justified if the line fitting the

14

BACKGROUND

end points produces the smallest error. However, if the errors associated with the lines fitting the starting and ending points of the text lines are found to be nearly equal then the document is marked as fully justified. In this case the two lines fitting the left margin points and the right margin points are used to determine the vertical vanishing point as the point of their intersection. But if the document is not fully justified then further work needs to be done. The line with the least error is referred to as the baseline. The image is rotated to make the baseline perpendicular to one axis so that the problem gets less complicated by having to consider only two-dimensional (y,z-plane) image from here onwards. Figure 2.8 shows the image after making the baseline vertical.

Figure 2.8: Geometry involved in line spacing (Source: [12])

In the figure, P is the bottom of the paragraph and the text lines are at regular intervals of Q. Hence the nth line is given as (from the bottom, P): L(n) = P + nQ

(Equation 2.5)

and the projection of this line in the image plane is given as:

(Equation 2.6)

where f is the camera’s focal length and, Py and Pz, and, Qy and Qz are the y and z components of P and Q respectively. Clark and Mermehdi state that without resulting in a loss of the nature of the projection, this scene can be scaled about the focal point O to make Pz equal to f which would give an effect of the paragraph touching the image plane. Thus making Py = y(0) which leads to the following formula from equation 2.5:

15

BACKGROUND

(Equation 2.7)

where U = y(0), V = Qy / Py and W = Qz / Pz. This makes the process independent of the focal length of the camera and hence any camera can be used in the experiment. The position of the nth line is given by, Xn = y(n)

(Equation 2.8)

and the line spacing at position Xn is given as, Y n = y(n+1) + y(n)

(Equation 2.9)

By doing this, any lines in the image that are not found to be consistently spaced will outstand as having unusual spacing but this inconsistency will not promulgate through the rest of the lines. By substituting equation 2.7 into equation 2.9 gives:

(Equation 2.10)

and substituting equation 2.7 into equation 2.8 and then using equation 2.10 gives:

(Equation 2.11)

Different values of V and W are substituted in the above equation for fitting lines by obtaining least error. However, many unwanted minima may be produced due to the complexity of the equation 2.11 and hence it is approximated as follows:

(Equation 2.12)

This makes sure that values for V and W, close to the actual minima are obtained. These values are finally substituted in equation 2.7 to get altitude of the horizon which is given by: y(8 ) = UV/W

(Equation 2.13)

The rotation made earlier to make the baseline upright is revered to achieve the vertical vanishing point, which this point corresponds to. The vertical vanishing point and the horizontal vanishing points found will be used to find the lines fitting the paragraph. These will intersect each other forming a quadrilateral enclosing the paragraph.

16

BACKGROUND

This quadrilateral, like in the previous method discussed in section 2.5.2, will be mapped onto a rectangle to produce an undistorted image. The table below summarises the pros and cons of the described method. Pros

1.

Can correct perspective distortion in text documents when there are no page borders present in the document image.

2.

Efficiently corrects the distortion even if the paragraphs are not fully justified.

3.

The method is independent of the focal length and other internal features of the camera being used.

Cons

1.

The method does not works well on images taken using high resolution cameras to make the undistorted document suitable for OCR.

2.

It is computationally expensive.

2.5.4 Conclusions on previous work

The sections above discussed two methods to remove perspective distortion from text document images along with other techniques involved in doing so. From the past research it has been realised that apart from the above mentioned two methods, there has been no other research done in the field of correcting perspective distortion in document images. Both the approaches mentioned here are novel approaches to solve the problem however the second method, i.e., using projection profiles together with line spacings, takes an edge over the method of finding and using illusory clues. This is due to the fact that the former can perform even when the document is not fully justified, whereas the latter cannot. However, the implementation of the approaches mentioned is beyond the scope of this project in terms of the time allocated for it. This project will therefore try to implement parts of the methods discussed above to solve the problem.

2.6 BASIC TECHNIQUES INVO LVED

Basic understanding of the problem, together with the notions congregated from the previous work, suggests that some basic techniques will be inevitably used in this project. This section throws light on some such procedures.

17

BACKGROUND

2.6.1 Image warping

The whole project idea is to remove some sort of geometric distortion from the image. For this purpose, once the knowledge of the geometry is known, ‘image warping’ will be implemented to create an undistorted image. Image warping is a burgeoning field of image processing which deals with geometric alterations of images. The increasing availability of powerful computers and advanced graphics stations has broadened the vistas of image warping to create special effects in real-time videos. Image warping involves mapping a set of control points from the reference image I(x, y) onto a set of points in the target image I’(x’, y’). This method is also referred to as forward mapping. This transformation can be represented in the form of equations as follows:

x’ = a1 x2 + a2 y2 + a3 xy + a4 x + a5 y + a6 y’ = a7 x2 + a8 y2 + a9 xy + a10 x + a11 y + a12

(Equation 2.14)

There are twelve unknown coefficients in this quadratic equation and it represents a quadratic warp. In such a case, six control points would be required in both the images to determine the unknown coefficients [14]. Substituting the coordinates of these six control points in equation 2.14 would give twelve equations and thus would be sufficient to find the twelve unknowns. The type of equation stated in 2.14 represents a quadratic warp. This will induce complex distortions in the image being warped, for example polynomial curves. Similarly, different degrees of warps are used to induce different types of distortions in the images. For instance, cubic warps are used to remove the pincushion and barrel distortions in images that are induced by the camera lens. In this project, once the text region or the borders of the document page are identified, quadrilateralto-rectangle mapping can be used to warp the original image to give the undistorted text image. This warping can be done using a transformation that would map the identified quadrilateral, which either represents the border of a text grouped as a paragraph or the actual page boundaries, onto a rectangle in a target image so that the undesired perspective distortion is removed. Such mapping has been discussed by in Kim et al [15] in their work and has been detailed in the next section.

18

BACKGROUND

2.6.2 Quadrilateral-to-Rectangle Mapping

Quadrilateral-to-rectangle mapping implements the basic fundamentals of perspective geometry and vanishing points to perform a transformation of a quadrilateral in the source image I(x, y) to a rectangle in the target image I’(x’, y’). This perspective transform has eight degrees of freedom, i.e., it involves determining eight unknown coefficients. These eight coefficients can be determined using four corresponding points in the source and the target images. As Kim et al has mentioned, a general planar perspective transform can be written in the matrix form as,

(Equation 2.15)

and for the perspective transformation M, the forward transformations will be,

(Equation 2.16)

To perform the required, quadrilateral-to-rectangle mapping, a unit square-to-quadrilateral mapping [10] is scaled, translated and reversed. The unit-square-to-quadrilateral mapping is performed between the points (0,0), (1,0), (0,1), (1,1) in the reference image and (x0’, y0 ’), (x1 ’, y1 ’), (x2 ’, y2 ’), (x3’, y3’), on the target image. The perspective transformation is given by:

(Equation 2.17)

Where,

(Equation 2.18)

(Equation 2.19)

and,

19

BACKGROUND

(Equation 2.20)

(Equation 2.21)

This transformation can then be applied to map a rectangle with coordinates (x0, y0), (x1, y1), (x2, y2), (x3, y3 ) to a quadrilateral with coordinates (x0 ’, y0 ’), (x1’, y1 ’), (x2 ’, y2 ’), (x3 ’, y3 ’) in the target image. The

intended transformation is shown in Figure 2.9.

Figure 2.9: Quadrilateral -to-rectangle mapping using transformation M. (Source: [16])

This rectangle -to-quadrilateral mapping can be obtained by taking the unit-square-to-quadrilateral mapping and scaling and translating it as follows:

(Equation 2.22)

Here,

20

BACKGROUND

(Equation 2.23)

(Equation 2.24)

and, x’ = u’/w’

and y’ = v’/w’

(Equation 2.25)

Finally, this transformation can then be used to find the quadrilateral-to-rectangle mapping by reversing the mapping as:

(Equation 2.26)

This type of image warping is inevitably used to correct perspective distortion and hence will be of grave relevance to this project.

2.6.3 Interpolation

In forward mapping, each point corresponding to the reference image is mapped onto to the target image. However, depending on the nature of the transformation, on one hand, two or more points from the reference image may map onto the same point in the target image, and on the other, a few points in the target image may not be mapped onto at all. This would create holes in the target image corresponding to the points which will not be mapped onto. Thus this data will need to be filled (interpolated) somehow in order to produce an image without any holes. The remedy to this problem is to perform a backward mapping. In backward mapping, an inverse mapping is used to trace the points of the target image back to the points in the reference image, i.e., a point (x, y) will be mapped on to each point (x’, y’). Let T be the transformation applied to the init ial image, so x and y would be given as:

21

BACKGROUND

x = x ’ * inv(T) y = y’ * inv(T)

and

(equation 2.27)

where inv(T) represents the inverse of mapping T. This way it can be ensured that each of the target image points have some value corresponding to the original image. Nonetheless, when this is done, a point in the target image may trace back on to noninteger values, i.e., x and y may have value including non-zero value after the decimal place. These points cannot be traced back to the initial image since the points in an image are represented by integer coordinates. So these points have to be rounded to the nearest integer value which would mean that the pixel value of the integer point closest to these x and y values will be mapped onto the point (x’, y’). This process is known as nearest neighbour interpolation.

A more accurate method of approximating the value of the target image pixel is to employ a method called bilinear interpolation. In this technique, the target pixel value is taken as the weighted combination of the four nearest pixels in the reference image where the weights are directly proportional to the distance between the pixel to be mapped onto and the nearby pixel. Figure 2.10 illustrates this.

Figure 2.10 : Bilinear Interpolation (Source: [14])

The bilinear interpolation function can be given as: I’(x’, y’) = I(x 1 , y 1 ) + [I(x 2 , y2 ) - I(x1 , y 1 )] dx + [I(x 4 , y4 ) - I(x 1 , y1 )] dx

+ [I(x3 , y 3 ) + I(x 1 , y1 ) - I(x4 , y 4) - I(x2 , y 2 )] dxdy

(Equation 2.28)

where dx = x - x 1 and dy = y - y1 .

22

BACKGROUND

Bilinear interpolation is computationally more expensive than nearest-neighbour interpolation but the results produced are much better. Especially when the images are rotated using nearest-neighbour interpolation, the rotated image gives a blocky look thus making the edges look jagged.

2.7 CONCLUSIONS

On the basis of the background research and considering the complexity of the involved methods, a basic idea can be made as to what method the system would follow in order to produce the desired output. The method would involve finding and using the hard horizontal and vertical clues, as mentioned in section 2.5.2, to find the vertical (or horizontal) vanishing point of the document page. The lines found this way will be used to form a quadrilateral representing the document page. The project will then try to find the corners of the page from the quadrilateral formed. Then, the techniques discussed in section 2.6 will be applied to calculate a transformation that will map the points of the quadrilateral on to a rectangle along with the notion of interpolation in mind. The transformation calculated will be applied to the original image in order to produce an undistorted image. The initial overview of the design of the system mentioned here states that the system will aim at finding the page borders in order to correct the distortion. For this purpose the images of the document will have to be taken from a distance further away from the document and hence the initial claim, of keeping the text font size to fourteen points in section 2.4 will have to be revised. For this purpose the camera will be placed at various distances from the document to be captured. This resulted in reaching an optimum distance of nineteen inches from the document in order to capture the whole document and to leave some room for the rotations that would be made to the paper in order to induce perspective distortion. Subsequently, same document was captured in various font sizes keeping the camera at the optimum distance of nineteen inches. The OCR best responded when the font size was thirty-two points and hence the initial claim of keeping the font size of fourteen will be changed to this new value. As mentioned in the Mid-Project Report, due to a delayed start and the difficulties encountered in comprehending the complex previous work, the background research took longer than as planned in the project schedule in the section 1.6. To accommodate this, a new project schedule has been planned as given shown in Figure B.2 of appendix B. Here the time allocated to the design phase has

23

BACKGROUND

been reduced to adjust the change because the background reading has given a basic idea about the probable design of the system.

2.8 REQUIREMENTS SPECIFICATION

A set of requirements for the system acts as a blueprint for the system to be developed. These requirements are not only important to understand the problem that needs to be solved, but also provides guidelines on the basis of which the system can be evaluated at various stages of the project. There are two types of system requirements – functional requirements and non-functional requirements. Fulfilled functio nal requirements specify that the system is a working one and satiated non-functional requirements determine that the system is workable on.

2.8.1 Functional System Requirements

Functional requirements are the “must have” requirements of the system. These requirements lay out the most basic tasks and function the system is expected to perform. Listed below are functional requirements of the project at hand: •

The system must correct the perspective distortion in the document image that is up to 30° out of the image plane.



The system should increase the accuracy of the OCR in reading a text document with perspective distortion.



The OCR must be able to recognise at least seventy percent of the characters in the document image given the constraints mentioned in sections 2.4 and 2.7.

2.8.2 Non-Functional System Requirements

“Quality is not an after thought, it has to be built in right from the beginning” – Eur ing Peter Jesty Non-Functional Requirements present a schematic and viable approach to build quality in the system. According to ISO 9126 standards [16], six quality characteristics determine the workability of a system and state the attributes which can be used to evaluate the final product: functionality, reliability, usability, efficiency, maintainability and portability . All the characteristics are broad categories which are further divided into many attributes. An overview of these characteristics is given below: 24

BACKGROUND



Functionality defines a set of attributes that uphold the being of the functions that fulfil the stated needs of the user.



Reliability defines a set of attributes that determine the competence of the system to sustain its performance under a given set of conditions.



Usability defines a set of attributes that ascertain the effort required to use the system.



Efficiency defines a set of attributes that uphold the amount of resources used by the system to give a certain level of performance.



Maintainability defines a set of attributes that state the effort required to make changes to the system.



Portability defines a set of attributes that bear upon them the capability of the system to work in a new environment.

All these attributes need to be built into the system to ensure that a fine quality product is built. These characteristics will be used to evaluate the system once it is developed. The first quality trait is the functionality, which is the same as the functional requirements of the system. Hence all the other quality characteristics except this one will be considered while evaluating the system for nonfunctional requirements.

2.9 SUMMARY OF BACKGROUND RESEARCH

This chapter detailed the knowledge required to understand the problem in order to device its possible solutions. It first evaluated existing OCR packages and recognised the limitations of using web cameras for the purpose of capturing images. Based on these limitations, the system scope was set. A basic introduction to the problem was given and the motive of the previous work was explained. Then the work recognised in this area was detailed. This included the approaches that have been followed to correct the perspective distortion along with the techniques complementing these methods such as rectangle -to-quadrilateral mapping and interpolation. These methods were evaluated and conclusions were made to present the basic ideas that would be used to produce a solution to the problem. Following the conclusion, amendments were made to the system constraints. Finally, the chapter discussed the needs that the system is required to fulfil in order to be a success. The following chapter will discuss a detailed design of the system based on these basic ideas.

25

Chapter 3 : T HE DESIGN

3.1 INTRODUCTION

Design is the third phase of the project according to the project methodology. This chapter discusses the design of the system in detail. First it gives an overview of the design and then discusses the various steps the system that will be required to do in order to meet the final goal. At every stage of the design, it discusses the techniques involved in the development of that component. Lastly, the chapter compares implementation technologies that will be employed to construct the system and justifies the chosen technology. The system will be provided with an initial grey level image with the undesired perspective distortion. To achieve the main goal of removing the perspective in the document image, the overall design of the system can be sub-divided in the following objectives: •

Original input image will be provided to the system.



Segmentation of the features of interest from the original image will be performed.



The corners of the document page will be detected.



The corners of the quadrilateral representing the document page will be mapped on to a rectangle to correct the perspective distortion.



The missing data in the output image will be interpolated.



The output image will be passed to the OCR package.

The following sections discuss the above mentioned objectives in detail.

3.2 SEGMENTATION

In most vision applications, it is very useful to differentiate between the parts of the image we are interested in from the unwanted ones. This is known as segmentation. Thresholding is a very convenient technique to perform segmentation. This is effective in cases where the foreground in the

26

DESIGN

image, usually the region of interest, has grey level intensity different to that of the background. Thresholding an image produces a binary output image. In this project, first a binary output image will be produced after which the edges of the document will be needed to be found. This can be done using edge detectors. These techniques are discussed in details below.

3.2.1 Thresholding

In thresholding, a certain grey level value is set as a limiting value. All the pixels in the initial image with grey level intensity values below this limiting value are set to 0, the minimum grey level intensity representing black, in the target image, and the pixels with grey level value above this value are set to 255, the maximum grey level intensity signifying white. This results in the desired binary image, with the bright white foreground pixels on a dark background or vice versa. In more complex situations multiple thresholding values can also be set to determine the band of intensities that need to be mapped onto white or black. However, the choice of the threshold (limiting value) is very critical in this process. This can be determined by looking at image histograms. Histogram is a graph screening the number of pixel with a particular grey level intensity. In an 8-bit greyscale image there are 256 grey level intensities, hence a histogram for such an image would give the number of pixels with each of these 256 intensities. An image with two distinct grey level intensities will produce a histogram with two well-defined peaks. In such a case the grey level intensit y at the lowest point of the trough between these two peaks can be chosen as the threshold which would effectively separate the desired regions of the image. In case there are more than two peaks then two or more threshold values can be chosen to determine the range of grey levels that need to be segmented. On the other hand, if the grey intensities in the image are not distinctly present in the image then overlapping peaks are produced in the histogram. Figure 3.1 shows the three types of histograms with T1 and T2 as optimal thresholds.

Figure 3.1: Histograms with (a) two well defined peaks, (b) more than two peaks, (c) overlapping peaks. [18]

27

DESIGN

When there are overlapping peaks in the histogram, then adaptive thresholding is used to determine the threshold value. In this process, the original image is divided in smaller regions and a local threshold is set for each of these regions. The basic idea is that for naturally taken images, under different lighting conditions, smaller regions of images have more consistent gradient, i.e., there is less variations of grey level intensities over these smaller regions. The system constraints mentioned in section 2.4 state that uniform lighting will be present during the experiment. Also, the project assumptions include that black text will be present on white background; hence global thresholding will be sufficient to accurately segment the desired regions, i.e., the page, from the rest of the image. However, this threshold needs to be calculated during the experiment, automatically, according to the image. This can be done using the iterative threshold method.

(a)

(b)

Figure 3.2: (a) A grey level image (b) Corresponding thresholded image using iterative threshold method.

3.2.2 Edge Detection

Edge detection is a vastly used application in the Image processing industry. Edge detection is done by locating the places in an image where the colour (in case of colour images) or grey level (in case of greyscale images) varies suddenly. The elements of interest such as solid objects, shapes, shadows, etc. generally produce variations in the colour or intensity of grey level and hence it is of utmost relevance to find their edges in order to identify and label them as separate regions, for which edge detection is used. Various types of edge detectors have been developed namely the Prewitt, Roberts, Sobel and Canny edge detectors. The simplest, most widely used edge detector is the Sobel edge detector which will be used for the project. After the original text image has been thresholded, the borders of the document page need to be identified. This can be done using the Sobel edge detector. 28

DESIGN

3.2.3 Determining the page corners

The next step after doing the basic image processing and segmenting out the page borders is to use the page borders to determine the page corners. This can be achieved by finding the equations of the lines representing the edges of the pages and subsequently find the corners as the intersection points of these lines. To do the required task, we can use a technique called linear regression.

3.2.4 Linear Regression

Regression is a method used to predict a dependent variable on the information available about one or more independent variables. In linear regression, the dependent variable is a linear function of one or more independent variables. For example, when two points are given in a 2D-plane then there is at least one line that passes through these points, but when there are three or more points in the plane, then there may not exist a single straight line that contains all three points. In such a case, regression is used to fit a line through these points which bets fits the given set of points. This is usually done by a method called the least-squares method. When a single line is drawn which passes close to all these points, the points not lying on the line will either lay above or below the line. The distances of these points are measured from the line, which give the error of estimation for each point. The best fit of the line would be the line that would minimize this error, but, since some errors will be negative (lying above the line) and some will be positive (lying below the line), these errors are squared and the sum of their squares in minimised to determine the best fitting line. Figure 3.3 shows an example of regression line fitting.

Figure 3.3: A diagram showing a regression line fitting a set of points (Source: [17])

29

DESIGN

Following text determines how the application of regression fits the concerned problem. As mentioned in Section 2.4, there will be no skew present in the document page, hence the equations of the top edge of the document page can be the equation y = where is the y coordinate of centre pixel of the top edge. Similarly, the bottom equation of the bottom edge of the page can be determined by the same equation where the is the coordinate of centre pixel of the bottom edge of the page. However, due to the perspective in the document, the left and the right edges of the page will be slanted and will require regression to determine their equations. Firstly, coordinates of five equally spaced points will be taken from the left edge of the page. Then, using regression, a line that best fits these points will be determined. This line will represent the left edge of the document. In a similar way, a line that represents the right edge of the document will be determined. After this has been done, four equations representing the four edges of the page will be present. These equations can then be solved in pairs to find the corners of the page. The next section details the mathematics involved in doing so.

3.2.5 Mathematics Involved

If there are two unknown variables, then a system of two equations containing these variables will be required and sufficient to determine the unknown variables. This is based on basic Linear Algebra principles as mentioned by Hoffman and Kunze [19]. Let two linear equations be given by: y = m1 x + c 1

(Equation 3.1)

y = m2 x + c2

(Equation 3.2)

and, where x and y need to be determined from the known values of m1 , c1 , m2 and c2. From the equation 3.1, x can be written as: x = (y – c1 ) / m1

(Equation 3.3)

This value of x can be substituted in the equation 3.2 to give: y = m2 * ((y – c1 ) / m1 ) + c 2

(Equation 3.4)

which can be solved to find out y. This value of y can be then substituted back into equation 3.1 to get the value of x.

30

DESIGN

The above mathematics can be used to find the page corners by selecting pairs of equations from the four equations that were determined using regression. The next step after locating the coordinates of the corners of the page would be to map these coordinates onto a rectangle. The following section discusses this aspect.

3.3 QUADRILATERAL- TO-RECTANGLE MAPPING

This part of the project design implements the technique discussed in section 2.6.2. The corner points of the page found in the previous section embody a quadrilateral. This quadrilateral when mapped onto a rectangle will produce an, undistorted, upright image of the document image. As part of this mapping, backward mapping using either nearest-neighbour or bi-linear interpolation scheme (discussed in section 2.6.3) will be implemented in order to fill the missing points in the output image. Which scheme will be finally used will be decided after implementing both the strategies and evaluating the quality of the output and the processing times. Following the steps followed above will produce an output image with corrected perspective distortion and this image will be passed on to the OCR package to perform character recognition.

3.4 IMPLEMENTATION TECHNOLOGY

The proposed design can be put into action using various programming languages. A few of these potential technologies have been discussed and evaluated below:

3.4.1 Java

Java is a platform-independent object-oriented programming language invented by Sun Microsystems around 1991 and officially released in 1995 [21]. It has entities called classes and these classes are comprised of entities called methods that process a particular input and provide with an output on the basis of some calculations. Classes together with their methods form objects. There are many in-built class libraries in Java that contain classes along with their methods, which can be used directly to build complex programs.

31

DESIGN

Java is especially renowned to create web based technologies by building on small programs called Applets. These applets are downloaded to a java enabled computer on which the application is to be run. The pearsoneduc imaging library contains many image processing classes. These classes along with standard Java classes make it possible to perform image manipulation in Java.

3.4.2 MATLAB

Matlab is an acronym for MATrix LABoratory, a language built by Prof. Cleve B. Moler at the University of New Mexico who was an expert in numerical analysis [21]. As the name suggests, Matlab is a language which basically deals in the manipulation of matrices. Based on this language, The MathWorks , have built the programming language MATLAB which is a very potent tool to perform matrix computations. It has numerous inbuilt functions with the help of which complicated programs can be create with great ease. Moreover, it is a powerful tool for 2D and 3D graphics and the Image Processing Toolkit makes it possible to perform image manipulation maintaining the simplicity of MATLAB programming.

3.4.3 Chosen Technology

Both the technologies mentioned above have the capability of performing image processing using the inbuilt class libraries and packages. However, MATLAB code is far more succinct than an analogous code written in Java programming language. Moreover, image processing involves great deal of matrix manipulation and MATLAB is better in carrying out calculations with matrices. Apart from these factors, the author has knowledge about Java programming Language through the course “Object-Oriented Programming (SO21)” studied at the second year at University Of Leeds hence working in MATLAB would add to the knowledge of the author. Considering all these factors, MATLAB has been chosen as the programming language to implement the design discussed in this chapter. “For the purpose of an engineer or scientist, MATLAB has the most features and is the best development program in its class.” – IEEE Spectrum Magazine.

32

DESIGN

3.5 DESIGN SUMMARY

The chapter first presented the overall design of the system and then explained the design of the system in detail. It discussed the procedures and tools involved in the development of various design components. Lastly, the chapter compared technologies that will be employed to implement the system and justified the chosen technology. The next chapter gives a comprehensive discussion on how the proposed design is executed using the chosen development tool.

33

Chapter 4 : IMPLEMENTATION AND T ESTING

4.1 INTRODUCTION

Following the incremental waterfall model, next step after a feasible design has been made, is to implement the design and then to test it. This chapter details the execution of the prescribed design. It presents a systematic implementation of the design components along with their testing. followed by detailed implementation and testing of According the project process mentioned in Section 1.5.5, the implementation of the system will be broken down into smaller parts that will be separately implemented and tested. The chosen programming language was MATLAB and hence the following execution will discuss how it will be implemented in the language.

Before these components are implemented, test data needs to be collected, with which, each of the components will be tested individually so as to ensure that each component works well before the system is integrated as a whole. Appendix F shows the apparatus used in the project to capture images. The setup consists of a web camera mounted on top of the monitor and a TV antenna as the platform to place the document pages intended to be captured. The flexibility of the antenna allows the page to be rotated out of the image plane. The environment to capture various images has to be kept constant in order to ensure that other factors do not impede the performance of the system. For all experiments the following constraints will be applicable: 1.

The camera will be placed 19 inches from the document plane (as mentioned in section 2.7).

2.

The camera will be focused on the centre of the document page.

3.

Illumination will remain constant.

4.

Image resolution will be kept at 640x480.

5.

The font size will be 32 pts (as mentioned in section 2.7).

6.

OCR package used will be ‘Page Cam’.

A number of images were captured using the standard apparatus of the project. The images were captured at different angles of orientation of the page (0º - 50º) from the camera axis. The degree of rotation was measured using a protractor and a compass. Appendix D shows this sample test data.

34

IMPLEMENTATION AND TESTING

This data set is not exhaustive and a complete set will be collected at a later stage to test the integrated system. In accordance with the design mentioned in the previous chapter, implementation will be divided into following phases and each of these segments will be tested before the next component is developed. •

Thresholding



Edge Detection



Regression fitting



Corner detection



Quadrilateral-to-Rectangle Mapping

Figure 4.1 shows diagrammatic view of the implementation and testing process.

Figure 4.1 : Implementation and Testing process

4.2 COMPONENTS IMPLEMENTATION AND TESTING 4.2.1 Thresholding

The design mentions implementing the global iterative threshold method to segment the page from the rest of the image. The optimal threshold of the image was calculated using the isodata [22]

35

IMPLEMENTATION AND TESTING

method of MATLAB language. This optimal threshold was then used to produce a thresholded image by mapping the pixels with grey level below this threshold to 0 and the ones above that to 255. The results produced are shown by Figure 4.2 which illustrates a grey scale image thresholded using this method where the calculated threshold value was 145.

(a)

(b)

Figure 4.2: (a) Original greyscale image rotated at 30º from the camera axis, (b) Thresholded image with a threshold of 145.

This method was tested for all the images in the test data collected. The threshold calculated and the thresholded image thus produced, were found to be optimum in each case. Table 4.1 summarises the test results for each of the test images.

Table 1: Test Results for iterative thresholding

Test Image

Calculated Optimal Threshold

Threshold Outcome

Image 1

145

Success

Image 2

146

Success

Image 3

146

Success

Image 4

147

Success

Image 5

149

Success

36

IMPLEMENTATION AND TESTING

4.2.2 Edge Detection

The edges of the page were detected using the Sobel Edge Detector which is implemented in the inbuilt function of MATLAB called edge. This method was implemented in the program and was test for the threshold images produced in the previous phase. Figure 4.3 illustrates the effect of performing edge detection on the threshold image produced in Figure 4.2(b).

Figure 4.3 : Edge Detection image produced from Figure 4.1(b)

Each of the test images produced good edges. A good edge is defined as a continuous edge, without any breaking points.

4.2.3 Regression Fitting

As mentioned in the design, first a set of points need to be detected to which a regression line will be fitted. To collect such points on the left and the right lines, the following piece of code was executed where m and n are the dimensions of an (m x n) image: for each vertical position = (m/2-40), (m/2-20), m, (m/2+20), (m/2+40) Scan the image from left to right Mark the first white pixel encountered as the left edge pixel Mark the last white pixel encountered as the right edge pixel Append the pixels found to respective arrays End for

37

IMPLEMENTATION AND TESTING

The code will save the points of the left and right edges in respective arrays. These points are then used to fit a regression line to them. The in-built backslash operator ( \ ) in MATLAB performs the regression functionality. Regression finds the equations of the form y = m x + c for the left and the right edges of the page. The top and the bottom edges of the pages are represented as the equation y = c since they will be parallel to the x-axis (discussed in Chapter 3, Section 3.3.1). The following pseudo code finds the respective points (c) on the top and the bottom edges of the page.

At the vertical centre of the image scan from top to bottom Mark the first white pixel encountered as the top edge pixel Mark the last white pixel encountered as the bottom edge pixel

4.2.4 Corner Detection

Based on the concepts of Mathematics discussed in Section 3.3.2, the corners of the pages were determined by writing simple code which used the equations found earlier using regression. Figure 4.4 demonstrates corner detection in the image shown in Figure 4.2; the red circles encircle the detected corner points. Note that in the given example, the top left corner has not been shown, as it lies outside the image.

Figure 4.4 : (a) Image from figure 4.2, (b) Depicts corner detection in the edge image

4.2.5 Quadrilateral-to-Rectangle Mapping

38

IMPLEMENTATION AND TESTING

The final step after the detection of the corner points of the page was to map it onto a rectangle so that an undistorted image is produced. This was implemented based on the mathematics and principles mentioned in section 2.6.2 along with the mapping techniques discussed in section 2.6.3. The implementation of the quadrilateral-to-rectangle mapping technique took time longer than that had been expected and planned. This was due to inconsistencies in the referred document mentioned in section 2.6.2. These errors were identified after a great deal of testing and implementation of the technique and excessive brain racking. The mapping shown in the section 2.6.2 which was taken from Kim et al [15] did not have the points in the cyclic manner and no where in their work was it mentioned either that the points need to be in a cyclic order. This was later realised during trial and error methods and Figure 4.5 shows the correct mapping.

Figure 4.5 : Corrected version of the mapping shown in Figure 2.8

Moreover, the equation 2.10, given by the text was incorrect which was corrected as given by equation 4.1 below:

(Equation 4.1)

These corrections were later found to be in accordance with another paper on the topic presented by Heckbert [23]. The corrected version of the code was then implemented and it succeeded in doing the required task of mapping a quadrilateral on to a rectangle. However, due to this delay the project process had to be rescheduled. Figure B.3 shows the revised plan. As mentioned in the design (section 3.4), both the interpolation schemes were implemented and tested here. It was observed that the implementations of the two techniques in MATLAB took almost equal computational time but bilinear interpolation produced a smoother image than nearest neighbour

39

IMPLEMENTATION AND TESTING

interpolation. Figures 4.6 (a) and (b) show same parts of two images produced using the two interpolation methods. It can be clearly seen that the corresponding marked areas are smoother in 4.6(b). Hence bilinear method was chosen as the final implementation scheme. Figure 4.6 (c) shows the final undistorted image produced using bi-linear interpolation and shows the part used to compare the interpolation schemes.

Figure 4.6: (a) Nearest-neighbour interpolation, (b) Bi-linear interpolation, and (c) Final image produced after mapping the original Image shown in Figure 4.1(a) onto a rectangle using bi -linear method.

The code was then tested for all the images in the test data and all the images produced rectangular undistorted images for the given distorted images.

4.3 SYSTEM TESTING

According to Duetsch [24], software development encompasses a sequence of building tasks which provide with a colossal opportunity of inducing human errors due to the inability of humans to perform and commune perfectly. Hence to certify that the produced system maintains quality standards, it is an utmost necessity to test it for such errors. Boehm [25] suggests that system testing also verifies and validates the system. Verification realises if the product is being built right where as validation observes if the right product is being built. All the individual components of the system have been implemented and tested positively. This section aims to test the system as a whole using the OCR package. This would involve passing the

40

IMPLEMENTATION AND TESTING

undistorted images produced by the system to the OCR engine in order to recognise the text. The criteria for success in testing can be traced back to the functional requirements of the system as mentioned in section 2.8.1. A system is said to have tested positively if it satisfies all the functional requirements. According to the functional requirements the success criteria is described as follows: •

For the given input image with perspective distortion, the system produces an output image free of the distortion. The system does this for images with at least up to 30° rotation out of the image plane.



The OCR accuracy is increased by the system.



The OCR package recognises at least 70% of the characters in the improved text document image.

4.4 TEST PARAMETERS

All the conditions mentioned in section 4.1 will remain applicable for the rest of the duration of the testing process as well. There are two parameters involved in the performance of the system: 1. The axis of rotation: Assuming the camera axis to be the z-axis, the document image can be rotated either along the x-axis or the y-axis. 2. Degree of Rotation: Along the axes, the document can be rotated to different degrees of rotation. This includes turning the page in the positive and negative directions. Figure 4.7 illustrates examples of such rotations.

(a)

(b)

(c)

(d)

Figure 4.7: Images rotated along (a) x-axis +30°, (b) x-axis -30°, (c) y-axis +30° and (d) y-axis -30°.

41

IMPLEMENTATION AND TESTING

The contents of the document pages can be thought of as a test variable, i.e., documents with nontextual such as pictures can be considered for testing the system. However, the algorithm used in the system is independent of the content of the page and only depends on the page borders. Hence testing for documents containing non-textual information is not required.

4.5 TEST DATA

Test data was collected by varying the value of parameters mentioned in the previous section. Images were captured by rotating the document page in the range of -50° to +50° along the x-axis and -40° to +40° along the y-axis. Appendix D contains samples of the test data and Appendix E contains an example of the whole process on an image which was rotate +20° along the y-axis.

4.6 TEST RESULTS

The table below shows the test results when the system was tested with the test data. Legend: Red = failure, Green = success Serial No.

Rotation axis

Degree of

OCR performance (%)

Rotation (degrees) Before Correction

After Correction

1

x

+10°

100

100

2

x

+20°

96

97

3

x

+30°

96

97

4

x

+40°

77

88

5

x

+45°

70

75

6

x

+50°

61

66

7

x

-10°

100

100

8

x

-20°

97

98

9

x

-30°

96

98

10

x

-40°

76

85

42

IMPLEMENTATION AND TESTING

Serial No.

Rotation axis

Degree of

OCR performance (%)

Rotation (degrees) Before Correction

After Correction

11

x

-45°

69

73

12

x

-50°

63

66

13

y

+10°

98

100

14

y

+20°

96

98

15

y

+30°

64

97

16

y

+35°

45

81

17

y

+40°

15

72

18

y

+45°

10

57

19

y

-10°

98

100

20

y

-20°

94

98

21

y

-30°

62

96

22

y

-35°

50

84

23

y

-40°

17

74

24

y

-45°

12

58

4.7 RESULTS ANALYSIS

The system test results show successes and failures at various points. In cases when the document was rotated along the x-axis, the OCR package performed fairly well on original images but the systemproduced images enhanced the performance of the system in all occurrences. System produced positive results up to 45° where more than seventy percent of characters were recognised by the OCR engine. On the other hand, when the document page was rotated along the y-axis, the system showed drastic improvement in text recognition. In the original images, with the perspective distortion, the OCR package’s accuracy plummeted steeply for images that were rotated more than 20° out of the image plane. In contrast, the images produced by system succeeded for rotations 40° out of the image plane. 43

IMPLEMENTATION AND TESTING

The sudden down fall in OCR’s performance for images rotated along the y-axis was due to the fact in these images (as shown in Figure 4.8), the lines towards the top and the bottom of the page of text were at a big angle with respect to the horizontal. The OCR package is incapable of reading lines with more than ±2° of rotation along the z-axis [7]. The files produced by the system, after correcting the perspective distortion, had all the lines parallel to the horizontal and hence gave good results with the OCR.

Figure 4.8: Illustrating perspective

System testing shows that the program satisfies all the functional requirements and thus is marked as a success.

4.8 SUMMARY

The chapter saw a schematic implementation of the various components of the system along with their individual testing. It then detailed the process of testing the integrated system starting with the importance of system testing. Subsequently, the variables that affect the performance of the system were mentioned. The chapter then gave the test data that was collected by varying these variables. It then asserted the test results gathered by testing the system with the collected data set. Finally, the test results were analysed and conclusions were made. The next chapter makes an attempt to evaluate the project and the developed system.

44

Chapter 5 : EVALUATION

5.1 INTRODUCTION

This chapter aims at evaluating the project as a whole and the developed software program (or system). The project is evaluated against the minimum requirements mentioned in section 1.4 and the system is evaluated against the functional and non-functional requirements discussed in section 2.8. Subsequently, the chapter throws light on the possible enhancements that can be made to the system and the scope of future work in this field. Lastly, it summarises the evaluation in the last section.

5.2 EVALUATION OF THE PROJECT

Minimum requirements were delivered in the mid-project report which were not modified and are restated as is in section 1.4 of this report. The criterion for the success of the project was to fulfil these minimum requirements. The first minimum requirement states that the problems confronted with the performing OCR and the limitations of using web cams to capture document images need to be understood. These requirements have been fulfilled by the sections 2.2 and 2.3 of this report. The second of the minimum requirements was to acquire and develop an understanding of the previous work done in the field of solving perspective distortion for OCR. This requirement has been met and has been detailed in the sections 2.5 and 2.6. The next requirement was to produce and implement an algorithm to correct the perspective distortion. The design of such an algorithm has been discussed in Chapter 3 and the implementation of the same in Chapter 4. In the implementation of the algorithm, various problems were encountered which have been mentioned in Chapter 4. The major problem was to correct the formulae and concepts mentioned in the past work which was used as a part of the system. The identification and correction of such errors was beyond the expected scope of the project and displays extended effort in coping with the encountered impediments to the project. Minimum requirements number 4 and 5, state that under normal lighting conditions OCR will perform well for documents rotated at various degrees out of the image plane. This has been fulfilled and detailed in section 4.3 of the report.

45

EVALUATION

The above evaluation states that all the minimum requirements were met by the project and hence evaluates the project as a success.

5.3 EVALUATION OF THE S YSTEM

The success criteria for the system were laid down in the Requirement Specifications mentioned in section 2.8. This included two types of requirements; functional and non-functional. The system will be evaluated separately against the two types of requirements.

5.3.1 Against functional requirements

First functional requirement stated that the system should be able to correct perspective distortion in images that were rotated up to 30° out of the image plane. The test results given in section 4.6 manifest that the system could correct perspective distortion in images that were rotated up to ±45° out of the image plane along the x-axis and up to ±40° along the y-axis. This shows that the system has not only satisfied the requirement but has exceeded it by achieving higher results. The second functional requirement stated that the system should increase the accuracy of the OCR package. This is again clearly evident from the test results. The third functional requirement asserted that the system should be able to recognize at least seventy percent of the characters in the produced undistorted document image. Once again this can be directly seen from the outcome of the test.

5.3.2 Against non-functional requirements

As mentioned in section 2.8.2, the system will be evaluated for the non-functional requirements beginning from the second quality characteristic laid down by ISO 9126. Reliability System constraints had been set as mentioned in section 2.4 and later in section 4.1. While the system is operated within these constrains it has been found to work well as seen in the previous section. The system is not required to work outside these settings and hence this trait is not applicable for system evaluation there. 46

EVALUATION

Usability Setting up of the environment to capture images takes time. However once this is done, the system itself is operated by typing just one command at the command prompt, which does not require any effort indeed. Hence the system is found to be easy-to-use. Efficiency From the tests done, it was noted that the OCR package takes about 6-7 seconds to perform OCR on the text documents. The correction algorithm takes similar amount of time to correct the image. The trade-off is between time-required and quality-achieved which is perfectly acceptable for the achieved results. Hence the program is classified as ‘efficient’. Maintainability The program has been implemented in MATLAB, which is itself an easy-to-understand language. Moreover the code produced is modular and all the modules have been commented appropriately to be understood even by a novice programmer. Hence it is easy to maintain the code by anyone who reads it for the first time. Thus the system is ‘maintainable’. Portability The system is built in MATLAB and requires to be executed in the same. Hence it is as portable as the package itself. The official MATLAB website [26] states that MATLAB is compatible to work with various platforms such as Windows, Linux, Unix, Macintosh and Solaris, which constitute majority of platforms used today. As the system is as portable as the package itself, thus the system is categorised as highly ‘portable’. The system is found to satisfy all the applicable non-functional requirements and hence is said to be evaluated positively.

5.4 FUTURE W ORK

The project has presented an approach that has begun the exploration in the field of performing OCR on images suffering from perspective distortion and has provided with a good foundation for prospective students wanting to further explore the research area.

47

EVALUATION

Now-a-days, handheld devices with built-in camera technology, such as PDAs, mobile phones, etc., are becoming increasingly popular due to their portability and cost effectiveness. The project can be further studied into to apply perspective distortion corrections to images captured using contemporary handheld devices. These devices pose greater problems, the biggest being of poor image resolution. Thouin and Chang [27] have presented an approach to restore low resolution images for the purpose of OCR by producing strong bimodal images. More realistically talking, the images may be captured under poor lighting conditions that would introduce lighting variations like shadows in the document image. This is another area of the application that needs to be explored in order to be able to apply this to real world scenarios. The system developed does not deal with images containing skew distortions. Nevertheless, the previous work mentioned in the background chapter mentions the two approaches that have been made in order to restore document images with both perspective and skew distortions. The system can be further developed to implement these techniques. The project can also be evolved to cater to distortions due to different document orientations and with varied types of paper quality such as glossy, wrinkled or curved media. Furthermore, the assumptions for the project include for the document to have black text written on white background, which can be extended to cater to multi-coloured documents.

5.5 EVALUATION SUMMARY

The evaluation of the project against the success criteria has shown that the overall project and system were a success. The goals set initially have all been met and further enhancements to the system have been suggested. It is tough to compare the results of the project with other approaches taken as the combination of system constraints, the test images, and the OCR package used to evaluate, has not been implemented by other authors and can greatly affect the results. Moreover, details of the test results performed by other authors have not been mentioned in the published papers and are not available elsewhere, which makes it impossible to compare the results.

48

Chapter 6 : CONCLUSION RECAPITULATION OF THE PROJECT P ROCESS

The investigation and deployment process of the project closely followed the various stages of the hybrid of waterfall and the incremental models to develop the novel system. Firstly, a comprehensive background research was performed to understand the work done in the field of correcting perspective distortion in document images and the complexities involved with using webcams and OCR packages. Further research was devoted to identify the basic techniques of image processing used to carry out the processes involved in performing such a task. Given the constraints, a complete implementation of the complex algorithms used to solve the problem was (and is) not possible. Hence the important sections were identified and chosen to be employed to build a system. The next phase was to lay out a success criteria that would be used to evaluate the completed system and this was done through requirements specification. The following stage and chapter of the project was to propose a design independent of the implementation technology. This also involved evaluating various tools that could be used to develop the system and justification for the chosen one. Implementation phase followed the design phase adhering to the project methodology. The phase included modularisation of the implementation process into parts that were deployed and tested separately. All the modules once tested positive were integrated into one system and this was tested upon. The system was finally evaluated against the success criteria laid out at the beginning of the process. Suggestions for future work were presented for the students looking forward to extend the research in this field. The methodology chosen for the project was most appropriate as it comprised of the timeliness provided by the use of the waterfall model, the flexibility of the realistic model and the reduction in implementation complexity by the use of the incremental approach at the implementation stage. This is an ideal methodology for the type of project that was to be carried out. The project and the system finally satisfied all the requirements stated at the early stages and thus are considered to be successful in achieving their aim.

49

References

[1] Avison D. and Fitzgerald G., (1995), Information System development: methodologies, techniques and tools , 2nd Edition, McGraw-Hill. [2] Royce W., (1970), Managing the development of large software systems: concepts and techniques, in: Proceedings IEEE WESCON, pp: 1-9. [3] The Standard Waterfall Model for Systems Development, URL: http://asdwww.larc.nasa.gov/barkstrom/public/The_Standard_Waterfall_Model_For_Systems_Developme nt.htm, [13/01/2004]. [4] General Idea of Iterative Models , URL: http://www.csis.gvsu.edu/~heusserm/CS/CS641/FinalSpiralModel97.ppt [17/01/2004]. [5] The Spiral Model, URL: http://searchvb.techtarget.com/sDefinition/0,290660,sid8_gci755347,00.html, [18/04/2004]. [6] PageCam , URL: http://www.pagecam.com/ [18/12/2003]. [7] Nicel D., (2003), OCR from a web camera, Leeds: University of Leeds, School of Computer Studies, pp 6 – 8. [8] Pilu M., (2001), Extraction of illusory linear clues in perspectively skewed documents , in: Proceedings of the 2001 IEEE Computer Society Conference. pp: I-363 – I-368 vol.1 [9] Bruce V., Green P. R, (1991), Visual Perception: Physiology, Psychology and Ecology, 2nd Edition, Psychology Press. [10] Sonka M., Halvac V., and Boyle R., 1993, Image processing, analysis and machine vision, Chapman & Hall. [11] Modena S. and Messelodi C.M., (1999), Automatic identification and skew estimation of text lines in real scene images, PR, Vol. 32, No. 5, pp 791-810. [12] Clark P. and Mirmehdi M, (2001), On the recovery of oriented documents from single images, in: Proceedings of ACIVS 2002. [13] Bolles R. and Fischler M., A RANSAC-based approach to model fitting and its application to finding cylinders in range data, in: Proceedings of the 7th International Conference on Artificial Intelligence, 1981, pp. 637 – 643.

50

REFERENCES

[14] Efford N., (2000), Digital Image Processing: a practical introduction using Java, 1st Edition, Pearson Education Ltd. [15] Kim D., Jang B. and Hwang C., (2002), A planar perspective image matching using point correspondences and rectangle -to-quadrilateral mapping, in: Proceedings of Fifth IEEE Southwest Symposium on Image Analysis and Interpretation on 7-9 April 2002, pp. 87 – 91. [16] Software Project Management, URL: http://www.comp.leeds.ac.uk/se22/lectures/9QualityAndDesignC.pdf [13/03/04]. [17] Linear Regression, URL: http://people.hofstra.edu/faculty/Stefan_Waner/RealWorld/calctopic1/regression.html [18/03/2004]. [18] Thresholding, URL: http://www.cee.hw.ac.uk/hipr/html/threshld.html [16/03/2004]. [19] Hoffman K. and Kunze R., (1961), Linear Algebra, Englewoods Cleff: Prentice-Hall. [20] Java, URL: http://philip.greenspun.com/wtr/dead-trees/53008.htm [26/04/04] [21] MATLAB , URL: http://ccrma-www.stanford.edu/~jos/matlab/What_is_Matlab.html [12/04/2004]. [22] Isodata method, URL: http://www.mathworks.com/matlabcentral/fileexchange/loadCategory.do?objectType=category &objectId=26 [26/03/04]. [23] Heckbert P., (1989), Fundamentals of Texture Mapping and Image Warping, Master's thesis, UCB/CSD 89/516, CS Division, U.C. Berkeley, pp. 17 – 20. [24] Deutsch M., (1982), Software verification and validation: realistic project approaches, Prentice Hall. [25] Boehm B., (1979), Software Engineering: R&D trends and defense needs, in: Research Direction in Software Technology, MIT Press, Cambridge. [26] MATLAB requirements , URL: http://www.mathworks.com/products/matlab/requirements.html [20/04/04]. [27] Thouin P. and Chang C., (2000), A method for restoration of low-resolution document images, in: International Journal on Document Analys is and Recognition (IJDAR), SpringerVerlag Heidelberg, vol. 2, no. 4, pages 200-210.

51

Appendix A: Project Reflection

The motive of this section is to provide my reflection on the project experience, the knowledge I have gained and the lessons that I have learnt in order to advise future students undertaking this type of a project. The project area comes under the category of a branch of Artificial Intelligence known as ‘Document Image processing’. I was suggested by few of my friends, who had done projects in past, to take a project I was really interested in. After pursuing course modules like AR11 – ‘Introduction to Artificial Intelligence’, AI21 – ‘Image and Speech Processing’ and AI31 – ‘Computer Vision’, I had developed an understanding and keen interest in the field of Artificial Intelligence, and, Computer Vision in particular. Apart from this I also had an interest in programming. A combination of both these aspects led me to take up the project I chose. While choosing a project, I came across the short description of this project and read the first line – “The project will investigate how you might perform Optical Character Recognition from a document held in front of a camera in ordinary lighting conditions.” The first impression that I got after reading the line was that I had to develop an OCR tool which will be capable of reading text. I found this extremely challenging and was stimulated to take the project. After I undertook the project and discussed in further detail with my supervisor, I understood the real problem that was to be solved and, as can be seen from the project, it was quite different from my perception. This did shake my interest initially, which I regained after realizing the challenge of the real problem. The lesson I learnt, and the advice I would like to give is that, it is very important to understand the problem properly before making the decision of undertaking the project and before starting to develop a solution. Due to my options and interest, I had elected for modules that happened to constitute fifty credits in my first semester and thirty credits in my second semester, apart from the forty credit project. This proved to be a great advantage to me, because, due to prior engagements in other modules and graduate job applications (which take a lot of time, if you are planning to take up a job), I was late in realizing the need to start the project at an early stage during my first semester. By the end of December I had done little to even write in the mid-project report. But by the end of the first semester I had completed fifty credits of modules, which left me with only three modules in the second semester and a lot more time to spend on the project as compared to the first semester. I found that this was the case with a lot of other students who had more credits in their second semester and equal amount of work to do in the project. Hence it was easier for me to cope with the conditions than other people and a lot more time to write up the report at the end. At the beginning of the first semester it seems that there are six months to do the project and even if you make a start, you won’t feel the need 52

APPENDIX A: REFLECTION

to hurry due to the time relaxation. Thus, I would recommend, firstly, taking up more credits in the first semester than the second. Secondly, and most importantly, realizing the stringency of time and thus managing the time throughout the final year appropriately, keeping room for unexpected requirements (coursework, interviews, sickness and relatives!) is extremely important. Consult the past year reports. Reading good reports give a better idea of how to structure the report and about general concepts used in projects. Comparing good reports to not-so-good ones can give an understanding of do’s and don’ts of a report. But the idea is not to spend time too much time over this!! I added to my technical skills by learning a new programming language – MATLAB. I had never programmed in MATLAB before this project hence it was a good learning experience. Initially I had problems learning the new language but I gradually picked up and it came out well. Overall, this project has been a great learning experience. I am more confident of undertaking the responsibility of large scale projects, which might be soon required when I enter the corporate world.

53

Appendix B: Project Schedule

Figure B.1: Gantt Chart of the Project Schedule

54

APPENDIX B: P ROJECT SCHEDULE

Figure B.2: Revised Project Schedule 1

55

APPENDIX B: P ROJECT SCHEDULE

Figure 6.3: Revised Project Schedule 2

56

Appendix C: OCR Evaluation

Table C.1: Test results of ‘PageCam’ (source: [12])

57

APPENDIX C: OCR EVALUATION

Table C.1 (continued…)

58

Appendix D: Test Samples The first five pictures in this appendix were used at the implementation stage to test the individual components. The rest of the images were used to test the integrated system.

Figure D.1: +10° along x-axis

Figure D.2: +20° along x-axis

59

APPENDIX D: TEST SAMPLES

Figure D.3: +30° along x-axis

Figure D.4: +40° along x-axis

60

APPENDIX D: TEST SAMPLES

Figure D.5: +45° along x-axis

Figure D.6: +50° along x-axis

61

APPENDIX D: TEST SAMPLES

Figure D.7: +10° along y-axis

Figure D.8: +20° along y-axis

62

APPENDIX D: TEST SAMPLES

Figure D.9: +30° along y-axis

Figure D.10: +35° along y-axis

63

APPENDIX D: TEST SAMPLES

Figure D.11: +40° along y-axis

Other test images, which were rotate in the negative direction, were mirror images of the above given images.

64

Appendix E: The Process

Figure E.1: Thresholded image for Figure D.8 which was rotated +20° along y-axis

Figure E.2: Edge Detected image

65

APPENDIX E: THE P ROCESS

Figure E.3: Detected corners in the image

Figure E.4: Final Image produced using backward mapping and bilinear interpolation

66

Appendix F: Experiment Apparatus

Figure F.1: The setting of the apparatus used in the project. Web camera mounted on top of the monitor and the document placed on a TV antenna that provides adjustable perspecti ve angle.

67

APPENDIX F: EXPERIMENT APPARATUS

Figure F.2: Another setting

Figure F.3: Figure showing the TV antenna used to place the document and the protractor and D used to measure the perspective angles.

68

Final Project Report 5

This type of software development life cycle ... This type of lifecycle is very useful ...... http://www.csis.gvsu.edu/~heusserm/CS/CS641/FinalSpiralModel97.ppt ...

3MB Sizes 2 Downloads 79 Views

Recommend Documents

Project Final Report
Dec 27, 2007 - Appendix F. A Tutorial of Using the Read-Only Zip File. System of ALTERA in NIOS II ..... Tutorial of how to use the flash device and build a read-only file system in NIOS II. IDE is in the ...... Local Functions. -- Type Definitions.

CCTF Report Draft. FINAL.1.14.13 (2) (5).pdf
There was a problem loading this page. CCTF Report Draft. FINAL.1.14.13 (2) (5).pdf. CCTF Report Draft. FINAL.1.14.13 (2) (5).pdf. Open. Extract. Open with.

pollution prevention crematoria project final report
Tetra Tech began the project by contacting crematoria, funeral homes, and trade associates within ... Fetuses and young children suffer the greatest risk because their nervous systems are still developing. .... George Malesich, Denver Archdiocese and

CCTF Report Draft. FINAL.1.14.13 (2) (5).pdf
Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. CCTF Report Draft. FINAL.1.14.13 (2) (5).pdf. CCTF Report Draft.

Final Report
The Science week, which is organised bi annually by students and teachers of the last two years of the ...... We will end this review with Pulsar, the publication published by the SAP for more than. 90 years. Different from the ...... It will be clou

Final technical report for CGP Tanzania project-26Nov2014.pdf
... the Canadian International Food. Whoops! There was a problem loading this page. Retrying... Final technical report for CGP Tanzania project-26Nov2014.pdf.

Final Report
39.2. 6.10. 27.5-54.3. 95. 35.0. 6.02. 25.3-55.2. S.B.L.. 98. 42.4. 8.55. 29.6-68.8. 98. 34.0. 4.24. 26.4-45.6. USH 2. W.B.L.. 59. 33.7. 4.68. 27.7-60.3. 59. 35.3. 4.38.

Final Report - GitHub
... user inputs a certain number of TV shows he wants a recommendation for, let's call this set .... Proceedings of the 21st international conference on World Wide.

Project Report
Mar 16, 2009 - The Weighted L1 norm minimization only gives a very good ... they try to minimize L1 norm which has a computational complexity of O(N3) using linear programming. ... (European Conf. on Computer Vision (ECCV), Mar-.

final report - City of Mobile
Feb 14, 2014 - School Board, Mobile Area Water and Sewer System, and Alta Pointe Health. System; and ... in seven (7) stages: 1. Review of relevant court decisions on MWBE;. 2. ... collected covers three years of procurement activities from 2010-2012

Project Report - Semantic Scholar
compelling advantages of FPDs are instant manufacturing turnaround, low start-up costs, low financial ... specific software and then design the hardware. Confusion ... custom chips, we refer here only to those PLAs that are provided as separate ... B