Preface - GitHub - P.PDFKUL.COM

Viewer
Transcript

and , indicating an ordered list, i.e., an enumeration of list items. The

things

1 hate:

1.

Moldy

2.

People who drive

bread. too slow in the fast la?e.

( a)

The

things

I

The text

as

viewed

hate:

Moldy bread.
People who drive in the fast

too

slow

lane.

(b) Figure We also

The HTML

source

5.12: An HTML document and its

printed

version

examples ofunmatched tags:

and

, which introduce items, respectively. HTML allows, indeed encourages, that these tags be matched by

and

at the ends of paragraphs and list items, but it does not require the matching. We have therefore left the matching tags off, to provide some complexity to the sample HTML grammar we shall develop.?

paragraphs

see

two

and list

There are a number of classes of strings that are associated with an HTML document. We shall not try to list them all, but here are the ones essential to the understanding of text like that of Example 5.22. For each class, we shall

introduce

a

variable with

a

descriptive

name.

3Sometimes the introducing tag has more information in it than just the However, we shall not consider that possibility in examples.

the tag.

name x

for

5.3.

APPLICATIONS OF CONTEXT-FREE GRAMMARS

1. Text is any string of characters that can be literally has no tags. An example of a Text element in Fig

199

interpreted; i.e., it 5.12(a) is "Moldy

bread." 2. Char is any string consisting of a single character that is text. Note that blanks are included as characters.

legal

in HTML

3. Doc represents documents, which are seque?ces of "elements." We define elements next, and that definition is mutually recursive with the definition

of

a

Doc.

4. Element is either ument between

a

Text

them,

5. Listltem is the

string,

or an

pair of matching tags and the doc-

or a

unmatched tag followed

tag followed by

a

by

a

document, which

document. is

a

single

list

item.

6. List is

a

sequence of

zero or more

list items.

1.

Char

??

aI

A

2.

Text

??

eI

Char Text

3.

Doc

??

eI

Element Doc

4.

Element

??

Text

I…

I

Doc
Doc

List

5.

Listltem

??

Doc

6.

List

??

eI

Figure

I

I

I

Listltem List

5.13: Part of

an

HTNIL grammar

describes

as

much of the structure of the HTML

Figure 5.13 is a CFG that language as we have covered.

In line (1) it is suggested that a character can possible characters that are part of the HTML character set. Line (2) says, using two productions, that Text can be either the empty string, or any legal character followed by more text. Put another way, Text is zero or more characters. Note that < and > are not legal characters, although they can be represented by the sequences &1 t; and > ; respectively. Thus, we cannot accidentally get a tag into Text.

be ?"

or

"A"

or

many other

,

says that a document is a sequence of zero or more "elements." An element in turn, we learn at line (4), is either text, an emphasized document, a

Line

(3)

CHAPTER 5.

200

CONTEXT-FREE GRAMMARS AND LANGUAGES

paragraph-beginning followed by

a document, or a list. We have also suggested productions for Element, corresponding to the other kinds of tags that appear in HTML. Then, in line (5) we find that a list item is the

tag followed by any document, and line (6) tells us that a list is a sequence

that there

of

are

other

zero or more

list items.

Some aspects of HTML do not require the power of context-free grammars; regular expressions are adequate. For example, lines (1) and (2) of Fig. 5.13 simply say that T ext represents the same language as does the regular expression

(a

+ A

of CFG's.

+…) *. However,

For

instance,

and

some aspects of HTML do require the power pair of tags that are a corresponding beginning and , is like balanced parentheses, which we

each

ending pair, e.g., already know are not regular. 5.3.4

XML and

Document-Type

The fact that HTML is described

Essentially

by

all

a

Definitions

grammar is not in itself remarkable. be described by their own CFG's,

programming languages more surprising if we could not so describe HTML. However, when we look at another important markup language, XML (eXtensible Markup Language), we find that the CFG's play a more vital role, a?part of the process of using that language. The purpose of XML is not to describe the formatting of the document; that is the job for HTML. Rather, XML tries to describe the "semantics" of the text. For example, text like "12 Maple St." looks like an address, but is it? In XML, tags would surround a phrase that represented an address; for example: so

can

it would be

12

Maple St.

However, it is not immediately obvious that means the address of a building. For instance, if the document were about memory allocation, we might expect that the tag would refer to a memory address. To make clear what the different kinds of tags are, and what structures may appear between

matching pairs of these tags, people with a common interest are expected to develop standards in the form of a DTD (Document-Type Definition). A DTD is essentially a context-free grammar, with its own notation for describing the variables and productions. In the next example, we shall show a simple DTD and introduce some of the language used for describing DTD's. The DTD language itself has a context-free grammar, but it is not that grammar we are interested in describing. Rather, the language for describing DTD's is essentially a CFG notation, and we want to see how CFG's are expressed in this language. The form of

a

DTD is

[

list of element definitions

]>

APPLICATIONS OF CONTEXT-FREE GRAMMARS

5.3.

An element

in turn, has the form

definition,

Element

descriptions

201

are

(description

of

the

element)> The basis of these

essentially regular expressions.

are:

expresslons

1. Other element names, representing the fact that elements of one type can appear within elements of another type, just as in HTML we might find

emphasized 2. The

special

text within term

#PCDATA, standing for any

XML tags. This term

The allowed operators 1.

list.

a

plays

text that does not involve

the role of variable Text in

Example 5.22.

are:

I standing for union,

as

in the UNIX

regular-expression notation discussed

in Section 3.3.1. 2. A comma,

denoting

concatenation.

3. Three variants of the closure operator, as ih Section 3.3.1. These are *, the usual operator meaning "zero or more occurrences of," +, meaning "one

or more occurrences

of,"

and

?, meaning

"zero

or

one occurrence

of." Parentheses may group operators to their arguments; otherwise, the usual precedence of regular-expression operators applies. 5.23: Let us imagine that computer vendors get together to create standard DTD that they can use to publish, on the Web, descriptions of the various PC's that they currently sell. Each description of a PC will have a model number, and details about the features of the model, e.g., the amount of

Example a

RAM, number and size of disks, and so on. Figure 5.14 shows a hypothetical, very simple, DTD for personal computers. The name of the DTD is PcSpecs. The first element, which is like the start symbol of a CFG, is PCS (list of PC specifications). Its definition, PC*, says that

a

PCS is

We then

of five

zero or more

see

PC entries.

the definition of

The first four

things.

a

are

PC element. It consists of the concatenation

other

elements, corresponding

to the

model,

price, processor type, and RAM of the PC. Each of these must appear once, in that order, since the comma represents concatenation. The last constituent, DISK.?tells

us

that there will be

Many of the

constituents

are

type. However, PROCESSOR has it consists of

elements is

one or more

simply text; MODEL, PRICE,

more

structure. We

manufacturer, model, simple text. a

disk entries for

and

speed,

see

a

PC.

and RAM

are

of this

from its definition that

in that

order; each of these

CHAPTER 5.

202

CONTEXT-FREE GRAMMARS AND LANGUAGES

PcSpecs [

(PC*)> (MODEL, PRICE, PROCESSOR, RAM, DISK+)>

(#PCDATA)> (#PCDATA)>

(MANF, MODEL, SPEED)> (#PCDATA)>
]>

Figure

5.14: A DTD for

personal computers

A DISK entry is the most complex. First, a disk is either a hard disk, CD, or DVD, as indicated by the rule for element DISK, which is the OR of three other elements.

Hard

and size

model, speed.

disks, in turn, have a structure in which the manufacturer, specified, while CD's and DVD's are represented only by

are

their

Figure 5.15 is an example of an XML document that conforms to ?he DTD Fig. 5.14. Notice that each element is represented in the document by a tag with the name of that element and a matching tag at the end, with an extra slash, just as in HTML. Thus, in Fig. 5.15 we see at the outermost level the tag . . Inside these tags appears a list of entries, one for each PC sold by this manufacturer; we have only shown one such entry explicitly. Within the illustrated entry, we can easily see that the model number is 4560, the price is $2295, and it has an 800MHz Intel Pentium processor. It has 256Mb of RAM, a 30.5Gb Maxtor Diamond hard disk, and a 32x CD-ROM reader. What is important is not that we can read these facts, but that a program could read the document, and guided by the grammar in the DTD of Fig. 5.14 that it has also read, could interpret the numbers and names in Fig. 5.15 properly.? of

.

are

.

You may have noticed that the rules for the elements in DTD's like Fig. 5.14 not quite like productions of context-free grammars. Many of the rules are

of the correct form. For instance,

(MANF, MODEL, SPEED)>

5.3.

APPLICATIONS OF CONTEXT-FREE GRAMMARS

203

4560 $2295

Intel

Pentium 800MHz 256

Maxtor Diamond 30.5Gb

32x

Figure

is

5.15: Part of

analogous

to the

a

document

obeying

the structure of the DTD in

Fig.

5.14

production

Processor?Manf

Model

Speed

However, the rule
does not have

(HARDDISK I CD I DVD)>

definition for DISK that is like

a production body. In this case, simple: we may interpret this rule as three productions, with the vertical bar playing the same role as it does in our shorthand for productions having a common head. Thus, this rule is equivalent to the three productions

a

the extension is

Disk?HardDisk The most difficult

case

I

Cd

I

Dvd

is

(MODEL, PRICE, PROCESSOR, RAM, DISK+)>

204

CHAPTER 5.

CONTEXT-FREE GRAMMARS AND LANGUAGES

where the DISK+

"body" has a closure operator within it. The solution is to replace variable, say Disks, that generates, via a pair of productions, more instances of the variable Disk. The equivalent productions are

by

one or

a new

thus:

PC?M odel Price Processor Ram Disks

Disks?Disk

I

Disk Disks

There is a general technique for converting a CFG with regular expressions production bodies to an ordinary CFG. We shall give the idea informally; you may wish to formalize both the meaning of CFG 's with regular-expression productions and a proof that the extension yields no new languages beyond the CFL's. We show, inductively, how to convert a production with a regularexpression body to a collection of equivalent ordinary productions. The induction is on the size of the expression in the body. as

BASIS:

If the

already

in the

INDUCTION:

body is the concatenation of elements, then legal form for CFG's, so we do nothing. Otherwise,

there

five cases,

are

depending

the

production

is

the final operator

on

used. 1. The

production permitted

sions

Introduce two grammar.

is of the form A in the DTD

?E1 E2' where E1 and E2 ,

language.

are

expres-

This is the concatenation

variables, B and C, that appear nowhere else Replace A ?El' E2 by the productions new

case.

in the

ABC ??? BEZC12 The first

production, A ?BC, is legal for CFG's. The last two may or legal. However, their bodies are shorter than the body of the original production, so we may inductively convert them to CFG form. may not be

2. The

production is of the form A ?E1 I E2• For this replace this production by the pair of productions:

union operator,

AA ?? E? Again,

these

their bodies

apply the

productions are

rules

may

or

shorter than the

recursively

and

may not be legal CFG productions, but body of the original. We may therefore

eventually

convert

these

new

productions

to CFG form.

3. The

production

is of the form A

that appears nowhere

else,

and

?(E1) *.

Introduce

a new

replace this production by:

variable B

5.3.

APPLICATIONS OF CONTEXT-FREE GRAMMARS

205

A ?BA A

?e

B ?E1

4. The

production

is of the form

that appears nowhere

else,

and

A?(E1)+. replace

this

Introduce

a new

variable B

production by:

A?BA

A ?B B

5. The

production

is of the form A

?E1

?(E1)? Replace

this

production by:

AA ?? eE Example

5.24: Let

to

us

consider how to convert the DTD rule

(MODEL, PRICE, PROCESSOR, RAM, DISK+)>

legal CFG productions. First,

catenation of two

we can

view the

expressions, the first of which

body

of this rule

as

the

con-

is MODEL, PRICE,

PROCESSOR, RAM and the second of which is DISK+. If we create variables for these two subexpressions, say A and B, respectively, then we can use the productions: PC?AB A?M odel Price Processor Ram B ?Disk+

Only and the

the last of these is not in

legal

form. We introduce another variable G

productions: B

?GBIG

C ?Disk

special case, because the expression that A derives is just a concatenation of variables, and Disk is a single variable, we actually have no need for the variables A or G. We could use the following productions instead: In this

PC?M odel Price Processor RamB B ?Disk B ?

I

Disk

206

CHAPTER 5.

5.3.5

CONTEXT-FREE GRAMMARS AND LANGUAGES

Exercises for Section 5.3

Exercise 5.3.1: Prove that if in

given

Hint: Perform *

an

string of parentheses is balanced, in the sense generated by the grammar B ?BB I (B) Iethe length of the string.

a

then it is

Example 5.19,

induction

on

Exercise 5.3.2: Consider the set of all

strings of balanced parentheses of two round and An types., square. example of where these strings come from is as follows. If we take expressions in C, which use round parentheses for grouping and for arguments of function calls, and use square brackets for array indexes, drop out everything but the parentheses, we get all strings of balanced

and

parentheses of these

two

f

becomes the mar

types. For example,

(a [i]

*

(b [i] [j]

[g (x) ] ) ,d [i] )

,c

balanced-parenthesis string ([] ( [] [] [()] ) [] ). Design only the strings of round and square parentheses that

for all and

a

grambal-

are

anced. ! Exercise 5.3.3: In Section

5.3.1,

S

and claimed that

doing

the

we

S8

?eI

could test for

following, starting

with

considered the grammar

we

I

i8

I

iSeS in its

language L by repeatedly Tþe string w changes during

membership a string w.

repetitions. 1. If the current 2. If the 3.

string begins

string currently has

Otherwise,

delete the first

these three steps

on

Prove that this process

a)

b)

An element

!

c)

are

can

be

e

e's

(it

is not in L.

i's), succeed;

immediately

w

is in L.

to its left. Then

repeat

string.

following by

w

may have

and the i

new

A list item must be ended

lists

no

fail;

correctly identifies

Exercise 5.3.4: Add the *

the

with e,

the

strings

in L.

forms to the HTML grammar of a

Fig.

5.13:

closing tag

PREFACE

Preface

Preface -

preface

Preface - Sign in

Contents PREFACE ...

Preface -

The Life of Prayer PREFACE

Remember [Dedication, Contents, Preface].pdf

GitHub

GitHub

Download PDF A Preface to Marketing Management ...

Preface - GitHub

that suggest a price for a particular item. 2. The documents to be searched cannot be cataloged. For example, Ama- zon.com does not make it easy for crawlers to find all the pages for all the books that the company sells. Rather, these pages are generated "on the fl.y" in response to queries. However, we could send a query ...

Download PDF

10MB Sizes 0 Downloads 363 Views

Report

Recommend Documents

PREFACE

raised, trees, wells and others in Village account No.2(adangal) and to assess ..... person shall possess a Fire Arm without a valid licence under this act. .... guarding the premises or property of the Company shall be entered as a retainer in the.

Preface

Revelations 4:7 - And the first beast was like a lion, and the second beast like a calf, and the third beast had a face as a man, and the fourth beast was like a flying eagle. ... force to raise a cloud of dust, instill great fear in any prey. AND ra

Preface -

writing but does not stop there. Going beyond the essentials, this book helps you: â« Start with an attention-catching introduction. â« State your intention effectively.

preface

early and modern masters is entirely contrary to the ..... a la distance d'un ton ou d'un demi-ton, une seconde precedes a longer note at the interval of a tone or.

Preface - Sign in

110.12 Requirements for Electrical Installations: Mechanical Execution of Work. ..... Service equipment installed in hazardous (classified) locations shall comply ...

Contents PREFACE ...

four happy years, I decided to cancel all my social commitments, and ... thought I would do myself a lot of good, in this situation, .... All these campaigns, however, came to a halt with the ...... Basra and Baghdad during the 8thâ10th centuries A

Preface -

Since the Holy Name can deliver the conditioned soul from all material suffering, it is called .... KÃ¥Ã±Ã«a and clap His hands, and in this way He commenced His ...

The Life of Prayer PREFACE

come into any such crisis, but shall be kept out of situations which would be too trying, carried through the places which ...... and the recovery of the race commences the moment the soul begins to trust its ...... strength, which, Phoenix like, ris

Remember [Dedication, Contents, Preface].pdf

There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Remember ...

GitHub

domain = meq.domain(10,20,0,10); cells = meq.cells(domain,num_freq=200, num_time=100); ...... This is now contaminator-free. â Observe the ghosts. Optional ...

GitHub

data can only be âcorrectedâ for a single point on the sky. ... sufficient to predict it at the phase center (shifting ... errors (well this is actually good news, isn't it?)

Download PDF A Preface to Marketing Management ...

... Book Online PDF A Preface to Marketing Management , J. Paul Peter PDF A ... simulations or offer modules on marketing management for MBA students. ... courses that implement a cross-functional curriculum where the students are ...