Non-standard #

Algorithms

Stringology: S. Muthukrishnant

and

K.

Abstract

Palem*

As we show problem

here,

derived

concerns string matching problems, wherein a position in the “text” (of size n) matches one in the “pattern)) (of size m), based on very general relationships between the corresponding “symbols”. For example, string matching with don’t string matching prob.’ cares is a simple non-standard lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet X; these wildcards match ever-y symbol from Z. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions.

uses 0(min{7r,

Non-standard

ment.

Non-standard



For

Basic

three

string



where

problem.

convolution

the

bound

RAM

model

convolution allow

etrized *This

will

or adapting

the

encode

using

integer

us to infer

parameter

that for

any

convolutions

for of the

supported

NY

10012,

Division,

USA;

these

latter [K089]

that

each

eight

two

families:

(eg.,

involves text

non-

variant

counting

position),

matching

other

problems of num-

and

(in

which

nonthe

k-

is a basic example).

We

problem,

that

upon

graphs



in

match

our

the

particular,

its

comple-

bounds

the

and

induced

the

and

re-

cliques

“dominating”

“clique

edge

cov-

and

ran-

complement.

provide

improved

algorithms,

as well

running

and

lower

sizes of the

graph,

in its

also

pected

out

times

for

deterministic ae those

some

with

better

non-standard

exstring

problems.

that

n.

to (in

input

vectors.

grant 251

The

the

by NSF/DARPA

field

lems

of

with

are

also

finding

212-

P.

here

in

cal

string

Yorktown 914-984-

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and Its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

associated

the

in practice

of matching

associated

symbols

this

(more

refer

to

these

string

depend

on more

its

higher KR87]

checking

A of

that

arise

demanding

beyond

family

of string

size

classi-

are identical.

identical.

aa non-standard

problem

and

of problems

go well

are

of

The

In all of these probone in the pattern

symbols

general)

that involve

AC75,Bi77,Ba78,

number

that

stand-

phrase

n.

al-

problems

typically

size

problems. “matches”

a large

from

a

problem[KMP77]

provided

well-known

of

probas

application —

a pattern

variants[KMR72, of such location

tions

of

in

these

problems

string

are examples lems, a text

naturally

770

the

tezt

In contrast,

STOC 94-5194 Montreal, Quebec, Canada @ 1994 ACM 0-89791 -663-8/84/0005..s.50



matching

dimensional

an

rich aa well

often,

stringology

occurrences

a larger

is very

from

standard

all

rn

Quite

well-motivated In

we introduce

Mercer

[Ga85]

mathematical

structure.

point.

num-

stringoiogy substantial

gorithmic

un-

Center,

Introduction

1

These

param-

704, o. Box Heights, NY 10598, USA; [email protected], 9846. palemIIMheory.stanford.edu, 415-723-4405.

at

for

matching

string

depend in the

domized

best-

solving

Research

the

from

matching

following

matching

turns

ers/partitions”

also

[email protected],

T. J. Watson

the

mat thing,

string It

cliques

truncated

problems

der grant number CCR-89-06949 and by NSF under ber CCR-91-03953. t Cowat ~~titute of Mathematical science, Street, New York, 998-3061. *IBM Resew&

in

improvement

these

algorithms

was partially

of

results

string

string

problem

standard

on

reductions,

from

threshold

matching

a variant

analogous

of mismatches

boolean We

for

the scheme

Interestingly, all of the above results are derived using the structure of the “match graph” defined by of the given inst ante of the nonthe mat thing relation

generaliza-

here.

derive

mismatches

algorithms

convolutions.

non-standard

string

standard

fastest

JR})

counting

ductions

~ depends

the

by extending

are drawn

ber

including

in the

of

classical

of f2(~( IX 1)) convo-

introduce

we show,

faster

its

also

standard

model.

with

truncated research

we

algorithms

yield



function

by improving

problems

and

are proved

to this

best–known

RAM)

that

family

bound

results

algorithms

reductions

cares

(increasing)

known

boolean

don’t

a lower

model

of these

this

the

this

In the

from

We variants that

Matching:

These

match



with

we prove

lutions,

all

String

problems

mat thing

tions the

stringology

Complexity*

simple

example

problems

stringology matching

noif the —

— with

we

is the “don’t

cares” the

[F P74].

text

from

the

card” bol

In

and/or

underlying

symbol from

in

the

the

Unix

agrep

and

problems

from

nience,

refer

as

Since

to problems

there

advances[AL88,Ab87, phisticated standard

string

Despite

these

We

matching contrast

this

with

stringology[GS83,

in

bounds

some

our

space

is to fill

this

inherent

complexities

arising

in non-standard

provide some well

algorithms cases.

These

tremely

fast

mat thing

in

algebraic

sharp These

of two

will

correspond

a that

multiplying vectors

n

m,

of

resulting

alternately, a boolean

bounds

models are

to

any

. . .a~–l

convolution

polynomials

on

GF2

convolution.

in

in

(For

they widely with

is shorter the

case

a polynomial

which The

case resulting

than

involves

for imply

c where

Ci

Depending defined either we

refer

to

refer

to

in O(lmin(r,

@)) upper

known

refer

to

param-

as truncated

if the

running string

string

we show

variety

of non-standard

that

match-

algorithms when

convo-

T convolutions,

algorithms

problemsz

for

truncated

convolution in solving

of results

non-

can be improved.

bottleneck string

for

T < @,

truncated than

of existing

mat thing

the computational

fewer

in

irnprovements

improved

parameterized

times

group

summarized

that

non-standard

using

bodies

second

(also

In particular

that

can be solved

Our

mat thing includes

ema wide

problems. optimal

time-

the BC or PC model it can be proven that no faster algorithm exists for parametrized truncated convolution. 2The parameter T gets mapped into a structural aapect of the non-standard string matching problems, which we will describe in the sequel.

or it

follow

lution

10n

the

convolution), we

= on on

will

convolutions.

Therefore,

Formally,

Extrun-

best

simply

conclude

algorithms

would

a single algorithm

is given.

will

reductions

RAM

standard

b = bobl . . .b~-1,

vector n -1. are

which

text)

carry-overs.

b

to the

we can now

ing

truncated

b (vector

we

convolutions

to existing the

for an

in were

)

1.1),

in

earlier

parameterized

is the

convenience,

Consequent Section

that

can be computed

truncated

convolutions.

of

defined

where

is

convo-

conva,lutions

m,

be It

truncated

convolutions

this

dependcould

convolution.

erized

of de-

of the in-

problem

Kosaraju[Ko89]

we show

out out

as part

Truncated

standard

earlier

As before, this

n =

parame-

are left

specified

by

work

with

ai x bj ame left

convolution 1.

interre-

defined

terms

p aramet

of these

via

of parametrized

truncated

convolutions;

bound.1

on clas-

the

a and

and is

in as

seven

based

pattern

correspond

operation

for

Informally,

vectors

the

without

a = aoal

their

convolution

of computation

the

O(@

cated

terms what

parameter

their

standard

polyno-

of all

problem,

of operation,

T =

tending

it 0/1

Ej(ai-j x bj), for 0 S i S the context, these convolutions a field

string

[AHiU,WC76,BG91]

to

will them

given

lower

model”

Convolution

>

group

non-standard

comparisons

operation.

vector

first

takes

effi-

for non-standard

(See Appendix).

introduced

of the

that

ex-

concern

models

replacing

convolution

that

Our

as

with

1.1 below

convoiutions[AHU74]. by

“comparison

the

different

particular

here.

derived

used

times.

Section

value

more

alll known

convolution

ai, some

standard

when

originally

in

strictly

are

problems,

This of the

or boolean

to see that

etrized

we provide

we introduce are

in

a polynomial

we

prob-

that

to varieties

a boundary

the field

times

schemes

on T and

upon

deterministic

and

running

of twelve

problems

sical

include

approaches,

expected

running

to

for these

complexities

Exactly

is the

bounds;

1.1 as well.

on compu~ting

matching RAM,

for each

case

lower

improvements

that

rely

convolutions.

ing

the

any

algorithms

the

string

in-

lution

Additionally,

improved

algorithms

we relate

each ai.

of the

time

bounds

Section

Note

problems

with

easy

our

for

upper

convolutions.

put

of problems

techniques

convolutions.

summation.

of

in

algorithms

is a variation

pends

match that

in

problems.

First, these

stringology.

summarized

the complexity

family

the

that

of known

in the

in which,

thrust

in Section

algorithms

yield

it follows

matching

r,

best-known

summarized

or boolean

ter

times

in understanding,

a large

with

as randomized

of results

gap

primary

the

essentially

is, o(nm)

ducibilities

of the

in the running

(or of the

can be found

times

Second,

well.

[BG91,CC+93],

convo-

convolution

justifications

facts,

non-standard

non-standard

of problems

are

employ

that

mial

so-

very

that

problems

than

string

inher-

understanding

The

of

must

cient,

of non-

the

the

complexities

constants

these

truncated

CP91]

on the

From

additional

understood

deep

results

powerful

prob-

detailing

definitions

show

these

lems

conve-

a variety

of

cases [G G91,CH92].

paper

for

is not the

standard cluding

and

several

developments,

problems

time

of

non-standard

MR92]

structure

inherent

al-

also

of these

the running

problems.

important and

general

matching

been

solutions

mat thing

complexity

string

have

Fis-

a variety

polynomial

boolean

Further

in the BC or PC models

recently

(For

from

string

Ko89,AF91,

algorithmic

for

stringology.

non-standard

then,

a very

is applicable

as the

or the

respectively.

and precise

We each

are of

the

to

model

2.

and

Historically,

provided

which

models

systems—for

and

[WM92].

non-standard

we will

stringology

stringology

searching

P~

model

notion

cares

be referred

(or

11~

of the

general

will

lution

ever-y sym-

don ‘t

grep[KP84]

Paterson[FP74] solution

more

models

“wild-

occurrences

with

text

facility

gorithmic

ent

this

from symbol

a special

against all

non-standard in

location

a (basic)

or

match

mat thing

from

in

developed

Z,

under

interest

instance,

lems).

can

text,

problems

has

is to find

String

fundamental

each

either

alphabet

goal

the

of mat thing.

cher

problem,

@ that

X;

pattern other

this pattern

as

computational

771

space

tradeoffs

results

for

(both

derived

scribed in

and

Section

1.2.

improved

Our

the we

overview our

results

and

the

third

proof

each

in Sections

technical

ideas.

included

1.1

and

in this

bounds

and

this

two

section,

mat thing eleven to

we consider

haa

problems.



In

the

each

care

them

string

as basic,

is matched

main

model,

Q(log

Fischer

from

we

above

approach

yields

an

1.1.1,

matching

1.1.2,

known

algorithm

In

1.1.3

are

discussed

respectively.

problem,

presenting

paper,

when

in terms

sumed

that

length

ing

in Sec-

we discuss

convolution

and

m

string

of positions the

context

note

the

spective

the

text

of the

For

computing

of vectors

RAM.

O(ndlog

m)

Also,

time

the

are

best

known

m(<

n)

when

on the

m de-

the

given

re-

(or takes,

both

on the



In

the

A is at

the

these

RAM

when

can

algorithm

known

B.

takes

linear

while

best

known

more

(say,

O(n@polylogm))

In most

or near-linear algorithms

(i.e.,

cases,

our

O(npolylogm))

for

1? take

quires

fastest reduction

at

gorithms

for

these

convolutions[Ab87,

time

<

the

where

subsets

these

RAM and

1X1 ~

in

above

we show

m =

be

unless

problem

take

best

re-

on the convolu-

truncated

The

that

reduced

IX 1.4 There-

this

convolutions

faster.

by

[AF91].

can

comment,

problems

again

from algorithms

@,

parameter

problems

Once

Farach

problem

@,

solved

have

the main

case

is matched

Amir

when

earlier

end-

match

case of string

the

that

S2( 1X1) boolean

IX!

be

state

associated

show

of

with

its

(segments)

now

in

is 0(n).3

convolution

least

by

positions

in the

bound

and

on our

tions

for

this

problems

a contiguous

convolutions.

adaptation

based

be

two

the

of string problem

consider

of all

we

and

problem

subsets

we

in the

non-empty

of this to

We will

match-

position

set X, specified

instance

cases,

RAM

third

problems;

sizes

truncated

fore,

The

problems,

these

Abrahamson[Ab87]

the time

asymptotically

the

convo-

is string

a specified

ordered

model,

a simple

family

with

subsets,

BC

less than

by the

log 121 + whereaa

each

Q( IZ 1) (boolean)

least as hard as problem 1?, or B is reducible to A. By this we mean that B can be reduced to A in time taken

best–

Paterson[FP74]. exactly

is restricted

problem

in both

by

log m)

respectively)

problem

result

and

2 log IE I boolean

corresponding

with

the

and

to

we state

In



RAM.

places,

and

is a variant

these

of the

require

we say a problem

d convolutions

it requires

in

convolution O(n

above

problem,

intersection.

for

sum

algorithms

take

of the

mat thing the

vectors

of

convolu-

classical

takes

X.

ranges

their

results

number

n and

shorter

the

in this

In this

subset

a non-empty

non-

respectively;

re-

bound

algorithm IZI)

convolutions

takes

problem

In both

provided

vectors of

the

and

respectively)

at various

m

or polynomial

n and

that

of two

This

O(log

the

boolean

alphabet

each

points.

it is as-

cent ext

problem

classical

to Fischer

[FP74]

with

“segment”

convolutions,

Therefore

(or takes, we mean

the

pattern

longer

boolean

of lengths

on the

RAM,

and

example, the

in n and

of (truncated) length

y.

requires

matching,

in

is that

where,

all the

the size of the

this

deriving over

is associated

of the

matchmg

of a problem

of convolutions,

the

algorithm

subsets.

strings

subset

it in

results:

complexity

that

uses

to

our

second

with

input

Tree pattern

and

the the

of number

each n

standard

time

they

and

alphabet

matches

convolutions.

since

due

from

The

non-standard

which in

this

is stated

for

and twelfth

threshold

text

lutions.

refer

1.1.4.

Convention

of

problems;

is the

Section

and

show

improvement

algorithm

we group

we will

Recall

base

IX! denote

Paterson[FP74]

Specifically, string

reasons, families;

the

the

@ that

let

1X1) (boolean)

and

Our



non-standard

in

from

is the

cares.

tions.

are

Complexities

three

count,

mat thing

tions

into

Non-

family

don’t

position

symbol

As usual,

BC

one

Lower

For conceptual

of our problems

Basic

of this

with

a symbol

don’t

symbols.

quires

abstract.

twelve

member

mat thing

problem,

either

and

details

illustrative

0.5 log log Ill I +0(1) In this

of

alphabet.

summary

highlighted

technical

RAM

in

other

of

model

Concerning

Results

of string

or a special

our

flavor

4 to illustrate

extended

of

The

on this

have

remaining

and

problem that

comprehensive

1.3.

of the

We

3 and

The

number

based

2.

A simple

pattern

discussion

Section

comprise algorithms

and

1.2,

Complexity

Stringology

de-

are described

and

can be understood in

reductions

and

standard

1.3.

a detailed 1.1,

Bounds

problems

of results

nature

give

1.1.1

algorithmic

deterministic

in Section

Sections

issues

the results

group

and

accompanying

related

of These

technical

first

in

time matching

corollaries group).

are described

results,

expected string

randomized

Given

not

as

in the first

which

fast

non-standard

known

O(min{lXl,

al~})

AF91].

significantly 3 Clearly our results

time.

this sum can be significantly demonstrate, this restriction

the dominating

term

in the complexity

larger than O(n). As is sufficient to caDture of this

problem.

4These reductions are ineffective when Z > fi in the following sense: a reduction from problem A to problem B is ineffective if it takes more time than the best known algorithms for Aand B.

772

A

particularly

bounds

interesting

is that

we derive

ing a common

“structure”

To explain

this

ing

generai

(truly)

problem. G.

of this

A

node

element

i:

range, G

that

correspond

that

text

be expressed

level,

the

complexity

completely

in the the

its match trated

match

graph

In the

BC

G and

CC(GC)

clique

cover

(not

is the

its

clique

cliques

necessarily

in the

disjoint)

edges

the

the

gsm

tion, with

is true

problem

we infer

string

fact

and

the

matching ranges,

of GC.6

can

result.

In

specific

value

order

to

union

is

an interesting

with

general its

don’t

cares

in detail,

the

mally,

in each don’t

Q(r)

boolean

tions

can

concerning to will

the

be the

in Section

string

from

a dominating elements

can

for

general

compute

This

of size

gsm

truncated

probneed

convolu-

characterization

im-

characterization

case of ir = 1 was consid-

the

context

cover

second

is aligned

this

notion

with i,

problems subsets

i,

there

output

with

from

1 and that

we get

problems,

Section

1.1.1,

of

so on.

of deter-

three

more

based

namely,

For these

if

a count mismatches;

i +

is a match,

and ranges.

exception:

determining

replacing

matching

(standard)

following

the pattern pz

problems.

of

than

Non-

thing

of four

we must

that

with

string

cares,

the

rather

there,

Mat

a variant

of counting

whether

three

family

is

position of positions

four

on the

with

don’t

problems,

we

that:

In

the

PC

Q( [X I) The

lems

model,

all

of

(polynomial)

these

best-known

algorithms

[GG88,Ab87,AF91]

problems

convolutions match

for

this

re-

for

IX I

these

bound

~

prob-

on the PC

model.

we will

In

in G to de-

generally

text

pl

@.

number

the

of

String

example

the number

quire

matching

that

Counting

matches



computation

Complexity

and

mat ching[Ka93]

show

the

convolutions.

of more

clique any

best–known

special

convolu-

of size ~ will

unless

Standard

non-standard

and

can

with

2)

the

and

be reduced

again

for

IX I ~

A,

parameterized)

parameter

7 once

as con-

RAM

(polynomial

tion

For-

A of size k in G (Figure

be thought

the

ito an instance

a dominating

faster.

the pattern

the structure

G is the

truncated

subgraph

the

consider

mining

3.

truncated

clique

result

graph

7, s Therefore,

Bounds

each

With

of

observa-

case of string first

with

be solved

illustrative

for

Lemma[An87].

clique

parti-

from

be reduced

convolutions

where

We now

of the

bounds

In

this

a given nodes

size in G.

(boolean)

on the previously

1.1.2

The

instance

very

to

case.

cares,

we use a different

reductions

5 Clearly

we need

. . . k}

ered.

The

subsets,

this

use of Sperner’s

result

. In the RAM, rive

with

specialization

describe

specializing

of CC(GC)

involves

with

problem

in [MR92]

of whose

cover

lower

cares,

do this,

matching

and

don’t

simply

of string The

gsm

proves

number

previous

above-mentioned with

by

on the

dis-

{1,2,

problem,

of each and every

based

and,

j > i.

of a match

IT can

an dominating

here this

i from

labels

any

with

an instance

is an edge

that

parameter

lem

graph. Since

show

7r = 0(~.

gsm

is the smallest

graph,

prop-

be labeled

range

to those

of largest

1, for

convolutions,

number

the

label

subgraph

clique

of the

GC are illus-

solving

cover

with

7r+

complement,

example

boolean

of a graph

two

k nodes

can

from with

of

of induced

graph

that

Q(cc(GC))

number

of bipartite

An

conflict

we show

at least

where

G and

GC.

its

of

structure

partition

with

3. Specifically,

model,

takes

graph

graph

following

partition

exactly

in

at a deeper

instance

the

a node

dominating

tion

problem.

that

that,

is connected

dominating

variants above,

gsm

to show

every

by

conflict

in Figure

of gsm

of

determined

“cliques” namely,

able

such

We

the pattern

mentioned

each

integers

tion

The

elements

the three

cases of the

we were

the

edge nodes

pattern,

text

from

that

matching

as special

Surprisingly,

the

with

in A has exactly in

by

other

is associAn

nodes

tinctly

G is a bipartite

from

partition

2. the

in-

graph symbol,

that

Clearly, derived

1. each

distinct

. . . . j~ in the

derived

string

each

(pattern).

jl,

and those

each

ti and all those

node

It is easy to observe

non-standard



text

i matches. nodes

for

symbols5

the

subgraph

short)

match

be an alphabet

to elements

in one partition, another.

in

between

the

the follow-

gsm for

a pattern,

is defined

of alphabet

element

with

and

could

a position

is defined

graph

G

an element

with

(or

is an induced erties:

y inherent.

has an associated

(pj ) in

or subset

ated in

ti

matching

above

characteriz-

we define

to a text

problem

the

by

is universal

better,

string

of

of them

that

structure

In addition

stance

aspect

all

n for

8It is assumed

to =

lager

each

of these

lx 1.9 T,

As

show

subgraph

that

convolu-

four

before,

the reduction

the dominating

we

truncated

problems, this

implies

is i]leffective. is provided

with

the

input. This need not be always true and it can be implicitly specified, but very easy to find for problems of int crest such as string matching with ranges and subsets, from the alphabet set. In general, we have to account for the additional complexity of finding this graph, should it not be specified explicitly. 9 StrictlY this i5 tme ~~y with the exception of the co~ting

junction of subsets or ranges and so on. Our results extend to such general elements as well, though we omit these details here. 6 We additionally show that given a minimal clique cover of Gc, the input instance of gsm can be solved using CC(GC) boolean convolutions, and hence this bound is matched from above.

variant

773

of standard

string

matching

and

string

matching

with

that

a RAM

mial

convolutions

will

yield

fewer

algorithm

one that

than

that

for

solves

As in the previous

ber

from

1.1.1,

match

graph its

before.

complexity

analog

except of

an

instance,

Once

again,

we note

string

mat thing

be expressed

as special

tionally

we use an analog the to be

in the edges ●

the

PC

least

cp(GC)

in

and

some

correctly

(See Section

for the counting problem

cal

4).

(the

first

any

algorithm

is a corollary

of

our

case that

which

follows

the

trix

equation

lower

shows

mentioned

condition

is true,

must the

be

output

O otherwise.

For

threshold

matching

problems,

and

for

with

for

IXI ~ @,

the

convolution

the

(parameterized

can be reduced

a parameter

value

to derive

these

problems,

to

In I; 10 this

has

holds

for

every

technique

value

vector

convolution.

b in the

these

results

values

This

involves

k to the

defer

final

a non-

number

of zeros

instance

of trun-

a complete

version

of

characterization

corresponding

We

to the

complex-

particular our

of k.

of relating

cated

nontrivial

for

k = O or k = m – 1), but

in the

discussion

of this

Complexity

of

Tree

of

paper.

Pattern

at

above;

of

and

tree

pattern

Mat

pattern

thing

,K089,DGM90]. to

implying

that

on

running

time

solving

factors)

in

will

for

string

[DGM90], the

yield

BC

this

with bound

problem, (up

improving for

that

current

is optimal

bounds

and

show

matching

the

model;

better

labeled We

reducible

the

is

matching

ordered,

thereby

of

to polyit

on

the

truncated

Consequences

of

the con-

problem.

least

Additional

1.2

of course,

any

problem

volution

clsssi-

needs

problem text

trees[H085

RAM

in the

the

the the

subsets

log

match-

Our

Reductions

it

algorithm

approach,

above

corresponding 1.1.2)

is 1 and

O(nfipolylog(rn))

showed

result

for

Fischer-Paterson

bounds

this

a cer-

string

bound

that

ity

rooted

is satis-

follows This

4,

which

[Ka93]

[FP74]

general

simple

in the Section

implication.

wherein

bound

to be sketched

convolutions.

is the

Section

equation

Karloff

approach

polynomial

If this

in

than

equivalently,

position

Consider

is inter-

gsc problem,

which

position

truncated

usual

1.1.4

where

lower

of standard

problem

k);

non-standard

RAM

the

trivial

cliques

algorithm

a matrix

Recently

defined

k.

of them,

here

de-

disjoint

proof in

any

of the

text

(as

corre-

of GC.

of our

for

variant

Fischer-Paterson

Q(IX])

part

that

the no more

of gsc takes

development

via

number,

convolutions,

of this

that

In the

k (say,

We show:

number

an instance

that ing

graph.

an instance

hard

modeled

subsection),

necessarily

its

Addi-

cover

whether yields

a “threshold”

It is straightforward

can

of a graph,

of bipartite

partition

The

fied

of

number

solving

argument

solves

bound

variants

of whose of the

sketch

detail.

as

the four

clique

text

that

each

GC,

above,

for

polynomial)

The

exactly

number

development

we

num-

complement

than



defined

mentioned

polynomial

clique

is the

t ain

cover

technical

esting proof

union

model,

is the

The

that

of the

smallest

Q(cP(GC))

its

no more

determine

of the pattern

we show:

from

text.

for

variant

resulting

prob-

of the

of the

are

partition

the

is an edge

In

at

the

graph,

This

cases of gsc problems.

namely,

clique

general

problem,

count

position

subgraph

(truly)

gsm

the

results

we

(given

output

count

structural

problem.

we output

dominating

non-standard

fined

of

of the

at each

G

general

position, alignment

k mismatches

using

the above

the more

(or gsc for short)

natural

of mismatches

and

convolution

text

sponding

problems,

the

of the

counting is the

Section

truncated

each

IX I polyno-

of these

case, we derive

them

characterization string

any

n convolutions.

by specializing

lem

uses less than

solving

is trivially

the

In

ma-

this

section,

quences

true.

we will

of our

sketch

reductions

two

additional

discussed

in

conse-

the

previous

sections. ●

In

the

cated

RAM,

we

show

convolution

to an instance subgraph usual

with of the

that

any

parameter

n can

gsc problem

of size T + 1, provided

implication

(polynomial)

as in Section

with

trun-

be reduced

1.2.1

Time-Space

T < @;

this

has the

A match

1.1.2.

Complexity String

of Non-standard Mat

graph

partite that

1.1.3

Threshold

and

any

algorithm

for

(basic,

thing

with

now

These

problems

counting don’t

consider

cares.

the are

problems However,

third

family

respectively in

Section

for these

of four variants

1.1.2.

problems

problems. of the

count,

Informally, we develop

TS

=

match

(This

solves string

graph string

problemll

in

show

a non-standard

matching —

of bi-

We

problem

as well

as count-

matching[Ka93]

— bound

union

otherwise.

which

of standard

Q(nrn).

if it is a disjoint

is nontrivial

or threshold)

a nontrivial

k-mismatches

four

is trivial

cliques,

ing variant We

Tradeoff

a dominating

time

T

and the

and

is optimal).

space

This

S,

follows

for an aker-

nat e characterization based on cert tin “sparse” convolutions (rather than pammeterized truncated convolutions). With this new characterization, an implication analogous to one stated here follows for these two problerm as well.

10All

coment~

rua&

counting

variants

in the

they are omitted. 11Note that the ~t& variant standard string lem is always trivial.

774

in Section RAM

1.1.2

about

are pertinent

graph mat cling

corresponding and

the

complexity

here

as well,

to the k-mismat

of and

co~ting

&es

prob-

from

the matching

convolution show in

that

all

the

log m)

standard)

string

1~[ =

Q(m).

holds

for

non-trivial

have

be

that

and

on

cal

the

result

[Ga85]

in

[CL90

and

to the

bound

[Ab87]

when

for

input

with

the

are

performs

text

and

of the

a

the

drawn

convolution i.e.,

follows

from

k-mismatches from

the

the

matching

problem

problem

for

which

boolean

Two

1.3.1

Randomized

Results

tion

Algorithms

provide

standard

basic

the

degree

mic

in

time

fast

graph

previously

reducing

“large”

alphabet

over

modulo

1.3.2

A

the

problem to

study

open

In

aa string

mat thing exception:

provide

an algorithm

previously,

in

the

no o nrn) stricting

seems

algorithm

shown that

in

with

the

length.

the

fore,

We

is

the

a

complexfor

is no 10ss of generality

queries

lem the

to m.

is at RAM. model

stringology bound

[W86].

775

out,

the

any

least

as hard

This to

shows study

boolean

the

the

RAM

complexity

since convolution

of

problem. model

studying

showing

prob-

“wea,k”

graph

as a boolean that

be-

to study problems

string

show

by from

As

stringology

also

de-

location

is the number

in

match

We

pattern

comparison

non-standard

non-trivial

the

location.

is too

each

locations

nulmbers

the

successful

t and

for

algorithm

text

that

model

cent ain respec-

when

algorithm

stringology

model.

problems for

for

algoprob-

from

back,

of the

of non-standard that

with

in this

hard

turns highly

standard

we show

time

in re-

it

gets

each

it uses to solve

been

one

The

sum

of the

An

matching

and pattern

for

is the

problem.

as in the BC

location.

queries

algorithm

Model.

by A,

finally

queries

uses to solve

and which

arithmetic

As

problem

which

output

complexity

since

text

complexity

haa

of text

identified

that

the

string

it

by

by computing

matching

response

of

contain the

of elements,

each other

the

lems from

harder and

with elements

alignment

which

was

In

loc~

identified

of the

much

the

text

alignment

(PC)

works

sets of

from

determines

it

string

sets

p.

on

the

various queries

the number

Remark.

time; time

from

alignment

a general

two

the

various

of strings

is known. there

location,

computing

mat thing,

rnatching[AHU74]

text

way

to be intrinsically

string

other

of elements algorithm

a counting

PC model

the

same

o(nrn)

of understanding

expression

the degrees

on the

two

to an alignment

location

basic

identifying

the

queries

respectively

complexity

by

0(nrn075polylogm) running

The

lem

termines

to access

identifies

which

of alignment

of each

is placed

each

from

Convolution

with

over

bits

solves

align

for

for each text

A that

the

is given,

sets

Polynomial

which

other

The

non-standard

tively

the

two

location.

matching

sets

the

of the

number

left

of equal problem

that

matching

however

are now

this

basic

direction

time &

subsets,

takes

problem

of regular 12lt ~a

for

string

from

A and

on t at i leads

rithm

Outside

is defined

symbols)

which

non-standard

step ity

alphabet

This

a

match charged

alignment

text

location

prob-

primes.

string

the

problem

no algorithm

known. than

we call

elements

than set

the renaming

Problem

of

with

(rather alphabet

over

same

p and

it is allowed

using

an

string

is not

query, it

a O otherwise. finally

given

Stringology

This

following

take

approach

defined

of the

Matching

which

string–subsets.

Our

to

Consider

pattern

but

only

the

and pattern

OR

algorithms

of

asked

non-standard

algorithm

on G,

response,

logical the

queries

queries

model.

t,

The

pattern

i, a 1 if p placed

and

text

alignment

output for

chosen

a generalization

in [Ab87]

whenever

O(npolylog(rn))

initially via

A,

non-

polylogarith-

case.

instances

alphabet

Non-standard We

this

appropriately

String

most

take

for

set,

“smaller”

symbol

at

best–known time

for

problems

is

algorithms

polylog(m))

involves

algorithms

matching

the Our

whereas

lem

string

of

m.lz

O(n@

randomized

on

one from

elements We

type

complexity

it is charged.

each

a text

algorithm

to access the

the

of such

sequential

an

“yes-no”

the basic

model.

or the

elements,

Algorithmic

only

(B C)

computation

For

model, is allowed

number

solves

G in this

identi-

for

problem.

pattern.

1.3

is the

Convolution

text

this

using

A that

any

In

tj mat ch pi?”;

“does

algorithm

the

model

matching

Boolean

for

problem.

standard

pattern

algorithm

log m)

an

the

string

form

graph

can

o(~

reduction

Definitions

Model.

that

solve

vectors

time,

of our

latter

it

Results

boolean This

since

is

Comparison

of (non-

ID I ~ 3.

sublinear

nature

RAM

computation[AHU74].

problem

Time

if the

,C92]

the

convolution

subsets

when

average.

The

a non-

problem

matching

randomly,

performed

time

with

Expected

shown

uniformly

is embodied with

is an improvement

non-standard

Sublinear

the

and

reductions

a time-space

for

matching bound

Models

a boolean

our

[Ab86]

Previously,

mat ch graph,

1.2.2

that

instances

was known

Our

any

fact

problem

mat ch graph.

2

for computing

the

convolution

above

of Q(rn2/

We

bound

and

boolean

of

trivial

TS

[Ab86],

that

matching

takes

Cl(nm)

such

a prob-

convolution moclel of

ncm-standard

a specific

has been

on

is a very

open

lower

for long

3

Bounds

and

Basic First

we

plexity the

String

derive model

queries.

Since

the

model

BC

boolean this

we claim is

graph

edge-disjoint the mum

cover

number

1 Solving

matching model

a

problem takes

exactly

CC(GC)

from

a universe

that

from

1 String

ezactiy

matching

yields is the

cover

G

G;

4

mini-

for

log IZI+O.5

on the BC

the

We

BC

derive

the conflict ing

with

tem

graph don’t

PC

cares

alignment

of another. maximum

set

of given of

size.

the

(a)

First

to

model

volution lower

clique

cover S={

elements.

for

those tom

Ki

on Sj

that

given.

S2} vertex

any

two

sets

of the

{Cl, the

i on the

. . . C’.}

top

and

let

that

each

alignment

are

We

i on

the

cover

top

a

set

and

sets in S are incomparable. universe

set

sets

on the

[1 . . . ~].

= IxI.



tc~ #

in

2

exactly

cp(GC) @

such each

to a top is we~ hewn

in a clique

partition

to

see

sufficient.

string

on the PC

alignment

two

counting

string

matching

that

S, the

problem.

by vertex

set of all

Let

set (S:,

Gi ‘s, must

that

at

necessary.

(%>$), ”””, (S4, S;) be the sets by any algorithm which solves

at

alignment

argue

(s:, s;)> identified

of GC induced

has

sets.

cp(GC) are

takes

G

vertex

now

queries

of

matching model

provided

that

We

partition number

of G.

queries,

is

a

bipartite

minimum

of its

gives

of a bipar-

clique

counting G,

con-

this

of polynomial

partition

any graph

easy

m,

The

is the

on each

cp(GC)

of GC .

Let

of vertices the given

Gi be the subS;).

We claim

be a clique

partition

flj Our

argument

strategy

for

=

if and

to

assumed

that

First, at

the

in

are two

~ong

re-

14 All

Complexity.

when

776

if our

assume responses

~own

lx I < @i

the

claim

various

to

either

holds.

same

that

all 1 pairs

cases:

We

to construct

The

length,

each

are in fact

for

Gi count

is static. all without

static.

m. That

There

or some

matching

is,

looking

queries14.

is a clique string

queries graph G is correct

t and p are

text namely,

alignment

the

~~o=it~

and for

a

t and a

alignment

the algorithm once

provide

a text

these strings and the match 1 rounds, the algorithm d

to be of the

it identifies

clique

answer

with after

only

vertex

is adversary-based.

an adversary

p and

consistent such that

sets are sub-

Consider

cp(G),

Since model

polynomial

number

G.

in the

PC

of edge-disjoint

vertices It

complexity

in the

the

a ckque

alignment

are

the

n and

yields

match

on

query

the

that

Solving

with

all

C.} S

of

union

problem

least

terms

Recall

cliques

aa

We

Sj.

{Cl,...,

bottom

String

queries.

of length

bipartite

bot-

collection

The

corresponding

this connection

in Communication

C’i have

S’i corresponding

the

(~&~~2)

alignment

as computing

denoted

pattern

C =

exists

problem

as hard

number,

graph

GC.

construct each

and

defined

assigned that

a

Consider

vertices

for

of in-

and there

mat thing

of vectors

whose

queries

bottom

subgraph

those

re-

states

1X1 + O(l).

bound

string

G is a collection

Proofi

system,

of cliques

which

all

a clique

of Z sets,

13we ~USPeCt th&

follows

lower of the

graph

most

Lemma.

collection.

~~ E S,

Ci G C. TO the set Sj

searchers

a clique

this

cover

is

from lemma

vertex

from

sets Sj such

{s1,..., some

to each

are assigned

Assume

a ground

a sperner

C is a clique

GC

for

set S of ~

that

(-) of

on

universe

SIEI } on the

which

claim

bounds

Consider

each

E

Sperner’s

our

collection

Counting

terms

Theorem

GC is generated.

subset

vertices that

our

of

a sperner

C =

For

sys-

from

Assign

follows.

match-

sets are in-

tight

reduction

from

that

0.510glog

in

bound

cliques

is a subset

collection

system,

that

Sl,...,

collection

the

of the

show

a unique

such

Given

asymptotic

system

two

tite

of

is a collection

any

provides

such

a sperner

we

of string

is, no set in the collection lemma

cover

to a Sperner’s

that

by rc~,n

constructed

Lemma

(n~z)

It follows

for

a counting

is exactly

takes

queries

a clique

system

set such

size of any

G=

from

the

sperner’s

Sperner’s

the

that

is equivalent

The that

show

for the problem

a universe

comparable,

top,

G.

cares

[An87]13.

sets from

cover

3. We

a size.

a general

of solving

model.

See Figure

follows

Sperner’s

set of size n, any

~min z log 1X1+

convolutions. Proof.

be

Matching

string on

don’t

log log 121+0(1)

denoted

K,

It

~~in.

= ~min.

Bounds

G.

queries.

with

of

can

of such

we claim Corollary

comparable.

necessarily

union

alignment

construction,

be

sets

sets has size at most

Therefore,

graph

value

of size CC(GC)

a ground

of

by this

cannot

comparable a collection

non-standard

match

the minimum

that

k,

reduction.

IX I incomparable

of

clique

our

that

#

that

m,

cover

j

Consider

n and

a clique

claim

sk,

such

number

cc(G),

basic

with

in

We

and

completes

ductions

the

i c Sj. S’j

length

whose

in any

query

of not

denoted

sets

of the

that

cliques

number,

of cliques

Theorem

of

terms

Recall

in

as computing

G, is a collection

bipartite

clique

in

make

com-

of alignment

alignment

vectors

bound

convolutions.

a bipartite

each

the

problem

number

as hard

of

a lower

boolean

that

exactly

on

matching

of the

Ci, two That

bound

string

terms

convolution

gives

lower

a basic in

of

Matching

a general

of solving

BC

Complexity

Gi is

problem

not

a clique.

here. be

In

The

the

an edge

an edge G.,

GC.

e = (r, a),

called

the

three

types.

e is not

in

any

than

one

Gi

more

of these

cases,

simple algorithm

It is much gorithm

is

harder

after

alignment to a number rithm

while

quent scenario sary

are

or the the

as follows.

using

this

about

weak

second

due to the

cancel.

That

the

correct We

swer

show

the

errors

two

the match

of its

sets

graph

mat thing size

It

condition an

in this adver-

cc and and

and the

by

the can

1 in concould

problem process A be the lli

with

adversary ensure

of the That

the

between must

(S;,

the

S;)

be a clique

equivalent rank

rank(A)

down

the

let the subsets

adjacency

matrix

size

in G.. and

a count

Ge,

but

function

(where

< ~~~~

rank(lkf~

model

non-standard

string

dependence

in our

to note

lower

on the

bound

intimate

re-

for

the

they

relation-

techniques

Complexity

respectively.

numbers

the

bound

and

[Lov].

are frequently

bounds

seen

in the

communication are

referred

tech-

Both

the

upper

complexity to

as cover

and

Acknowledgements

Sincere

thanks

Farach,

Howard

to Amihood

Amir,

Karloff,

Bill

Ravi

Chang,

Boppana,

Martin

and Martin

Tompa

discussions.

K.

the

size

[Ab87]

K. Abrahamson.

the

rank

). Also

for

for

each

i, let

the

edges

that

each

$ymp

for

FOCS,

string

matching.

1039-1051.

Combinatorics

of

Publications,

Mi.

This

the

[AC75]

is

2dth

Finite

Sets,

1987.

reals),

A.

Ann

Aho

and

M.

A.

ACM

and

Symp

M.

Symp.

and

to

of Comput.,

18(6), M.

Figures.

1975,

Addison-

Wesley

Proc

Algorithms,

Publishers,

string search.

333-340.

Farach.

Approximate

on Discrete

Efficient

bibliographic

Efficient

Matching of

of

2nd

Ann

1991,

Aho, J. Hop croft, and [AHU74] A. The design and analysis of computer

777

Al-

matching.

Theory

Corasick.

aid

of the ACM, Amir

rectangular

model.

Farach.

two-dimensioned

An

dimensional )

Benson

58-68.

Comm, [AF91]

G.

independent

searching:

of

i, rank(ll~

Amir,

1992,

Gi

2 Solving the counting variant of string with don’t cares over alphabet set E requires tn the PC

1987,

Science

A. Proc

1 ~ rank(A).

queries

Ann

Generalized

Comp.,

Anderson.

phabet

Let

sub-additivity

each

[ABF92]

in the

S~)}.

is over

J.

Oxford

matching

. . ..(Sl. and

I.

27th

for

those

Consider

string

A = ~t

From the

form.

programs.

with

402-409.

SIAM

partition

tradeoffs

constructed

programs line

1986,

of

Time-space

Abrahamson.

branching straight

Corollary matching

alignment



of IZ I are

comparison

a direct

lower

tile

strat-

1. Therefore,

IXI

our

there

[Ab86]

is exactly

at least

the

as shown

cp functions

is of size

clique

we claim

in addition, 2.

display

in Communication

lower

his

of vertices

representing

Then,

to Theorem

in

Bibliography

claim

it identifies

(S~, S~), for

as A,

clf problems

independent in

Br93]

of functions;



G and

be of same

complexity problems

alphabet

for valuable

an-

oft

In our

limits

2 in a different

S~),

number

are alphabet-independent

that

Theorem.

solves

{( S/,



get

to

details

number

which

be S =

which

problems

bet ween

6

above.

edges

by the adversary

to write

in Theorem

algorithm

gives

discussion.

substantial

is also of interest

niques

guess

made

unlike

these

of the

[An87] It is convenient

121, which

sults.

subse-

in a manner

to

of Gc.

in the

(3)

correctness.

maximum

vertex

type

for

ship

when only

that

[KMP77)ABF92,GP92,

algo-

text

can

(nontrivial)

constructed

v is the

the

is forced

the proof

al-

the text

algorithm

carefully

the

the

of weak

preceding

A of the graph

=

ches! for

queries

the

edge of type

2, the

a strategy

egy, the pattern V2 where

types

a weak

We omit

and

in the

is that different

algorithm

earlier.

strategy

and

of mismat

alignment

a dynamic stated

(2)

of type

number

can

the

that

by the

However,

adversary

by the

matrix

rank(A)

Remarks

known

set of vertices

the

Clearly

standard stringology — algorithms with

commits

case,

of type

is, by using

with

vertex

constructing

edge.

the

edges

which

difficulties

static

corollary

We note

the

the

constructs

weak

difficulty

algorithm junction

In the

so he simply

is dynamic,

The

main

edge before

algorithm the

Two

that

of sets for

the adjacency

3.

each CTm; a

to the previous

the

Figure

5

(3)

when

be used

pair

In

p =

adversary

between the

the weak

pattern

decides

can

not

or

G=.

claim

the

this

queries.

pattern,

it

the

input.

the response

identifying

knows

the

is,

but

argument

Consider

GC in

is one

G.,

and

Tm

the

Intuitively, query;

alignment

as in

=

to

exists

Gi

is in

on this

of alignments

in an alignment

t

to argue

scanning

queries.

it

as well sets

that

there

edge, which

but

Proof.

S needs

e is in some

Gi

an error

dynamic,

sets to pick

weak

completes

commits

it is omitted

that

Otherwise

(1)

adversary

case analysis

and

we argue

of

(2)

e is in

case is easy

case,

partition

of following in

latter

former

1974.

J.

2NonACM

212-222. Unman. algorithms.

[AHiU]

A.

Aho,

Bounds mon

and

J.

of the

problem.

[GG92]

Unman.

longest

J. of A CM,

Vol.

and

Landau.

Fast

multidimensional

matching. 97-115.

T.

J. Baker.

and

Computer

array

Science,

81,

[GS83]

A technique string

for

matching

dimension.

extending

rapid

to arrays

SIAM

J.

Vol.

R.

[Br93]

Two

D.

Letters,

Dictionary

[Ka93]

6, No.

uniform

and

string

Symp

on Theory

Z. Galil.

matching.

Chang.

with

M.

Proc

R. Hariharan,

and

W.

Ann

for

R.

and

ACM

D.E.

and

R.

and

parallel

pattern Proc

Hariharan.

of string

IEEE

M. tern

R.

1992,

[KP84]

Karp,

IEEE

Symp.

the

Proc

[KR87]

exact

[Ga85]

IEEE

Ann

Lawler.

sublinear

[Lov]

Approximate expected

time.

R.

Crochemore

and

D. Perrin.

Two-way

pat-

Journal

of ACM,

38, 1991,

651-

116-124. [MR92]

Gedil.

[GG88]

[GG91]

Z.

and Proc

E. Magen. IEEE

Faster

Ann

and

Karp

Paterson.

Products.

Open

Match-

SIAM-AMS

Problems

Galil

String

Proceed[WC76]

in Stringology.

on Eds,

Words,

Com-

A.

Springer-Verlag

Apos[WM92]

Lecture

Galil

and

ing.

Journal

Z. Galil it y of string Computing,

R.

Giancarlo.

for

approximate

of Compiezity, and

and

and

and

Fast

A.

Proc

paton

Rosenberg. Patterns

dth

Ann

in ACM

1972.

R.

Pike.

The

UNIX

Prentice-Hall,

1991,

lower

On exact bounds.

Rabin.

and

NJ,

S.

random-

IBM

Development,

Flows

Promel,

Efficient

algorithms.

Communication

31(2),

Journal 249-260.

complexity

and

VLSI

Schrijver

Layout,

Eds.,

- a surKorte,

Lo-

Springer-Verlag

235-266. Muthukrishnan

and

under

FST

@

Vol.

general TCS,

652,

I. Wegener.

H.

Ramesh.

match

India,

String

relation.

LNCS,

Proc

Springer-

1992,356-367.

match-

33-72. complexSIAM

J

1008-1020.

778

The

Complexity

Wiley-Teubner

of Boolean

Series

in

1986.

C. string

Wong and A. Chandra. Bounds editing problem. J. of ACM,

1976,

13-16.

S. Wu

and errors.

83-91.

U.

Manber.

Fast

Communications

Func-

Computer

ence,

1992,

structures

string

4(1988),

R. Giancarlo.

matching:

Data

M.O.

matching

Paths,

allowing

1985.1-8.

algorithms

1989,

Journal

Repeated

of Comput.,

L. Lovasz.

tions.

1974.

Algorithms

and

Pratt.

SIAM

Arrays.

vey.

Verlag, [W86]

M.

7, 113-125,

and Z.

match-

FOCS,

Symp

145-150.

other

Vol.

binatorial tolic

Z. Galil,

mat thing.

1990,

pattern on

environment.

pattern

vasz,

1990,

and

Notes,

1993.

V.

Miller,

and

Kernighan

of Research

on FOCS,

Fischer

Z.

68-

approximately

tree

Symp.

of

on Theory

B.

ized

Symp

pattern

ings,

1982,

323-350.

R.

programming

600-609.

E.

in

M. Dubiner,

ing

for

strings.

6(1973),

matching

M.

Sci,

Pattern

of ACM,

J. Morris, in

Trees

12th

[FP74]

O’Donnell.

Manuscript,

Identification

Symp

Ann

FOCS,

optimal

Syst.

algo-

matching

On

matching.

and

matching.

tree

M.

Efficient

Knuth,

Strings,

675. [DGM90]

space

Comput.

Journal

Ann.

matching

(1990), [CP91]

of

247-256.

1984.

matching

Proc

L. Gssie-

1993.

Chang

string

Proc.

1992,

Time

algorithms

IEEE

Computing,

K. Park,

fast

dimensions.

on FOCS,

W.

Fast

Kosaraju.

tern

[KMR72]

Z. Galil,

Optimally

two

complexity

[CL90]

and

in Trees.

Proc

[KMP77]

for

Communication.

preprocessing

Cole

Symp

alphabet-

178-183.

1991,439-443.

S. Muthukrishnan,

Rytter.

on FOCS, [CH92]

FOCS,

Journal

mismatches.

S.R. ing.

case.

bound

l?3rd

Crochemore,

niec,

one

J

280-294.

H. Karloff.

[K089]

un-

length

A lower

of Comput.,

Private

R. Cole,

in

Truly matching.

J. Seiferas.

Huffman

Rapid

rithms

Park.

Symp.

and

counting

1993.

parallel

[CC+93]

IEEE

Matching

match.

Vol.

matching

alphabet-the

D. Breslauer

W.

pattern

Processing

Breslauer.

Manuscript,

[C92]

K.

dimensional

matching.

C.

SIAM

168-170.

bounded

[BG91]

dimensional

Information

5, 1977,

Ann.

Z. Galil

[H085]

7,

complex-

bounds.

95.

S. Bird.

ing.

and two

26(1983),

of more

Comput.,

On exact

upper

1992,407-437.

Galil

33rd

1978,533-541. [Bi77]

Z.

R. Giancarlo.

mat thing:

independent

string

exact-match one

serial

approximate

Theoretical

1991,

than

G.

and

of string

Computing,

23, [GP92]

Amir

parallel

Z. Galil ity

com-

1-12.

A.

[Ba78]

Hirschberg, complexity

subsequence

1976, [AL88]

D.

on the

text

for Vol.

Sci-

the 23,

searching

of ACM,

35,

7

Complexity

of

non-standard ing

the

basic

string

problem

on

match-

the

h(i)

RAM

define

parameterized

Parameterized alaz..

Truncated

.a~

tors.

In

and

blb2 -. .bm,

addition,

each

a function

parameter

Let m be divisible cated convolution ~,

1 ~

truncated

~ <

m,

n >

m,

index~

h

by ~. of a and

namely

the

: {1, . . .,n}

0/1

vector

~

1

1

1

a has

{1, . . .

ClC2 . . . cn_~+l,

is defined

Figure

~

1:

cated

(ai+j-1

(j mod T)+l

we refer

convolution When

@

Alignments

involving

bj)

the

arithmetic

operators x

polynomial

and

tors

@

operation

and

@

@

+,

truncated

this

to parameterized

as simply

the

the

truncated

and

@

this

convolu-

operation

are respectively

the

is called

the

boolean

consider

dividing

VI

are respectively

convolution;

Elements

trun-

is called

when

the

logical

A

Y3

Y2

Y4

Intuitively, windows

text

of length

location,

window. defined

the

operaand

truncated

that

text

location

index It

only

V,

convo-

terms with

window.

is easy

to see that

convolution

for

r

[AHU74].

truncated

convolution

We derive

the following

=

into

parameterized

X2

xl

stan-

this

is the

Kosaraju’s

--d ___ ---xk

a

is the

by Kosaraju

x4

x3

Text

of greater

m,

in

ex-

in which

1, this

extending

a is

convolution

m =

introduced

each

convolution

location

For

dis-

For

an index

are considered

a pattern

in each

into

1.

truncated

as the standard

those aligns

pattern

Figure

assigns

erized

way

the

~ aa in

h function

paramet

the same

cept

dard

the

The

ai

region.

Yk

lution.

joint

b in shaded

convolution

Pattern

tion.

with

}

The parameterized t~~n~ b with integer parameter

61
convenience,

aligns

Ui

truncated

For

1

1

vec-

follows:

Ct =

b

I Let

be two

in

I

convolution.

Convolution:

a 1 Ii

W*-

w. I

We first

= 1

Figure

Elements

2: Dominating

Clique

of Size k

[K089]. result

in

[K089].

Lemma

1

volution

The parametrized

of

vectors

a

and

spectively

can

be computed

O(min{r,

@})

standard

We now zed

show

truncated

mat thing

the

truncated b of in

length

the

connection and

n

RAM

boolean

convolutions

boolean and model

12344

con. m

12344 pattern

re.

using

elements

convolutions.

between

D 1

parametri-

non-standard

2

3

4

4

el:::nts

12344

string

problems.

G

Gc Theorem

3 Solving

matching length

n, pattern

which k ~

contains 2,

terazed length

problem

in

basic

non-standard

RAM

model

p of length a given

is at least truncated

a the

boolean m

and

dominating

as hard

n, b of length

m

with

a match clique

as computing convolution

with

%

3

string text

t

graph

Example:

of

Pattern:

12#34@l

Text:

1234233g54~

G

C of size k, the parame.

of vectors

a of

T = k – 1.

779

Figure

3:

Graph

G for

Example, String

Conflict Matching

Graph with

GC Don’t

and Cares

Match

Non-standard Stringology: # Algorithms and Complexity

212-. 998-3061. *IBM. Resew&. Division,. T. J. Watson. Research. Center,. P. o. Box. 704,. Yorktown ... arise naturally in practice depend on more demanding no- tions of matching that go well beyond ...... open in [Ab87] which we call the string.

1MB Sizes 1 Downloads 158 Views

Recommend Documents

Non-standard Stringology: # Algorithms and Complexity
no faster algorithm exists for parametrized truncated convolution. 2The parameter ..... mining whether there is a match, we get three more non-standard string.

Algorithms and Complexity Internet Edition, Summer ...
Internet Edition, Summer, 1994 .... and graduate students, with good results. ..... In this section we're going to discuss the rates of growth of different functions and to ...... down, the longer binary strings, because of the space saving, coupled

Algorithms and Complexity (Dover Books on Computer ...
Introduction to Graph Theory (Dover Books on Mathematics) · Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and ...

Lower Complexity Bounds for Interpolation Algorithms
Jul 3, 2010 - metic operations in terms of the number of the given nodes in order to represent some ..... Taking into account that a generic n–.

Low-Complexity Feedback Allocation Algorithms For ...
station serves a subset of users, chosen based on limited feedback received during the initial control segment of a time slot. The users can ... 1 depicts the uplink of an FDD cellular network where the base station serves multiple mobiles or users .

Low-complexity Scheduling Algorithms for Multi ...
Mar 18, 2010 - Investigate scheduling in OFDM1 downlink networks. 1Orthogonal ... 5/18. Introduction. Motivation. Objective. Problem. Description. Why not.

A Nonstandard Counterpart of DNR
Definition M0 ⊂d M1 iff M0 ⊂ω M1 and. (∀A∈SM1. )(∃f∈SM0. )[M1 |=f is dnr(A)]. Recall the following theorem holds,. Given any inf. rec. binary tree T and f:dnr ...

Elements of Nonstandard Algebraic Geometry
techniques of nonstandard mathematics. Contents. 1 Introduction. 2. 2 List of Notation ..... the collection {Af,g,U }x∈U,f∈p,g /∈p has finite intersection property.

A Nonstandard Standardization Theorem
Rσ(k). → s with respect to an order ≺ (typically the left-to-right order). ... Reduce the leftmost redex at each step ... Dynamically: small-step operational semantics.

A Nonstandard Standardization Theorem
used to prove a left-to-right standardization theorem for the cal- culus with ES .... affect the final result nor the length of evaluation sequences (tech- nically, LHR ...

Complexity Anonymous recover from complexity addiction - GitHub
Sep 13, 2014 - Refcounted smart pointers are about managing the owned object's lifetime. Copy/assign ... Else if you do want to manipulate lifetime, great, do it as on previous slide. 2. Express ..... Cheap to move (e.g., vector, string) or Moderate

Physical Complexity and Cognitive Evolution
Digital organisms (digitalia) are self-replicating computer programs (sequences of instructions) that mutate and compete. (for space and computer time). • 3 conditions of evolution: replication, variation (mutation), ... It might be fruitful to gen

ON INITIAL SEGMENT COMPLEXITY AND DEGREES OF ...
2000 Mathematics Subject Classification. 68Q30 ... stitute for Mathematical Sciences, National University of Singapore, during the Computational. Aspects of ...... In Mathematical foundations of computer science, 2001 (Mariánské Lázn˘e),.

Computer Algorithm and Complexity Theory.pdf
(b) Write the Kruskal's algorithm for minimum spanning tree. Explain its time. complexity. Generate the minimum spanning tree for the given graph. (07).

Physical Complexity and Cognitive Evolution
Regular object (process): regular pattern short decription (compression) possible. 0101010101010101010101010101… million repetitions of 01. Random ...

ON INITIAL SEGMENT COMPLEXITY AND DEGREES OF ...
is 1-random, then X and Y have no upper bound in the K-degrees (hence, no ... stitute for Mathematical Sciences, National University of Singapore, during the ..... following result relates KZ to unrelativized prefix-free complexity when Z ∈ 2ω is

Evolution and Complexity in Economics.pdf
Evolution and Complexity in Economics.pdf. Evolution and Complexity in Economics.pdf. Open. Extract. Open with. Sign In. Main menu.

Complexity, Mental Frames and Neglect
Sep 23, 2017 - *An earlier draft of this paper circulated under the title “Complexity, Mental Frames and Neglect”. Financial support through the Bonn Graduate School of Economics and the Center for Economics and. Neuroscience Bonn is gratefully a

Wolfram, Cellular Automata and Complexity, Collected Papers.pdf ...
Wolfram, Cellular Automata and Complexity, Collected Papers.pdf. Wolfram, Cellular Automata and Complexity, Collected Papers.pdf. Open. Extract. Open with.