Non-standard #
Algorithms
Stringology: S. Muthukrishnant
and
K.
Abstract
Palem*
As we show problem
here,
derived
concerns string matching problems, wherein a position in the “text” (of size n) matches one in the “pattern)) (of size m), based on very general relationships between the corresponding “symbols”. For example, string matching with don’t string matching prob.’ cares is a simple non-standard lem, wherein text andjor pattern positions might have wildcard symbols rather than those drawn from the base alphabet X; these wildcards match ever-y symbol from Z. The main results in this paper concern the inherent complexity of a variety of non-standard string mat thing problems, characterized in terms of algebraic convolutions.
uses 0(min{7r,
Non-standard
ment.
Non-standard
●
For
Basic
three
string
—
where
problem.
convolution
the
bound
RAM
model
convolution allow
etrized *This
will
or adapting
the
encode
using
integer
us to infer
parameter
that for
any
convolutions
for of the
supported
NY
10012,
Division,
USA;
these
latter [K089]
that
each
eight
two
families:
(eg.,
involves text
non-
variant
counting
position),
matching
other
problems of num-
and
(in
which
nonthe
k-
is a basic example).
We
problem,
that
upon
graphs
—
in
match
our
the
particular,
its
comple-
bounds
the
and
induced
the
and
re-
cliques
“dominating”
“clique
edge
cov-
and
ran-
complement.
provide
improved
algorithms,
as well
running
and
lower
sizes of the
graph,
in its
also
pected
out
times
for
deterministic ae those
some
with
better
non-standard
exstring
problems.
that
n.
to (in
input
vectors.
grant 251
The
the
by NSF/DARPA
field
lems
of
with
are
also
finding
212-
P.
here
in
cal
string
Yorktown 914-984-
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and Its date appear, and notice is given that copying is by permission of the Association of Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
associated
the
in practice
of matching
associated
symbols
this
(more
refer
to
these
string
depend
on more
its
higher KR87]
checking
A of
that
arise
demanding
beyond
family
of string
size
classi-
are identical.
identical.
aa non-standard
problem
and
of problems
go well
are
of
The
In all of these probone in the pattern
symbols
general)
that involve
AC75,Bi77,Ba78,
number
that
stand-
phrase
n.
al-
problems
typically
size
problems. “matches”
a large
from
a
problem[KMP77]
provided
well-known
of
probas
application —
a pattern
variants[KMR72, of such location
tions
of
in
these
problems
string
are examples lems, a text
naturally
770
the
tezt
In contrast,
STOC 94-5194 Montreal, Quebec, Canada @ 1994 ACM 0-89791 -663-8/84/0005..s.50
—
matching
dimensional
an
rich aa well
often,
stringology
occurrences
a larger
is very
from
standard
all
rn
Quite
well-motivated In
we introduce
Mercer
[Ga85]
mathematical
structure.
point.
num-
stringoiogy substantial
gorithmic
un-
Center,
Introduction
1
These
param-
704, o. Box Heights, NY 10598, USA;
[email protected], 9846. palemIIMheory.stanford.edu, 415-723-4405.
at
for
matching
string
depend in the
domized
best-
solving
Research
the
from
matching
following
matching
turns
ers/partitions”
also
[email protected],
T. J. Watson
the
mat thing,
string It
cliques
truncated
problems
der grant number CCR-89-06949 and by NSF under ber CCR-91-03953. t Cowat ~~titute of Mathematical science, Street, New York, 998-3061. *IBM Resew&
in
improvement
these
algorithms
was partially
of
results
string
string
problem
standard
on
reductions,
from
threshold
matching
a variant
analogous
of mismatches
boolean We
for
the scheme
Interestingly, all of the above results are derived using the structure of the “match graph” defined by of the given inst ante of the nonthe mat thing relation
generaliza-
here.
derive
mismatches
algorithms
convolutions.
non-standard
string
standard
fastest
JR})
counting
ductions
~ depends
the
by extending
are drawn
ber
including
in the
of
classical
of f2(~( IX 1)) convo-
introduce
we show,
faster
its
also
standard
model.
with
truncated research
we
algorithms
yield
—
function
by improving
problems
and
are proved
to this
best–known
RAM)
that
family
bound
results
algorithms
reductions
cares
(increasing)
known
boolean
don’t
a lower
model
of these
this
the
this
In the
from
We variants that
Matching:
These
match
●
with
we prove
lutions,
all
String
problems
mat thing
tions the
stringology
Complexity*
simple
example
problems
stringology matching
noif the —
— with
we
is the “don’t
cares” the
[F P74].
text
from
the
card” bol
In
and/or
underlying
symbol from
in
the
the
Unix
agrep
and
problems
from
nience,
refer
as
Since
to problems
there
advances[AL88,Ab87, phisticated standard
string
Despite
these
We
matching contrast
this
with
stringology[GS83,
in
bounds
some
our
space
is to fill
this
inherent
complexities
arising
in non-standard
provide some well
algorithms cases.
These
tremely
fast
mat thing
in
algebraic
sharp These
of two
will
correspond
a that
multiplying vectors
n
m,
of
resulting
alternately, a boolean
bounds
models are
to
any
. . .a~–l
convolution
polynomials
on
GF2
convolution.
in
in
(For
they widely with
is shorter the
case
a polynomial
which The
case resulting
than
involves
for imply
c where
Ci
Depending defined either we
refer
to
refer
to
in O(lmin(r,
@)) upper
known
refer
to
param-
as truncated
if the
running string
string
we show
variety
of non-standard
that
match-
algorithms when
convo-
T convolutions,
algorithms
problemsz
for
truncated
convolution in solving
of results
non-
can be improved.
bottleneck string
for
T < @,
truncated than
of existing
mat thing
the computational
fewer
in
irnprovements
improved
parameterized
times
group
summarized
that
non-standard
using
bodies
second
(also
In particular
that
can be solved
Our
mat thing includes
ema wide
problems. optimal
time-
the BC or PC model it can be proven that no faster algorithm exists for parametrized truncated convolution. 2The parameter T gets mapped into a structural aapect of the non-standard string matching problems, which we will describe in the sequel.
or it
follow
lution
10n
the
convolution), we
= on on
will
convolutions.
Therefore,
Formally,
Extrun-
best
simply
conclude
algorithms
would
a single algorithm
is given.
will
reductions
RAM
standard
b = bobl . . .b~-1,
vector n -1. are
which
text)
carry-overs.
b
to the
we can now
ing
truncated
b (vector
we
convolutions
to existing the
for an
in were
)
1.1),
in
earlier
parameterized
is the
convenience,
Consequent Section
that
can be computed
truncated
convolutions.
of
defined
where
is
convo-
conva,lutions
m,
be It
truncated
convolutions
this
dependcould
convolution.
erized
of de-
of the in-
problem
Kosaraju[Ko89]
we show
out out
as part
Truncated
standard
earlier
As before, this
n =
parame-
are left
specified
by
work
with
ai x bj ame left
convolution 1.
interre-
defined
terms
p aramet
of these
via
of parametrized
truncated
convolutions;
bound.1
on clas-
the
a and
and is
in as
seven
based
pattern
correspond
operation
for
Informally,
vectors
the
without
a = aoal
their
convolution
of computation
the
O(@
cated
terms what
parameter
their
standard
polyno-
of all
problem,
of operation,
T =
tending
it 0/1
Ej(ai-j x bj), for 0 S i S the context, these convolutions a field
string
[AHiU,WC76,BG91]
to
will them
given
lower
model”
Convolution
>
group
non-standard
comparisons
operation.
vector
first
takes
effi-
for non-standard
(See Appendix).
introduced
of the
that
ex-
concern
models
replacing
convolution
that
Our
as
with
1.1 below
convoiutions[AHU74]. by
“comparison
the
different
particular
here.
derived
used
times.
Section
value
more
alll known
convolution
ai, some
standard
when
originally
in
strictly
are
problems,
This of the
or boolean
to see that
etrized
we provide
we introduce are
in
a polynomial
we
prob-
that
to varieties
a boundary
the field
times
schemes
on T and
upon
deterministic
and
running
of twelve
problems
sical
include
approaches,
expected
running
to
for these
complexities
Exactly
is the
bounds;
1.1 as well.
on compu~ting
matching RAM,
for each
case
lower
improvements
that
rely
convolutions.
ing
the
any
algorithms
the
string
in-
lution
Additionally,
improved
algorithms
we relate
each ai.
of the
time
bounds
Section
Note
problems
with
easy
our
for
upper
convolutions.
put
of problems
techniques
convolutions.
summation.
of
in
algorithms
is a variation
pends
match that
in
problems.
First, these
stringology.
summarized
the complexity
family
the
that
of known
in the
in which,
thrust
in Section
algorithms
yield
it follows
matching
r,
best-known
summarized
or boolean
ter
times
in understanding,
a large
with
as randomized
of results
gap
primary
the
essentially
is, o(nm)
ducibilities
of the
in the running
(or of the
can be found
times
Second,
well.
[BG91,CC+93],
convo-
convolution
justifications
facts,
non-standard
non-standard
of problems
are
employ
that
mial
so-
very
that
problems
than
string
inher-
understanding
The
of
must
cient,
of non-
the
the
complexities
constants
these
truncated
CP91]
on the
From
additional
understood
deep
results
powerful
prob-
detailing
definitions
show
these
lems
conve-
a variety
of
cases [G G91,CH92].
paper
for
is not the
standard cluding
and
several
developments,
problems
time
of
non-standard
MR92]
structure
inherent
al-
also
of these
the running
problems.
important and
general
matching
been
solutions
mat thing
complexity
string
have
Fis-
a variety
polynomial
boolean
Further
in the BC or PC models
recently
(For
from
string
Ko89,AF91,
algorithmic
for
stringology.
non-standard
then,
a very
is applicable
as the
or the
respectively.
and precise
We each
are of
the
to
model
2.
and
Historically,
provided
which
models
systems—for
and
[WM92].
non-standard
we will
stringology
stringology
searching
P~
model
notion
cares
be referred
(or
11~
of the
general
will
lution
ever-y sym-
don ‘t
grep[KP84]
Paterson[FP74] solution
more
models
“wild-
occurrences
with
text
facility
gorithmic
ent
this
from symbol
a special
against all
non-standard in
location
a (basic)
or
match
mat thing
from
in
developed
Z,
under
interest
instance,
lems).
can
text,
problems
has
is to find
String
fundamental
each
either
alphabet
goal
the
of mat thing.
cher
problem,
@ that
X;
pattern other
this pattern
as
computational
771
space
tradeoffs
results
for
(both
derived
scribed in
and
Section
1.2.
improved
Our
the we
overview our
results
and
the
third
proof
each
in Sections
technical
ideas.
included
1.1
and
in this
bounds
and
this
two
section,
mat thing eleven to
we consider
haa
problems.
●
In
the
each
care
them
string
as basic,
is matched
main
model,
Q(log
Fischer
from
we
above
approach
yields
an
1.1.1,
matching
1.1.2,
known
algorithm
In
1.1.3
are
discussed
respectively.
problem,
presenting
paper,
when
in terms
sumed
that
length
ing
in Sec-
we discuss
convolution
and
m
string
of positions the
context
note
the
spective
the
text
of the
For
computing
of vectors
RAM.
O(ndlog
m)
Also,
time
the
are
best
known
m(<
n)
when
on the
m de-
the
given
re-
(or takes,
both
on the
●
In
the
A is at
the
these
RAM
when
can
algorithm
known
B.
takes
linear
while
best
known
more
(say,
O(n@polylogm))
In most
or near-linear algorithms
(i.e.,
cases,
our
O(npolylogm))
for
1? take
quires
fastest reduction
at
gorithms
for
these
convolutions[Ab87,
time
<
the
where
subsets
these
RAM and
1X1 ~
in
above
we show
m =
be
unless
problem
take
best
re-
on the convolu-
truncated
The
that
reduced
IX 1.4 There-
this
convolutions
faster.
by
[AF91].
can
comment,
problems
again
from algorithms
@,
parameter
problems
Once
Farach
problem
@,
solved
have
the main
case
is matched
Amir
when
earlier
end-
match
case of string
the
that
S2( 1X1) boolean
IX!
be
state
associated
show
of
with
its
(segments)
now
in
is 0(n).3
convolution
least
by
positions
in the
bound
and
on our
tions
for
this
problems
a contiguous
convolutions.
adaptation
based
be
two
the
of string problem
consider
of all
we
and
problem
subsets
we
in the
non-empty
of this to
We will
match-
position
set X, specified
instance
cases,
RAM
third
problems;
sizes
truncated
fore,
The
problems,
these
Abrahamson[Ab87]
the time
asymptotically
the
convo-
is string
a specified
ordered
model,
a simple
family
with
subsets,
BC
less than
by the
log 121 + whereaa
each
Q( IZ 1) (boolean)
least as hard as problem 1?, or B is reducible to A. By this we mean that B can be reduced to A in time taken
best–
Paterson[FP74]. exactly
is restricted
problem
in both
by
log m)
respectively)
problem
result
and
2 log IE I boolean
corresponding
with
the
and
to
we state
In
●
RAM.
places,
and
is a variant
these
of the
require
we say a problem
d convolutions
it requires
in
convolution O(n
above
problem,
intersection.
for
sum
algorithms
take
of the
mat thing the
vectors
of
convolu-
classical
takes
X.
ranges
their
results
number
n and
shorter
the
in this
In this
subset
a non-empty
non-
respectively;
re-
bound
algorithm IZI)
convolutions
takes
problem
In both
provided
vectors of
the
and
respectively)
at various
m
or polynomial
n and
that
of two
This
O(log
the
boolean
alphabet
each
points.
it is as-
cent ext
problem
classical
to Fischer
[FP74]
with
“segment”
convolutions,
Therefore
(or takes, we mean
the
pattern
longer
boolean
of lengths
on the
RAM,
and
example, the
in n and
of (truncated) length
y.
requires
matching,
in
is that
where,
all the
the size of the
this
deriving over
is associated
of the
matchmg
of a problem
of convolutions,
the
algorithm
subsets.
strings
subset
it in
results:
complexity
that
uses
to
our
second
with
input
Tree pattern
and
the the
of number
each n
standard
time
they
and
alphabet
matches
convolutions.
since
due
from
The
non-standard
which in
this
is stated
for
and twelfth
threshold
text
lutions.
refer
1.1.4.
Convention
of
problems;
is the
Section
and
show
improvement
algorithm
we group
we will
Recall
base
IX! denote
Paterson[FP74]
Specifically, string
reasons, families;
the
the
@ that
let
1X1) (boolean)
and
Our
●
non-standard
in
from
is the
cares.
tions.
are
Complexities
three
count,
mat thing
tions
into
Non-
family
don’t
position
symbol
As usual,
BC
one
Lower
For conceptual
of our problems
Basic
of this
with
a symbol
don’t
symbols.
quires
abstract.
twelve
member
mat thing
problem,
either
and
details
illustrative
0.5 log log Ill I +0(1) In this
of
alphabet.
summary
highlighted
technical
RAM
in
other
of
model
Concerning
Results
of string
or a special
our
flavor
4 to illustrate
extended
of
The
on this
have
remaining
and
problem that
comprehensive
1.3.
of the
We
3 and
The
number
based
2.
A simple
pattern
discussion
Section
comprise algorithms
and
1.2,
Complexity
Stringology
de-
are described
and
can be understood in
reductions
and
standard
1.3.
a detailed 1.1,
Bounds
problems
of results
nature
give
1.1.1
algorithmic
deterministic
in Section
Sections
issues
the results
group
and
accompanying
related
of These
technical
first
in
time matching
corollaries group).
are described
results,
expected string
randomized
Given
not
as
in the first
which
fast
non-standard
known
O(min{lXl,
al~})
AF91].
significantly 3 Clearly our results
time.
this sum can be significantly demonstrate, this restriction
the dominating
term
in the complexity
larger than O(n). As is sufficient to caDture of this
problem.
4These reductions are ineffective when Z > fi in the following sense: a reduction from problem A to problem B is ineffective if it takes more time than the best known algorithms for Aand B.
772
A
particularly
bounds
interesting
is that
we derive
ing a common
“structure”
To explain
this
ing
generai
(truly)
problem. G.
of this
A
node
element
i:
range, G
that
correspond
that
text
be expressed
level,
the
complexity
completely
in the the
its match trated
match
graph
In the
BC
G and
CC(GC)
clique
cover
(not
is the
its
clique
cliques
necessarily
in the
disjoint)
edges
the
the
gsm
tion, with
is true
problem
we infer
string
fact
and
the
matching ranges,
of GC.6
can
result.
In
specific
value
order
to
union
is
an interesting
with
general its
don’t
cares
in detail,
the
mally,
in each don’t
Q(r)
boolean
tions
can
concerning to will
the
be the
in Section
string
from
a dominating elements
can
for
general
compute
This
of size
gsm
truncated
probneed
convolu-
characterization
im-
characterization
case of ir = 1 was consid-
the
context
cover
second
is aligned
this
notion
with i,
problems subsets
i,
there
output
with
from
1 and that
we get
problems,
Section
1.1.1,
of
so on.
of deter-
three
more
based
namely,
For these
if
a count mismatches;
i +
is a match,
and ranges.
exception:
determining
replacing
matching
(standard)
following
the pattern pz
problems.
of
than
Non-
thing
of four
we must
that
with
string
cares,
the
rather
there,
Mat
a variant
of counting
whether
three
family
is
position of positions
four
on the
with
don’t
problems,
we
that:
In
the
PC
Q( [X I) The
lems
model,
all
of
(polynomial)
these
best-known
algorithms
[GG88,Ab87,AF91]
problems
convolutions match
for
this
re-
for
IX I
these
bound
~
prob-
on the PC
model.
we will
In
in G to de-
generally
text
pl
@.
number
the
of
String
example
the number
quire
matching
that
Counting
matches
●
computation
Complexity
and
mat ching[Ka93]
show
the
convolutions.
of more
clique any
best–known
special
convolu-
of size ~ will
unless
Standard
non-standard
and
can
with
2)
the
and
be reduced
again
for
IX I ~
A,
parameterized)
parameter
7 once
as con-
RAM
(polynomial
tion
For-
A of size k in G (Figure
be thought
the
ito an instance
a dominating
faster.
the pattern
the structure
G is the
truncated
subgraph
the
consider
mining
3.
truncated
clique
result
graph
7, s Therefore,
Bounds
each
With
of
observa-
case of string first
with
be solved
illustrative
for
Lemma[An87].
clique
parti-
from
be reduced
convolutions
where
We now
of the
bounds
In
this
a given nodes
size in G.
(boolean)
on the previously
1.1.2
The
instance
very
to
case.
cares,
we use a different
reductions
5 Clearly
we need
. . . k}
ered.
The
subsets,
this
use of Sperner’s
result
. In the RAM, rive
with
specialization
describe
specializing
of CC(GC)
involves
with
problem
in [MR92]
of whose
cover
lower
cares,
do this,
matching
and
don’t
simply
of string The
gsm
proves
number
previous
above-mentioned with
by
on the
dis-
{1,2,
problem,
of each and every
based
and,
j > i.
of a match
IT can
an dominating
here this
i from
labels
any
with
an instance
is an edge
that
parameter
lem
graph. Since
show
7r = 0(~.
gsm
is the smallest
graph,
prop-
be labeled
range
to those
of largest
1, for
convolutions,
number
the
label
subgraph
clique
of the
GC are illus-
solving
cover
with
7r+
complement,
example
boolean
of a graph
two
k nodes
can
from with
of
of induced
graph
that
Q(cc(GC))
number
of bipartite
An
conflict
we show
at least
where
G and
GC.
its
of
structure
partition
with
3. Specifically,
model,
takes
graph
graph
following
partition
exactly
in
at a deeper
instance
the
a node
dominating
tion
problem.
that
that,
is connected
dominating
variants above,
gsm
to show
every
by
conflict
in Figure
of gsm
of
determined
“cliques” namely,
able
such
We
the pattern
mentioned
each
integers
tion
The
elements
the three
cases of the
we were
the
edge nodes
pattern,
text
from
that
matching
as special
Surprisingly,
the
with
in A has exactly in
by
other
is associAn
nodes
tinctly
G is a bipartite
from
partition
2. the
in-
graph symbol,
that
Clearly, derived
1. each
distinct
. . . . j~ in the
derived
string
each
(pattern).
jl,
and those
each
ti and all those
node
It is easy to observe
non-standard
●
text
i matches. nodes
for
symbols5
the
subgraph
short)
match
be an alphabet
to elements
in one partition, another.
in
between
the
the follow-
gsm for
a pattern,
is defined
of alphabet
element
with
and
could
a position
is defined
graph
G
an element
with
(or
is an induced erties:
y inherent.
has an associated
(pj ) in
or subset
ated in
ti
matching
above
characteriz-
we define
to a text
problem
the
by
is universal
better,
string
of
of them
that
structure
In addition
stance
aspect
all
n for
8It is assumed
to =
lager
each
of these
lx 1.9 T,
As
show
subgraph
that
convolu-
four
before,
the reduction
the dominating
we
truncated
problems, this
implies
is i]leffective. is provided
with
the
input. This need not be always true and it can be implicitly specified, but very easy to find for problems of int crest such as string matching with ranges and subsets, from the alphabet set. In general, we have to account for the additional complexity of finding this graph, should it not be specified explicitly. 9 StrictlY this i5 tme ~~y with the exception of the co~ting
junction of subsets or ranges and so on. Our results extend to such general elements as well, though we omit these details here. 6 We additionally show that given a minimal clique cover of Gc, the input instance of gsm can be solved using CC(GC) boolean convolutions, and hence this bound is matched from above.
variant
773
of standard
string
matching
and
string
matching
with
that
a RAM
mial
convolutions
will
yield
fewer
algorithm
one that
than
that
for
solves
As in the previous
ber
from
1.1.1,
match
graph its
before.
complexity
analog
except of
an
instance,
Once
again,
we note
string
mat thing
be expressed
as special
tionally
we use an analog the to be
in the edges ●
the
PC
least
cp(GC)
in
and
some
correctly
(See Section
for the counting problem
cal
4).
(the
first
any
algorithm
is a corollary
of
our
case that
which
follows
the
trix
equation
lower
shows
mentioned
condition
is true,
must the
be
output
O otherwise.
For
threshold
matching
problems,
and
for
with
for
IXI ~ @,
the
convolution
the
(parameterized
can be reduced
a parameter
value
to derive
these
problems,
to
In I; 10 this
has
holds
for
every
technique
value
vector
convolution.
b in the
these
results
values
This
involves
k to the
defer
final
a non-
number
of zeros
instance
of trun-
a complete
version
of
characterization
corresponding
We
to the
complex-
particular our
of k.
of relating
cated
nontrivial
for
k = O or k = m – 1), but
in the
discussion
of this
Complexity
of
Tree
of
paper.
Pattern
at
above;
of
and
tree
pattern
Mat
pattern
thing
,K089,DGM90]. to
implying
that
on
running
time
solving
factors)
in
will
for
string
[DGM90], the
yield
BC
this
with bound
problem, (up
improving for
that
current
is optimal
bounds
and
show
matching
the
model;
better
labeled We
reducible
the
is
matching
ordered,
thereby
of
to polyit
on
the
truncated
Consequences
of
the con-
problem.
least
Additional
1.2
of course,
any
problem
volution
clsssi-
needs
problem text
trees[H085
RAM
in the
the
the the
subsets
log
match-
Our
Reductions
it
algorithm
approach,
above
corresponding 1.1.2)
is 1 and
O(nfipolylog(rn))
showed
result
for
Fischer-Paterson
bounds
this
a cer-
string
bound
that
ity
rooted
is satis-
follows This
4,
which
[Ka93]
[FP74]
general
simple
in the Section
implication.
wherein
bound
to be sketched
convolutions.
is the
Section
equation
Karloff
approach
polynomial
If this
in
than
equivalently,
position
Consider
is inter-
gsc problem,
which
position
truncated
usual
1.1.4
where
lower
of standard
problem
k);
non-standard
RAM
the
trivial
cliques
algorithm
a matrix
Recently
defined
k.
of them,
here
de-
disjoint
proof in
any
of the
text
(as
corre-
of GC.
of our
for
variant
Fischer-Paterson
Q(IX])
part
that
the no more
of gsc takes
development
via
number,
convolutions,
of this
that
In the
k (say,
We show:
number
an instance
that ing
graph.
an instance
hard
modeled
subsection),
necessarily
its
Addi-
cover
whether yields
a “threshold”
It is straightforward
can
of a graph,
of bipartite
partition
The
fied
of
number
solving
argument
solves
bound
variants
of whose of the
sketch
detail.
as
the four
clique
text
that
each
GC,
above,
for
polynomial)
The
exactly
number
development
we
num-
complement
than
●
defined
mentioned
polynomial
clique
is the
t ain
cover
technical
esting proof
union
model,
is the
The
that
of the
smallest
Q(cP(GC))
its
no more
determine
of the pattern
we show:
from
text.
for
variant
resulting
prob-
of the
of the
are
partition
the
is an edge
In
at
the
graph,
This
cases of gsc problems.
namely,
clique
general
problem,
count
position
subgraph
(truly)
gsm
the
results
we
(given
output
count
structural
problem.
we output
dominating
non-standard
fined
of
of the
at each
G
general
position, alignment
k mismatches
using
the above
the more
(or gsc for short)
natural
of mismatches
and
convolution
text
sponding
problems,
the
of the
counting is the
Section
truncated
each
IX I polyno-
of these
case, we derive
them
characterization string
any
n convolutions.
by specializing
lem
uses less than
solving
is trivially
the
In
ma-
this
section,
quences
true.
we will
of our
sketch
reductions
two
additional
discussed
in
conse-
the
previous
sections. ●
In
the
cated
RAM,
we
show
convolution
to an instance subgraph usual
with of the
that
any
parameter
n can
gsc problem
of size T + 1, provided
implication
(polynomial)
as in Section
with
trun-
be reduced
1.2.1
Time-Space
T < @;
this
has the
A match
1.1.2.
Complexity String
of Non-standard Mat
graph
partite that
1.1.3
Threshold
and
any
algorithm
for
(basic,
thing
with
now
These
problems
counting don’t
consider
cares.
the are
problems However,
third
family
respectively in
Section
for these
of four variants
1.1.2.
problems
problems. of the
count,
Informally, we develop
TS
=
match
(This
solves string
graph string
problemll
in
show
a non-standard
matching —
of bi-
We
problem
as well
as count-
matching[Ka93]
— bound
union
otherwise.
which
of standard
Q(nrn).
if it is a disjoint
is nontrivial
or threshold)
a nontrivial
k-mismatches
four
is trivial
cliques,
ing variant We
Tradeoff
a dominating
time
T
and the
and
is optimal).
space
This
S,
follows
for an aker-
nat e characterization based on cert tin “sparse” convolutions (rather than pammeterized truncated convolutions). With this new characterization, an implication analogous to one stated here follows for these two problerm as well.
10All
coment~
rua&
counting
variants
in the
they are omitted. 11Note that the ~t& variant standard string lem is always trivial.
774
in Section RAM
1.1.2
about
are pertinent
graph mat cling
corresponding and
the
complexity
here
as well,
to the k-mismat
of and
co~ting
&es
prob-
from
the matching
convolution show in
that
all
the
log m)
standard)
string
1~[ =
Q(m).
holds
for
non-trivial
have
be
that
and
on
cal
the
result
[Ga85]
in
[CL90
and
to the
bound
[Ab87]
when
for
input
with
the
are
performs
text
and
of the
a
the
drawn
convolution i.e.,
follows
from
k-mismatches from
the
the
matching
problem
problem
for
which
boolean
Two
1.3.1
Randomized
Results
tion
Algorithms
provide
standard
basic
the
degree
mic
in
time
fast
graph
previously
reducing
“large”
alphabet
over
modulo
1.3.2
A
the
problem to
study
open
In
aa string
mat thing exception:
provide
an algorithm
previously,
in
the
no o nrn) stricting
seems
algorithm
shown that
in
with
the
length.
the
fore,
We
is
the
a
complexfor
is no 10ss of generality
queries
lem the
to m.
is at RAM. model
stringology bound
[W86].
775
out,
the
any
least
as hard
This to
shows study
boolean
the
the
RAM
complexity
since convolution
of
problem. model
studying
showing
prob-
“wea,k”
graph
as a boolean that
be-
to study problems
string
show
by from
As
stringology
also
de-
location
is the number
in
match
We
pattern
comparison
non-standard
non-trivial
the
location.
is too
each
locations
nulmbers
the
successful
t and
for
algorithm
text
that
model
cent ain respec-
when
algorithm
stringology
model.
problems for
for
algoprob-
from
back,
of the
of non-standard that
with
in this
hard
turns highly
standard
we show
time
in re-
it
gets
each
it uses to solve
been
one
The
sum
of the
An
matching
and pattern
for
is the
problem.
as in the BC
location.
queries
algorithm
Model.
by A,
finally
queries
uses to solve
and which
arithmetic
As
problem
which
output
complexity
since
text
complexity
haa
of text
identified
that
the
string
it
by
by computing
matching
response
of
contain the
of elements,
each other
the
lems from
harder and
with elements
alignment
which
was
In
loc~
identified
of the
much
the
text
alignment
(PC)
works
sets of
from
determines
it
string
sets
p.
on
the
various queries
the number
Remark.
time; time
from
alignment
a general
two
the
various
of strings
is known. there
location,
computing
mat thing,
rnatching[AHU74]
text
way
to be intrinsically
string
other
of elements algorithm
a counting
PC model
the
same
o(nrn)
of understanding
expression
the degrees
on the
two
to an alignment
location
basic
identifying
the
queries
respectively
complexity
by
0(nrn075polylogm) running
The
lem
termines
to access
identifies
which
of alignment
of each
is placed
each
from
Convolution
with
over
bits
solves
align
for
for each text
A that
the
is given,
sets
Polynomial
which
other
The
non-standard
tively
the
two
location.
matching
sets
the
of the
number
left
of equal problem
that
matching
however
are now
this
basic
direction
time &
subsets,
takes
problem
of regular 12lt ~a
for
string
from
A and
on t at i leads
rithm
Outside
is defined
symbols)
which
non-standard
step ity
alphabet
This
a
match charged
alignment
text
location
prob-
primes.
string
the
problem
no algorithm
known. than
we call
elements
than set
the renaming
Problem
of
with
(rather alphabet
over
same
p and
it is allowed
using
an
string
is not
query, it
a O otherwise. finally
given
Stringology
This
following
take
approach
defined
of the
Matching
which
string–subsets.
Our
to
Consider
pattern
but
only
the
and pattern
OR
algorithms
of
asked
non-standard
algorithm
on G,
response,
logical the
queries
queries
model.
t,
The
pattern
i, a 1 if p placed
and
text
alignment
output for
chosen
a generalization
in [Ab87]
whenever
O(npolylog(rn))
initially via
A,
non-
polylogarith-
case.
instances
alphabet
Non-standard We
this
appropriately
String
most
take
for
set,
“smaller”
symbol
at
best–known time
for
problems
is
algorithms
polylog(m))
involves
algorithms
matching
the Our
whereas
lem
string
of
m.lz
O(n@
randomized
on
one from
elements We
type
complexity
it is charged.
each
a text
algorithm
to access the
the
of such
sequential
an
“yes-no”
the basic
model.
or the
elements,
Algorithmic
only
(B C)
computation
For
model, is allowed
number
solves
G in this
identi-
for
problem.
pattern.
1.3
is the
Convolution
text
this
using
A that
any
In
tj mat ch pi?”;
“does
algorithm
the
model
matching
Boolean
for
problem.
standard
pattern
algorithm
log m)
an
the
string
form
graph
can
o(~
reduction
Definitions
Model.
that
solve
vectors
time,
of our
latter
it
Results
boolean This
since
is
Comparison
of (non-
ID I ~ 3.
sublinear
nature
RAM
computation[AHU74].
problem
Time
if the
,C92]
the
convolution
subsets
when
average.
The
a non-
problem
matching
randomly,
performed
time
with
Expected
shown
uniformly
is embodied with
is an improvement
non-standard
Sublinear
the
and
reductions
a time-space
for
matching bound
Models
a boolean
our
[Ab86]
Previously,
mat ch graph,
1.2.2
that
instances
was known
Our
any
fact
problem
mat ch graph.
2
for computing
the
convolution
above
of Q(rn2/
We
bound
and
boolean
of
trivial
TS
[Ab86],
that
matching
takes
Cl(nm)
such
a prob-
convolution moclel of
ncm-standard
a specific
has been
on
is a very
open
lower
for long
3
Bounds
and
Basic First
we
plexity the
String
derive model
queries.
Since
the
model
BC
boolean this
we claim is
graph
edge-disjoint the mum
cover
number
1 Solving
matching model
a
problem takes
exactly
CC(GC)
from
a universe
that
from
1 String
ezactiy
matching
yields is the
cover
G
G;
4
mini-
for
log IZI+O.5
on the BC
the
We
BC
derive
the conflict ing
with
tem
graph don’t
PC
cares
alignment
of another. maximum
set
of given of
size.
the
(a)
First
to
model
volution lower
clique
cover S={
elements.
for
those tom
Ki
on Sj
that
given.
S2} vertex
any
two
sets
of the
{Cl, the
i on the
. . . C’.}
top
and
let
that
each
alignment
are
We
i on
the
cover
top
a
set
and
sets in S are incomparable. universe
set
sets
on the
[1 . . . ~].
= IxI.
❑
tc~ #
in
2
exactly
cp(GC) @
such each
to a top is we~ hewn
in a clique
partition
to
see
sufficient.
string
on the PC
alignment
two
counting
string
matching
that
S, the
problem.
by vertex
set of all
Let
set (S:,
Gi ‘s, must
that
at
necessary.
(%>$), ”””, (S4, S;) be the sets by any algorithm which solves
at
alignment
argue
(s:, s;)> identified
of GC induced
has
sets.
cp(GC) are
takes
G
vertex
now
queries
of
matching model
provided
that
We
partition number
of G.
queries,
is
a
bipartite
minimum
of its
gives
of a bipar-
clique
counting G,
con-
this
of polynomial
partition
any graph
easy
m,
The
is the
on each
cp(GC)
of GC .
Let
of vertices the given
Gi be the subS;).
We claim
be a clique
partition
flj Our
argument
strategy
for
=
if and
to
assumed
that
First, at
the
in
are two
~ong
re-
14 All
Complexity.
when
776
if our
assume responses
~own
lx I < @i
the
claim
various
to
either
holds.
same
that
all 1 pairs
cases:
We
to construct
The
length,
each
are in fact
for
Gi count
is static. all without
static.
m. That
There
or some
matching
is,
looking
queries14.
is a clique string
queries graph G is correct
t and p are
text namely,
alignment
the
~~o=it~
and for
a
t and a
alignment
the algorithm once
provide
a text
these strings and the match 1 rounds, the algorithm d
to be of the
it identifies
clique
answer
with after
only
vertex
is adversary-based.
an adversary
p and
consistent such that
sets are sub-
Consider
cp(G),
Since model
polynomial
number
G.
in the
PC
of edge-disjoint
vertices It
complexity
in the
the
a ckque
alignment
are
the
n and
yields
match
on
query
the
that
Solving
with
all
C.} S
of
union
problem
least
terms
Recall
cliques
aa
We
Sj.
{Cl,...,
bottom
String
queries.
of length
bipartite
bot-
collection
The
corresponding
this connection
in Communication
C’i have
S’i corresponding
the
(~&~~2)
alignment
as computing
denoted
pattern
C =
exists
problem
as hard
number,
graph
GC.
construct each
and
defined
assigned that
a
Consider
vertices
for
of in-
and there
mat thing
of vectors
whose
queries
bottom
subgraph
those
re-
states
1X1 + O(l).
bound
string
G is a collection
Proofi
system,
of cliques
which
all
a clique
of Z sets,
13we ~USPeCt th&
follows
lower of the
graph
most
Lemma.
collection.
~~ E S,
Ci G C. TO the set Sj
searchers
a clique
this
cover
is
from lemma
vertex
from
sets Sj such
{s1,..., some
to each
are assigned
Assume
a ground
a sperner
C is a clique
GC
for
set S of ~
that
(-) of
on
universe
SIEI } on the
which
claim
bounds
Consider
each
E
Sperner’s
our
collection
Counting
terms
Theorem
GC is generated.
subset
vertices that
our
of
a sperner
C =
For
sys-
from
Assign
follows.
match-
sets are in-
tight
reduction
from
that
0.510glog
in
bound
cliques
is a subset
collection
system,
that
Sl,...,
collection
the
of the
show
a unique
such
Given
asymptotic
system
two
tite
of
is a collection
any
provides
such
a sperner
we
of string
is, no set in the collection lemma
cover
to a Sperner’s
that
by rc~,n
constructed
Lemma
(n~z)
It follows
for
a counting
is exactly
takes
queries
a clique
system
set such
size of any
G=
from
the
sperner’s
Sperner’s
the
that
is equivalent
The that
show
for the problem
a universe
comparable,
top,
G.
cares
[An87]13.
sets from
cover
3. We
a size.
a general
of solving
model.
See Figure
follows
Sperner’s
set of size n, any
~min z log 1X1+
convolutions. Proof.
be
Matching
string on
don’t
log log 121+0(1)
denoted
K,
It
~~in.
= ~min.
Bounds
G.
queries.
with
of
can
of such
we claim Corollary
comparable.
necessarily
union
alignment
construction,
be
sets
sets has size at most
Therefore,
graph
value
of size CC(GC)
a ground
of
by this
cannot
comparable a collection
non-standard
match
the minimum
that
k,
reduction.
IX I incomparable
of
clique
our
that
#
that
m,
cover
j
Consider
n and
a clique
claim
sk,
such
number
cc(G),
basic
with
in
We
and
completes
ductions
the
i c Sj. S’j
length
whose
in any
query
of not
denoted
sets
of the
that
cliques
number,
of cliques
Theorem
of
terms
Recall
in
as computing
G, is a collection
bipartite
clique
in
make
com-
of alignment
alignment
vectors
bound
convolutions.
a bipartite
each
the
problem
number
as hard
of
a lower
boolean
that
exactly
on
matching
of the
Ci, two That
bound
string
terms
convolution
gives
lower
a basic in
of
Matching
a general
of solving
BC
Complexity
Gi is
problem
not
a clique.
here. be
In
The
the
an edge
an edge G.,
GC.
e = (r, a),
called
the
three
types.
e is not
in
any
than
one
Gi
more
of these
cases,
simple algorithm
It is much gorithm
is
harder
after
alignment to a number rithm
while
quent scenario sary
are
or the the
as follows.
using
this
about
weak
second
due to the
cancel.
That
the
correct We
swer
show
the
errors
two
the match
of its
sets
graph
mat thing size
It
condition an
in this adver-
cc and and
and the
by
the can
1 in concould
problem process A be the lli
with
adversary ensure
of the That
the
between must
(S;,
the
S;)
be a clique
equivalent rank
rank(A)
down
the
let the subsets
adjacency
matrix
size
in G.. and
a count
Ge,
but
function
(where
< ~~~~
rank(lkf~
model
non-standard
string
dependence
in our
to note
lower
on the
bound
intimate
re-
for
the
they
relation-
techniques
Complexity
respectively.
numbers
the
bound
and
[Lov].
are frequently
bounds
seen
in the
communication are
referred
tech-
Both
the
upper
complexity to
as cover
and
Acknowledgements
Sincere
thanks
Farach,
Howard
to Amihood
Amir,
Karloff,
Bill
Ravi
Chang,
Boppana,
Martin
and Martin
Tompa
discussions.
K.
the
size
[Ab87]
K. Abrahamson.
the
rank
). Also
for
for
each
i, let
the
edges
that
each
$ymp
for
FOCS,
string
matching.
1039-1051.
Combinatorics
of
Publications,
Mi.
This
the
[AC75]
is
2dth
Finite
Sets,
1987.
reals),
A.
Ann
Aho
and
M.
A.
ACM
and
Symp
M.
Symp.
and
to
of Comput.,
18(6), M.
Figures.
1975,
Addison-
Wesley
Proc
Algorithms,
Publishers,
string search.
333-340.
Farach.
Approximate
on Discrete
Efficient
bibliographic
Efficient
Matching of
of
2nd
Ann
1991,
Aho, J. Hop croft, and [AHU74] A. The design and analysis of computer
777
Al-
matching.
Theory
Corasick.
aid
of the ACM, Amir
rectangular
model.
Farach.
two-dimensioned
An
dimensional )
Benson
58-68.
Comm, [AF91]
G.
independent
searching:
of
i, rank(ll~
Amir,
1992,
Gi
2 Solving the counting variant of string with don’t cares over alphabet set E requires tn the PC
1987,
Science
A. Proc
1 ~ rank(A).
queries
Ann
Generalized
Comp.,
Anderson.
phabet
Let
sub-additivity
each
[ABF92]
in the
S~)}.
is over
J.
Oxford
matching
. . ..(Sl. and
I.
27th
for
those
Consider
string
A = ~t
From the
form.
programs.
with
402-409.
SIAM
partition
tradeoffs
constructed
programs line
1986,
of
Time-space
Abrahamson.
branching straight
Corollary matching
alignment
—
of IZ I are
comparison
a direct
lower
tile
strat-
1. Therefore,
IXI
our
there
[Ab86]
is exactly
at least
the
as shown
cp functions
is of size
clique
we claim
in addition, 2.
display
in Communication
lower
his
of vertices
representing
Then,
to Theorem
in
Bibliography
claim
it identifies
(S~, S~), for
as A,
clf problems
independent in
Br93]
of functions;
❑
G and
be of same
complexity problems
alphabet
for valuable
an-
oft
In our
limits
2 in a different
S~),
number
are alphabet-independent
that
Theorem.
solves
{( S/,
❑
get
to
details
number
which
be S =
which
problems
bet ween
6
above.
edges
by the adversary
to write
in Theorem
algorithm
gives
discussion.
substantial
is also of interest
niques
guess
made
unlike
these
of the
[An87] It is convenient
121, which
sults.
subse-
in a manner
to
of Gc.
in the
(3)
correctness.
maximum
vertex
type
for
ship
when only
that
[KMP77)ABF92,GP92,
algo-
text
can
(nontrivial)
constructed
v is the
the
is forced
the proof
al-
the text
algorithm
carefully
the
the
of weak
preceding
A of the graph
=
ches! for
queries
the
edge of type
2, the
a strategy
egy, the pattern V2 where
types
a weak
We omit
and
in the
is that different
algorithm
earlier.
strategy
and
of mismat
alignment
a dynamic stated
(2)
of type
number
can
the
that
by the
However,
adversary
by the
matrix
rank(A)
Remarks
known
set of vertices
the
Clearly
standard stringology — algorithms with
commits
case,
of type
is, by using
with
vertex
constructing
edge.
the
edges
which
difficulties
static
corollary
We note
the
the
constructs
weak
difficulty
algorithm junction
In the
so he simply
is dynamic,
The
main
edge before
algorithm the
Two
that
of sets for
the adjacency
3.
each CTm; a
to the previous
the
Figure
5
(3)
when
be used
pair
In
p =
adversary
between the
the weak
pattern
decides
can
not
or
G=.
claim
the
this
queries.
pattern,
it
the
input.
the response
identifying
knows
the
is,
but
argument
Consider
GC in
is one
G.,
and
Tm
the
Intuitively, query;
alignment
as in
=
to
exists
Gi
is in
on this
of alignments
in an alignment
t
to argue
scanning
queries.
it
as well sets
that
there
edge, which
but
Proof.
S needs
e is in some
Gi
an error
dynamic,
sets to pick
weak
completes
commits
it is omitted
that
Otherwise
(1)
adversary
case analysis
and
we argue
of
(2)
e is in
case is easy
case,
partition
of following in
latter
former
1974.
J.
2NonACM
212-222. Unman. algorithms.
[AHiU]
A.
Aho,
Bounds mon
and
J.
of the
problem.
[GG92]
Unman.
longest
J. of A CM,
Vol.
and
Landau.
Fast
multidimensional
matching. 97-115.
T.
J. Baker.
and
Computer
array
Science,
81,
[GS83]
A technique string
for
matching
dimension.
extending
rapid
to arrays
SIAM
J.
Vol.
R.
[Br93]
Two
D.
Letters,
Dictionary
[Ka93]
6, No.
uniform
and
string
Symp
on Theory
Z. Galil.
matching.
Chang.
with
M.
Proc
R. Hariharan,
and
W.
Ann
for
R.
and
ACM
D.E.
and
R.
and
parallel
pattern Proc
Hariharan.
of string
IEEE
M. tern
R.
1992,
[KP84]
Karp,
IEEE
Symp.
the
Proc
[KR87]
exact
[Ga85]
IEEE
Ann
Lawler.
sublinear
[Lov]
Approximate expected
time.
R.
Crochemore
and
D. Perrin.
Two-way
pat-
Journal
of ACM,
38, 1991,
651-
116-124. [MR92]
Gedil.
[GG88]
[GG91]
Z.
and Proc
E. Magen. IEEE
Faster
Ann
and
Karp
Paterson.
Products.
Open
Match-
SIAM-AMS
Problems
Galil
String
Proceed[WC76]
in Stringology.
on Eds,
Words,
Com-
A.
Springer-Verlag
Apos[WM92]
Lecture
Galil
and
ing.
Journal
Z. Galil it y of string Computing,
R.
Giancarlo.
for
approximate
of Compiezity, and
and
and
and
Fast
A.
Proc
paton
Rosenberg. Patterns
dth
Ann
in ACM
1972.
R.
Pike.
The
UNIX
Prentice-Hall,
1991,
lower
On exact bounds.
Rabin.
and
NJ,
S.
random-
IBM
Development,
Flows
Promel,
Efficient
algorithms.
Communication
31(2),
Journal 249-260.
complexity
and
VLSI
Schrijver
Layout,
Eds.,
- a surKorte,
Lo-
Springer-Verlag
235-266. Muthukrishnan
and
under
FST
@
Vol.
general TCS,
652,
I. Wegener.
H.
Ramesh.
match
India,
String
relation.
LNCS,
Proc
Springer-
1992,356-367.
match-
33-72. complexSIAM
J
1008-1020.
778
The
Complexity
Wiley-Teubner
of Boolean
Series
in
1986.
C. string
Wong and A. Chandra. Bounds editing problem. J. of ACM,
1976,
13-16.
S. Wu
and errors.
83-91.
U.
Manber.
Fast
Communications
Func-
Computer
ence,
1992,
structures
string
4(1988),
R. Giancarlo.
matching:
Data
M.O.
matching
Paths,
allowing
1985.1-8.
algorithms
1989,
Journal
Repeated
of Comput.,
L. Lovasz.
tions.
1974.
Algorithms
and
Pratt.
SIAM
Arrays.
vey.
Verlag, [W86]
M.
7, 113-125,
and Z.
match-
FOCS,
Symp
145-150.
other
Vol.
binatorial tolic
Z. Galil,
mat thing.
1990,
pattern on
environment.
pattern
vasz,
1990,
and
Notes,
1993.
V.
Miller,
and
Kernighan
of Research
on FOCS,
Fischer
Z.
68-
approximately
tree
Symp.
of
on Theory
B.
ized
Symp
pattern
ings,
1982,
323-350.
R.
programming
600-609.
E.
in
M. Dubiner,
ing
for
strings.
6(1973),
matching
M.
Sci,
Pattern
of ACM,
J. Morris, in
Trees
12th
[FP74]
O’Donnell.
Manuscript,
Identification
Symp
Ann
FOCS,
optimal
Syst.
algo-
matching
On
matching.
and
matching.
tree
M.
Efficient
Knuth,
Strings,
675. [DGM90]
space
Comput.
Journal
Ann.
matching
(1990), [CP91]
of
247-256.
1984.
matching
Proc
L. Gssie-
1993.
Chang
string
Proc.
1992,
Time
algorithms
IEEE
Computing,
K. Park,
fast
dimensions.
on FOCS,
W.
Fast
Kosaraju.
tern
[KMR72]
Z. Galil,
Optimally
two
complexity
[CL90]
and
in Trees.
Proc
[KMP77]
for
Communication.
preprocessing
Cole
Symp
alphabet-
178-183.
1991,439-443.
S. Muthukrishnan,
Rytter.
on FOCS, [CH92]
FOCS,
Journal
mismatches.
S.R. ing.
case.
bound
l?3rd
Crochemore,
niec,
one
J
280-294.
H. Karloff.
[K089]
un-
length
A lower
of Comput.,
Private
R. Cole,
in
Truly matching.
J. Seiferas.
Huffman
Rapid
rithms
Park.
Symp.
and
counting
1993.
parallel
[CC+93]
IEEE
Matching
match.
Vol.
matching
alphabet-the
D. Breslauer
W.
pattern
Processing
Breslauer.
Manuscript,
[C92]
K.
dimensional
matching.
C.
SIAM
168-170.
bounded
[BG91]
dimensional
Information
5, 1977,
Ann.
Z. Galil
[H085]
7,
complex-
bounds.
95.
S. Bird.
ing.
and two
26(1983),
of more
Comput.,
On exact
upper
1992,407-437.
Galil
33rd
1978,533-541. [Bi77]
Z.
R. Giancarlo.
mat thing:
independent
string
exact-match one
serial
approximate
Theoretical
1991,
than
G.
and
of string
Computing,
23, [GP92]
Amir
parallel
Z. Galil ity
com-
1-12.
A.
[Ba78]
Hirschberg, complexity
subsequence
1976, [AL88]
D.
on the
text
for Vol.
Sci-
the 23,
searching
of ACM,
35,
7
Complexity
of
non-standard ing
the
basic
string
problem
on
match-
the
h(i)
RAM
define
parameterized
Parameterized alaz..
Truncated
.a~
tors.
In
and
blb2 -. .bm,
addition,
each
a function
parameter
Let m be divisible cated convolution ~,
1 ~
truncated
~ <
m,
n >
m,
index~
h
by ~. of a and
namely
the
: {1, . . .,n}
0/1
vector
~
1
1
1
a has
{1, . . .
ClC2 . . . cn_~+l,
is defined
Figure
~
1:
cated
(ai+j-1
(j mod T)+l
we refer
convolution When
@
Alignments
involving
bj)
the
arithmetic
operators x
polynomial
and
tors
@
operation
and
@
@
+,
truncated
this
to parameterized
as simply
the
the
truncated
and
@
this
convolu-
operation
are respectively
the
is called
the
boolean
consider
dividing
VI
are respectively
convolution;
Elements
trun-
is called
when
the
logical
A
Y3
Y2
Y4
Intuitively, windows
text
of length
location,
window. defined
the
operaand
truncated
that
text
location
index It
only
V,
convo-
terms with
window.
is easy
to see that
convolution
for
r
[AHU74].
truncated
convolution
We derive
the following
=
into
parameterized
X2
xl
stan-
this
is the
Kosaraju’s
--d ___ ---xk
a
is the
by Kosaraju
x4
x3
Text
of greater
m,
in
ex-
in which
1, this
extending
a is
convolution
m =
introduced
each
convolution
location
For
dis-
For
an index
are considered
a pattern
in each
into
1.
truncated
as the standard
those aligns
pattern
Figure
assigns
erized
way
the
~ aa in
h function
paramet
the same
cept
dard
the
The
ai
region.
Yk
lution.
joint
b in shaded
convolution
Pattern
tion.
with
}
The parameterized t~~n~ b with integer parameter
61
convenience,
aligns
Ui
truncated
For
1
1
vec-
follows:
Ct =
b
I Let
be two
in
I
convolution.
Convolution:
a 1 Ii
W*-
w. I
We first
= 1
Figure
Elements
2: Dominating
Clique
of Size k
[K089]. result
in
[K089].
Lemma
1
volution
The parametrized
of
vectors
a
and
spectively
can
be computed
O(min{r,
@})
standard
We now zed
show
truncated
mat thing
the
truncated b of in
length
the
connection and
n
RAM
boolean
convolutions
boolean and model
12344
con. m
12344 pattern
re.
using
elements
convolutions.
between
D 1
parametri-
non-standard
2
3
4
4
el:::nts
12344
string
problems.
G
Gc Theorem
3 Solving
matching length
n, pattern
which k ~
contains 2,
terazed length
problem
in
basic
non-standard
RAM
model
p of length a given
is at least truncated
a the
boolean m
and
dominating
as hard
n, b of length
m
with
a match clique
as computing convolution
with
%
3
string text
t
graph
Example:
of
Pattern:
12#34@l
Text:
1234233g54~
G
C of size k, the parame.
of vectors
a of
T = k – 1.
779
Figure
3:
Graph
G for
Example, String
Conflict Matching
Graph with
GC Don’t
and Cares
Match