Alphabet-Dependent String Searching with Wexponential Search Trees

Johannes Fischer1 1 TU 2 University

Paweł Gawrychowski2 Dortmund

of Warsaw (supported by WCMCS)

July 1, 2015

Johannes Fischer and Paweł Gawrychowski ( TU Dortmund, String University Searching of Warsaw (supported by WCMCS))

July 1, 2015

1 / 25

We consider a fundamental data structure question: how to represent a tree?

(Compacted) Trie A trie is simply a tree with edges labeled by single characters. A compacted trie is created by replacing maximal chains of unary vertices with single edges labeled by (possibly long) words.

Navigation queries Given a pattern p, we want to traverse the edges of a compacted trie to find the node corresponding to p. If there is no such node, we would like to compute its longest prefix for which the corresponding node does exist.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

2 / 25

We consider a fundamental data structure question: how to represent a tree?

(Compacted) Trie A trie is simply a tree with edges labeled by single characters. A compacted trie is created by replacing maximal chains of unary vertices with single edges labeled by (possibly long) words.

Navigation queries Given a pattern p, we want to traverse the edges of a compacted trie to find the node corresponding to p. If there is no such node, we would like to compute its longest prefix for which the corresponding node does exist.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

2 / 25

erw

qo fdd cmvn

fed tovnd

ewn bog dkn qtkj

g nbo

bx

djd

hjk

dk

mnx

cv hyugfe

pov

Johannes Fischer and Paweł Gawrychowski

trqw

idk

xc wp we

ba sdk

Consider p = wewpxcwrehyzrt and the following compacted trie.

String Searching

July 1, 2015

3 / 25

erw

qo fdd cmvn

fed tovnd

ewn bog dkn qtkj

g nbo

bx

djd

hjk

dk

mnx

cv hyugfe

pov

Johannes Fischer and Paweł Gawrychowski

trqw

idk

xc wp we

ba sdk

Consider p = wewpxcwrehyzrt and the following compacted trie.

String Searching

July 1, 2015

3 / 25

erw

qo fdd cmvn

fed tovnd

ewn bog dkn qtkj

g nbo

bx

djd

hjk

dk

mnx

cv hyugfe

pov

Johannes Fischer and Paweł Gawrychowski

trqw

idk

xc wp we

ba sdk

Consider p = wewpxcwrehyzrt and the following compacted trie.

String Searching

July 1, 2015

3 / 25

erw

qo fdd cmvn

fed tovnd

ewn bog dkn qtkj

g nbo

vbx

djd

hjk

dk

mnx

c hy ugfe

pov

Johannes Fischer and Paweł Gawrychowski

trqw

idk

xc wp we

ba sdk

Consider p = wewpxcwrehy zrt and the following compacted trie.

String Searching

July 1, 2015

3 / 25

Splitting an edge Given an edge, we want to split it into two parts by (possibly) creating a node, and adding a new edge outgoing from this middle node.

aka abr ra

dab Notice that this covers adding a new edge outgoing from an existing node. Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

4 / 25

Splitting an edge Given an edge, we want to split it into two parts by (possibly) creating a node, and adding a new edge outgoing from this middle node.

aka abr ra

dab

z y x

Notice that this covers adding a new edge outgoing from an existing node. Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

4 / 25

Static case Given a compacted trie, can we quickly construct a small structure which allows us to execute navigation queries efficiently?

Dynamic case Can we maintain a compacted trie so that: 1

the resulting structure is small,

2

we can execute navigation queries efficiently,

3

we can split any edge efficiently?

Parameters: the number of nodes in the compacted trie n, the size of the alphabet σ, and the length of the pattern m.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

5 / 25

Static case Given a compacted trie, can we quickly construct a small structure which allows us to execute navigation queries efficiently?

Dynamic case Can we maintain a compacted trie so that: 1

the resulting structure is small,

2

we can execute navigation queries efficiently,

3

we can split any edge efficiently?

Parameters: the number of nodes in the compacted trie n, the size of the alphabet σ, and the length of the pattern m.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

5 / 25

Static case Given a compacted trie, can we quickly construct a small structure which allows us to execute navigation queries efficiently?

Dynamic case Can we maintain a compacted trie so that: 1

the resulting structure is small,

2

we can execute navigation queries efficiently,

3

we can split any edge efficiently?

Parameters: the number of nodes in the compacted trie n, the size of the alphabet σ, and the length of the pattern m.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

5 / 25

Hashing For each node store a hash table mapping characters to the corresponding outgoing edges. Randomized!

Table Or, for each node store a table of size σ mapping characters to the corresponding outgoing edges. Space usage is nσ!

BST Or, for each node store a binary search tree mapping characters to the corresponding outgoing edges. Navigation query takes O(m log σ) time! Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

6 / 25

Hashing For each node store a hash table mapping characters to the corresponding outgoing edges. Randomized!

Table Or, for each node store a table of size σ mapping characters to the corresponding outgoing edges. Space usage is nσ!

BST Or, for each node store a binary search tree mapping characters to the corresponding outgoing edges. Navigation query takes O(m log σ) time! Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

6 / 25

Hashing For each node store a hash table mapping characters to the corresponding outgoing edges. Randomized!

Table Or, for each node store a table of size σ mapping characters to the corresponding outgoing edges. Space usage is nσ!

BST Or, for each node store a binary search tree mapping characters to the corresponding outgoing edges. Navigation query takes O(m log σ) time! Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

6 / 25

Hashing For each node store a hash table mapping characters to the corresponding outgoing edges. Randomized!

Table Or, for each node store a table of size σ mapping characters to the corresponding outgoing edges. Space usage is nσ!

BST Or, for each node store a binary search tree mapping characters to the corresponding outgoing edges. Navigation query takes O(m log σ) time! Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

6 / 25

Hashing For each node store a hash table mapping characters to the corresponding outgoing edges. Randomized!

Table Or, for each node store a table of size σ mapping characters to the corresponding outgoing edges. Space usage is nσ!

BST Or, for each node store a binary search tree mapping characters to the corresponding outgoing edges. Navigation query takes O(m log σ) time! Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

6 / 25

Hashing For each node store a hash table mapping characters to the corresponding outgoing edges. Randomized!

Table Or, for each node store a table of size σ mapping characters to the corresponding outgoing edges. Space usage is nσ!

BST Or, for each node store a binary search tree mapping characters to the corresponding outgoing edges. Navigation query takes O(m log σ) time! Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

6 / 25

Rules of the game: 1

the solution must be deterministic,

2

the space usage must be linear in n, irrespectively of σ,

3

bound on the update time must be worst-case.

Then it seems that navigation queries must necessarily take O(mf (σ)) time, for some function of σ, for instance f (σ) = log σ, or something better if we use a more sophisticated predecessor structure. Surprisingly, this is not true.

Suffix trays of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time, which can be construct in linear time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

7 / 25

Rules of the game: 1

the solution must be deterministic,

2

the space usage must be linear in n, irrespectively of σ,

3

bound on the update time must be worst-case.

Then it seems that navigation queries must necessarily take O(mf (σ)) time, for some function of σ, for instance f (σ) = log σ, or something better if we use a more sophisticated predecessor structure. Surprisingly, this is not true.

Suffix trays of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time, which can be construct in linear time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

7 / 25

Rules of the game: 1

the solution must be deterministic,

2

the space usage must be linear in n, irrespectively of σ,

3

bound on the update time must be worst-case.

Then it seems that navigation queries must necessarily take O(mf (σ)) time, for some function of σ, for instance f (σ) = log σ, or something better if we use a more sophisticated predecessor structure. Surprisingly, this is not true.

Suffix trays of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time, which can be construct in linear time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

7 / 25

Rules of the game: 1

the solution must be deterministic,

2

the space usage must be linear in n, irrespectively of σ,

3

bound on the update time must be worst-case.

Then it seems that navigation queries must necessarily take O(mf (σ)) time, for some function of σ, for instance f (σ) = log σ, or something better if we use a more sophisticated predecessor structure. Surprisingly, this is not true.

Suffix trays of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time, which can be construct in linear time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

7 / 25

Rules of the game: 1

the solution must be deterministic,

2

the space usage must be linear in n, irrespectively of σ,

3

bound on the update time must be worst-case.

Then it seems that navigation queries must necessarily take O(mf (σ)) time, for some function of σ, for instance f (σ) = log σ, or something better if we use a more sophisticated predecessor structure. Surprisingly, this is not true.

Suffix trays of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time, which can be construct in linear time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

7 / 25

Rules of the game: 1

the solution must be deterministic,

2

the space usage must be linear in n, irrespectively of σ,

3

bound on the update time must be worst-case.

Then it seems that navigation queries must necessarily take O(mf (σ)) time, for some function of σ, for instance f (σ) = log σ, or something better if we use a more sophisticated predecessor structure. Surprisingly, this is not true.

Suffix trays of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time, which can be construct in linear time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

7 / 25

What about the updates?

Suffix trists of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time and splitting edges in O(log σ).

Application to text indexing Consider a suffix tree of a text. After prepending a letter, one edge should be split. It is easy to locate it in amortized O(1) time, but getting a sublinear worst-case bound is not trivial!

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

8 / 25

What about the updates?

Suffix trists of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time and splitting edges in O(log σ).

Application to text indexing Consider a suffix tree of a text. After prepending a letter, one edge should be split. It is easy to locate it in amortized O(1) time, but getting a sublinear worst-case bound is not trivial!

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

8 / 25

What about the updates?

Suffix trists of Cole, Kopelowitz, and Lewenstein ICALP’06 There exists a deterministic linear-size structure supporting navigation in O(m + log σ) time and splitting edges in O(log σ).

Application to text indexing Consider a suffix tree of a text. After prepending a letter, one edge should be split. It is easy to locate it in amortized O(1) time, but getting a sublinear worst-case bound is not trivial!

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

8 / 25

Suffix tree oracle of Amir, Kopelowitz, Lewenstein, and Lewenstein SPIRE’05 There exists a suffix tree oracle which locates the edge in O(log n) time.

Suffix tree oracle of Breslauer and Italiano SPIRE’11 If σ = O(1), there exists a suffix tree oracle which locates the edge in O(log log n) time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

9 / 25

Suffix tree oracle of Amir, Kopelowitz, Lewenstein, and Lewenstein SPIRE’05 There exists a suffix tree oracle which locates the edge in O(log n) time.

Suffix tree oracle of Breslauer and Italiano SPIRE’11 If σ = O(1), there exists a suffix tree oracle which locates the edge in O(log log n) time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

9 / 25

In the Word RAM model, are these O(m + log σ) and O(log σ) bounds the best possible?

Andersson and Thorup SODA’01 There exists navigation q a deterministic linear-size structure supporting q in O(m +

log n log log n )

time and splitting edges in O(

log n log log n ).

Are these bounds are the best possible? Yes if σ is unbounded in terms of n, and navigation queries actually give us the predecessor of the string.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

10 / 25

In the Word RAM model, are these O(m + log σ) and O(log σ) bounds the best possible?

Andersson and Thorup SODA’01 There exists navigation q a deterministic linear-size structure supporting q in O(m +

log n log log n )

time and splitting edges in O(

log n log log n ).

Are these bounds are the best possible? Yes if σ is unbounded in terms of n, and navigation queries actually give us the predecessor of the string.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

10 / 25

In the Word RAM model, are these O(m + log σ) and O(log σ) bounds the best possible?

Andersson and Thorup SODA’01 There exists navigation q a deterministic linear-size structure supporting q in O(m +

log n log log n )

time and splitting edges in O(

log n log log n ).

Are these bounds are the best possible? Yes if σ is unbounded in terms of n, and navigation queries actually give us the predecessor of the string.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

10 / 25

But what if σ is non-constant, yet (significantly) smaller than n?

This paper There exists a static deterministic linear-size structure supporting navigation in O(m + log log σ) time, which can be constructed in linear time.

This paper There exists a deterministic linear-size structure supporting navigation log2 log σ log2 log σ in O(m + log log log σ ) time and splitting edges in O( log log log σ ).

Full version of the paper A better suffix tree oracle to locate the edge in O(log log n + time.

Johannes Fischer and Paweł Gawrychowski

String Searching

log2 log σ log log log σ )

July 1, 2015

11 / 25

But what if σ is non-constant, yet (significantly) smaller than n?

This paper There exists a static deterministic linear-size structure supporting navigation in O(m + log log σ) time, which can be constructed in linear time.

This paper There exists a deterministic linear-size structure supporting navigation log2 log σ log2 log σ in O(m + log log log σ ) time and splitting edges in O( log log log σ ).

Full version of the paper A better suffix tree oracle to locate the edge in O(log log n + time.

Johannes Fischer and Paweł Gawrychowski

String Searching

log2 log σ log log log σ )

July 1, 2015

11 / 25

But what if σ is non-constant, yet (significantly) smaller than n?

This paper There exists a static deterministic linear-size structure supporting navigation in O(m + log log σ) time, which can be constructed in linear time.

This paper There exists a deterministic linear-size structure supporting navigation log2 log σ log2 log σ in O(m + log log log σ ) time and splitting edges in O( log log log σ ).

Full version of the paper A better suffix tree oracle to locate the edge in O(log log n + time.

Johannes Fischer and Paweł Gawrychowski

String Searching

log2 log σ log log log σ )

July 1, 2015

11 / 25

But what if σ is non-constant, yet (significantly) smaller than n?

This paper There exists a static deterministic linear-size structure supporting navigation in O(m + log log σ) time, which can be constructed in linear time.

This paper There exists a deterministic linear-size structure supporting navigation log2 log σ log2 log σ in O(m + log log log σ ) time and splitting edges in O( log log log σ ).

Full version of the paper A better suffix tree oracle to locate the edge in O(log log n + time.

Johannes Fischer and Paweł Gawrychowski

String Searching

log2 log σ log log log σ )

July 1, 2015

11 / 25

To construct a static deterministic linear-size structure, we could simply to try to find a perfect hashing function storing pairs (node, character ).

Ruži´c ICALP’08 A static linear-size constant-access dictionary on a set of k keys can be deterministically constructed in time O(k log2 log k ). Hence we immediately get a static deterministic structure which can be construct in close-to-linear time. Can we do better?

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

12 / 25

To construct a static deterministic linear-size structure, we could simply to try to find a perfect hashing function storing pairs (node, character ).

Ruži´c ICALP’08 A static linear-size constant-access dictionary on a set of k keys can be deterministically constructed in time O(k log2 log k ). Hence we immediately get a static deterministic structure which can be construct in close-to-linear time. Can we do better?

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

12 / 25

We store the edges outgoing from v in a few different ways depending on the size of the subtree rooted at v .

Heavy nodes A node is heavy if its subtree contains at least s = Θ(log2 log σ) leaves, and otherwise light. Furthermore, a heavy node is branching if it has more than one heavy child.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

13 / 25

heavy light

branching nonbranching pv heavy leaf

Johannes Fischer and Paweł Gawrychowski

String Searching

v

July 1, 2015

14 / 25

We classify edges into three types, and deal with each type separately: 1

from (any) branching node to a light node,

2

from a nonbranching heavy node to (any) heavy node,

3

from a branching heavy node to (any) heavy node.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

15 / 25

We classify edges into three types, and deal with each type separately: 1

from (any) branching node to a light node,

2

from a nonbranching heavy node to (any) heavy node,

3

from a branching heavy node to (any) heavy node.

At most one such edge per node, can be stored separately.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

15 / 25

We classify edges into three types, and deal with each type separately: 1

from (any) branching node to a light node,

2

from a nonbranching heavy node to (any) heavy node,

3

from a branching heavy node to (any) heavy node.

The total number of such edges is just ns , hence we can afford the super-linear construction time. More precisely, we compute the perfect hashing function for each such node separately in O(k log2 log k ) = O(k log2 log σ) = O(ks) time, which takes O( ns s) = O(n) time in total.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

15 / 25

We classify edges into three types, and deal with each type separately: 1

from (any) branching node to a light node,

2

from a nonbranching heavy node to (any) heavy node,

3

from a branching heavy node to (any) heavy node.

We store all such edges in a predecessor structure. By combining perfect hashing result and Willard’s x-fast trees, there exists a linear-size predecessor structure with O(log log σ) query time, which can be constructed in linear time.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

15 / 25

Observe that any navigation query traverses an edge of type (1) at most once, hence we pay O(log log σ) just once (so far). But what happens when we reach a light node? Each light node contains at most s leaves. We can execute a binary search over those leaves using the suffix array trick, namely in each step we achieve at least one of the following: 1

halve the current interval,

2

consume one character from the pattern.

Hence in O(m + log s) time we can locate the predecessor of the pattern among all leaves, and the search actually computes the longest prefix of the pattern which is a prefix of a string corresponding to some leaf.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

16 / 25

Observe that any navigation query traverses an edge of type (1) at most once, hence we pay O(log log σ) just once (so far). But what happens when we reach a light node? Each light node contains at most s leaves. We can execute a binary search over those leaves using the suffix array trick, namely in each step we achieve at least one of the following: 1

halve the current interval,

2

consume one character from the pattern.

Hence in O(m + log s) time we can locate the predecessor of the pattern among all leaves, and the search actually computes the longest prefix of the pattern which is a prefix of a string corresponding to some leaf.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

16 / 25

Observe that any navigation query traverses an edge of type (1) at most once, hence we pay O(log log σ) just once (so far). But what happens when we reach a light node? Each light node contains at most s leaves. We can execute a binary search over those leaves using the suffix array trick, namely in each step we achieve at least one of the following: 1

halve the current interval,

2

consume one character from the pattern.

Hence in O(m + log s) time we can locate the predecessor of the pattern among all leaves, and the search actually computes the longest prefix of the pattern which is a prefix of a string corresponding to some leaf.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

16 / 25

The total time complexity for a query is O(m + log log σ + log s) = O(m + log log σ) and the total construction time is linear.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

17 / 25

Now let us consider the dynamic case.

Reduction The general case can be reduced to maintaining a collection of trees of size O(σ) each and linear total size, so that any update/query can be efficiently translated into an update/query into at most one smaller tree. From now on we assume that n = O(σ). Instead of the simple two-level scheme we need to partition the nodes into more groups.

Levels of nodes 3 `

Let f (`) = 2( 2 ) . We say that a node v is of level ` when the number of leaves in its subtree belongs to [f (`), 2f (` + 1)]. We will maintain an invariant that a level of v doesn’t exceed the level of its parent.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

18 / 25

Now let us consider the dynamic case.

Reduction The general case can be reduced to maintaining a collection of trees of size O(σ) each and linear total size, so that any update/query can be efficiently translated into an update/query into at most one smaller tree. From now on we assume that n = O(σ). Instead of the simple two-level scheme we need to partition the nodes into more groups.

Levels of nodes 3 `

Let f (`) = 2( 2 ) . We say that a node v is of level ` when the number of leaves in its subtree belongs to [f (`), 2f (` + 1)]. We will maintain an invariant that a level of v doesn’t exceed the level of its parent.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

18 / 25

Now let us consider the dynamic case.

Reduction The general case can be reduced to maintaining a collection of trees of size O(σ) each and linear total size, so that any update/query can be efficiently translated into an update/query into at most one smaller tree. From now on we assume that n = O(σ). Instead of the simple two-level scheme we need to partition the nodes into more groups.

Levels of nodes 3 `

Let f (`) = 2( 2 ) . We say that a node v is of level ` when the number of leaves in its subtree belongs to [f (`), 2f (` + 1)]. We will maintain an invariant that a level of v doesn’t exceed the level of its parent.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

18 / 25

Now let us consider the dynamic case.

Reduction The general case can be reduced to maintaining a collection of trees of size O(σ) each and linear total size, so that any update/query can be efficiently translated into an update/query into at most one smaller tree. From now on we assume that n = O(σ). Instead of the simple two-level scheme we need to partition the nodes into more groups.

Levels of nodes 3 `

Let f (`) = 2( 2 ) . We say that a node v is of level ` when the number of leaves in its subtree belongs to [f (`), 2f (` + 1)]. We will maintain an invariant that a level of v doesn’t exceed the level of its parent.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

18 / 25

Now, we classify the edges into two types: 1

from a node to a node of the same level,

2

from a node to a node of a smaller level,

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

19 / 25

Now, we classify the edges into two types: 1

from a node to a node of the same level,

2

from a node to a node of a smaller level,

Those edges are stored in a static dictionary with constant access time. We already know that such dictionary can be construct in close-to-linear time, which is enough because of the way we defined the levels. More precisely, it cannot happen too often that a level of a node increases.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

19 / 25

Now, we classify the edges into two types: 1

from a node to a node of the same level,

2

from a node to a node of a smaller level,

Those edges are stored in a dynamic dictionary structure. For this we develop a weighted variant of the exponential search trees of Andersson and Thorup.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

19 / 25

Even without the modification, the query complexity is log3 log σ O(m + log log log σ ). This is because there are at most t = Θ(log log σ) edges of type (2) on any path descending from the root.

wt

wi ∈ [f (i), 2f (i + 1)]

wt−1

wt−2 wt−3

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

20 / 25

Faster! The subsequent accesses to the dynamic dictionary structures are not completely independent.

Wexponential search trees There exists a linear-size dynamic structure storing a collection of n weighted elements from [1, U] with the following bounds: 1

2 3

W log log U predecessor search takes O(log log log w log log log U ), where W is the current total weight, and w is the weight of the predecessor,

inserting a new element of weight 1 takes O(log log W ),

increasing a weight of an element of weight w by 1 takes W O(log log log w ).

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

21 / 25

Telescoping

Now if we use this structure instead of the standard exponential search trees, the total complexity of all queries at nodes where we decrease the current level becomes: 0 X i=t−1

=

log

log wi+1 log log U log log U = log log wt log wi log log log U log log log U

log2 log U log log U log log U = log log log U log log log U

(ignoring the details necessary to show how to update the structures...)

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

22 / 25

Telescoping

Now if we use this structure instead of the standard exponential search trees, the total complexity of all queries at nodes where we decrease the current level becomes: 0 X i=t−1

=

log

log log U log wi+1 log log U = log log wt log log log U log wi log log log U

log2 log U log log U log log U = log log log U log log log U

(ignoring the details necessary to show how to update the structures...)

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

22 / 25

Telescoping

Now if we use this structure instead of the standard exponential search trees, the total complexity of all queries at nodes where we decrease the current level becomes: 0 X i=t−1

=

log

log log U log wi+1 log log U = log log wt log log log U log wi log log log U

log log U log2 log U log log U = log log log U log log log U

(ignoring the details necessary to show how to update the structures...)

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

22 / 25

Telescoping

Now if we use this structure instead of the standard exponential search trees, the total complexity of all queries at nodes where we decrease the current level becomes: 0 X i=t−1

=

log

log log U log wi+1 log log U = log log wt log log log U log wi log log log U

log log U log2 log U log log U = log log log U log log log U

(ignoring the details necessary to show how to update the structures...)

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

22 / 25

Wexponential search trees Imagine that each element of weight w is a fragment of such length, and draw all of them on a [1, W ] segment.

√ Then choose a set of roughly W evenly spaced splitters. Store them in a static predecessor structure, and recursively build a smaller √ wexponential search tree for each of the resulting roughly W subsets.

Beame and Fich STOC’90 A static predecessor search structure with O( logloglogloglogσ σ ) query time can be constructed in O(k 1+ ) time and space, where k is the number of elements. Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

23 / 25

Wexponential search trees Imagine that each element of weight w is a fragment of such length, and draw all of them on a [1, W ] segment.

√ Then choose a set of roughly W evenly spaced splitters. Store them in a static predecessor structure, and recursively build a smaller √ wexponential search tree for each of the resulting roughly W subsets.

Beame and Fich STOC’90 A static predecessor search structure with O( logloglogloglogσ σ ) query time can be constructed in O(k 1+ ) time and space, where k is the number of elements. Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

23 / 25

Wexponential search trees Imagine that each element of weight w is a fragment of such length, and draw all of them on a [1, W ] segment.

√ Then choose a set of roughly W evenly spaced splitters. Store them in a static predecessor structure, and recursively build a smaller √ wexponential search tree for each of the resulting roughly W subsets.

Beame and Fich STOC’90 A static predecessor search structure with O( logloglogloglogσ σ ) query time can be constructed in O(k 1+ ) time and space, where k is the number of elements. Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

23 / 25

Wexponential search trees

Intuition: 1

the larger the weight, the sooner the element is stored in a static predecessor structure,

2

rebuilding a static predecessor structure is very costly, but happens only if there have been multiple insertions/increases.

Worst-case bounds Very complicated in Andresson&Thorup paper. We follow the simpler idea of Bender, Cole and Raman.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

24 / 25

Wexponential search trees

Intuition: 1

the larger the weight, the sooner the element is stored in a static predecessor structure,

2

rebuilding a static predecessor structure is very costly, but happens only if there have been multiple insertions/increases.

Worst-case bounds Very complicated in Andresson&Thorup paper. We follow the simpler idea of Bender, Cole and Raman.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

24 / 25

Wexponential search trees

Intuition: 1

the larger the weight, the sooner the element is stored in a static predecessor structure,

2

rebuilding a static predecessor structure is very costly, but happens only if there have been multiple insertions/increases.

Worst-case bounds Very complicated in Andresson&Thorup paper. We follow the simpler idea of Bender, Cole and Raman.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

24 / 25

Wexponential search trees

Intuition: 1

the larger the weight, the sooner the element is stored in a static predecessor structure,

2

rebuilding a static predecessor structure is very costly, but happens only if there have been multiple insertions/increases.

Worst-case bounds Very complicated in Andresson&Thorup paper. We follow the simpler idea of Bender, Cole and Raman.

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

24 / 25

Questions?

Johannes Fischer and Paweł Gawrychowski

String Searching

July 1, 2015

25 / 25

Alphabet-Dependent String Searching with ...

Jul 1, 2015 - Suffix tree oracle of Breslauer and Italiano SPIRE'11. If σ = O(1), there exists a suffix tree oracle which locates the edge in. O(log logn) time.

318KB Sizes 1 Downloads 263 Views

Recommend Documents

A Fast String Searching Algorithm
number of characters actually inspected (on the aver- age) decreases ...... buffer area in virtual memory. .... One telephone number contact for those in- terested ...

A Fast String Searching Algorithm
An algorithm is presented that searches for the location, "i," of the first occurrence of a character string, "'pat,'" in another string, "string." During the search operation, the characters of pat are matched starting with the last character of pat

Power searching with Google
a short course in being a great internet searcher. Daniel M. Russell. Senior Research Scientist ... Class 5. Checking your facts. Class 6. Putting it all together ...

(Re)Searching with Google
Expertise in a domain is usually limited to that domain. (Djakow, Petrowski &. Rudik, 1927). – IQ and general intelligence measures are non-predictors of expertise. (Taylor, 1975). – Expertise seems to be a function of pattern knowledge on the do

String Constraints with Concatenation and Transducers Solved ...
path-queries for graph databases [Barceló et al. 2013; Barceló et al. 2012], which has ...... important for our purpose. However, the crucial point is that all queries that a DPLL(T) solver asks ...... In USENIX Security Symposium. http://static.us

job searching with social media for dummies pdf
job searching with social media for dummies pdf. job searching with social media for dummies pdf. Open. Extract. Open with. Sign In. Main menu. Displaying job ...

Simultaneously Searching with Multiple Settings: An ...
each problem domain. Moreover ... could perfectly select the weight in S that performs best on ... to this problem in several domains (Valenzano et al. 2010).

Playing Detective with Full Text Searching Software
in the second part, subjects used the software to answer questions such as What brand of cigarette does ... PAT, a full text searching system constructed at the Univer- .... to one another if not already familiar, then the instructions for the second

Practical String Dictionary Compression Using String ...
Abstract—A string dictionary is a data structure for storing a set of strings that maps them ..... been proposed [9], [10] and implemented as open-source software, such as the .... ENWIKI: All page titles from English Wikipedia in. February 2015.9.

Database Searching Examples
access to thousands of online publications, including academic journals, periodicals, ... If you have entered a Boolean string, for example, you'll want to make sure you select ... PDF Full Text​ or ​HTML Full Text​ below an item's title, you k

Searching the Scriptures study
Duplication of copyrighted material for commercial use is strictly prohibited. Committed to Excellence in Communicating Biblical Truth and Its Application. S01.

Searching tracks - Irisa
Jul 16, 2009 - characterized by three data: (i) the probabilities OF the searched object ... tial position and velocity) and the n-dimensional vector. Xo = (q,~,.

Searching Tracks
The problem is to detect target tracks. ... Manuscript received April 17, 1999; revised March 29, 2000; ... relative to the global search effort (i.e., for the entire.

Searching the Scriptures study
reminds us that God still uses imperfect people. • We are stabilized when we have similar experiences. Think of Abraham and. Sarah's heartache through years of infertility, their grief over moving away from their homeland, or their intense family c

Searching the Scriptures study
Duplication of copyrighted material for commercial use is strictly prohibited. Committed to Excellence in Communicating Biblical Truth and Its Application. S22.

Searching the Scriptures study
(Genesis 22:5, emphasis added). Don't miss the implication of the little pronoun we: “we will come right back.” Somehow, Abraham knew. Isaac would return with him. What trust! STUDY. ABRAHAM: ONE NOMAD'S AMAZING. JOURNEY OF FAITH. Portrait of a H

Greene, The Future of String Theory, A Conversation with Brian ...
Greene, The Future of String Theory, A Conversation with Brian Greene.pdf. Greene, The Future of String Theory, A Conversation with Brian Greene.pdf. Open.

INTRODUCTION to STRING FIELD THEORY
http://insti.physics.sunysb.edu/˜siegel/plan.html ... 1.3. Aspects. 4. 1.4. Outline. 6. 2. General light cone. 2.1. Actions. 8. 2.2. Conformal algebra. 10. 2.3. Poincaré ...

Searching for Activation Functions - arXiv
Oct 27, 2017 - Practically, Swish can be implemented with a single line code change in most deep learning libraries, such as TensorFlow (Abadi et al., 2016) (e.g., x * tf.sigmoid(beta * x) or tf.nn.swish(x) if using a version of TensorFlow released a

String Orchestra Competencies.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.