Parallel Approaches to the Pattern Matching Problem on the GPU Saman Ashkiani, Nina Amenta, John D. Owens University of California, Davis Introduction

Divide-and-Conquer RK (DRK)

Perfromance Evaluation

With an increasing amount of existing raw data (e.g., web and network traffic, DNA sequences, etc.), we face an increasing need to search raw data for patterns of interest. The pattern-matching problem is both an application itself and an intermediate tool for other applications. Applications such as web searching, network intrusion detection systems, and computational biology all require pattern-matching. In this work, we consider scenarios in which both the set of patterns to be searched for and the text consist of characters within a finite alphabet.

Another approach to perform the matching process in parallel is to assign different parts of the text to different processors and process each part individually. Then the final result is simply a union of results for each subproblem. In order to mainaint independence between different subproblems, we consider an overlap of m − 1 characters between consecutive subtexts:

We used an NVIDIA Tesla K40C GPU, as well as an Intel Xeon E5-2637 v2 3.50 GHz CPU, with 16 GB of DDR3 DRAM memory. All parallel methods are implemented by the authors and are run on the GPU and the sequential reference methods are all based on the smart library [2] and run on the CPU.

The main idea of the Rabin-Karp (RK) method is to hash X and all instances of Y [r] into a single entity (an integer or a 2-by-2 matrix [1]), and then compare the hashed values instead. In case hashed values (fingerprints) are matched, the two substrings are considered to be exactly the same with high probability. For any binary string X = x1 . . . xm ∈ {0, 1}m and prime number p, we define two class of finger prints:

2. GDRK1 (or GDRK2): 1st (or 2nd ) class. Each thread process a single subtext form the global memory. 3. LDRK1 (or LDRK2): 1st (or 2nd ) class. First a group of subtexts are stored into the local memory of each block, and then each thread process its own subtext. 4. HRK: 2nd class, a single subtext is stored into the local memory of each block, and then threads of the block process the subtext cooperatively.

RK algorithm: 1) choosing a random prime number p < mn2 . 2) computing the pattern’s fingerprint (Kp (X) or Fp (X)), 3) computing all the fingerprints for for substrings in the text (Kp (Y [r]) or Fp (Y [r])) for all 1 ≤ r ≤ n − m + 1). 4) comparing all fingerprints in 3 against the one in 2.

(2)

where for any binary value x ∈ {0, 1}, Ap (x) is defined as the left inverse of p Kp (x) modulo p (i.e., Ap (x)Kp (x) ≡ I where I is an identity matrix).

Cooperative RK (CRK) The second class of fingerprints can be computed in parallel using the parallel scan operation. Let K = {Kp (yi ) : 1 ≤ i ≤ n} and A = {Ap (yi ) : 1 ≤ i ≤ n} represent the set of all the fingerprints and their inverses for all the characters in the text successively. We define S and T as follows: S = {Kp (y1 ), Kp (y1 )Kp (y2 ), . . . , Kp (y1 ) . . . Kp (yn )},

(3)

T = {I, Ap (y1 ), Ap (y2 )Ap (y1 ), . . . , Ap (yn−m+1 ) . . . Ap (y1 )}.

(4)

S is an inclusive-scan over set K with right matrix multiplication modulo p as its associative operator. Similarly, T is an exclusive scan over A with left matrix multiplication modulo p as its associative operator. Then, it is clear that for 1 ≤ r ≤ n − m + 1: Kp (Y [r]) = Tr+1 Sr+m−1

10 -1 10 0

Fig. 1: Schematic view of all approaches

Thus, after computing two scans (i.e., computing S and T ), we can compute any required fingerprint by computing a single matrix multiplication.

10 4

10 5

10 -1

10 -2 10

5

10

6

10

7

10

10 0 10 0

8

10 1

10 2

10 3

10 4

10 5

Pattern Length (m)

Text Length (n)

Fig. 4: Avg. running time versus n; m = 64

Fig. 5: Speedup vs. serial RK n = 223

2

16

64

256

16384

1 GB 128 MB 16 MB

32.44 GB/s 32.15 GB/s 30.40 GB/s

29.85 GB/s 29.57 GB/s 27.47 GB/s

24.38 GB/s 24.18 GB/s 22.77 GB/s

8.78 GB/s 8.70 GB/s 8.34 GB/s

0.56 GB/s 0.55 GB/s 0.53 GB/s

Extensions Other Sequential methods: Fig. 6 shows the average running time of our methods, compared to the fastest sequential methods in the smart library [2]. If we define ρ = (No. of matches×m/n) to be the density of matches in the text, Fig. 7 shows that the performance of the HASH8 algorithm (as an example of data dependent matching methods) heavily depends on the number of matches in the text. The black curve denotes the superior methods chosen from our own RK methods. General characters: We can extend our methods to support general characters. For example, for 1st class of fingerprints, with alphabet Σ and σ =   dlog2 |Σ|e: p σ Fp (Y [r + 1]) ≡ 2 Fp (Y [r]) − 2σ(m−1) yr + yr+m+1 . (7) Average running time versus pattern length for various cases. The black curve denotes the superior methods chosen from our own RK methods.

26 25 24

LDRK1

10 3

10 2

10 2 Pure RK methods (GPU) HASH8

23

LDRK1 no mod

22

CRK

21 20 19 18

GDRK1 no mod

17

10 1

10 0

1

2

3

4

5

6

7

8

9 10 log2(m)

11

12

13

14

15

16

17

Fig. 2: Superior method for each input parameter pattern length (m) and text length (n)

General remarks: 1. The RK algorithm is independent of the content of the text or pattern. As a result, the runtime for any text or pattern with a fixed (m, n) will be identical. This is usually not true for other algorithms. 2. In order to find the optimum parameters for each method (Lopt , gopt , No. of threads per block, etc.) we implemented an auto-tuning procedure which is run initially. Its objective is to choose the optimum parameters for each problem size and based on the characteristics of the GPU.

[1] R. M. Karp and M. O. Rabin, Efficient randomized pattern-matching algorithms, IBM journal of Research and Development, 1987. [2] S. Faro and T. Lecroq, Smart: a string matching algorithm research tool, 2011, http://www.dmi.unict.it/ faro/smart/.

10 1

10 0

10 1

10 0

SRK SA AOSO2 HASH5 HASH8 Pure RK methods (GPU)

Invalid

16

FJS GRASPm SBNDM-BMH FS Pure RK methods (GPU)

10 2

References (5)

10 3

10 1

n

10 -1 10 0

10 1

10 2

10 3

10 4

10 -1 10 0

10 1

Pattern Length (m)

10 2

10 3

10 4

10 -1 10 0

10 1

Pattern Length (m)

10 2

10 3

10 4

Pattern Length (m)

Fig. 6: Fastest sequential methdos Fig. 7: HASH8 with variable Fig. 8: Fastest matching methods density 0 ≤ ρ ≤ 1 with 256 characters from botoom to top

Multi-pattern matching: Suppose we have a dictionary of patterns X = {X1 , . . . , Xd } defined in a general alphabet Σ. Our final objective is to find all possible instances of our dictionary in the text Y of length n. We assume here that all patterns have the same length |Xi | = m < n for all 1 ≤ i ≤ d. The procedure is as follows: we compute the fingerprints for every element of the dictionary, and then sort them. By using any DRK methods, we can use binary search to verify if a computed fingerprint exists in the dictionary or not. Processing rate (including the preprocessing time) of the Multi-LDRk1 method for different pattern lengths m and different dictionary sizes |X | are shown below: Number of patterns (|X |) includes preprocessing

= Kp (yr ) . . . Kp (yr+m−1 ),

10 2

10 0

Processing rate: Processing rate (in GB/s) of the fastest methods (a horizontal line in Fig. 2) for different values of text and pattern sizes. Pattern length (m)

= [Ap (yr−1 ) . . . Ap (y1 )] × [Kp (y1 ) . . . Kp (yr−1 )Kp (yr ) . . . Kp (yr+m−1 )]

10 1

10 1

Pattern Length (m)

Average running time (msec)

Kp (Y [r + 1]) ≡ Ap (yr )Kp (Y [r])Kp (yr+m+1 ),

10 0

10 2

27

log2(n)

p

(1)

Divide &

yr + yr+m+1 ,

5. G/L DRK1-no-mod: For all the 1st class DRK methods above, and small patterns m ≤ 64, we can avoid performing the modulo operation in (1) without the fear of overflows. Hybrid Conquer Cooperative Serial



SRK (CPU) CRK GDRK1-no-mod LDRK1-no-mod GDRK1 LDRK1 GDRK2 LDRK2 HRK

Fig. 3: Avg. running time versus m; n = 223

1. CRK: 2nd class, all threads cooperatively process the text from the global memory.

Serial RK (SRK): For both classes of fingerprints, it is possible to sequentially update fingerprints instead of computing from scratch:

10 1

Speed-up

(6)

y(L−1)g . . . yLg yLg+1 . . . yn

Programmers express GPU programs as parallel threads that are grouped into blocks (virtualized cores). The memory hierarchy has three levels, ordered from fastest/smallest to slowest/largest: registers, local to each thread (up to 1 KB); locally shared memory, shared by threads within a block (up to 48 KB); and globally shared memory, available to all threads (12 GB). Considering the memory hierearchy, different classes of fingerprints, and by using the CRK, DRK or a combination of both, i.e. Hyrbid RK(HRK), we can have various implementations on the GPU:

Second class: Kp (X) ≡ K(x1 )K(x2 ) . . . K(xm ), where K(xi ) = K0 if xi = 0, and K(xi ) = K1 if xi = 1.     1 0 1 1 K0 = K1 = . 1 1 0 1

Fp (Y [r + 1]) ≡ 2 Fp (Y [r]) − 2

.

All parallel approaches on the GPU:

p

m−1

..

10 2

Each subtext has g = (n − m + 1)/L exclusive characters, plus m − 1 overlapped characters. In DRK, we process each subtext independently by using the Serial RK method.

First class: Fp (X) ≡ 2m−1 x1 + . . . 2xm−1 + xm

p

yg+1 . . . yg+m−1 . . . y2g y2g+1 . . . y2g+m−1

YL =

p

10 3

Average running time (msec)

Rabin-Karp method

10 3

10 2

Average running time (msec)

Let X = x1 . . . xm be a binary pattern of length m to be found in a binary text Y = y1 . . . yn of length n ≥ m. If Y [r] = yr yr+1 . . . yr+m−1 , the problem will be to find all indices r that Y [r] = X for 1 ≤ r ≤ n − m + 1.

Y2 = .. .

with fixed text length n or pattern length m.

Average running time (msec)

String Matching Problem

Y = y1 y2 . . . yg yg+1 . . . yg+m−1

Average running time (msec)

Our objective: finding the most efficient pattern matching method on a GPU given its text and pattern size.

Average running time:

1

m 16 64 256 512

32 8.28 GB/s 5.43 GB/s 3.75 GB/s 2.58 GB/s

256 5.08 GB/s 3.95 GB/s 2.56 GB/s 0.98 GB/s

1024 2.44 GB/s 2.20 GB/s 1.25 GB/s 0.65 GB/s

4096 0.91 GB/s 0.86 GB/s 0.45 GB/s 0.39 GB/s

10240 0.42 GB/s 0.41 GB/s 0.21 GB/s —

Parallel Approaches to the Pattern Matching Problem ...

the pattern's fingerprint (Kp(X) or Fp(X)), 3) computing all the fingerprints ... update fingerprints instead of computing from scratch: ... blocks (virtualized cores).

480KB Sizes 2 Downloads 227 Views

Recommend Documents

Pattern Matching
basis of the degree of linkage between expected and achieved outcomes. In light of this ... al scaling, and cluster analysis as well as unique graphic portrayals of the results .... Pattern match of program design to job-related outcomes. Expected.

Tree Pattern Matching to Subset Matching in Linear ...
'U"cdc f f There are only O ( ns ) mar k ed nodes#I with the property that all nodes in either the left subtree ofBI or the right subtree ofBI are unmar k ed; this is ...

Pattern-based approaches to semantic relation extraction
representational problems investigated by the AI community in the 1990s ..... See http://www.cs.utexas.edu/users/mfkb/related.html for a list of worldwide projects.

Pattern-based approaches to semantic relation ... - Semantic Scholar
assessment of semantic information that can be automatically extracted from machine readable dictionaries (MRDs). In fact, a large body of research has been ...

Eliminating Dependent Pattern Matching - Research at Google
so, we justify pattern matching as a language construct, in the style of ALF [13], without compromising ..... we first give our notion of data (and hence splitting) a firm basis. Definition 8 ...... Fred McBride. Computer Aided Manipulation of Symbol

Efficient randomized pattern-matching algorithms
the following string-matching problem: For a specified set. ((X(i), Y(i))) of pairs of strings, .... properties of our algorithms, even if the input data are chosen by an ...

biochemistry pattern matching .pdf
biochemistry pattern matching .pdf. biochemistry pattern matching .pdf. Open. Extract. Open with. Sign In. Main menu. Whoops! There was a problem previewing ...

Two approaches to solving a problem on GP.pdf
Sign in. Page. 1. /. 2. Loading… Page 1 of 2. Page 1 of 2. Eton Education Centre. Two approaches to solving a problem on geometric progression. By Wee WS ...

The Input Pattern Order Problem II: Evolution of ...
considered to be more reliable—rather than merely producing intermittent solutions— when the input pattern order is randomised during the evolution process ...

The Input Pattern Order Problem: Evolution of ...
platform [5], which features both combinatorial and registered logic as well as ... as 16 bit look-up table (LUT), shift register or random access memory (RAM).

The Input Pattern Order Problem II: Evolution of ...
this paper addresses the importance of the input pattern order problem .... is more suited for the particular problem at hand had a considerably positive impact on the performance of evolution in designing the desired circuit [11], [18]. The importan

Towards High-performance Pattern Matching on ... - Semantic Scholar
such as traffic classification, application identification and intrusion prevention. In this paper, we ..... OCTEON Software Developer Kit (Cavium SDK version 1.5):.

Optimization of Pattern Matching Algorithm for Memory Based ...
Dec 4, 2007 - widely adopted for string matching in [6][7][8][9][10] because the algorithm can ..... H. J. Jung, Z. K. Baker, and V. K. Prasanna. Performance of.

A Universal Online Caching Algorithm Based on Pattern Matching
We present a universal algorithm for the classical online problem of caching or ..... Call this the maximal suffix and let its length be Dn. 2. Take an α ..... Some Distribution-free Aspects of ... Compression Conference, 2000, 163-172. [21] J. Ziv 

Holistic Twig Joins: Optimal XML Pattern Matching
XML employs a tree-structured data model, and, naturally,. XML queries specify .... patterns is that intermediate result sizes can get very large, even when the input and ... This validates the analytical results demonstrat- ing the I/O and CPU ...

q-Gram Tetrahedral Ratio (qTR) for Approximate Pattern Matching
possible to create a table of aliases for domain- specific alphanumeric values, however, it is unlikely that all possible errors could be anticipated in advance. 2.

Optimization of Pattern Matching Algorithm for Memory Based ...
Dec 4, 2007 - accommodate the increasing number of attack patterns and meet ... omitted. States 4 and 8 are the final states indicating the matching of string ...

person identification by retina pattern matching
Dec 30, 2004 - gait, facial thermo-gram, signature, face, palm print, hand geometry, iris and ..... [3] R. C. Gonzalez and R. E. Woods, Digital Image. Processing.