Bleve

Text Indexing for Go 1 February 2015 Marty Schoch

Say What?

blev-ee

bih-leev

1

Marty Schoch

NoSQL Document Database Official Go SDK Projects Using Go N1QL Query Language Secondary Indexing Cross Data-Center Replication 2

Why?

Lucene/Solr/Elasticsearch are awesome Could we build 50% of Lucene's text analysis, combine it with off-the-shelf KV stores and get something interesting?

3

Bleve Core Ideas Text Analysis Pipeline We only have to build common core Users customize for domain/language through interfaces Pluggable KV storage No custom file format Plug-in Bolt, LevelDB, ForestDB, etc Search Make term search work Almost everything else built on top of that...

4

What is Search?

Simple Search

6

Advanced Search

7

Search Results

8

Faceted Search

9

Getting Started

Install bleve

go get github.com/blevesearch/bleve/...

11

Import 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

import "github.com/blevesearch/bleve" type Person struct { Name string } func main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("people.bleve", mapping) if err != nil { log.Fatal(err) } person := Person{"Marty Schoch"} err = index.Index("m1", person) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") }

12

Data Model 10 11 12 13 14 15 16

import "github.com/blevesearch/bleve" type Person struct { Name string } func main() {

17 18 19 20 21 22 23 24 25 26 27 28 29 }

mapping := bleve.NewIndexMapping() index, err := bleve.New("people.bleve", mapping) if err != nil { log.Fatal(err) } person := Person{"Marty Schoch"} err = index.Index("m1", person) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document")

13

Index Mapping 10 import "github.com/blevesearch/bleve" 11 12 type Person struct { 13 Name string 14 } 15 16 func main() { 17 mapping := bleve.NewIndexMapping() 18 index, err := bleve.New("people.bleve", mapping) 19 if err != nil { 20 log.Fatal(err) 21 } 22 23 person := Person{"Marty Schoch"} 24 err = index.Index("m1", person) 25 if err != nil { 26 log.Fatal(err) 27 } 28 fmt.Println("Indexed Document") 29 } 14

Create a New Index 10 import "github.com/blevesearch/bleve" 11 12 type Person struct { 13 Name string 14 } 15 16 func main() { 17 mapping := bleve.NewIndexMapping() 18 index, err := bleve.New("people.bleve", mapping) 19 if err != nil { 20 log.Fatal(err) 21 } 22 23 person := Person{"Marty Schoch"} 24 err = index.Index("m1", person) 25 if err != nil { 26 log.Fatal(err) 27 } 28 fmt.Println("Indexed Document") 29 } 15

Index Data 10 import "github.com/blevesearch/bleve" 11 12 type Person struct { 13 Name string 14 } 15 16 func main() { 17 mapping := bleve.NewIndexMapping() 18 index, err := bleve.New("people.bleve", mapping) 19 if err != nil { 20 log.Fatal(err) 21 } 22 23 person := Person{"Marty Schoch"} 24 err = index.Index("m1", person) 25 if err != nil { 26 log.Fatal(err) 27 } 28 fmt.Println("Indexed Document") 29 }

Run

16

Open Index 15 func main() { 16 index, err := bleve.Open("people.bleve") 17 if err != nil { 18 19 20 21 22 23 24 25 26 27 28 }

log.Fatal(err) } query := bleve.NewTermQuery("marty") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result)

17

Build Query 15 func main() { 16 index, err := bleve.Open("people.bleve") 17 18 19 20 21 22 23 24 25 26 27 28 }

if err != nil { log.Fatal(err) } query := bleve.NewTermQuery("marty") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result)

18

Build Request 15 func main() { 16 17 18 19 20 21 22 23 24 25 26 27 28 }

index, err := bleve.Open("people.bleve") if err != nil { log.Fatal(err) } query := bleve.NewTermQuery("marty") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result)

19

Search 15 func main() { 16 index, err := bleve.Open("people.bleve") 17 if err != nil { 18 log.Fatal(err) 19 } 20 21 query := bleve.NewTermQuery("marty") 22 request := bleve.NewSearchRequest(query) 23 result, err := index.Search(request) 24 if err != nil { 25 log.Fatal(err) 26 } 27 fmt.Println(result) 28 }

Run

20

More Realistic Examples

FOSDEM Schedule of Events (iCal) BEGIN:VEVENT METHOD:PUBLISH UID:2839@[email protected] TZID:Europe-Brussels DTSTART:20150201T140000 DTEND:20150201T144500 SUMMARY:bleve - text indexing for Go DESCRIPTION: Nearly every application today has a search component. But delivering high quality search results requires a long list of text analysis and indexing techniques. With the bleve lib rary, we bring advanced text indexing and search to your Go applications. In this talk we'll exa mine how the bleve library brings powerful text indexing and search capabilities to Go applicatio ns. CLASS:PUBLIC STATUS:CONFIRMED CATEGORIES:Go URL:https:/fosdem.org/2015/schedule/event/bleve/ LOCATION:K.3.401 ATTENDEE;ROLE=REQ-PARTICIPANT;CUTYPE=INDIVIDUAL;CN="Marty Schoch":invalid:nomail END:VEVENT

22

FOSDEM Event Data Structure type Event struct { UID string `json:"uid"` Summary string `json:"summary"` Description string `json:"description"` Speaker string `json:"speaker"` Location string `json:"location"` Category string `json:"category"` URL string `json:"url"` Start time.Time `json:"start"` Duration float64 `json:"duration"` }

23

Index FOSDEM Events 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56

count := 0 batch := bleve.NewBatch() for event := range parseEvents() { batch.Index(event.UID, event) if batch.Size() > 100 { err := index.Batch(batch) if err != nil { log.Fatal(err) } count += batch.Size() batch = bleve.NewBatch() } } if batch.Size() > 0 { index.Batch(batch) if err != nil { log.Fatal(err) } count += batch.Size() } fmt.Printf("Indexed %d Events\n", count)

Run

24

Search FOSDEM Events 11 func main() { 12 13 index, err := bleve.Open("fosdem.bleve") 14 if err != nil { 15 log.Fatal(err) 16 } 17 18 q := bleve.NewTermQuery("bleve") 19 req := bleve.NewSearchRequest(q) 20 req.Highlight = bleve.NewHighlightWithStyle("html") 21 req.Fields = []string{"summary", "speaker"} 22 res, err := index.Search(req) 23 if err != nil { 24 log.Fatal(err) 25 } 26 fmt.Println(res) 27 }

Run

25

Phrase Search 11 func main() { 12 13 index, err := bleve.Open("fosdem.bleve") 14 if err != nil { 15 log.Fatal(err) 16 } 17 18 phrase := []string{"advanced", "text", "indexing"} 19 q := bleve.NewPhraseQuery(phrase, "description") 20 req := bleve.NewSearchRequest(q) 21 req.Highlight = bleve.NewHighlightWithStyle("html") 22 req.Fields = []string{"summary", "speaker"} 23 res, err := index.Search(req) 24 if err != nil { 25 log.Fatal(err) 26 } 27 fmt.Println(res) 28 }

Run

26

Combining Queries 11 func main() { 12 13 index, err := bleve.Open("fosdem.bleve") 14 if err != nil { 15 log.Fatal(err) 16 } 17 18 tq1 := bleve.NewTermQuery("text") 19 tq2 := bleve.NewTermQuery("search") 20 q := bleve.NewConjunctionQuery([]bleve.Query{tq1, tq2}) 21 req := bleve.NewSearchRequest(q) 22 req.Highlight = bleve.NewHighlightWithStyle("html") 23 req.Fields = []string{"summary", "speaker"} 24 res, err := index.Search(req) 25 if err != nil { 26 log.Fatal(err) 27 } 28 fmt.Println(res) 29 }

Run

27

Combining More Queries 11 func main() { 12 13 index, err := bleve.Open("fosdem.bleve") 14 if err != nil { 15 log.Fatal(err) 16 } 17 18 tq1 := bleve.NewTermQuery("text") 19 tq2 := bleve.NewTermQuery("search") 20 tq3 := bleve.NewTermQuery("believe") 21 q := bleve.NewConjunctionQuery( 22 []bleve.Query{tq1, tq2, tq3}) 23 req := bleve.NewSearchRequest(q) 24 req.Highlight = bleve.NewHighlightWithStyle("html") 25 req.Fields = []string{"summary", "speaker"} 26 res, err := index.Search(req) 27 28 29 30 31 }

if err != nil { log.Fatal(err) } fmt.Println(res) Run

28

Fuzzy Query 11 func main() { 12 13 index, err := bleve.Open("fosdem.bleve") 14 if err != nil { 15 log.Fatal(err) 16 } 17 18 tq1 := bleve.NewTermQuery("text") 19 tq2 := bleve.NewTermQuery("search") 20 tq3 := bleve.NewFuzzyQuery("believe") 21 q := bleve.NewConjunctionQuery( 22 []bleve.Query{tq1, tq2, tq3}) 23 req := bleve.NewSearchRequest(q) 24 req.Highlight = bleve.NewHighlightWithStyle("html") 25 req.Fields = []string{"summary", "speaker"} 26 27 28 29 30 31 }

res, err := index.Search(req) if err != nil { log.Fatal(err) } fmt.Println(res) Run

29

Numeric Range Query 11 func main() { 12 13 index, err := bleve.Open("fosdem.bleve") 14 if err != nil { 15 log.Fatal(err) 16 } 17 18 longTalk := 110.0 19 q := bleve.NewNumericRangeQuery(&longTalk, nil) 20 req := bleve.NewSearchRequest(q) 21 req.Highlight = bleve.NewHighlightWithStyle("html") 22 req.Fields = []string{"summary", "speaker", "duration"} 23 res, err := index.Search(req) 24 if err != nil { 25 26 27 28 }

log.Fatal(err) } fmt.Println(res) Run

30

Date Range Query 11 func main() { 12 13 index, err := bleve.Open("fosdem.bleve") 14 if err != nil { 15 log.Fatal(err) 16 } 17 18 lateSunday := "2015-02-01T17:30:00Z" 19 q := bleve.NewDateRangeQuery(&lateSunday, nil) 20 q.SetField("start") 21 req := bleve.NewSearchRequest(q) 22 req.Highlight = bleve.NewHighlightWithStyle("html") 23 req.Fields = []string{"summary", "speaker", "start"} 24 25 26 27 28 29 }

res, err := index.Search(req) if err != nil { log.Fatal(err) } fmt.Println(res) Run

31

Query Strings 11 func main() { 12 13 index, err := bleve.Open("fosdem.bleve") 14 if err != nil { 15 log.Fatal(err) 16 } 17 18 qString := `+description:text ` 19 qString += `summary:"text indexing" ` 20 qString += `summary:believe~2 ` 21 qString += `-description:lucene ` 22 qString += `duration:>30` 23 24 25 26 27 28 29 30 31 32 }

q := bleve.NewQueryStringQuery(qString) req := bleve.NewSearchRequest(q) req.Highlight = bleve.NewHighlightWithStyle("html") req.Fields = []string{"summary", "speaker", "description", "duration"} res, err := index.Search(req) if err != nil { log.Fatal(err) } fmt.Println(res) Run

32

Default Mapping vs Custom Mapping The default mapping has worked really well, but... 18 19 20 21 22 23 24 25

q := bleve.NewTermQuery("haystack") req := bleve.NewSearchRequest(q) req.Highlight = bleve.NewHighlightWithStyle("html") req.Fields = []string{"summary", "speaker"} res, err := index.Search(req) if err != nil { log.Fatal(err) }

26

fmt.Println(res)

Run

Earlier today we heard talk named "Finding Bad Needles in Worldwide Haystacks". Will we find it if we search for "haystack"?

33

Custom Mapping 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

enFieldMapping := bleve.NewTextFieldMapping() enFieldMapping.Analyzer = "en" eventMapping := bleve.NewDocumentMapping() eventMapping.AddFieldMappingsAt("summary", enFieldMapping) eventMapping.AddFieldMappingsAt("description", enFieldMapping) kwFieldMapping := bleve.NewTextFieldMapping() kwFieldMapping.Analyzer = "keyword" eventMapping.AddFieldMappingsAt("url", kwFieldMapping) eventMapping.AddFieldMappingsAt("category", kwFieldMapping) mapping := bleve.NewIndexMapping() mapping.DefaultMapping = eventMapping index, err := bleve.New("custom.bleve", mapping) if err != nil { log.Fatal(err) }

Run

34

Search Custom Mapping 18 19 20 21 22 23 24 25 26

q := bleve.NewTermQuery("haystack") req := bleve.NewSearchRequest(q) req.Highlight = bleve.NewHighlightWithStyle("html") req.Fields = []string{"summary", "speaker"} res, err := index.Search(req) if err != nil { log.Fatal(err) } fmt.Println(res)

Run

35

Analysis Wizard http://analysis.blevesearch.com

36

Precision vs Recall

Precision - are the returned results relevant? Recall - are the relevant results returned? 37

Faceted Search 11 func main() { 12 13 index, err := bleve.Open("custom.bleve") 14 if err != nil { 15 log.Fatal(err) 16 17 18 19 20 21 22 23 24 25 26 27 28 }

} q := bleve.NewMatchAllQuery() req := bleve.NewSearchRequest(q) req.Size = 0 req.AddFacet("categories", bleve.NewFacetRequest("category", 50)) res, err := index.Search(req) if err != nil { log.Fatal(err) } fmt.Println(res) Run

38

Optional HTTP Handlers import "github.com/blevesearch/bleve/http"

All major bleve operations mapped Assume JSON document bodies See bleve-explorer sample app https://github.com/blevesearch/bleve-explorer

39

Putting it All Together

FOSDEM Schedule Search http://fosdem.blevesearch.com

41

Performance

Micro Benchmarks Use Go benchmarks to test/compare small units of functionality in isolation. $ go test -bench=. -cpu=1,2,4 PASS BenchmarkBoltDBIndexing1Workers 1000 BenchmarkBoltDBIndexing1Workers-2 1000 BenchmarkBoltDBIndexing1Workers-4 500 BenchmarkBoltDBIndexing2Workers 500 BenchmarkBoltDBIndexing2Workers-2 1000 BenchmarkBoltDBIndexing2Workers-4 1000 BenchmarkBoltDBIndexing4Workers 1000 BenchmarkBoltDBIndexing4Workers-2 500 BenchmarkBoltDBIndexing4Workers-4 1000 BenchmarkBoltDBIndexing1Workers10Batch BenchmarkBoltDBIndexing1Workers10Batch-2 BenchmarkBoltDBIndexing1Workers10Batch-4 BenchmarkBoltDBIndexing2Workers10Batch BenchmarkBoltDBIndexing2Workers10Batch-2 BenchmarkBoltDBIndexing2Workers10Batch-4 BenchmarkBoltDBIndexing4Workers10Batch BenchmarkBoltDBIndexing4Workers10Batch-2 BenchmarkBoltDBIndexing4Workers10Batch-4 BenchmarkBoltDBIndexing1Workers100Batch BenchmarkBoltDBIndexing1Workers100Batch-2

3075988 ns/op 4004125 ns/op 4470435 ns/op 3148049 ns/op 3336268 ns/op 3461157 ns/op 3642691 ns/op 3130814 ns/op 3312662 ns/op 1 1350916284 ns/op 1 1493538328 ns/op 1 1256294099 ns/op 1 1393491792 ns/op 1 1271605176 ns/op 1 1343410709 ns/op 1 1393552247 ns/op 1 1144501920 ns/op 1 1311805564 ns/op 3 425731147 ns/op 3 439312970 ns/op

43

Bleve Bench Long(er) running test, index real text from Wikipedia. Measure stats periodicaly, compare across time. Does indexing performance degrade over time? How does search performance relate to number of matching documents?

44

Join the Community

Community

#bleve is small/quiet room, talk to us real time

Discuss your use-case Plan a feature implementation

Apache License v2.0, Report Issues, Submit Pull Requests 46

Contributors

47

Roadmap Result Sorting (other than score) Better Spell Suggest/Fuzzy Search Performance Prepare for 1.0 Release

48

Speaking GopherCon India February 2015 (Speaking) GopherCon July (Attending/Proposal to be Submitted) Your Conference/Meetup Here!

49

Thank you Marty Schoch [email protected] (mailto:[email protected]) http://github.com/blevesearch/bleve (http://github.com/blevesearch/bleve) @mschoch (http://twitter.com/mschoch) @blevesearch (http://twitter.com/blevesearch)

Text Indexing for Go 1 February 2015 - GitHub

Feb 1, 2015 - NewSearchRequest(q) req.Highlight=bleve.NewHighlightWithStyle("html") req.Fields=[]string{"summary","speaker"} res,err:=index.Search(req).

3MB Sizes 86 Downloads 331 Views

Recommend Documents

Full-Text Indexing and Search for Go 10 July 2015 - GitHub
Jul 10, 2015 - All major bleve operations mapped. Assume JSON document bodies. See bleve-explorer sample app https://github.com/blevesearch/bleve- ...

Volume 11, Issue 1 - February 2015
Mozambique, despite some new laws being introduced and institutions being ..... research project participant's right to privacy and the research community's .... Europe and Africa. I have promised that if elected chair, I would do my best to continue

Go Circuit - GitHub
OPS controls networking, execution, provisioning, failure response, etc. Elementary to deploy. Compile entire solution into one “smart” binary .... Worker services.

February 15, 2015
Feb 15, 2015 - loved ones on the list. Reading Food ... Follow us and stay updated with social media! ... season we are all encouraged to participate in our.

February 15, 2015
Feb 15, 2015 - moral habitus, or what we would call vice, lack conformity to doing what is good and thereby put us ... PARISH CENTER (PAC) IMMEDIATELY FOLLOWING .... Licensed Nursing 24 hours a day • Recreational Therapy ...

Go kit - GitHub
Go kit services must play nice With existing services. C. (l ... configuration. We'll work With whatever your organization prefers: flags, env vars, conf files - all OK.

Settlement calendar for February 2015 - NSE
Jan 5, 2015 - in Annexure 'G'. 8) The settlement schedule for ITP for SME (Settlement Type 'G') is placed in Annexure 'H'. ... Toll Free No. Fax No. Email id.

Go Quick Reference Go Quick Reference Go Quick Reference - GitHub
Structure - Package package mylib func CallMeFromOutside. Format verbs. Simpler than Cās. MOAR TABLE package anothermain import (. "fmt". ) func main() {.

Synopsis: Sakar Murli February 24, 2015 1. The ... -
Constantly remember Me alone. This is called the fire of yoga. ... the intellect, we break the wall of weak thoughts in one second, free ourselves from the trap of ...

Synopsis: Sakar Murli February 17, 2015 1 ... -
The Father says: Consider yourself to be a soul; know Me as I am, and constantly ... The Father says: When you belong to the Father, all illnesses will come out.

Getting Started with Go - GitHub
Jul 23, 2015 - The majority of my experience is in PHP. I ventured into Ruby, ... Compiled, Statically Typed, Concurrent, Imperative language. Originally ...

Text Search in Wade - GitHub
wr(t, q, d) = wm(t, d)[. 1. ∑ p∈ q(µ(p)). ] 1. q is a multiset. 2. µ(p) is the multiplicity of p in the multiset q. 3. ∑p∈ q(µ(p)) is the cardinality of the multiset q. This can be used to represent how much each term should affect the sco

February 2015.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. February 2015.

February 2015.pdf
Rogers to the vestry. Thanked retiring. vestry member Laura. Baillie for her service. And even worked. through some. technical difficulties. Towards the end ...

February 28, 2015
Agency Bullets. 2%. Certificates of Deposit. 9%. Bank .... ce & So a so Messao'ee Ness so eosis&eosy Seo. pRolling One Year Treasury asseWeighted Average ...

February 2015.pdf
Roses are red,. Violets are blue, ... Streets; road reconstruction and storm drain work in. one block of .... ENTERTAINMENT & NO HOST BAR. from 5:30 - 8:30 pm ...

February 2015.pdf
earth sprout vegetation, plants yielding seed, and. fruit trees bearing fruit in which is their seed, each. according to its kind, on the earth." And it was so. The earth ...

February 2015.pdf
Bank Name: Accounts Payable - Wachovia. NCB 02/26/2015 Frontline Technologies 100.252.345000.10 Amount owed for taxes for $1,139.00. invoices that ...

Sumner Evans February 16, 2017 - GitHub
Feb 16, 2017 - SSH is a cryptographic network protocol for operating network services securely over an unsecured network. • SSH clients allow you ... Page 5 ...

Vowpal Wabbit 2015 - GitHub
iPython Notebook for Learning to Search http://tinyurl.com/ ... VW learning to search. 9. Hal Daumé III ([email protected]). Training time versus test accuracy ...

1 - GitHub
are constantly accelerated by an electric field in the direction of the cathode, the num- ...... als, a standard fit software written at the University of Illinois [Beechem et al., 1991], ...... Technical report, International Computer Science Instit

1 - GitHub
Mar 4, 2002 - is now an integral part of computer science curricula. ...... students have one major department in which they are working OIl their degree.

Electronics For You - February 2015 IN.pdf
Page 2 of 145. Embedded. 50 An Introduction to Fault-Tolerant. Embedded Systems. 56 Building Connected Internet of. Things Widgets with Raspberry Pi. and Intel Galileo. 60 Watchdog Timer for Robust. Embedded Systems. 64 Vedic Mathematics in. Microcon

Electronics For You - February 2015 IN.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Electronics For ...