Back-Off Language Model Compression Boulos Harb, Ciprian Chelba, Jeffrey Dean, Sanjay Ghemawat {harb,ciprianchelba,jeff,sanjay}@google.com

Back-Off Language Model Compression, Interspeech 2009 – p. 1

Outline Motivation: Language Model (LM) Size Matters Integer Trie LM Representation Techniques for LM Compaction: N-gram Map: Block Compression Probabilities and Back-off Weights: Quantization and Block Compression Experiments Conclusions and Future Work

Back-Off Language Model Compression, Interspeech 2009 – p. 2

How Big a Language Model?

Typical Voicesearch LM training setup is data rich: vocabulary size: 1 million words, OoV rate 0.57% training data: 230 billion words from google.com query logs, after text normalization for ASR Order 3 3 5

# n-grams pruning PPL n-gram hit-ratios 15M entropy 190 47/93/100 7.7B 1-1-1 132 97/99/100 12.7B 1-1-2-2-2 108 77/88/97/99/100

A lot of float numbers along with n-grams!

Back-Off Language Model Compression, Interspeech 2009 – p. 3

Is Bigger 1st Pass LM Better? YES!

Perplexity (left) and Word Error Rate (right) as a function of LM size 260

20.5

240

20

220

19.5

200

19

180

18.5

160

18

140

17.5

120 −3 10

−2

10

−1

10 LM size: # n−grams(B, log scale)

0

10

17 1 10

Back-Off Language Model Compression, Interspeech 2009 – p. 4

Integer Trie LM Representation

1-1 mapping between n-grams and dense integer range using integer trie: 2 vectors that concatenate, for each n-gram context: cummulative diversity count list of future words look-up time: O((n − 1) · log(V )), in practice much smaller once n-gram key is identified, lookup probability and back-off weight in 2 separate arrays

Back-Off Language Model Compression, Interspeech 2009 – p. 5

Integer Trie LM Compaction

Sequence of entries in vectors is far from memoryless. N-gram Map: block compression for both diversity and word vectors GroupVar: variable integer length per block RandomAccess: fixed integer length per block CompressedArray: a version of Huffman coding enhanced with simple operators Probabilities and Back-off Weights: linear quantization to 1 byte block compression of 4 byte bundles cast to int Back-Off Language Model Compression, Interspeech 2009 – p. 6

Experiments

Google Search by Voice LM: : 3-gram LM, 13.5 million n-grams 1.0/8.2/4.3 million 1/2/3-grams, respectively We measure: storage: representation rate, no. bytes/n-gram speed (relative to uncompressed): computed PPL on unseen test data

Back-Off Language Model Compression, Interspeech 2009 – p. 7

LM Representation Rate vs. Speed

Compression Technique None Quantized CMU 24b, Quantized GroupVar RandomAccess CompressedArray + logprob/bow arrays

Block Relative Bytes per Length Time n-gram — 1.0 13.2 — 1.0 8.1 — 1.0 5.8 8 1.4 6.3 64 1.9 4.8 256 3.4 4.6 8 1.5 6.2 64 1.8 4.6 256 3.0 4.6 8 2.3 5.0 64 5.6 3.2 256 16.4 3.1 256 19.0 2.6

Back-Off Language Model Compression, Interspeech 2009 – p. 8

LM Representation Rate vs. Speed

Google Search by Voice LM 9 GroupVar RandomAccess CompressedArray

Representation Rate (B/−ngram)

8

7

6

5

4

3

0

1

2

3

4 5 6 Time, Relative to Uncompressed

7

8

9

10

1 billion 3-grams: 4GB of RAM @acceptable lookup speed Back-Off Language Model Compression, Interspeech 2009 – p. 9

Conclusions can achieve 2.6 bytes/n-gram representation rate if speed is not a concern 4 bytes/n-gram at reasonable speed 1st pass LM using 1 billion n-grams is feasible, with excellent results in WER: 10% rel. reduction in WER over 13.5 million n-gram LM baseline

Back-Off Language Model Compression, Interspeech 2009 – p. 10

Future Work Integrate with reachable composition decoder at real-time factor close to 1.0: Allauzen, Riley, Schalkwyk: A Generalized Composition Algorithm for Weighted Finite-State Transducers Scale up to 10 billion n-grams (40-60GB)?

Back-Off Language Model Compression, Interspeech 2009 – p. 11

Back-Off Language Model Compression - Research at Google

How Big a Language Model? ... training data: 230 billion words from google.com query logs, after text ... storage: representation rate, no. bytes/n-gram.

65KB Sizes 4 Downloads 262 Views

Recommend Documents

On-Demand Language Model Interpolation for ... - Research at Google
Sep 30, 2010 - Google offers several speech features on the Android mobile operating system: .... Table 2: The 10 most popular voice input text fields and their.

Multi-Sentence Compression: Finding Shortest ... - Research at Google
sentence which we call multi-sentence compression and ... tax is not the only way to gauge word or phrase .... which occur more than once in the sentence; (3).

Full Resolution Image Compression with ... - Research at Google
This paper presents a set of full-resolution lossy image compression ..... Computing z1 does not require any masked convolution since the codes of the previous.

Sentence Compression by Deletion with LSTMs - Research at Google
In this set-up, online learn- ..... Alan Turing, known as the father of computer science, the codebreaker that helped .... Xu, K., J. Ba, R. Kiros, K. Cho, A. Courville,.

AUTOMATIC LANGUAGE IDENTIFICATION IN ... - Research at Google
this case, analysing the contents of the audio or video can be useful for better categorization. ... large-scale data set with 25000 music videos and 25 languages.

Action Language Hybrid AL - Research at Google
the idea of using a mathematical model of the agent's domain, created using a description in the action language AL [2] to find explanations for unexpected.

Language-independent Compound Splitting ... - Research at Google
trained using a support vector machine classifier. Al- fonseca et al. ..... 213M 42,365. 44,559 70,666 .... In A. Gelbukh, editor, Lecture Notes in Computer Sci-.