Semantic Hashing - - P.PDFKUL.COM

Viewer
Transcript

Semantic Hashing Presented by : Ali Vashaee IFT725 Neural network With Hugo larochelle Hinton, Salakhutdinov 2009

Outline • • • • •

What is semantic hashing Unlabeled data Multilevel autoencoder Results Conclusion

Semantic hashing

Document retrieval • Word-count vector • If we computes document similarity directly in the word-count space, which can be slow for large vocabularies.

Document Retrieval • • • •

N= size of the document corpus V= latent variable size LSA: O(Vlog(N)) with kdtree Semantic Hashing : is not dependent on the size of the document corpus. And linear of the size of the short list that it will produce.

Deep auto-encoder

Word count vector

First layer RBM • ‘‘Constrained Poisson Model” that is used for modeling word-count vectors.

The constrained Poisson model

is the bias of the conditional Poisson model for word i, and bj is the bias of feature j.

RBM

Unrolling Word count vector

Word count vector

Fine Tuning • Conjugate gradient for fine-tuning. • Reconstructing the code words. • Replacing the stochastic binary values of hidden units with probability. • Code layer close to binary . • Deterministic noise to force fine tuning to find the binary codes in top layer.

Activities of 128 bit code

• After fine tuning the array codes are threshold to get binary values.

Training • Pretraining: Mini-baches with 100 cases. • Greedy pretrained with 50 epochs. • The weights were initialized with small random values sampled from a zero-mean normal distribution with variance 0.01. • For fine-tuning : conjugate gradients. • After fine-tuning, the codes were thresholded to produce binary code vectors.

Hashing • Hashing : a shortlist of similar documents in a time that is independent of the size of the document collection and linear.

Semantic Hashing Gaurman et al

Screen clipping taken: 11/28/2012, 6:36 PM

Hamming distance • Very little memory is needed for codes. • Fast to find the hamming distance between binary codes.

Experimental Results

• Using a 2000 word counts vector. • Two text datasets: 20-newsgroups and Reuters corpus Volume I (RCV1-v2). • 2000-500-500 (128 ,30 )

Class structure of documents • Semantic hashing with 128 bit code:

20 Newsgroups

• 3.6 ms to search through 1 million documents using 128-bit codes. • The same search takes 72 ms for 128-dimensional LSA

Reuters RCV2

Image retrieval Torralba et al

• nearest neighbors from a database of 12,900,000 images

Thank you

Semantic Hashing -

Deep auto-encoder. Word count vector ... Training. â¢ Pretraining: Mini-baches with 100 cases. â¢ Greedy pretrained with 50 epochs. â¢ The weights were initialized ...

Download PDF

876KB Sizes 10 Downloads 249 Views

Report

Semantic Hashing -

Discrete Graph Hashing - Semantic Scholar

An Improved Version of Cuckoo Hashing - Semantic Scholar

Semi-Supervised Hashing for Large Scale Search - Semantic Scholar

Sparse Semantic Hashing for Efficient Large Scale ...

An Improved Version of Cuckoo Hashing: Average ... - Semantic Scholar

Complementary Projection Hashing - CiteSeerX

Corruption-Localizing Hashing

Hashing with Graphs - Sanjiv Kumar

Discrete Graph Hashing - Sanjiv Kumar

Rapid Face Recognition Using Hashing

Rapid Face Recognition Using Hashing

Backyard Cuckoo Hashing: Constant Worst-Case ... - CiteSeerX

Scalable Heterogeneous Translated Hashing

Optimized Spatial Hashing for Collision Detection of ...

Sequential Projection Learning for Hashing with ... - Sanjiv Kumar

A New Hashing and Caching Approach for Reducing ...

SPEC Hashing: Similarity Preserving algorithm for Entropy-based ...

Compact Hyperplane Hashing with Bilinear ... - Research at Google

Linear Cross-Modal Hashing for Efficient Multimedia ...

Backyard Cuckoo Hashing: Constant Worst-Case ...

A New Hashing and Caching Approach for ...

Semantic Hashing -

Semantic Hashing -

Discrete Graph Hashing - Semantic Scholar

An Improved Version of Cuckoo Hashing - Semantic Scholar

Semi-Supervised Hashing for Large Scale Search - Semantic Scholar

Sparse Semantic Hashing for Efficient Large Scale ...

An Improved Version of Cuckoo Hashing: Average ... - Semantic Scholar

Complementary Projection Hashing - CiteSeerX

Corruption-Localizing Hashing

Hashing with Graphs - Sanjiv Kumar

Discrete Graph Hashing - Sanjiv Kumar

Rapid Face Recognition Using Hashing

Rapid Face Recognition Using Hashing

Backyard Cuckoo Hashing: Constant Worst-Case ... - CiteSeerX

Scalable Heterogeneous Translated Hashing

Optimized Spatial Hashing for Collision Detection of ...

Sequential Projection Learning for Hashing with ... - Sanjiv Kumar

A New Hashing and Caching Approach for Reducing ...

SPEC Hashing: Similarity Preserving algorithm for Entropy-based ...

Compact Hyperplane Hashing with Bilinear ... - Research at Google

Linear Cross-Modal Hashing for Efficient Multimedia ...

Backyard Cuckoo Hashing: Constant Worst-Case ...

A New Hashing and Caching Approach for ...

Semantic Hashing -

Recommend Documents