Unsupervised deep clustering for semantic object retrieval Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar

Google Brain

Motivation Observe motion and extract moving agents. These must be entities. i.e., full objects. Unsupervised object discovery to form semantic classes of objects.

Video credit Tinghui Zhou: https://people.eecs.berkeley.edu/~ tinghuiz/projects/SfMLearner/

We (almost) know how to do SFM (with deep nets) SFMLearner: T. Zhou et al. ’17 al. ’17

Unsupervised learning of depth and egomotion https://people.eecs.berkeley.edu/~tinghuiz/projects/SfMLearner/ https://youtu.be/RTFatijYcaU

SFMNet: S. Vijayanarasimhan et

Additionally, learning of motion masks.

Main idea You can extract moving objects which will be entities. We won’t know their class but will discover semantic affiliation. The goal is to (learn to) detect them in out-of-sample images.

Unsupervised! Clearly all these apply to weakly supervised or semi-supervised tasks.

This work Moving objects can be used to form an embedding.

Learn an object vs background discriminator.

This work Moving objects can be used to form an embedding.

Learn: object vs background Improve embedding by forcing objects to cluster.

Differential clustering to improve embedding Memory units

Clustering objective

w

2

LK=Σn mink[(xn-wk) ]

1 FC …..

w Min L= LK+αL2+βLC With additional L2 regularization and Lc is loss balancing the size of the clusters

k

Experiments: Cifar

Two classes from Cifar 10 Evaluation process uses the labels for visualization (above). The figures show accuracy per learned cluster as a function of time. Class dog

Class auto

Cluster 0

68.5%

17.9%

Cluster 1

31.5%

82.1%

We also tried contrastive loss : Hadsell et al.Since the task is hard, no obvious clusters were formed.

Experiments: The Cityscapes data Segmentation masks provided for 1/30s of the data. We use them here but idea is to use all unsupervised data.

From:https://www.cityscapes-dataset.com/examples/

Retrieval results: Cityscapes data Training: build foreground/background and clustering objective embedding Testing: cluster into several groups (known annotation for eval only)

Large imbalance of data. Data is also quite noisy.

Retrieval results: Cityscapes data Class 1 Class 2 Class 3

Retrieval results: Cityscapes data Class 1 Class 2 Class 3

Note: since data is very noisy, it is really hard to form clustering. E.g a bicycle may have a car in the background. A bicycle is likely to have a person on it.

Clustering results Comparison to the baseline embedding (i.e. when discriminating background vs object):

Classification accuracy 66%-69% when considering the 3 main classes: person, car,bicycle

Summary Can we retrieve semantically related objects from videos? Clustering is implemented in a DNN with memory units. Experiments with Cityscapes dataset for moving objects. Retrieval of meaningful classes. Future: This is a very challenging task (class overlap) Base embedding is also based on noisy data Suggestions for datasets/embeddings, where to try the approach.

Thank you! Questions? [email protected]

Unsupervised deep clustering for semantic ... - Research at Google

You can extract moving objects which will be entities. We won't know their class but will discover semantic affiliation. The goal is to (learn to) detect them in out-of-sample images. Unsupervised! Clearly all these apply to weakly supervised or semi-supervised tasks.

2MB Sizes 1 Downloads 377 Views

Recommend Documents

Unsupervised deep clustering for semantic ... - Research at Google
Experiments: Cifar. We also tried contrastive loss : Hadsell et al.Since the task is hard, no obvious clusters were formed. Two classes from Cifar 10. Evaluation process uses the labels for visualization (above). The figures show accuracy per learned

UNSUPERVISED LEARNING OF SEMANTIC ... - Research at Google
model evaluation with only a small fraction of the labeled data. This allows us to measure the utility of unlabeled data in reducing an- notation requirements for any sound event classification application where unlabeled data is plentiful. 4.1. Data

Improving semantic topic clustering for search ... - Research at Google
come a remarkable resource for valuable business insights. For instance ..... queries from Google organic search data in January 2016, yielding 10, 077 distinct ...

Why does Unsupervised Pre-training Help Deep ... - Research at Google
pre-training acts as a kind of network pre-conditioner, putting the parameter values in the appropriate ...... 7.6 Summary of Findings: Experiments 1-5. So far, the ...

UNSUPERVISED CONTEXT LEARNING FOR ... - Research at Google
grams. If an n-gram doesn't appear very often in the training ... for training effective biasing models using far less data than ..... We also described how to auto-.

Quantum Annealing for Clustering - Research at Google
been proposed as a novel alternative to SA (Kadowaki ... lowest energy in m states as the final solution. .... for σ = argminσ loss(X, σ), the energy function is de-.

DeViSE: A Deep Visual-Semantic Embedding ... - Research at Google
matches state-of-the-art performance on the 1000-class ImageNet object .... found 500- and 1,000-D embeddings to be a good compromise between training speed, ..... In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), ...

DeViSE: A Deep Visual-Semantic Embedding ... - Research at Google
sources – such as text data – both to train visual models and to constrain their pre ... image representation [3]; in parallel, a neural network language model [2] ..... In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.

Unsupervised Testing Strategies for ASR - Research at Google
Similarly, web-scale text cor- pora for estimating language models (LM) are often available online, and unsupervised recognition .... lated to cultural references, popular names, and businesses that are not obvious to everyone. The cultural and ...

Parallel Algorithms for Unsupervised Tagging - Research at Google
ios (for example, Bayesian inference methods) and in general for scalable techniques where the goal is to perform inference on the same data for which one.

Achieving anonymity via clustering - Research at Google
[email protected]; S. Khuller, Computer Science Department, Unversity of Maryland, .... have at least r points.1 Publishing the cluster centers instead of the individual ... with a maximum of 1000 miles, while the attribute age may differ by a

COMPARISON OF CLUSTERING ... - Research at Google
with 1000 web images, and comparing the exemplars chosen by clustering to the ... surprisingly good, the computational cost of the best cluster- ing approaches ...

Why does Unsupervised Pre-training Help Deep ... - Semantic Scholar
such as Deep Belief Networks and stacks of auto-encoder variants, with impressive results .... of attraction of the dynamics of learning, and that early on small perturbations allow to ...... Almost optimal lower bounds for small depth circuits.

Parallel Spectral Clustering - Research at Google
a large document dataset of 193, 844 data instances and a large photo ... data instances (denoted as n) is large, spectral clustering encounters a quadratic.

Improving semantic topic clustering for search ... Research
[6] L. Hong and B. D. Davison. Empirical study of topic modeling in Twitter. In Proceedings of the First Work- shop on Social Media Analytics, pages 80 88. ACM,.

Phrase Clustering for Discriminative Learning - Research at Google
Aug 7, 2009 - data, even if the data set is a relatively large one. (e.g., the Penn Treebank). While the labeled data is generally very costly to obtain, there is a ...

Unsupervised Feature Selection for Biomarker ... - Semantic Scholar
Feature selection and weighting do both refer to the process of characterizing the relevance of components in fixed-dimensional ..... not assigned.no ontology.

Semantic-Shift for Unsupervised Object Detection - CiteSeerX
notated images for constructing a supervised image under- standing system. .... the same way as in learning, but now keeping the factors. P(wj|zk) ... sponds to the foreground object as zFG and call it the fore- ..... In European Conference on.

Learning with Deep Cascades - Research at Google
based on feature monomials of degree k, or polynomial functions of degree k, ... on finding the best trade-off between computational cost and classification accu-.

Unsupervised Translation Sense Clustering - John DeNero
the synonym groups of WordNet R (Miller, 1995).1 ..... Their analysis shows that of the wide range of met- rics, only BCubed ..... Data-driven semantic anal-.

Unsupervised Translation Sense Clustering - John DeNero
large monolingual and parallel corpora using ..... Their analysis shows that of the wide range of met- rics, only ..... Data-driven semantic anal- ... Mining. Lin Sun and Anna Korhonen. 2011. Hierarchical verb clustering using graph factorization.

Tera-scale deep learning - Research at Google
The Trend of BigData .... Scaling up Deep Learning. Real data. Deep learning data ... Le, et al., Building high-‐level features using large-‐scale unsupervised ...

Multiframe Deep Neural Networks for Acoustic ... - Research at Google
windows going up to 400 ms. Given this very long temporal context, it is tempting to wonder whether one can run neural networks at a lower frame rate than the ...