Unsupervised deep clustering for semantic object retrieval Steven Hickson, Anelia Angelova, Irfan Essa, Rahul Sukthankar

Google Brain

Motivation Observe motion and extract moving agents. These must be entities. i.e., full objects. Unsupervised object discovery to form semantic classes of objects.

Video credit Tinghui Zhou: https://people.eecs.berkeley.edu/~ tinghuiz/projects/SfMLearner/

We (almost) know how to do SFM (with deep nets) SFMLearner: T. Zhou et al. ’17 al. ’17

Unsupervised learning of depth and egomotion https://people.eecs.berkeley.edu/~tinghuiz/projects/SfMLearner/ https://youtu.be/RTFatijYcaU

SFMNet: S. Vijayanarasimhan et

Additionally, learning of motion masks.

Main idea You can extract moving objects which will be entities. We won’t know their class but will discover semantic affiliation. The goal is to (learn to) detect them in out-of-sample images.

Unsupervised! Clearly all these apply to weakly supervised or semi-supervised tasks.

This work Moving objects can be used to form an embedding.

Learn an object vs background discriminator.

This work Moving objects can be used to form an embedding.

Learn: object vs background Improve embedding by forcing objects to cluster.

Differential clustering to improve embedding Memory units

Clustering objective

w

2

LK=Σn mink[(xn-wk) ]

1 FC …..

w Min L= LK+αL2+βLC With additional L2 regularization and Lc is loss balancing the size of the clusters

k

Experiments: Cifar

Two classes from Cifar 10 Evaluation process uses the labels for visualization (above). The figures show accuracy per learned cluster as a function of time. Class dog

Class auto

Cluster 0

68.5%

17.9%

Cluster 1

31.5%

82.1%

We also tried contrastive loss : Hadsell et al.Since the task is hard, no obvious clusters were formed.

Experiments: The Cityscapes data Segmentation masks provided for 1/30s of the data. We use them here but idea is to use all unsupervised data.

From:https://www.cityscapes-dataset.com/examples/

Retrieval results: Cityscapes data Training: build foreground/background and clustering objective embedding Testing: cluster into several groups (known annotation for eval only)

Large imbalance of data. Data is also quite noisy.

Retrieval results: Cityscapes data Class 1 Class 2 Class 3

Retrieval results: Cityscapes data Class 1 Class 2 Class 3

Note: since data is very noisy, it is really hard to form clustering. E.g a bicycle may have a car in the background. A bicycle is likely to have a person on it.

Clustering results Comparison to the baseline embedding (i.e. when discriminating background vs object):

Classification accuracy 66%-69% when considering the 3 main classes: person, car,bicycle

Summary Can we retrieve semantically related objects from videos? Clustering is implemented in a DNN with memory units. Experiments with Cityscapes dataset for moving objects. Retrieval of meaningful classes. Future: This is a very challenging task (class overlap) Base embedding is also based on noisy data Suggestions for datasets/embeddings, where to try the approach.

Thank you! Questions? [email protected]

Unsupervised deep clustering for semantic ... - Research at Google

Experiments: Cifar. We also tried contrastive loss : Hadsell et al.Since the task is hard, no obvious clusters were formed. Two classes from Cifar 10. Evaluation process uses the labels for visualization (above). The figures show accuracy per learned cluster as a function of time. Class dog. Class auto. Cluster 0. 68.5%. 17.9%.

2MB Sizes 3 Downloads 129 Views

Recommend Documents

Unsupervised deep clustering for semantic ... - Research at Google
You can extract moving objects which will be entities. We won't know their class but will discover semantic affiliation. The goal is to (learn to) detect them in out-of-sample images. Unsupervised! Clearly all these apply to weakly supervised or semi

UNSUPERVISED CONTEXT LEARNING FOR ... - Research at Google
grams. If an n-gram doesn't appear very often in the training ... for training effective biasing models using far less data than ..... We also described how to auto-.

Parallel Algorithms for Unsupervised Tagging - Research at Google
ios (for example, Bayesian inference methods) and in general for scalable techniques where the goal is to perform inference on the same data for which one.

Why does Unsupervised Pre-training Help Deep ... - Semantic Scholar
such as Deep Belief Networks and stacks of auto-encoder variants, with impressive results .... of attraction of the dynamics of learning, and that early on small perturbations allow to ...... Almost optimal lower bounds for small depth circuits.

Achieving anonymity via clustering - Research at Google
[email protected]; S. Khuller, Computer Science Department, Unversity of Maryland, .... have at least r points.1 Publishing the cluster centers instead of the individual ... with a maximum of 1000 miles, while the attribute age may differ by a

Parallel Spectral Clustering - Research at Google
a large document dataset of 193, 844 data instances and a large photo ... data instances (denoted as n) is large, spectral clustering encounters a quadratic.

Improving semantic topic clustering for search ... Research
[6] L. Hong and B. D. Davison. Empirical study of topic modeling in Twitter. In Proceedings of the First Work- shop on Social Media Analytics, pages 80 88. ACM,.

Unsupervised Feature Selection for Biomarker ... - Semantic Scholar
Feature selection and weighting do both refer to the process of characterizing the relevance of components in fixed-dimensional ..... not assigned.no ontology.

Unsupervised Translation Sense Clustering - John DeNero
the synonym groups of WordNet R (Miller, 1995).1 ..... Their analysis shows that of the wide range of met- rics, only BCubed ..... Data-driven semantic anal-.

Unsupervised Translation Sense Clustering - John DeNero
large monolingual and parallel corpora using ..... Their analysis shows that of the wide range of met- rics, only ..... Data-driven semantic anal- ... Mining. Lin Sun and Anna Korhonen. 2011. Hierarchical verb clustering using graph factorization.

Semantic-Shift for Unsupervised Object Detection - CiteSeerX
notated images for constructing a supervised image under- standing system. .... the same way as in learning, but now keeping the factors. P(wj|zk) ... sponds to the foreground object as zFG and call it the fore- ..... In European Conference on.

Learning with Deep Cascades - Research at Google
based on feature monomials of degree k, or polynomial functions of degree k, ... on finding the best trade-off between computational cost and classification accu-.