Improved deep metric learning with multi-class N-pair loss objective Kihyuk Sohn1 and Wendy Shang2 1NEC

Overview f+ f

+ fN-2

...

.. f ..

f-

2 + fN-1

f1 f2+

f4+ f+

DNN

f1

f+N-1

N-2

f

N-pair loss: deep metric learning with multiple negatives f2

f1

f1+ f2+

f1f2-

f1 f2

f1+ f2+

f

+ 1

f1,-

f1,-

1

2

f2,-

f2,-

1

2

f1,N-1

f1

f1+

f2,N-1

f2

f2+

f3

f3+

fN

fN+

f2+

f3+

f4+ f-

f3+

f2+ fN

q Deep metric learning: learning a distance metric via deep learning. q Existing frameworks utilize only weak supervision (e.g., same/not same)

fN+

fN-

(a) triplet

fN

fN+

fN,1

fN,2

(b) (N+1)-tuplet

fN,N-1

f1+

f2+

+ fN-1

(c) N-pair

q Learning to identify from multiple negative examples § (N+1)-tuplet loss identifies a positive example from N negative examples.

§ contrastive loss [1] § triplet loss [2,3] § require carefully-designed negative data mining, but is expensive for deep network. q We propose a novel deep metric learning framework § that allows joint comparison to multiple negatives, § while reducing the computational burden via efficient batch construction. q We demonstrate the superiority of our proposed loss to the triplet loss on several visual recognition benchmark, including fine-grained object recognition and verification, image clustering and retrieval, and face verification and identification.

Experimental results

3-2

Face verification and identification

fN+

f3+

f

.. x ..

f

+ 1

...

1

Labs, 2Oculus VR

§ Triplet loss is a special case when N=2. q Efficient batch construction via N-pair examples: O(N2) è O(N) § N tuples of (N+1)-tuplet loss requires N(N+1) examples to be evaluated. § We can obtain N tuples of (N+1)-tuplet loss by constructing a batch with N pairs whose pairs are from different classes. This requires only 2N examples to be evaluated. q Multi-class N-pair (N-pair-mc) loss and One-vs-one N-pair loss (N-pair-ovo)

q Experimental setting § Dataset: train on CASIA WebFace database [9] (500k images of 10k identities) and evaluate on LFW database [10] § Training: 1. CASIA network from scratch (10 conv. + 4 max and 1 average pooling). 2. 384 examples per batch, which corresponds to 192-pair loss. 3. For 320 and 720-pair loss models, batch size is increased accordingly. 4. We replace conv + max pooling into conv with stride [11] to reduce GPU memory usage for 720-pair loss model. § Testing: 1. Verification: determine whether two images are the same person or not. 2. Closed-set identification (rank-1): reference example exists. 3. Open-set identification (DIR@FIR=1%): reference may not exist. q Performance

§ N-pair-mc loss models consistently improve over triplet or N-pair-ovo loss models. § The performance gap becomes more significant on identification tasks. § Increasing N by increasing the batch size is important.

References [1] Chopra et al. Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005. [2] Weinberger et al. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2005. [3] Schroff et al. FaceNet: A unified embedding for face recognition and clustering. In CVPR, 2015. [4] Xie et al. Hyper-class augmented and regularized deep learning for fine-grained imageclassification. In CVPR, 2015. [5] Szegedy et al. Going deeper with convolutions. In CVPR, 2015. [6] Song et al. Deep metric learning via lifted structured feature embedding. In CVPR, 2016. [7] Krause et al. 3d object representations for fine-grained categorization. 2013. [8] Wah et al. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011. [9] Yi et al. Learning face representation from scratch. CoRR,abs/1411.7923, 2014. [10] Huang et al. Towards unconstrained face recognition. In CVPR Workshop, 2008. [11] Springenberg et al. Striving for simplicity: The all convolutional net, In ICLR Workshop, 2015

q Convergence analysis q Hard negative class mining § When output space is small, N-pair loss doesn’t require hard negative data mining. § When output space is large, we propose to find hard negative “classes”.

Experimental results

3-1

Fine-grained visual object recognition & verification q Experimental setting § Dataset: Car-333 [4] (165k images of 333 car models) and Flower-610 (62k images of 610 flower species). § Training: 1. initialized from ImageNet pretrained GoogLeNet [5]. 2. 144 examples per batch, which corresponds to 72-pair loss. § Testing: 1. Classification: k-nearest neighbor classifier using cosine similarity. 2. Verification: rank-1 accuracy of query from single positive and multiple negatives. q Performance

100"

4.5"

90"

4"

tri,"tri" tri,"192" 192p1ovo,"tri"

80"

192p1ovo,"192"

3.5"

192p1mc,"tri"

70"

3"

60"

192p1mc,"ovo"

2.5"

50" 2"

40"

Visual recognition of unseen object classes q Experimental setting § Dataset: online product [6] (120k images of 23k online product categories), Car-196 [7] (16k images of 196 car models), and CUB-200 [8] (12k images of 200 bird species). § Procedure: divide dataset based on object categories, i.e., no overlapping categories between train and test. § Training: 1. initialized from ImageNet pretrained GoogLeNet. 2. 120 examples per batch, which corresponds to 60-pair loss. § Testing: 1. Clustering: F1 and NMI (normalized mutual information). 2. Retrieval: recall@K score at different K’s. q Performance

tri,"tri" 30"

1.5"

tri,"192" 192p1ovo,"tri"

20"

1"

192p1ovo,"192" 192p1mc,"tri"

10"

0.5"

192p1mc,"ovo" 0"

0" 0"

40000"

-

80000"

120000"

160000"

200000"

240000"

0"

40000"

80000"

120000"

160000"

200000"

240000"

Training accuracy (left) and loss (right) curves of triplet, 192-pair-ovo, and 192-pair-mc loss models. Triplet and 192-way classification accuracy and loss are plotted.

§ The difference in learning curve becomes apparent for 192-way classification measure. § 192-pair-mc loss model reaches the final accuracy of triplet loss models after 15k updates, and that of 192-pair-ovo loss model after 20k updates. q Importance of N-pair batch construction § NxM batch construction, where N is the number of distinct classes and M is the number of example for each class (e.g., N-pair = Nx2). § NCA-like loss function:

-

200-dim embedding vectors are connected on top of pool5 features of GoogLeNet. Additional fully connected layer is connected for training softmax loss models. : recognition with softmax classifier.

§ 72-pair loss models improve over triplet models, even with negative data mining. § 72-pair-mc loss models improve upon 72-pair-ovo loss models. § Softmax loss models are good at classification, but poor at verification.

§ Performance: § Similarly to previous, 60-pair loss models consistently improve over triplet models and multi-class loss models outperform one-vs-one loss models. § Negative class mining is effective for online product dataset, where the number of classes (>11k) is significantly larger than N (=60).

§ N-pair loss achieves utmost performance among other variants. § Although there exist performance drop, NxM loss models are still significantly better than triplet loss models.

N-pair loss - GitHub

Clustering: F1 and NMI (normalized mutual information). 2. Retrieval: .... Technical Report CNS-TR-2011-001, California Institute of Technology, 2011. [9] Yi et al. ... Striving for simplicity: The all convolutional net, In ICLR Workshop, 2015.

2MB Sizes 15 Downloads 314 Views

Recommend Documents

HIV infection and tooth loss
on periodontal reports, we hypothesize HIV patients experience greater tooth loss than ... and was conducted during the era of highly active antiretroviral therapy ...

Business Loss - Taxscan.pdf
M/s IMC Ltd. vs. DCIT, Cir-11, Ko. Page 3 ... of Income Tax Rules, 1962 under the normal provision of the Act and under ... Displaying Business Loss - Taxscan.pdf.

profit loss-Copy.pdf
Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps.

GitHub
domain = meq.domain(10,20,0,10); cells = meq.cells(domain,num_freq=200, num_time=100); ...... This is now contaminator-free. – Observe the ghosts. Optional ...

GitHub
data can only be “corrected” for a single point on the sky. ... sufficient to predict it at the phase center (shifting ... errors (well this is actually good news, isn't it?)

Data Loss Prevention.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Data Loss ...

Heat Loss Calculation -
Latent Heat (h) of saturated steam 150 oC @ 4.5 kg/cm2g is 16,371.57 Btu/lbmole. So total steam energy which is entering the pipe is: qt = Qh. 1,853,231.074 Btu/hr. Therefore, total steam condensate (QSC) which is will be performed inside the pipe ca

Habitat Loss Lab.pdf
the collectors in the larger areas. 6. Record all of the data that was collected for your class. Postlab. 1. Graph the average number of species present vs. the area ...

Torsten - GitHub
Metrum Research Group has developed a prototype Pharmacokinetic/Pharmacodynamic (PKPD) model library for use in Stan 2.12. ... Torsten uses a development version of Stan, that follows the 2.12 release, in order to implement the matrix exponential fun

Untitled - GitHub
The next section reviews some approaches adopted for this problem, in astronomy and in computer vision gener- ... cussed below), we would question the sensitivity of a. Delaunay triangulation alone for capturing the .... computation to be improved fr

ECf000172411 - GitHub
Robert. Spec Sr Trading Supt. ENA West Power Fundamental Analysis. Timothy A Heizenrader. 1400 Smith St, Houston, Tx. Yes. Yes. Arnold. John. VP Trading.

Untitled - GitHub
Iwip a man in the middle implementation. TOR. Andrea Marcelli prof. Fulvio Risso. 1859. Page 3. from packets. PEX. CethernetDipo topo data. Private. Execution. Environment to the awareness of a connection. FROG develpment. Cethernet DipD tcpD data. P

BOOM - GitHub
Dec 4, 2016 - 3.2.3 Managing the Global History Register . ..... Put another way, instructions don't need to spend N cycles moving their way through the fetch ...

Supervisor - GitHub
When given an integer, the supervisor terminates the child process using. Process.exit(child, :shutdown) and waits for an exist signal within the time.

robtarr - GitHub
http://globalmoxie.com/blog/making-of-people-mobile.shtml. Saturday, October ... http://24ways.org/2011/conditional-loading-for-responsive-designs. Saturday ...

MY9221 - GitHub
The MY9221, 12-channels (R/G/B x 4) c o n s t a n t current APDM (Adaptive Pulse Density. Modulation) LED driver, operates over a 3V ~ 5.5V input voltage ...

fpYlll - GitHub
Jul 6, 2017 - fpylll is a Python (2 and 3) library for performing lattice reduction on ... expressiveness and ease-of-use beat raw performance.1. 1Okay, to ... py.test for testing Python. .... GSO complete API for plain Gram-Schmidt objects, all.

article - GitHub
2 Universidad Nacional de Tres de Febrero, Caseros, Argentina. ..... www-nlpir.nist.gov/projects/duc/guidelines/2002.html. 6. .... http://singhal.info/ieee2001.pdf.

PyBioMed - GitHub
calculate ten types of molecular descriptors to represent small molecules, including constitutional descriptors ... charge descriptors, molecular properties, kappa shape indices, MOE-type descriptors, and molecular ... The molecular weight (MW) is th

MOC3063 - GitHub
IF lies between max IFT (15mA for MOC3061M, 10mA for MOC3062M ..... Dual Cool™ ... Fairchild's Anti-Counterfeiting Policy is also stated on ourexternal website, ... Datasheet contains the design specifications for product development.

MLX90615 - GitHub
Nov 8, 2013 - of 0.02°C or via a 10-bit PWM (Pulse Width Modulated) signal from the device. ...... The chip supports a 2 wires serial protocol, build with pins SDA and SCL. ...... measure the temperature profile of the top of the can and keep the pe