Learning with Deep Trees Giulia DeSalvo,2 Mehryar Mohri,1,2 and Umar Syed1 Google Research1 Courant Institute of Mathematical Sciences2
December 13, 2014
1/3
Introduction and Problem Decision Trees: binary trees with indicator functions at each internal node and assignment functions at each leaf. used in classification, regression, and clustering applications
Deep trees are a significantly broader family of decision trees: the node questions are in different hypothesis sets H1 , . . . , Hp of increasing complexity. used to tackle harder tasks
Challenge: H = ∪pk=1 Hk could be very complex → learning is prone to overfitting. Would it be possible to learn with node questions of varying complexity and yet not overfit? 2/3
Our Contribution
1. Data-dependent theoretical guarantees for learning with deep trees. Bounds in terms of the Rademacher complexity of node questions.
2. Novel algorithm for learning with deep trees that benefits from the guarantees derived
Dec 13, 2014 - Deep trees are a significantly broader family of decision trees: the node questions are in different hypothesis sets. H1,...,Hp of increasing complexity. used to tackle harder tasks. Challenge: H = U p k=1. Hk could be very complex â learning is prone to overfitting. Would it be possible to learn with node ...
Oct 24, 2016 - can train deep neural networks with non-convex objectives, under a ... Machine learning systems often comprise elements that contribute to ...
Oct 24, 2016 - In this paper, we combine state-of-the-art machine learn- ing methods with ... tribute to privacy since inference does not require commu- nicating user data to a ..... an Apache 2.0 license from github.com/tensorflow/models. For privac
best-in-class algorithms such as Random Forest, Gradient Boosting and Deep Learning at scale. .... elegant web interface or fully scriptable R API from H2O CRAN package. · grid search for .... takes to cut the learning rate in half (e.g., 10â6 mea
based on feature monomials of degree k, or polynomial functions of degree k, ... on finding the best trade-off between computational cost and classification accu-.
Feb 8, 2014 - size is small and prevents the effect of strong variables from being fully explored. Due to these ..... muting, which is suitable for most situations), and 50% ·|P\Pd ..... illustration of how this greedy splitting works. When there ar
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second ... Designing Data-Intensive Applications: The Big Ideas Behind Reliable, ...
One way to build image similarity models is to first ex- tract features like Gabor .... We call ti = (pi,p+ i ,pâ ... A ranking layer on the top evaluates the hinge loss (3).