Scientific Track

Hyperparameter Search Space Pruning - A New Component for Sequential Model-Based Hyperparameter Optimization

The optimization of hyperparameters is often done manually or exhaustively but recent work has shown that automatic methods can optimize hyperparameters faster and even achieve better final performance. Sequential model-based optimization (SMBO) is the current state of the art framework for automatic hyperparameter optimization. Currently, it consists of three components: a surrogate model, an acquisition function and an initialization technique.

Hyperparameter Optimization with Factorized Multilayer Perceptrons

In machine learning, hyperparameter optimization is a challenging task that is usually approached by experienced practitioners or in a computationally expensive brute-force manner such as grid-search. Therefore, recent research proposes to use observed hyperparameter performance on already solved problems (i.e. data sets) in order to speed up the search for promising hyperparameter configurations in the sequential model based optimization framework.In this paper, we propose multilayer perceptrons as surrogate models as they are able to model highly nonlinear hyperparameter response surfaces.

A Practical Approach to Reduce the Learning Bias under Covariate Shift.

Covariate shift is a specific class of selection bias that arises when the marginal distributions of the input features X are different in the source and the target domains while the conditional distributions of the target Y given X are the same. A common technique to deal with this problem, called importance weighting, amounts to reweighting the training instances in order to make them resemble the test distribution. However this usually comes at the expense of a reduction of the effective sample size.

Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining: Complexity Beyond Blocks

Matrix factorizations are a popular tool to mine regularities from data. There are many ways to interpret the factorizations, but one particularly suited for data mining utilizes the fact that a matrix product can be interpreted as a sum of rank-1 matrices. Then the factorization of a matrix becomes the task of finding a small number of rank-1 matrices, sum of which is a good representation of the original matrix.

Convex Factorization Machines

Factorization machines are a generic framework which allows to mimic most factorization models simply by feature engineering. In this way, they combine the high predictive accuracy of factorization models with the flexibility of feature engineering. Unfortunately, factorization machines involve a non-convex optimization problem. Thus, we can only obtain a local solution, the quality of which depends on initialization. In this paper, we propose a convex formulation of factorization machines based on the nuclear norm.

BoostMF: Boosted Matrix Factorisation for Collaborative Ranking

Personalised recommender systems are widely used information filtering for information retrieval, where matrix factorisation (MF) has become popular as a model-based approach to personalised recommendation. Classical MF methods, which directly approximate low rank factor matrices by minimising some rating prediction criteria, do not achieve a satisfiable performance for the task of top-N recommendation. In this paper, we propose a novel MF method, namely BoostMF, that formulates factorisation as a learning problem and integrates boosting into factorisation.

Large Scale Optimization with Proximal Stochastic Newton-type Gradient Descent

In this work, we generalized and unified two recent completely different works of Jascha and Lee respectively into one by proposing the proximal stochastic Newton-type gradient (PROXTONE) method for optimizing the sums of two convex functions: one is the average of a huge number of smooth convex functions, and the other is a nonsmooth convex function. Our PROXTONE incorporates second order information to obtain stronger convergence results, that it achieves a linear convergence rate not only in the value of the objective function, but also for the solution.

HierCost: Improving Large Scale Hierarchical Classification with Cost Sensitive Learning

Hierarchical Classification (HC) is an important problem with a wide range of application in domains such as music genre classification, protein function classification and document classification. Although several innovative classification methods have been proposed to address HC, most of them are not scalable to web-scale problems. While simple methods such as top-down "pachinko'' style classification and flat classification scale well, they either have poor classification performance or do not effectively use the hierarchical information.

Hash Function Learning via Codewords

In this paper we introduce a novel hash learning framework that has two main distinguishing features, when compared to past approaches. First, it utilizes codewords in the Hamming space as ancillary means to accomplish its hash learning task. These codewords, which are inferred from the data, attempt to capture grouping aspects of the data's hash codes. Secondly and more importantly, the same framework is capable of addressing supervised, unsupervised and, even, semi-supervised hash learning tasks.

Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems

We consider a generic convex-concave saddle point problem with a separable structure, a form that covers a wide-ranged machine learning applications. Under this problem structure, we follow the framework of primal-dual updates for saddle point problems, and incorporate stochastic block coordinate descent with adaptive stepsizes into this framework. We theoretically show that our proposal of adaptive stepsizes potentially achieves a sharper linear convergence rate compared with the existing methods.