Scientific Track

Scalable Metric Learning for Co-embedding

We present a general formulation of metric learning for co-embedding, where the goal is to relate objects from different sets. The framework allows metric learning to be applied to a wide range of problems---including link prediction, relation learning, multi-label tagging and ranking---while allowing training to be reformulated as convex optimization. For training we provide a fast iterative algorithm that improves the scalability of existing metric learning approaches.

Learning Compact and Effective Distance Metrics with Diversity Regularization

Learning a proper distance metric is of vital importance for many distance based applications. Distance metric learning aims to learn a set of latent factors based on which the distances between data points can be effectively measured. The number of latent factors incurs a tradeoff: a small amount of factors are not powerful and expressive enough to measure distances while a large number of factors cause high computational overhead.

Joint Semi-Supervised Similarity Learning for Linear Classification

The importance of metrics in machine learning has attracted a growing interest for distance and similarity learning. We study here this problem in the situation where few labeled data (and potentially few unlabeled data as well) is available, a situation that arises in several practical contexts. We also provide a complete theoretical analysis of the proposed approach. It is indeed worth noting that the metric learning research field lacks theoretical guarantees that can be expected on the generalization capacity of the classifier associated to a learned metric.

Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

We often encounter situations in supervised learning where there exist possibly overlapping groups that consist of more than two parameters. For example, we might work on parameters that correspond to words expressing the same meaning, music pieces in the same genre, and books released in the same year. Based on such auxiliary information, we could suppose that parameters in a group have similar roles in a problem and similar values.

Aggregation Under Bias: Renyi Divergence Aggregation and its Implementation via Machine Learning Markets

Trading in information markets, such as machine learning markets, has been shown to be an effective approach for aggregating the beliefs of different agents. In a machine learning context, aggregation commonly uses forms of linear opinion pools, or log opinion pools. It is interesting to relate information market aggregation to the machine learning setting.In this paper we introduce a spectrum of compositional methods, Renyi divergence aggregators, that interpolate between log opinion pools and linear opinion pools.

Sign Constrained Rectifier Networks with Applications to Pattern Decompositions

In this paper we introduce sign constrained rectifier networks (SCRN), demonstrate their universal classification power and illustrate their applications to pattern decompositions. We prove that the proposed two-hidden-layer SCRN, with sign constraints on the weights of the output layer and on those of the top hidden layer, are capable of separating any two disjoint pattern sets.

Scoring and Classifying with Gated Auto-encoders

Auto-encoders are perhaps the best-known non-probabilistic methods for representation learning. They are conceptually simple and easy to train. Recent theoretical work has shed light on their ability to capture manifold structure, and drawn connections to density modelling. This has motivated researchers to seek ways of auto-encoder scoring, which has furthered their use in classification. Gated auto-encoders (GAEs) are an interesting and flexible extension of auto-encoders which can learn transformations among different images or pixel covariances within images.

Online Learning of Deep Hybrid Architectures for Semi-Supervised Categorization

A hybrid architecture is presented capable of online learning from both labeled and unlabeled samples. It combines both generative and discriminative objectives to derive a new variant of the Deep Belief Network, i.e., the Stacked Boltzmann Experts Network model. The model's training algorithm is built on principles developed from hybrid discriminative Boltzmann machines and composes deep architectures in a greedy fashion. It makes use of its inherent ``layer-wise ensemble" nature to perform useful classification work.

Difference Target Propagation

Back-propagation has been the workhorse of recent successes of deeplearning but it relies on infinitesimal effects (partial derivatives) in order to performcredit assignment. This could become a serious issue as one considersdeeper and more non-linear functions, e.g., consider the extreme case of nonlinearitywhere the relation between parameters and cost is actually discrete. Inspiredby the biological implausibility of back-propagation, a few approacheshave been proposed in the past that could play a similar credit assignment role.

An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns

Energy-based models are popular in machine learning due to the elegance of their formulation and their relationship to statistical physics. Among these, the Restricted Boltzmann Machine (RBM), and its staple training algorithm contrastive divergence (CD), have been the prototype for some recent advancements in the unsupervised training of deep neural networks. However, CD has limited theoretical motivation, and can in some cases produce undesirable behaviour.Here, we investigate the performance of Minimum Probability Flow (MPF) learning for training RBMs.