Tutorial List

Monday Tutorials (7th September 2015)

Room: Infante Half Day
09:00 - 10:30; 11:00 - 12:45

Description:

Similarity and distance metrics between observations play an important role in both human cognitive processes and artificial systems for recognition and categorization. How to appropriately measure the distance or similarity for the problem at hand is crucial to the performance of many machine learning and data mining methods. This tutorial provides a comprehensive introduction to metric learning, a set of techniques to automatically learn similarity and distance functions from data. In the first part, we give a general overview of metric learning through the presentation of a few key algorithms and analytical frameworks. In particular, we cover linear and nonlinear methods and how to scale them up to large datasets, metric learning for structured data as well as the derivation of formal guarantees on the generalization performance. In the second part, we show the relevance of metric learning in the very active field of computer vision. We introduce approaches specifically designed for various tasks (image retrieval, object and face recognition, hierarchical image classification) and present experimental results on real-world computer vision datasets.

Website:
http://perso.telecom-paristech.fr/~abellet/misc/metric_learning_tutorial.html

Tutorial Organizers:
Aurelien Bellet
Matthieu Cord
Room: Infante Half Day
14:00 - 15:30; 16:00 - 17:45

Description:

From understanding the structure of data, to classification and topic modeling, graphical models are core tools in machine learning and data mining. They combine probability and graph theories to form a compact representation of probability distributions. In the last decade, as data stores became larger and higher-dimensional, traditional algorithms for learning graphical models from data, with their lack of scalability, became less and less usable, thus directly decreasing the potential benefits of this core technology. To scale graphical modeling techniques to the size and dimensionality of most modern data stores, data science researchers and practitioners now have to meld the most recent advances in numerous specialized fields including graph theory, statistics, pattern mining and graphical modeling. This tutorial will cover the core building blocks that are necessary to build and use scalable graphical modeling technologies on large and high-dimensional data.

Website:
http://www.francois-petitjean.com/Research/ECML-2015-Tutorial/

Tutorial Organizers:
François Petitjean
Geoffrey I. Webb
Room: D. Maria Half Day
09:00 - 10:30; 11:00 - 12:45

Description:

Nowadays data scientists are faced with a plethora of possible approaches, and it is not at all clear which can be confidently used on the data at hand, or how approaches can be tuned efficiently to optimize performance. Determining the best approach for a particular case requires deep expertise in a wide range of data analysis methods, and few data scientists are equally skilled in all these respects. This often leads to a lot of manual trial and error which slows down progress and may lead to suboptimal solutions.

We need novel systems that adapt automatically to different and changing environments, thus reducing the burden of selecting and tuning the right techniques. This tutorial introduces and elucidates the state of the art in meta-learning, algorithm selection, and algorithm configuration. These families of techniques address the problems outlined above by learning from past applications of methods on a wide range of data to automatically recommend machine learning techniques for new sets of data and efficiently optimize their performance to the data at hand. The tutorial also discusses practical tools and systems that can be used today, as well as important open challenges that need to be solved tomorrow. 

Website:
http://metasel2015.inesctec.pt/tutorial-t3-meta-learning-algorithm-selection

Tutorial Organizers:
Pavel Brazdil
Christophe Giraud-Carrier
Joaquin Vanschoren
Lars Kotthoff
Room: D. Luís Half Day
14:00 - 15:30; 16:00 - 17:45

Description:

The Web is inundated with information in many different formats including semi-structured and unstructured data. Machine Reading is a research area that aims to build systems that can read natural-language-based information, extracting knowledge and storing it into knowledge bases. Thus, Machine Reading systems are developed to produce language- understanding technology that will automatically process text in affordable time. In this tutorial the idea of automatically reading the Web using Machine Reading techniques will be explored. Four of the most successful Machine Reading approaches in- tended to Read the Web (namely KnowItAll, Yago, NELL and DBPedia systems) will be presented and discussed. The principles, the subtleties, as well as current results of each approach will be addressed. On-line resources (from each approach) will be explored and the future directions in each system will be pointed out. YAGO, KnowItAll, NELL and DBPedia are not the only research efforts focusing on Reading the Web. They were selected, to be presented in this tutorial, because they show four different and very relevant approaches to this problem, but it does not mean they are the only relevant approaches at all. In spite of mainly focusing on the four aforementioned systems, some other independent contributions on the Read the Web idea will be mentioned and pointed out as related works.

Website:
http://www2.dc.ufscar.br/~estevam/readthewebtutorial.html

Tutorial Organizers:
Estevam Hruschka

Friday Tutorials (11th September 2015)

Room: Infante Half Day
10:00 - 11:15; 11:30 - 13:00

Description:

Rademacher Averages and the Vapnik-Chervonenkis dimension are fundamental concepts from statistical learning theory. They allow to study simultaneous deviation bounds of empirical averages from their expectations for classes of functions, by considering properties of the problem, of the dataset, and of the sampling process. In this tutorial, we survey the use of Rademacher Averages and the VC-dimension for developing sampling-based algorithms for graph analysis and pattern mining. We start from their theoretical foundations at the core of machine learning, then show a generic recipe for formulating data mining problems in a way that allows using these concepts in the analysis of efficient randomized algorithms for those problems. Finally, we show examples of the application of the recipe to graph problems (connectivity, shortest paths, betweenness centrality) and pattern mining. Our goal is to expose the usefulness of these techniques for the data mining researcher, and to encourage research in the area.

Website:
http://bigdata.cs.brown.edu/vctutorial

Tutorial Organizers:
Matteo Riondato
Eli Upfal
Room: São João Half Day
10:00 - 11:15; 11:30 - 13:00

Description:

We review and summarise existing methods and literature on local pattern mining in relational data. Although data are (increasingly) often not stored in relational databases, data are–also increasingly–often relational. Our primary interest is to provide guidance on which approaches are available and in which circumstances they should be used.

We start out with an inclusive overview of multi-relational data mining, categorising approaches according to a number of dimensions: the type of data considered, algorithmic approach (e.g., heuristic vs. exhaustive), objective function or interestingness measure used (e.g., predictive vs. exploratory), and type and generality of the pattern syntax (e.g., global models vs. local patterns). Then, we focus on approaches based on local pattern mining, discussed in two parts: (1) data exploration through predictive modelling and (2) data exploration through descriptive modelling.

The goal of the tutorial is to ensure practitioners learn when to employ which approach, depending on the properties of the data at hand and their own intentions, and secondly, to ensure that researchers can mentally map the full research landscape of multi-relational data mining, with a specific focus on approaches based on local patterns. The focus is on local pattern mining approaches in order to complement recent tutorials given elsewhere on statistical relational learning.

Website:
http://www.interesting-patterns.net/forsied/making-sense-of-multi-relational-data/

Tutorial Organizers:
Jefrey Lijffijt
Eirini Spyropoulou
Tijl De Bie
Room: D. Maria Half Day
10:00 - 11:15; 11:30 - 13:00

Description:

Traditional recommender systems assume the availability of explicit ratings of items from users. However, in many applications this is not the case and only binary, positive-only user feedback is available in the form of likes on Facebook, items bought on Amazon, videos watched on Netflix, adds/links clicked on Google, tags assigned to a photo in Flickr etc. Recently, the number of publications on designing recommender systems that handle binary, positive-only feedback, is growing very fast. In this tutorial we discuss why collaborative filtering with binary, positive-only feedback is fundamentally different from collaborative filtering with rating data. We give an overview of the algorithms suitable for this task with an emphasis on surprising commonalities and key differences. Additionally, we provide extensive experimental comparisons among the most important algorithms. Finally, we discuss the role of these algorithms in the light of Netflix recommender system.

Website:
http://adrem.ua.ac.be/tutorial-ecmlpkdd15

Tutorial Organizers:
Bart Goethals
Kanishka Bhaduriy
Koen Verstrepen
Room: Infante Half Day
14:30 - 16:00; 16:15 - 18:00

Description:

Predictive maintenance strives to anticipate equipment failures to allow for advance scheduling of corrective maintenance, thereby preventing unexpected equipment downtime and improving service quality for the customers. There is a tremendous interest in industry to leverage recent advances in machine learning and data mining to tackle this problem. Whereas the key enabling techniques (such as failure diagnostics and prediction) for predictive maintenance have been of considerable emphasis in the community, the design of practical predictive maintenance systems has not enjoyed the same attention. This is partially due to the lack of access to real-world use cases being an obstacle for researchers to consider the unique characteristics of data and the nature of the problem for the practical design. 

In this tutorial, we aim to fill the gap between the real-world needs and technology offerings by a detailed study on the nature and requirements of the real-world predictive maintenance problems as well as a comprehensive survey of the techniques tacking the problems. We will survey the underlying data sources and feature engineering techniques, the learning scenarios and model creation and selection techniques, and will also present several real-world case studies and lessons learned.  

Website:
http://www.zhuang-john-wang.com/

Tutorial Organizers:
Zhuang Wang
Room: D. Maria Half Day
14:30 - 16:00; 16:15 - 18:00

Description:

Nowadays the rate and volume of information flow are sharply increasing and forcing numerous domains to switch from offline to online information processing. Online learning is a subfield of machine learning providing the intelligence behind many online data processing systems. It has already revolutionized personalization and advertising on the Internet and rapidly penetrates many other domains.

The tutorial will start with an overview of the classical online learning problems - prediction with expert advice, adversarial multiarmed bandits, stochastic multiarmed bandits, and bandits with side information. Then we will introduce the space of online learning problems. The space is spanned by three axes corresponding to three major parameters of online learning problems: (1) The feedback axis, measuring the amount of feedback obtained by the algorithm at every round; (2) The environmental resistance axis, measuring the resistance of the environment to the algorithm (e.g., i.i.d. or adversarial); and (3) The structural complexity axis, measuring the structural complexity of a problem.

Most of the classical results in online learning correspond to isolated points in the space of online learning problems. In the second part of the tutorial we will survey a number of new algorithms that solve ranges of problems in the space of online learning problems. The results include algorithms that interpolate between full information and bandit feedback; algorithms that interpolate between adversarial and i.i.d. (or in some other sense “sub-adversarial”) environments in the full information and in the bandit setting; and algorithms that interpolate between stateless bandits and bandits with side information. These results open a new era in online learning research, where the researchers progress from studying isolated points in the space of online learning problems to studying ranges of problems.

Website:
https://sites.google.com/site/spaceofonlinelearningproblems/

Tutorial Organizers:
Yevgeny Seldin