Scientific Track

Fast Inbound Top-K Query for Random Walk with Restart

Random walk with restart (RWR) is widely recognized as one of the most important node proximity measures for graphs, as it captures the holistic graph structure and is robust to noise in the graph. In this paper, we study a novel query based on the RWR measure, called the inbound top-k (Ink) query. Given a query node q and a number k, the Ink query aims at retrieving k nodes in the graph that have the largest weighted RWR scores to q. Ink queries can be highly useful for various applications such as traffic scheduling, disease treatment, and targeted advertising.

Exact Hybrid Covariance Thresholding for Joint Graphical Lasso

This paper considers the problem of estimating multiple related Gaussian graphical models from a p-dimensional dataset consisting of different classes. Our work is based upon the formulation of this problem as group graphical lasso. This paper proposes a novel hybrid covariance thresholding algorithm that can effectively identify zero entries in the precision matrices and split a large joint graphical lasso problem into small subproblems.

Estimating Potential Customers Anywhere and Anytime Based on Location-based Social Networks

Acquiring the knowledge about the volume of customers for places and time of interest has several benefits such as determining the locations of new retail stores and planning advertising strategies. This paper aims to estimate the number of potential customers of arbitrary query locations and any time of interest in modern urban areas. Our idea is to consider existing established stores as a kind of sensors because the near-by human activities of the retail stores characterize the geographical properties, mobility patterns, and social behaviors of the target customers.

Discovering Audience Groups and Group-Specific Influencers

Recently, user influence in social networks has been studied extensively. Many applications related to social influence depend on quantifying influence and finding the most influential users of a social network. Most existing work studies the global influence of users, i.e. the aggregated influence that a user has on the entire network. It is often overlooked that users may be significantly more influential to some audience groups than others. In this paper, we propose AudClus, a method to detect audience groups and identify group-specific influencers simultaneously.

Towards computation of novel ideas from corpora of scientific text

In this work we present a method for the computation of novel ideas from corpora of scientific text. The system functions by first detecting concept noun-phrases within the titles and abstracts of publications using Part-Of-Speech tagging, before classifying these into sets of problem and solution phrases via a target-word matching approach.

Semi-Supervised Subspace Co-Projection for Multi-Class Heterogeneous Domain Adaptation

Heterogeneous domain adaptation aims to exploit labeled training data from a source domain for learning prediction models in a target domain under the condition that the two domains have different input feature representation spaces. In this paper, we propose a novel semi-supervised subspace co-projection method to address multi-class heterogeneous domain adaptation.

Multidimensional prediction models when the resolution context changes

Multidimensional data is systematically analysed at multiple granularities by applying aggregate and disaggregate operators (e.g., by the use of OLAP tools). For instance, in a supermarket we may want to predict sales of tomatoes for next week, but we may also be interested in predicting sales for all vegetables (higher up in the product hierarchy) for next Friday (lower down in the time dimension).

Learning Pretopological Spaces for Lexical Taxonomy Acquisition

In this paper, we propose a new methodology for semi-supervised acquisition of lexical taxonomies. Our approach is based on the theory of pretopology that offers a powerful formalism to model semantic relations and transforms a list of terms into a structured term space by combining different discriminant criteria. In order to learn a parameterized pretopological space, we define the Learning Pretopological Spaces strategy based on genetic algorithms. In particular, rare but accurate pieces of knowledge are used to parameterize the different criteria defining the pretopological term space.

Inferring Unusual Crowd Events From Mobile Phone Call Detail Records

The pervasiveness and availability of mobile phone data offer the opportunity of discovering usable knowledge about crowd behavior in urban environments. Cities can leverage such knowledge to provide better services (e.g., public transport planning, optimized resource allocation) and safer environment. Call Detail Record (CDR) data represents a practical data source to detect and monitor unusual events considering the high level of mobile phone penetration, compared with GPS equipped and open devices.

Differentially Private Analysis of Outliers

This paper presents an investigation of differentially private analysis of distance-based outliers.Outlier detection aims to identify instances that are apparently distant from other instances. Meanwhile, the objective of differential privacy is to conceal the presence (or absence) of any particular instance. Outlier detection and privacy protection are therefore intrinsically conflicting tasks.In this paper, we present differentially private queries for counting outliers that appear in a given subspace, instead of reporting the outliers detected.