iDB Workshop

The 5th International Workshop with Mentors on Databases, Web and Information Management for Young Researchers

July 21th-23rd, 2013

Tutorials

Lecturers

Maarten de Rijke (University of Amsterdam, Netherlands)
Sourav S. Bhowmick (Nanyang Technological University, Singapore)
Roi Blanco ( University of A Coruna / Yahoo! Research Barcelona, Spain)
Jun Zhu (Tsinghua University, China)

Click Models: A Survey

July 21st 13:30 - 16:45

	Maarten de Rijke (University of Amsterdam, Netherlands)
Abstract	Click data has always been an important source of information for web search engines. It is an implicit signal because we do not always understand how user behavior correlates with user satisfaction: users' clicks are biased. Click models, probabilistic models of the behavior of web search users, have been studied extensively by the IR community during the last ve years; they manage to achieve good results in modeling user clicks. Click models are being used in a variety of ways: to inform ranking functions, to as part of user simulations, for evaluation purposes, and to inform the definition of new metrics. In the tutorial, I will survey the most influential click models proposed in the literature and discuss recent developments. These include click models that are aimed at capturing user intent and methods for deriving metrics from click models.
Short Bio	Maarten de Rijke is full professor of Information Processing and Internet in the Informatics Institute at the University of Amsterdam. He holds MSc degrees in Philosophy and Mathematics (both cum laude), and a PhD in Theoretical Computer Science. He worked as a postdoc at CWI, before becoming a Warwick Research Fellow at the University of Warwick, UK. He joined the University of Amsterdam in 1998, and was appointed full professor in 2004. De Rijke leads the Information and Language Processing Systems group, one of the world's leading academic research groups in information retrieval. During the most recent computer science research assessment exercise, the group achieved maximal scores on all dimensions. His research focus is on intelligent information access, with projects on social media analytics, vertical search engines, machine learning for information retrieval, and semantic search. A Pionier personal innovational research incentives grant laureate (comparable to an advanced ERC grant), De Rijke has generated over 35MEuro in project funding. With an h-index of 45 he has published over 550 papers, published or edited over a dozen books, is (associate) editor for various journals and book series, and a current and former coordinator of retrieval evaluation tracks at TREC, CLEF and INEX. He is co-chair for SIGIR 2013, general chair for ECIR 2014, and program chair for information retrieval for CIKM 2015. He is the director of the University of Amsterdam's Intelligent Systems Lab (ISLA), its Center for Creation, Content and Technology (CCCT), and a board member for the Ad de Jonge Centrum voor Inlichtingen- en Veiligheidsstudies. The retrieval and language technology developed by his research group is being used by organizations around the Netherlands and beyond, and has given rise to various spin-off initiatives.

Managing Social Image Tags: Methods and Applications

July 21st 13:30 - 16:45

Sourav S. Bhowmick (Nanyang Technological University, Singapore)

Abstract

With the advances in digital photography (e.g., digital cameras and mobile phones) and social media sharing web sites, a huge number of multimedia content is now available online. Most of these sites enable users to annotate web objects including images with free tags. A key consequence of the availability of such tags as meta-data is that it has created a framework that can be effectively exploited to significantly enhance our ability to understand social images. Such understanding paves way to the creation of novel and superior techniques and applications for searching and browsing social images contributed by common users. The objective of this tutorial is to provide a comprehensive background on state-of-the-art techniques for managing tags associated with social images.

The tutorial is structured as follows. In the first part, we provide a comprehensive understanding of social image tags. We present a brief survey on studies related to motivation behind tagging and impact of various tagging systems that are used by users to create tags. We shall use Flickr as an example tagging system to illustrate various concepts. In the second part, we shall describe state-of-the-art techniques for measuring effectiveness of tags in describing its annotated resources (social images). Specifically, we shall describe techniques that enable us to quantitatively measure a tag’s ability to describe the image content of social images. Note that this issue is one of the most fundamental problem in multimedia analysis, search, and retrieval. The third part of the tutorial is devoted to describing state-of-the-art techniques for discovering relationships between tags and how such knowledge is useful for various tag-based social media management applications such as tag recommendation, tag disambiguation and tag-based browsing systems. We conclude by identifying potential research directions in this area.

Short Bio

Sourav S Bhowmick is an Associate Professor in the School of Computer Engineering, Nanyang Technological University and the Director of Data-Intensive Scalable Computing (DISCO) Lab. He is currently a Senior Visiting Professor at Fudan University, China. He was a Visiting Associate Professor at the Biological Engineering Division, Massachusetts Institute of Technology (MA, USA) from 2007 to 2013. Sourav received his Ph.D. in computer engineering in 2001. His core research expertise is in the field of data management and analytics. His current research interests in multi-disciplinary in flavour, focusing primarily on the following two types of research: (a) Rethinking widely-accepted traditional data management techniques due to progress in other domains (e.g, HCI). (b) Explore novel and important data management/analytics problems at the interfaces of non-computer science disciplines such as social science and systems biology. Specifically, one of his current interest is in large scale analysis of online textual and non-textual social objects. Sourav has published more than 150 papers in major international database, data mining, and bioinformatics conferences and journals such as VLDB, IEEE ICDE, EDBT, CIDR, ACM WWW, ACM SIGMOD, ACM SIGKDD, ACM MM, ACM CIKM, ACM BCB, IEEE TKDE, VLDB Journal, Bioinformatics, and Biophysical Journal. He is serving as a PC member of various database, data mining, and bioinformatics conferences and workshops and reviewer for various database and data mining journals. He is serving as a program chair/co-chair of several international workshops and conferences. He is a member of the editorial boards of several international journals. Sourav has been tutorial speaker for several international conferences such as ER 2006, APWeb 2008, WAIM 2008, PAKDD 2009 and 2011, DASFAA 2011 and 2012, and ADMA 2012 . He has received Best Paper Awards at ACM CIKM 2004 and ACM BCB 2011 for papers related to evolution mining and biological network summarization, respectively.

Mining Web content for Enhanced Search

July 22nd 09:15 - 12:30

	Roi Blanco ( University of A Coruna / Yahoo! Research Barcelona, Spain)
Abstract	Typically, Web mining approaches have focused on enhancing or learning about user seeking behavior, from query log analysis and click through usage, employing the web graph structure for ranking to detecting spam or web page duplicates. Lately, there's a trend on mining web content semantics and dynamics in order to enhance search capabilities by either providing direct answers to users or allowing for advanced interfaces or capabilities. In this tutorial we will look into different ways of mining textual information from Web archives, with a particular focus on how to extract and disambiguate entities, and how to put them in use in various search scenarios. Further, we will discuss how web dynamics affects information access and how to exploit them in a search context.
Short Bio	Roi Blanco is a Senior Research Scientist in Yahoo! Labs Barcelona, where he has been working since 2009. He is interested in applications of natural language processing for information retrieval, web search and mining and large-scale information access in general, publishing at international conferences in those areas. He also contributes to different industrial products like Yahoo! Search. Previously he taught computer science at A Coruna University, from which he received his Ph.D. degree (cum laude) in 2008.

Learning Bayesian Models with Posterior Regularization

July 22nd 09:15 - 12:30

	Jun Zhu (Tsinghua University, China)
Abstract	Existing Bayesian models, especially nonparametric Bayesian methods, rely heavily on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors can affect posterior distributions through Bayes' theorem, imposing posterior regularization is arguably more direct and in some cases can be more natural and easier. In this talk, I will present regularized Bayesian inference (RegBayes), a computational framework to perform posterior inference with a convex regularization on the desired post-data posterior distributions. When the convex regularization is induced from a linear operator on the posterior distributions, RegBayes can be solved with convex analysis theory. Furthermore, I will present some concrete examples, including MedLDA for learning discriminative topic representations; infinite latent support vector machines for learning discriminative latent features for classification; and others on social network analysis, matrix factorization, multi-task learning, etc. All these models explore the large-margin idea in combination with a (nonparametric) Bayesian model for discovering predictive latent representations. I will discuss both variational and Monte Carlo methods for inference.
Short Bio	Prof. Jun Zhu is an associate professor in the Department of Computer Science and Technology at Tsinghua University. His principal research interests lie in the development of statistical machine learning methods for solving scientific and engineering problems arising from artificial and biological learning, reasoning, and decision-making in the high-dimensional and dynamic worlds. Prof. Zhu received his Ph.D. in Computer Science from Tsinghua University. He did post-doctoral research in the Machine Learning Department at Carnegie Mellon University. His current work involves both the foundations of statistical learning, including theory and algorithms for probabilistic latent variable models, sparse learning in high dimensions, Bayesian nonparametrics, and large-margin learning; and the application of statistical learning in social network analysis, data mining, and multi-media data analysis.