Invited Talks

7/31 13:30 - 18:30, 8/1 13:00 - 17:15

Whom to Ask? Jury Selection for Decision Making Tasks on Microblog Services

Lei Chen (Hong Kong University of Science and Technology, Hong Kong)

7/31 13:30 - 14:30

It is universal to see people obtain knowledge on micro-blog services by asking others decision making questions. In this talk, I will present our recent study on the Jury Selection Problem(JSP) by utilizing crowdsourcing for decision making tasks on micro-blog services. Specifically, the problem is to enroll a subset of crowd under a limited budget, whose aggregated wisdom via Majority Voting scheme has the lowest probability of drawing a wrong answer (Jury Error Rate-JER). The challenges of such problem reside in the procedure of calculating JER and finding the optimal subset under a limited budget. Due to various individual error-rates of the crowd, the calculation of JER is non-trivial. In our study, we propose two efficient algorithms: a dynamic programming-based algorithm and a divide-and-conquer algorithm. For JSP, we formally propose two models, one for altruistic users(AltrM) and the other one for incentive-requiring users(PayM) who require extra payment when enrolled into a task. Based on two models, we design efficient algorithms for JSP. The efficiency and effectiveness of our proposed algorithms are verified on both synthetic and real micro-blog data.

Short Bio

Lei Chen received the BS degree in computer science and engineering from Tianjin University, Tianjin, China, in 1994, the MA degree from Asian Institute of Technology, Bangkok, Thailand, in 1997, and the PhD degree in computer science from the University of Waterloo, Waterloo, Ontario, Canada, in 2005. He is currently an associate professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. His research interests include crowd sourcing on social media, social media analysis, probabilistic and uncertain databases, and privacy-preserved data publishing. So far, he published more than 150 conference and journal papers. He got the best paper awards in DASFAA 2009 and 2010. He is PC Track chairs for ACM SIGMM 2011, ACM CIKM 2012, and IEEE ICDE 2012. He has served as PC members for SIGMOD, VLDB, ICDE, SIGMM, and WWW. He is a member of the ACM and IEEE. He also serves as the chairman of ACM Hong Kong Chapter.

From Real-World Co-occurrences to Social Connections

Cyrus Shahabi (University of Southern California, USA)

7/31 14:30 - 16:00

In this talk, I first introduce the vision, research and projects of the Integrated Media Systems Center (IMSC), a graduated NSF Engineering Research Center at USC in the area of multimedia. The current research focus of IMSC is on a new geo-socio-temporal computing paradigm, termed Geo-Immersion. Geo-Immersion enables humans to capture, model and integrate real-world data into a geo-realistic virtual replica of the world for immersive data access, querying and analysis. It encompasses research in many interesting topics such as multimedia, participatory-sensing, privacy, trust, web, geospatial and temporal data management, etc. But more importantly, it brings up new fundamental research challenges in computer and social sciences to study the fusion of human behaviors in the real and virtual worlds.

Thus, for the rest of the talk, we will take a closer look at one of these challenges by focusing on how people’s behavior in the real world can be used to infer their social connections in the virtual world. Towards this end, I introduce a new geo-social model that derives social activities from the history of people’s movements in the real world, i.e., who has been where and when. In particular, from spatiotemporal histories, we infer real-world co-occurrences - being there at the same time - and then use co-occurrences to quantify social distances between any two persons. We show that straightforward approaches either do not scale or may overestimate the strength of social connections by giving too much weight to coincidences.

Short Bio

Cyrus Shahabi is a Professor of Computer Science and Electrical Engineering and the Director of the Information Laboratory (InfoLAB) at the Computer Science Department and also the Director of the NSF's Integrated Media Systems Center (IMSC) at the University of Southern California. He was also the CTO and co-founder of a USC spin-off and an InQTel portfolio company, Geosemble Technologies, which was acquired in June 2012. He received his B.S. in Computer Engineering from Sharif University of Technology in 1989 and then his M.S. and Ph.D. Degrees in Computer Science from the University of Southern California in May 1993 and August 1996, respectively. He authored two books and more than two hundred research papers in the areas of databases, GIS and multimedia. Dr. Shahabi has received funding from several agencies such as NSF, NIJ, NASA, NIH, DARPA, AFRL, and DHS as well as several industries such as Chevron, Google, HP, Intel, Microsoft, NCR and NGC. He was an Associate Edito r of IEEE Transactions on Parallel and Distributed Systems (TPDS) from 2004 to 2009. He is currently on the editorial board of the VLDB Journal, IEEE Transactions on Knowledge and Data Engineering (TKDE), ACM Computers in Entertainment and Journal of Spatial Information Science. He is the founding chair of IEEE NetDB workshop and also the general co-chair of ACM GIS 2007, 2008 and 2009. He chaired the nomination committee of ACM SIGSPATIAL for the 2011-2014 terms. He is PC co-Chair of MDM 2013 and regularly serves on the program committee of major conferences such as VLDB, ACM SIGMOD, IEEE ICDE, ACM SIGKDD, and ACM Multimedia. Dr. Shahabi is a recipient of the ACM Distinguished Scientist award in 2009, the 2003 U.S. Presidential Early Career Awards for Scientists and Engineers (PECASE), the NSF CAREER award in 2002, and the 2001 Okawa Foundation Research Grant for Information and Telecommunications. He was the recipient of US Vietnam Education Foundation (VEF) faculty fellow ship award in 2011 and 2012, an organizer of the 2011 National Academy of Engineering “Japan-America Frontiers of Engineering” program, an invited speaker in the 2010 National Research Council (of the National Academies) Committee on New Research Directions for the National Geospatial-Intelligence Agency, and a participant in the 2005 National Academy of Engineering “Frontiers of Engineering” program.

Big Data From the Physical World

Xing Xie (Microsoft Research Asia, China)

7/31 16:30 - 17:30

Context awareness is a key concept in ubiquitous computing. Computing systems become more intelligent through analyzing and reacting to the physical world surrounding them. By accumulating and aggregating large scale physical world contextual data from multiple users and multiple devices over a long period, we can obtain collective social intelligence from them. Based on this, more innovative Internet services can be developed to facilitate people's everyday lives. At Microsoft Research Asia, we are working on various technologies with a view to managing big data from the physical world and building intelligence from them. In this talk, I will present our recent work on this direction, as well as other related works in Microsoft and the industry.

Short Bio

Dr. Xing Xie is a lead researcher in Microsoft Research Asia, and a guest Ph.D. advisor for the University of Science and Technology of China. He received his B.S. and Ph.D. degrees in Computer Science from the University of Science and Technology of China in 1996 and 2001, respectively. He joined Microsoft Research Asia in July 2001, working on spatial data mining, location based services, social networks and ubiquitous computing. His research work has been reported by international media including MIT Technology review, Seattle Times, CNN, etc. He has served on the organizing and program committees of many international conferences such as WWW, UbiComp, GIS, and KDD. During the past years, he has published over 100 referred journal and conference papers. He has more than 50 patents filed or granted. He is a senior member of both ACM and the IEEE. He established the ACM SIGSPATIAL China chapter. And he was the program co-chair of UbiComp 2011.

New Approach for Processing Ranked Subsequence Matching Based on Ranked Union

Wook-Shin Han (Kyungpook National University, Korea)

7/31 17:30 - 18:30

Ranked subsequence matching ﬁnds top-k subsequences most similar to a given query sequence from data sequences. Recently, a new approach (referred to here as HLMJ) has been proposed to this problem by using the concept of the minimum distance matching window pair (MDMWP) and a global priority queue. By using the concept of MDMWP, HLMJ can prune many unnecessary accesses to data subsequences using a lower bound distance. However, we notice that HLMJ may incur serious performance overhead for important types of queries. In this talk, we propose a novel systematic framework to solve this problem by viewing ranked subsequence matching as ranked union. Speciﬁcally, we propose a notion of the matching subsequence equivalence class (MSEQ) and a novel lower bound called the MSEQ-distance. To completely eliminate the performance problem of HLMJ, we also propose a costaware density-based scheduling technique, where we consider both the density and cost of the priority queue. Extensive experimental results with many real datasets show that the proposed algorithm outperforms HLMJ and the adapted PSM, a state-of-the-art index-based merge algorithm supporting non-monotonic distance functions, by up to two to three orders of magnitude, respectively.

Short Bio

Professor* *Wook-Shin Han is currently an Associate Professor in the Department of Computer Engineering at Kyungpook National University in Korea. In the past, he has worked as a post-doctoral researcher at IBM Almaden Research Center, working on parallel progressive optimization inside parallel DB2. He received the B.S. degree in Computer Engineering from Kyungpook National University in 1994, and the M.S. and Ph.D. degrees in Computer Science from Korea Advanced Institute of Science and Technology (KAIST), in 1996 and 2001, respectively. His research interests include query processing and optimization, parallel databases, similarity searching, XML databases, spatial databases, object-oriented/object relational databases, and information retrieval. He has published at major international journals and conferences, including SIGMOD, VLDB, ICDE, WWW, IEEE TKDE, and the VLDB Journal. He has served as a PC member on VLDB, SIGMOD, ICDE, and CIKM. He has served as a PC co-chair of APWeb 2010. He is an editorial board member of several international journals including IEEE Transactions on Knowledge and Data Engineering.

Automated N-Tier System Management through Experimental Measurements

Calton Pu (Georgia Tech, USA)

8/1 13:00 - 14:00

Large N-Tier applications running in data centers and cloud environments have complex deployment requirements and dependencies that change frequently. The increasing complexity and scalability requirements of such applications demand automated configuration design, testing, deployment and monitoring of applications. In the Elba project, we have automated the n-tier application deployment, monitoring, and analysis phases through automated generation of benchmark scripts. Elba software tools include the Mulini generator, which creates deployment and monitoring scripts for several benchmarks such as RUBiS and RUBBoS. The scripts run the benchmark through many different configurations (from 3-tier to 5-tier, and several software packages such as MySQL and PostgreSQL), producing detailed data on many system resource metrics (e.g., CPU and network utilization). Statistical analysis of these metrics identifies the resource bottlenecks automatically, leading to automated adaptation. We will show detailed analyses of our data and discuss new research topics that can use the benchmark data accumulated and apply these techniques to other quality of service dimensions such as availability and power consumption. Concrete applications of this data include configuration planning and autonomic adaptation of N-tier applications.

Short bio

Calton Pu was born in Taiwan and grew up in Brazil. He received his PhD from University of Washington in 1986 and served on the faculty of Columbia University and Oregon Graduate Institute. Currently, he is holding the position of Professor and John P. Imlay, Jr. Chair in Software at the College of Computing, Georgia Institute of Technology. He has worked on several projects in systems and database research. His contributions to systems research include program specialization and software feedback. His contributions to database research include extended transaction models and their implementation. His recent research has focused on automated system management in clouds (Elba project) and document quality, including spam processing. He has collaborated extensively with scientists and industry researchers. He has published more than 70 journal papers and book chapters, 200 conference and refereed workshop papers. He served on more than 120 program committees, including the co-PC chairs of SRDS'95, ICDE'99, COOPIS'02, SRDS'03, DOA'07, DEBS'09, ICWS'10, CollaborateCom'11, and co-general chair of ICDE'97, CIKM'01, ICDE'06, DEPSA'07, CEAS'07, SCC'08, CollaborateCom'08, and World Service Congress'11.

Recommendation Services for Location Based Social Networks

Wang-Chien Lee (The Pennsylvania State University, USA)

8/1 14:00 - 15:00

With the rapid development of mobile devices, wireless networks and Web 2.0 technology, a number of location-based social networking services (LBSNs), e.g., Foursquare, Whrrl, Facebook Place, Google Latitude, Loopt, and Brightkite, have emerged in recent years. These LBSNs allow users to establish cyber links to their friends or other users, and share tips and experiences of their visits to plentiful point-of-interests (POIs), e.g., restaurants, stores, cinema theaters, etc.

Recommendation services, e.g., POI recommendation service that suggests new POIs to users in order to help them explore new places and know their cities better, are essential for LBSNs and thus receiving a lot of research interests. In this talk, I will introduce some recommendation services for LBSNs and present our research effort and results for enabling some of these recommendation services.

Short Bio

Wang-Chien Lee is an Associate Professor of Computer Science and Engineering at Pennsylvania State University, where he leads the Intelligent Pervasive Data Access (iPDA) Research Group to pursue cross-area research in database systems, pervasive/mobile computing, and networking. He is particularly interested in developing data management techniques (including accessing, routing, indexing, caching, aggregation, dissemination, and query processing) for supporting complex queries and location-based services in a wide spectrum of networking and mobile environments such as peer-to-peer networks, mobile ad-hoc networks, wireless sensor networks, and wireless broadcast systems. Meanwhile, he also works on XML, security, information integration/retrieval, and object-oriented databases. He has published more than 200 technical papers on these topics. He has served as a TPC member in many prestigious international conferences, such as VLDB, INFOCOM, ICDE, ICNP, ICDCS, CIKM, etc. He is the program co-chair of ICDCS 2012. He has co-founded MDM, served as the TPC co-chair of IEEE SUTC'06 and IEEE INFOSCALE'07 as well as the general co-chair of DASFAA'11 and MobiDE'07.

Integrating Unstructured Data in the Enterprise

Mukesh Mohania (IBM Research India, India)

8/1 15:30 - 16:30

With the full potential of structured data already exploited, leveraging unstructured data is becoming increasingly important for Enterprises to gain a competitive advantage. The unstructured data in an Enterprise is often related to the structured data, since they both (structured and unstructured) record data about the same business entities. However, they are often stored in different systems and not well integrated. This points to a need to bridge the gap between the structured and unstructured data to enable integrated retrieval, management and analysis. In this session, we will highlight how unstructured data can be used to improve information discovery, entity resolution and relationship discovery in Master Data Management, and con-joint analysis of both data and content together for deeper business intelligence.

We will also touch upon security and privacy aspects of unstructured data and highlight the need for preventing information leakage from unstructured documents based on the access control on structured data for better information governance.

Short Bio

Mukesh Mohania received his Ph.D. in Computer Science & Engineering from Indian Institute of Technology, Bombay, India in 1995. Currently, he is a Senior Technical Staff Member and IBM Master Inventor in IBM Research - India. He has worked extensively in the areas of distributed databases, data warehousing, data integration, and autonomic computing. He has published more than 120 papers and also filed more than 50 patents in these or related areas, and more than 14 have already been granted. He received the best paper awards in CIKM 2004 and CIKM 2005. His work on Data Quality, Information Integration, and Autonomic Computing has led to the development of new products and also influenced several existing IBM products. He has received several awards within IBM, such as "Excellence in People Management", "Outstanding Innovation Award", "Technical Accomplishment Award", "Leadership By Doing", and many more. He also received IEEE Meritorious Service Award. He is an ACM Distinguished Scientist, and a member of IBM Academy of Technology.

OpenSense: Open sensor networks for air quality monitoring

Karl Aberer (EPFL, Switzerland)

8/1 16:30 - 17:30

Wireless sensor networks and publishing of sensor data on the Internet bear the potential to substantially increase public awareness and involvement in environmental sustainability. Air pollution monitoring in urban areas is a prime example of such an application as common air pollutants have direct effect on the human health. However, bringing the vision of public involvement in environmental monitoring to a reality poses today substantial technical challenges for the communication and information systems infrastructure, to scale up from isolated well controlled systems to an open and scalable infrastructure.

In this talk we provide first an overview of the OpenSense project for air pollution monitoring. OpenSense takes a holistic, end-to-end systems perspective. The crucial insight is that in designing open scalable sensing system one has to consider dependencies among many system dimensions both for modelling and control, including sensor behaviour, wireless networks, mobility, environmental models, user needs as well as trust and privacy concerns.

In the second part of the talk we will discuss in more detail aspects of sensor data processing relevant to the OpenSense project. We will introduce model-based methods for sensor data cleaning, segmentation and multi-query processing. We will show a framework to extract semantic activity information from trajectory data and finally provide some initial results on studying the tradeoffs between privacy and sensor data accuracy in community sensing settings. Finally we will provide an outlook on some of our next steps we plan to undertake within OpenSense towards realizing a community-based approach for addressing health concerns of urban populations.

Short Bio

Karl Aberer is a full professor for Distributed Information Systems at EPFL Lausanne, Switzerland, since 2000. Since 2005 he is the director of the Swiss National Research Center for Mobile Information and Communication Systems (NCCR-MICS, www.mics.ch). Prior to his current position, he was senior researcher at the Integrated Publication and Information Systems institute (IPSI) of GMD in Germany. He received his Ph.D. in mathematics in 1991 from the ETH Zurich. His research interests are on semantics and self- organization in information systems with applications in peer-to-peer search, semantic web, trust management and mobile and sensor networks.

He is or has been serving on the editorial boards of SIGMOD Record, VLDB Journal, ACM Transaction on Autonomous and Adaptive Systems and WorldWide Web Journal and been co-chairing among others the ICDE, ISWC, MDM, ODBASE, P2P, VLDB and WISE conferences.

idbWorkshop

The 4th International Workshop with Mentors on Databases, Web and Information Management for Young Researchers

Invited Talks

Whom to Ask? Jury Selection for Decision Making Tasks on Microblog Services

7/31 13:30 - 14:30

Short Bio

From Real-World Co-occurrences to Social Connections

7/31 14:30 - 16:00

Short Bio

Big Data From the Physical World

7/31 16:30 - 17:30

Short Bio

New Approach for Processing Ranked Subsequence Matching Based on Ranked Union

7/31 17:30 - 18:30

Short Bio

Automated N-Tier System Management through Experimental Measurements

8/1 13:00 - 14:00

Short bio

Recommendation Services for Location Based Social Networks

8/1 14:00 - 15:00

Short Bio

Integrating Unstructured Data in the Enterprise

8/1 15:30 - 16:30

Short Bio

OpenSense: Open sensor networks for air quality monitoring

8/1 16:30 - 17:30

Short Bio