Invited Talks (13:00-16:15 Aug. 2nd, 2011)
An Effective Approach for Topic-Driven Opinion Summarization
Prof. Kam-Fai Wong (The Chinese University of Hong Kong)Slide (PDF: 1.2MB)
Topic-driven opinion summarization (TOS) plays an important role in helping users digest online opinions, which targets to extract a summary of opinion expressions specified by a query, i.e. topic-specific opinionated information (TOI). A fundamental problem in TOS is how to effectively represent the TOI of an opinion expression so that salient opinions can be summarized to meet user’s preference. Existing approaches for TOS are either limited by the mismatch between topical information and its corresponding opinionated information or lack of ability to measure opinionated information when associated with different topics (queries), which in turn affect the performance of TOS seriously. In this paper, we represent a TOI by a semantically richer information unit, word pair, constructed by a sentiment word together with its corresponding topic-specific word. We further propose a weighting scheme to measure word pair and compute the associative score between sentiment and topic word of individual word pair. Then, we integrate word pair into a random walk model for opinion expression ranking and adopt maximal marginal relevance method for summarization. Experimental results showed that salient opinion expressions are effectively weighted to be assigned top rank for TOS and achieved significant improvement in F value over other representations.
Kai-Fai Wong obtained his PhD from Edinburgh University, Scotland, in 1987. After his PhD, he has researched in Heriot-Watt University (Scotland), UniSys (Scotland) and ECRC (Germany). At present he is the Associate Dean (External Affairs) of the Faculty of Engineering, a professor in the Department of Systems Engineering and Engineering Management, the Director of the Centre for Innovation and Technology (CINTEC); Associate Director, Centre for Entrepreneurship (CEF) of the Chinese University of Hong Kong (CUHK). He is also an Adjunct Professor, Northeastern University, China; and Peking University as well as a Vice President, Database School China. He is the founding Editor-In-Chief of ACM Transactions on Asian Language Processing (TALIP) and a member of the editorial board of the Journal on Distributed and Parallel Databases, International Journal on Computer Processing of Oriental Languages and International Journal on Computational Linguistics and Chinese Language Processing; and chairmen of many international academic conferences, e.g. General Chair of IJCNLP2011, Chiang Mei, November 2011. He has published over 200 research papers and the book “Introduction to Chinese Natural Language Processing”, which is the first book about Chinese NLP in English.
On Matching Web-scale Entity Graphs
Prof. Seung-won Hwang (POSTECH, Korea)Slide (PDF: 5.7MB)
This talk introduces the problem of matching web-scale entity graphs, such as multilingual name graphs and social network graphs, to solve difficult problems such as name translation or social id finding. While existing approaches focus on using textual (or phonetic) similarity or Web co-occurrences, this approach combines the strength of the two and significantly outperforms the state-of-the-arts. We present our evaluation results using real-life entity graphs.
Seung-won Hwang is an associate professor of Computer Science and Engineering at Pohang University of Science and Technology (POSTECH), Korea. Prior to joining POSTECH, she received her Ph.D. in computer science from University of Illinois at Urbana-Champaign. Her research lies in web-scale data management, published in major international journals and conferences, including ACM TODS, IEEE TKDE, SIGMOD, VLDB, SIGKDD, and ICDE.
Being David in the Data Management World of Goliaths
Prof. Sourav S Bhowmick (Nanyang Technological University, Singapore)Slide (PDF: 2.8MB)
This talk consists of two parts. In the first part, I will share with you my experience and some insights on how to undertake data management research that leads to publications in top tier venues where you are a David in the data management world of Goliaths. By "David" I mean you belong to one or more of the followings: (a) You are not a graduate student of a "star" professor; (b) You have not (or not going to) graduated from a DB-strong University or group; or (c) You are not connected to the "visible network".
In the second part, I will share with you my views on XML and graph data management research. Specifically, I will talk about (a) what topics in these domains may lead you to top-tier publications; (b) publications vs impact of research and (c) what is the best strategy (according to my view) when you are a David.
Sourav S Bhowmick is an Associate Professor in the School of Computer Engineering, Nanyang Technological University and the Director of Centre for Advanced Information Systems (CAIS). He is affiliated with the Data Management Research Group at NTU (DANTE) and Data Mining & Machine Learning Group. He is currently Visiting Associate Professor at the Biological Engineering Division, Massachusetts Institute of Technology (MIT), USA. He also holds the position of Singapore-MIT Alliance (SMA) Fellow in Computation and Systems Biology program (2005-2011). Sourav received his Ph.D. in computer engineering in 2001.
Sourav's key research goal is to consistently undertake quality research that will enhance human needs and aspirations. His current research interests include tree and graph data management, HCI-aware data management, database usability, social media & web data management, data mining, and computation & systems biology. He has published more than 120 papers in major international database and data mining conferences and journals such as VLDB, IEEE ICDE, ACM WWW, ACM SIGMOD, ACM SIGKDD, ACM MM, ACM CIKM, ER, IEEE TKDE, ACM CS, Information Systems, and DKE. The common thread running through his research is a focus on going beyond papers to build usable novel data management and mining systems and prototypes. Sourav's key research contributions are summarized as follows.
His research team is the first to undertake a systematic study on mining structural evolution of tree-structured data. This work received ""Best Interdisciplinary Paper Award"" in ACM CIKM 2004. Subsequently, they proposed solutions to a series of novel problems related to mining evolution of tree and graph structured data. Some of these works were published in ACM WWW 2006 and 2007, ACM SIGKDD 2006, ACM CIKM 2005, 2008, VLDB 2009, ICDE 2011.
His team is the first to propose a novel query processing paradigm that blends visual query formulation and query processing to turbo-charge system response time by exploiting the latency offered by visual interfaces. This paradigm was realized on top of XML and graph databases resulting in two novel data management systems called XBLEND and GBLENDER, respectively. The results of this work were published in ICDE 2006, DASFAA 2007, ICDE 2009, SIGMOD 2010, SIGMOD 2011.
Sourav and his collaborators recently proposed novel techniques to improve usability of XML database systems. This resulted in world's first shape-polymorphic XML data transformation language called XMORPH. The results of this work were published in ICDE 2010, VLDB 2010.
His team also developed a system called XANADUE, which is the first system to detect changes to XML data using relational backends. The research results were first published in DEXA 2004 and subsequently in ACM CIKM 2005, ER 2006, and SIGMOD 2007.
Sourav is serving as a PC member of various database conferences and workshops and reviewer for various database journals. He has served as a program chair/co-chair of several international conferences and workshops. He is a member of the editorial boards of several international journals. Sourav has been tutorial speaker in several international conferences such as ER 2006, APWeb 2008, WAIM 2008, PAKDD 2009 and 2011, and DASFAA 2011. His Ph.D. thesis was published as a book entitled ""Web Data Management: A Warehouse Approach"" (Springers Verlag, October 2003). Sourav is a member of ACM and an affiliate member of IEEE.
Data Mining and the 'Curse of Dimensionality'
Dr. Arthur Zimek (Ludwig-Maximilians-University Munich, Germany)Slide (PDF: 2.2MB)
This talk sketches the infamous 'curse of dimensionality' in its relevance for data mining algorithms and reports on some exemplary approaches to mining high-dimensional data.
Dr. Arthur Zimek is a scientific assistant (equiv. assistant professor or lecturer) in the database systems and data mining group of Hans-Peter Kriegel at Ludwig-Maximilians-University Munich, Germany. He finished his PhD thesis in computer science on "Correlation Clustering" in summer 2008. He received the "Best Paper Honorable Mention Award" at the SIAM Int. Conf. on Data Mining (SDM) together with his co-authors in 2008 and has been selected as runner-up of the "SIGKDD Doctoral Dissertation Award" in 2009. Zimek holds degrees in Theology, Philosophy and Bioinformatics and published more than 30 peer reviewed journal and conference papers. He serves as a reviewer for several top database and data mining journals (e.g. VLDB Journal, IEEE TKDE, ACM TKDD, Data Min. Knowl. Disc., Stat. Anal. Data Min.) and as a member of program committees in data mining and pattern recognition conferences (ACM SIGKDD 2011, ICPRAM 2012). Zimek gave tutorials at several top database and data mining conferences (e.g. KDD, ICDM, SDM, PAKDD, VLDB).