- Haixun Wang (Microsoft Research Asia)
The dream of the Semantic Web is to develop one ontology in one
language covering everything that exists. The reality is that we
have a large number of ontologies that each focuses on a small
domain, and are extremely hard to integrate. Recently, a lot of
interest has been devoted to universal ontologies, either
automatically constructed or built by community effort. However,
they still have limited scope. For example, Freebase, the most well
known taxonomy built by community effort, contains about 1,500
concepts, which is far cry from ``covering everything that exists.''
In this talk, I will present a universal, probabilistic taxonomy that
is more comprehensive than any of the existing taxonomies
today. Currently, it contains over 2 million concepts harnessed
automatically from a corpus of 1.68 billion web pages and 2 years'
worth of search log data. Unlike traditional knowledge bases that
treat knowledge as black and white, it enables probabilistic
interpretations of the information it contains. The probabilistic
nature also enables it to incorporate heterogeneous information in a
natural way. We present the detail of how the core taxonomy, which
contains hypernym-hyponym relationships, is constructed, and how it
models knowledge's inherent uncertainty, ambiguity, and
inconsistency. I will also discuss potential applications, e.g.,
understanding user intent, that can benefit from the taxonomy.