MSc Thesis Defence: Song Lin
The School of Computer Science is pleased to announce the MSc Defence of Song Lin, to be presented on February 2, 2015, at 1:00 in Reynolds 219. The presentation is titled "Web Person Name Disambiguation Using Multiple Sources of Information".
Web Person Name Disambiguation Using Multiple Sources of Information
Most existing implementations for web people search focus on resolving person name ambiguity with clustering using enormous syntactic and semantic features. Extensive researches have been conducted on document similarity measurement for each individual feature. However, insufficient efforts have been spent on the combination of these similarities. In this thesis we propose two strategies for merging multiple information sources, namely Pre-Combination and Post-Combination. Pre-combination keeps the property of each feature type and orthogonal relationship between different feature types by integrating weights of various features into a single similarity. Post-combination dedicates to comparing contributions of various information sources by linearly merging independent similarities computed for each feature type. We compare traditional cluster merging methods and Chameleon methods with a framework of hierarchical agglomerative clustering. The results achieved using simplified Chameleon clustering with post-combination of keywords and within-document co-referenced named entities are competitive to the cutting-edge web person name disambiguation approaches.
Advisors: Fei Song