Usage Meets Links Analysis: Towards Improving Intranet and Site Specific Search via Usage StatisticsInvestigates ranking signals incorporating usage information with link analysis. A number of existing and newly proposed algorithms are compared.Thesis in pdf format. A presentation is also available. InfrastructureVarious search engine components are implemented:
Algorithms
Methods are compared using a query set as well as by their global properties (e.g. stability, ability to provide a reliable global ordering). EvaluationGlobal comparisonsUPR and UHITS parameters are sampled. Algorithms (and parameters) are compared via:
Query dependent evaluationsAlgorithms are compared via the following evaluation sets.
For each set, a set of relevant documents are identified. Positions of relevant documents using each algorihtm are noted and compared. Algorithm that ranks the relevant documents in earlier positions is deemed superior. ResultsOne of the suggested algorithms, UPR (Usage Aware PageRank), outperformed all other algorithms in practically all datasets. Results suggest that usage information may play an important role in improving ranking quality especially for site specific and intranet search domains where spam is typically not an issue. Modifications that emphasize group behavior over individuals' behavior are proposed and compared. Those can be used to increase spam resistance and reduce other undesirable effects. Results show that such filters did not affect the scores of most popular and high quality documents, but significantly penalized perturbations resulting from a single or few sources. | |
| Copyright © Bilgehan Uygar Oztekin The views and opinions expressed in this page are strictly those of the page author. |