ABSTRACT: The main idea of the wpath semantic similarity method is to encode both the structure of the concept taxonomy
and the statistical information of concepts. Furthermore, in order to adapt corpus-based IC methods to structured KGs, graph
based IC is proposed to compute IC based on the distribution of concepts over instances in KGs. The proposed a semantic
similarity method, namely wpath, to combine these two approaches, using IC to weight the shortest path length between
concepts. Conventional corpus-based IC is computed from the distributions of concepts over textual corpus, which is required
to prepare a domain corpus containing annotated concepts and has high computational cost. As instances are already
extracted from textual corpus and annotated by concepts in KGs, graph-based IC is proposed to compute IC based on the
distributions of concepts over instances. Through experiments performed on well known word similarity datasets, we show
that the wpath semantic similarity method has produced statistically significant improvement over other semantic similarity
methods. Moreover, in a real category classification evaluation, the wpath method has shown the best performance in terms
of accuracy and F score.
Keywords- Semantic similarity, semantic relatedness, information content, knowledge graph, WordNet, DBpedia