Pairwise semantic similarity measures

This page introduces the collection of pairwise semantic similarity measures which are implemented by the latest HESML version (V1R5). First column sets the corresponding SimilarityMeasureType (enum) code to instance the measure in HESML by calling the function MeasureFactory.getMeasure().

For a comprehensive and updated review of the literature, we refer the reader to the reproducible experimental survey introduced by Lastra-Díaz et al. (2019) [2].

[1] J.J. Lastra-Díaz, J. Goikoetxea, M. Hadj Taieb, A. García-Serrano, M. Ben Aouicha, E. Agirre, A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art, Engineering Applications of Artificial Intelligence. 85 (2019) 645-665. https://doi.org/10.1016/j.engappai.2019.07.010

Path-based semantic similarity measures

Semantic similarity measures detailed in the table below are based on the computation of the length of the shortest path between concepts in an ontology. We also include in the table several hybrid measures which use an IC model and the length of the shortest path between concepts.

SimilarityMeasureType enum Reference
Rada R. Rada, H. Mili, E. Bicknell, M. Blettner, Development and application of a metric on semantic nets, IEEE Transactions on Systems, Man, and Cybernetics. 19 (1989) 17–30.
WuPalmer Z. Wu, M. Palmer, Verbs Semantics and Lexical Selection, in: Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 1994: pp. 133–138.
Mubaid H. Al-Mubaid, H.A. Nguyen, Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies, IEEE Trans. Syst. Man Cybern. C Appl. Rev. 39 (2009) 389–398.
Hao D. Hao, W. Zuo, T. Peng, F. He, An Approach for Calculating Semantic Similarity between Words Using WordNet, in: Proc. of the Second International Conference on Digital Manufacturing Automation, IEEE, 2011: pp. 177–180.
PekarStaab V. Pekar, S. Staab, Taxonomy Learning: Factoring the Structure of a Taxonomy into a Semantic Classification Decision, in: Proceedings of the 19th International Conference on Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 2002: pp. 1–7.
LeacockChodorow C. Leacock, M. Chodorow, Combining local context and WordNet similarity for word sense identification, in: C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database, MIT Press, 1998: pp. 265–283.
Li2003Strategy3 Y. Li, Z.A. Bandar, D. McLean, An approach for measuring semantic similarity between words using multiple information sources, IEEE Transactions on Knowledge and Data Engineering. 15 (2003) 871–882.
Li2003Strategy4 Y. Li, Z.A. Bandar, D. McLean, An approach for measuring semantic similarity between words using multiple information sources, IEEE Transactions on Knowledge and Data Engineering. 15 (2003) 871–882.
PedersenPath T. Pedersen, S.V.S. Pakhomov, S. Patwardhan, C.G. Chute, Measures of semantic similarity and relatedness in the biomedical domain, J. Biomed. Inform. 40 (2007) 288–299.
LiuStrategy1 X.Y. Liu, Y.M. Zhou, R.S. Zheng, Measuring Semantic Similarity in Wordnet, in: Proc. of the 2007 International Conference on Machine Learning and Cybernetics, IEEE, 2007: pp. 3431–3435.
LiuStrategy2 X.Y. Liu, Y.M. Zhou, R.S. Zheng, Measuring Semantic Similarity in Wordnet, in: Proc. of the 2007 International Conference on Machine Learning and Cybernetics, IEEE, 2007: pp. 3431–3435.
WeightedJiangConrath J.J. Lastra-Díaz, A. García-Serrano, A novel family of IC-based similarity measures with a detailed experimental survey on WordNet, Engineering Applications of Artificial Intelligence Journal. 46 (2015) 140–153.
CosineNormWeightedJiangConrath J.J. Lastra-Díaz, A. García-Serrano, A novel family of IC-based similarity measures with a detailed experimental survey on WordNet, Engineering Applications of Artificial Intelligence Journal. 46 (2015) 140–153.
CaiStrategy1 Y. Cai, Q. Zhang, W. Lu, X. Che, A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet, J. Intell. Inf. Syst. (2017) 1–25.
CaiStrategy2 Y. Cai, Q. Zhang, W. Lu, X. Che, A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet, J. Intell. Inf. Syst. (2017) 1–25.
Zhou Z. Zhou, Y. Wang, J. Gu, New model of semantic similarity measuring in WordNet, in: Proc. of the 3rd International Conference on Intelligent System and Knowledge Engineering (ISKE 2008), IEEE, 2008: pp. 256–261.

IC-based semantic similarity measures

Semantic similarity measures detailed in the table below are based on an Information Content (IC) model.

SimilarityMeasureType enum Reference
Resnik P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, in: Proc. of the International Joint Conferences on Artificial Intelligence (IJCAI 1995), Montreal, Canada, 1995: pp. 448–453.
JiangConrath J.J. Jiang, D.W. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy, in: Proceedings of International Conference Research on Computational Linguistics (ROCLING X), 1997: pp. 19–33.
Lin D. Lin, An information-theoretic definition of similarity, in: Proceedings of the 15th International Conference on Machine Learning, Madison, WI, 1998: pp. 296–304.
Zhou Z. Zhou, Y. Wang, J. Gu, New model of semantic similarity measuring in WordNet, in: Proc. of the 3rd International Conference on Intelligent System and Knowledge Engineering (ISKE 2008), IEEE, 2008: pp. 256–261.
Li2003Strategy9 Y. Li, Z.A. Bandar, D. McLean, An approach for measuring semantic similarity between words using multiple information sources, IEEE Transactions on Knowledge and Data Engineering. 15 (2003) 871–882.
CosineNormJiangConrath J.J. Lastra-Díaz, A. García-Serrano, A novel family of IC-based similarity measures with a detailed experimental survey on WordNet, Engineering Applications of Artificial Intelligence Journal. 46 (2015) 140–153.
CosineNormWeightedJiangConrath J.J. Lastra-Díaz, A. García-Serrano, A novel family of IC-based similarity measures with a detailed experimental survey on WordNet, Engineering Applications of Artificial Intelligence Journal. 46 (2015) 140–153.
WeightedJiangConrath J.J. Lastra-Díaz, A. García-Serrano, A novel family of IC-based similarity measures with a detailed experimental survey on WordNet, Engineering Applications of Artificial Intelligence Journal. 46 (2015) 140–153.
PirroSeco G. Pirró, N. Seco, Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content, in: R. Meersman, Z. Tari (Eds.), On the Move to Meaningful Internet Systems: OTM 2008, Springer, 2008: pp. 1271–1288.
FaITH G. Pirró, J. Euzenat, A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness, in: P.F. Patel-Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J.Z. Pan, I. Horrocks, B. Glimm (Eds.), Proc. of the 9th International Semantic Web Conference, ISWC 2010, Springer, Shangai, China, 2010: pp. 615–630.
Meng2012 L. Meng, J. Gu, A New Model for Measuring Word Sense Similarity in WordNet, in: Proceedings of the 4th International Conference on Advanced Communication and Networking, ASTL, 2012: pp. 18–23.
Meng2014 L. Meng, R. Huang, J. Gu, Measuring Semantic Similarity of Word Pairs Using Path and Information Content, International Journal of Future Generation Communication & Networking. 7 (2014) 183–194.
Gao2015Strategy3 J.B. Gao, B.W. Zhang, X.H. Chen, A WordNet-based semantic similarity measurement combining edge-counting and information content theory, Eng. Appl. Artif. Intell. 39 (2015) 80–88.
CaiStrategy1 Y. Cai, Q. Zhang, W. Lu, X. Che, A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet, J. Intell. Inf. Syst. (2017) 1–25.
CaiStrategy2 Y. Cai, Q. Zhang, W. Lu, X. Che, A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet, J. Intell. Inf. Syst. (2017) 1–25.
Garla V.N. Garla, C. Brandt, Semantic similarity in the biomedical domain: an evaluation across knowledge sources, BMC Bioinformatics. 13:261 (2012). https://doi.org/10.1186/1471-2105-13-261.
CosineLin Proof of concept developed in the context of the thesis introduced by
J.J. Lastra-Díaz, Recent Advances in Ontology-based Semantic Similarity Measures and Information Content Models based on WordNet, PhD In Intelligent Systems, Universidad Nacional de Educación a Distancia (UNED), 2017. http://e-spacio.uned.es/fez/view/tesisuned:ED-Pg-SisInt-Jjlastra
ExpNormJiangConrath Proof of concept developed in the context of the thesis introduced by
J.J. Lastra-Díaz, Recent Advances in Ontology-based Semantic Similarity Measures and Information Content Models based on WordNet, PhD In Intelligent Systems, Universidad Nacional de Educación a Distancia (UNED), 2017. http://e-spacio.uned.es/fez/view/tesisuned:ED-Pg-SisInt-Jjlastra
LogisticLin Proof of concept developed in the context of the thesis introduced by
J.J. Lastra-Díaz, Recent Advances in Ontology-based Semantic Similarity Measures and Information Content Models based on WordNet, PhD In Intelligent Systems, Universidad Nacional de Educación a Distancia (UNED), 2017. http://e-spacio.uned.es/fez/view/tesisuned:ED-Pg-SisInt-Jjlastra
LogisticNormJiangConrath Proof of concept developed in the context of the thesis introduced by
J.J. Lastra-Díaz, Recent Advances in Ontology-based Semantic Similarity Measures and Information Content Models based on WordNet, PhD In Intelligent Systems, Universidad Nacional de Educación a Distancia (UNED), 2017. http://e-spacio.uned.es/fez/view/tesisuned:ED-Pg-SisInt-Jjlastra

Feature-based semantic similarity measures

Semantic similarity measures detailed in the table below are based on different ontology features.

SimilarityMeasureType enum Reference
Sanchez2012 D. Sánchez, M. Batet, D. Isern, A. Valls, Ontology-based semantic similarity: A new feature-based approach, Expert Syst. Appl. 39 (2012) 7718–7728.
Taieb2014 M.A. Hadj Taieb, M. Ben Aouicha, A. Ben Hamadou, Ontology-based approach for measuring semantic similarity, Eng. Appl. Artif. Intell. 36 (2014) 238–261.
Taieb2014sim2 M.A. Hadj Taieb, M. Ben Aouicha, A. Ben Hamadou, Ontology-based approach for measuring semantic similarity, Eng. Appl. Artif. Intell. 36 (2014) 238–261.
Stojanovic N. Stojanovic, A. Maedche, S. Staab, R. Studer, Y. Sure, SEAL: A Framework for Developing SEmantic PortALs, in: Proceedings of the 1st International Conference on Knowledge Capture, ACM, New York, NY, USA, 2001: pp. 155–162.
WuPalmerFast Fast approximation of the Wu & Palmer measure based on the use of the exact formula for the distance between concepts on a tree-like taxonomy which uses the depth values of the concepts.

Fast AncSPL-based reformulations of path-based similarity measures

Semantic similarity measures detailed in the table below are fast reformulations of path-based measures based on the AncSPL algorithm introduced by Lastra-Díaz et al. (2020) [2].

[2] J.J. Lastra-Díaz, A. Lara-Clares, A. García-Serrano, HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey, Submitted for Publication. (2020).

SimilarityMeasureType enum in HESML Base semantic similarity measures
AncSPLCaiStrategy1 Y. Cai, Q. Zhang, W. Lu, X. Che, A hybrid approach for measuring semantic similarity based on IC-weighted path distance in WordNet, J. Intell. Inf. Syst. (2017) 1–25.
AncSPLCosineNormWeightedJiangConrath J.J. Lastra-Díaz, A. García-Serrano, A novel family of IC-based similarity measures with a detailed experimental survey on WordNet, Engineering Applications of Artificial Intelligence Journal. 46 (2015) 140–153.
AncSPLGao2015Strategy3 J.B. Gao, B.W. Zhang, X.H. Chen, A WordNet-based semantic similarity measurement combining edge-counting and information content theory, Eng. Appl. Artif. Intell. 39 (2015) 80–88.
AncSPLHao D. Hao, W. Zuo, T. Peng, F. He, An Approach for Calculating Semantic Similarity between Words Using WordNet, in: Proc. of the Second International Conference on Digital Manufacturing Automation, IEEE, 2011: pp. 177–180.
AncSPLLeacockChodorow C. Leacock, M. Chodorow, Combining local context and WordNet similarity for word sense identification, in: C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database, MIT Press, 1998: pp. 265–283.
AncSPLLi2003Strategy3 Y. Li, Z.A. Bandar, D. McLean, An approach for measuring semantic similarity between words using multiple information sources, IEEE Transactions on Knowledge and Data Engineering. 15 (2003) 871–882.
AncSPLLi2003Strategy4 Y. Li, Z.A. Bandar, D. McLean, An approach for measuring semantic similarity between words using multiple information sources, IEEE Transactions on Knowledge and Data Engineering. 15 (2003) 871–882.
AncSPLLi2003Strategy9 Y. Li, Z.A. Bandar, D. McLean, An approach for measuring semantic similarity between words using multiple information sources, IEEE Transactions on Knowledge and Data Engineering. 15 (2003) 871–882.
AncSPLLiuStrategy1 X.Y. Liu, Y.M. Zhou, R.S. Zheng, Measuring Semantic Similarity in Wordnet, in: Proc. of the 2007 International Conference on Machine Learning and Cybernetics, IEEE, 2007: pp. 3431–3435.
AncSPLLiuStrategy2 X.Y. Liu, Y.M. Zhou, R.S. Zheng, Measuring Semantic Similarity in Wordnet, in: Proc. of the 2007 International Conference on Machine Learning and Cybernetics, IEEE, 2007: pp. 3431–3435.
AncSPLMeng2014 L. Meng, R. Huang, J. Gu, Measuring Semantic Similarity of Word Pairs Using Path and Information Content, International Journal of Future Generation Communication & Networking. 7 (2014) 183–194.
AncSPLMubaid H. Al-Mubaid, H.A. Nguyen, Measuring Semantic Similarity Between Biomedical Concepts Within Multiple Ontologies, IEEE Trans. Syst. Man Cybern. C Appl. Rev. 39 (2009) 389–398.
AncSPLPedersenPath T. Pedersen, S.V.S. Pakhomov, S. Patwardhan, C.G. Chute, Measures of semantic similarity and relatedness in the biomedical domain, J. Biomed. Inform. 40 (2007) 288–299.
AncSPLPekarStaab V. Pekar, S. Staab, Taxonomy Learning: Factoring the Structure of a Taxonomy into a Semantic Classification Decision, in: Proceedings of the 19th International Conference on Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 2002: pp. 1–7.
AncSPLRada R. Rada, H. Mili, E. Bicknell, M. Blettner, Development and application of a metric on semantic nets, IEEE Transactions on Systems, Man, and Cybernetics. 19 (1989) 17–30.
AncSPLWeightedJiangConrath J.J. Lastra-Díaz, A. García-Serrano, A novel family of IC-based similarity measures with a detailed experimental survey on WordNet, Engineering Applications of Artificial Intelligence Journal. 46 (2015) 140–153.
AncSPLZhou Z. Zhou, Y. Wang, J. Gu, New model of semantic similarity measuring in WordNet, in: Proc. of the 3rd International Conference on Intelligent System and Knowledge Engineering (ISKE 2008), IEEE, 2008: pp. 256–261.

Semantic similarity measures based on word embeddings

HESML implements the evaluation of pre-trained word embedding models in three different file formats as detailed in the table below. These types of similarity measures were used in the reproducible experimental survey [1] to evaluate most state-of-the-art word embeddings.

SimilarityMeasureType enum Reference
EMBWordEmbedding Text file format containing all raw vectors which is implemented by most of word embeddings reported in the literature [1].
NasariEmbedding J. Camacho-Collados, M.T. Pilehvar, R. Navigli, Nasari: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities, Artif. Intell. 240 (2016) 36–64.
UKBppvEmbedding E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Paşca, A. Soroa, A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches, in: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 2009: pp. 19–27.

Contact Us

UNED - Universidad Nacional de Educación a Distancia - ETSI Informática
Juan del Rosal, 16
28040 Madrid, Spain