Biomedical Domain Benchmark
Biomedical ontologies bring unique challenges to the ontology alignment problem. Moreover, there is an explicit interest for ontologies and ontology alignment in the domain of biomedicine. Consequently, we present a new biomedical ontology alignment testbed, which provides an important application context to the alignment research community. Due to the large sizes of biomedical ontologies, the testbed could serve as a comprehensive large ontology benchmark. Existing correspondences submitted to National Center for Biomedical Ontology (NCBO) may serve as the reference alignments for the pairs, although our analysis reveals that these maps represent just a small fraction of the total alignment that is possible between two ontologies. Consequently, new correspondences that are discovered during benchmarking may be submitted to NCBO for curation and publication. In order to create the testbed, we combed through more than 300 ontologies hosted at NCBO and OBO Foundry, and isolated a benchmark of 50 different biomedical ontology pairs. The ontology pairs are listed in the table below. Our primary criteria for including a pair in the benchmark was the presence of a sufficient amount of correspondences between the ontologies in the pair, as determined from NCBO’s BioPortal. We briefly describe the steps in creating the testbed:
- We selected ontologies, which exist in either OWL or RDF models.
- Next, we paired the ontologies and ordered the pairs by the percentage of available correspondences. This is calculated as the ratio of correspondences that exist in BioPortal for the pair of ontologies under consideration divided by the product of the number of entities in both the ontologies.
- Top 100 ontology pairs are selected, followed by ordering the pairs based on their joint sizes.
- We created 5 bins of equal sizes and randomly sampled each bin with a uniform distribution, to obtain the final 50 pairs.
Biomedical Ontology Alignment Benchmark
|NCBO ID||Ontology||Total Classes|
|1404||Uber anatomy ontology||7294|
|1017||FlyBase Controlled Vocabulary||821|
|1038||Plant Growth and Development Stage||282|
|1007||Chemical entities of biological interest||31470|
|1013||eVOC (Expressed Sequence Annotation for Humans)||2274|
|1090||Amphibian gross anatomy||1603|
|1021||Human developmental anatomy||2314|
|1065||Tick gross anatomy||628|
|1051||Zebrafish anatomy and development||2788|
|1000||Mouse adult gross anatomy||2982|
|1068||Subcellular Anatomy Ontology (SAO)||821|
|1123||Ontology for Biomedical Investigations||3537|
|1063||Common Anatomy Reference Ontology||50|
|1568||Anatomical Entity Ontology||238|
|1001||Cereal plant gross anatomy||1270|
|1574||vertebrate Homologous Organ Groups||1184|
|1110||Teleost Anatomy Ontology||3039|
|1005||BRENDA tissue / enzyme source||5139|
|1022||Human developmental anatomy||8340|
|1027||Medaka fish anatomy and development||4358|
|1362||Hymenoptera Anatomy Ontology||1930|
|1015||Drosophila gross anatomy||7797|
|1095||Xenopus anatomy and development||1041|
|1030||Mosquito gross anatomy||1864|