Knowledge Base Construction from Pre-trained Language Models (LM-KBC)

Challenge @ 21st International Semantic Web Conference (ISWC 2022)

Download dataset Discussions Follow us

🔔 News

  • 16-08-2022: Winning systems in each track announced.
  • 04-07-2022: Final deadline extension until July 26, 23:59:59 AoE time.
  • 11-07-2022: Test subject entities have been released here. Submit your predictions on CodaLab to get a confirmed score now (optional, final leaderboard will be split by track, multiple submissions possible).
  • 04-07-2022: Submission deadline extended to July 21 (to accommodate below changes).
  • 02-07-2022: Data format and evaluation scripts have been updated, please pull again (and read our announcement here).

Task Description

Pre-trained language models (LMs) have advanced a range of semantic tasks and have also shown promise for knowledge extraction from the models itself. Although several works have explored this ability in a setting called LM probing using prompting or prompt-based learning (Liu et al., 2021), the viability of knowledge base construction from LMs has not yet been explored. In this challenge, we invite participants to build actual knowledge bases from LMs, for given subjects and relations. In crucial difference to existing probing benchmarks like LAMA (Petroni et al., 2019), we make no simplifying assumptions on relation cardinalities, i.e., a subject-entity can stand in relation with zero, one, or many object-entities. Furthermore, submissions need to go beyond just ranking the predictions, and make concrete decisions on materializing outputs. The outputs are evaluated using the established F1-score KB metric.

Formally, given the input subject-entity (s) and relation (r), the task is to predict all the correct object-entities ({o1, o2, ..., ok}) using LM probing.

The challenge comes with two tracks:

  • Track 1: a BERT (BERT-base or BERT-large) track with low computational requirements.
  • Track 2: an open track, where participants can use any LM (e.g., RoBERTa, Transformer-XL, GPT-2, BART etc.) of their choice.

🏆 Winners

Track System Avg. Precision Avg. Recall Avg. F1-score
1 Task-specific Pre-training and Prompt Decomposition for Knowledge Graph Population with Language Models
Tianyi Li, Wenyu Huang, Nikos Papasarantopoulos, Pavlos Vougiouklis, Jeff Z. Pan
0.766 0.566 0.550
2 Prompting as Probing: Using Language Models for Knowledge Base Construction
Dimitrios Alivanistos, Selene Baez Santamaria, Michael Cochez, Jan-Christoph Kalo, Thiviyan Thanapalasingam, Emile van Krieken
0.798 0.690 0.676


We release a dataset (train and development) for a diverse set of 12 relations, each covering a different set of subject-entities and along with complete list ground truth object-entities per subject-relation-pair. The total number of object-entities varies for a given subject-relation pair. The train dataset subject-relation-object triples can be used for training or probing the language models in any form, while development can be used for hyperparameter tuning. Futher details on the relations are given below:

Relation Description Examples
CountryBordersWithCountry country (s) shares a land border with another country (o)
(Canada, CountryBordersWithCountry, [[USA, United States of America]])
(Norway, CountryBordersWithCountry, [Finland, Sweden, Russian])
(Mauritius, CountryBordersWithCountry, [])
CountryOfficialLanguage country (s) has an official language (o)
(Belarus, CountryOfficialLanguage, [Belarusian, Russian]) 
(Seychelles, CountryOfficialLanguage, [French, English, [Seychellois Creole, Creole]])
(Bosnia and Herzegovina, CountryOfficialLanguage, [Bosnian, Serbian, Croatian])
RiverBasinsCountry river (s) basins in a country (o)
(Elbe, RiverBasinsCountry, [Germany, Poland, Austria, [Czech Republic, Czechia]])
(Drin, RiverBasinsCountry, [Albania])
(Chari, RiverBasinsCountry, [[Central African Republic, Africa], Cameroon, Chad])
StateSharesBorderState state (s) of a country shares a land border with another state (o)
(Oregon, StateSharesBorderState, [California, Idaho, Washington, Nevada]) 
(Florida, StateSharesBorderState, [Georgia])
(Mexico city, StateSharesBorderState, [[State of Mexico, Mexico], Morelos])
ChemicalCompoundElement chemical compound (s) consists of an element (o)
(Water, ChemicalCompoundElement, [Hydrogen, Oxygen]) 
(Borax, ChemicalCompoundElement, [Boron, Oxygen, Sodium])
(Calomel, ChemicalCompoundElement, [Mercury, Chlorine])
PersonInstrument person (s) plays an instrument (o)
(Chester Bennington, PersonInstrument, []) 
(Ringo Starr, PersonInstrument, [Guitar, Drum, [Percussion Instrument, Percussion]]) 
(Leeteuk, PersonInstrument, [Piano])
PersonLanguage person (s) speaks in a language (o)
(Bruno Mars, PersonLanguage, [Spanish, English]) 
(Aamir Khan, PersonLanguage, [Hindi, Urdu, English])
(Alicia Keys, PersonLanguage, [English])
PersonEmployer person (s) is or was employed by a company (o)
(Susan Wojcicki, PersonEmployer, [Google]) 
(Steve Wozniak, PersonEmployer, [[Apple Inc, Apple], [Hewlett-Packard, HP], University of Technology Sydney, [Atari, Atari Inc]])
(Jacqueline Novogratz, PersonEmployer, [UNICEF, World Bank, Chase Bank])
PersonProfession person (s) held a profession (o)
(Nicolas Sarkozy, PersonProfession, [Lawyer, Politician, Statesperson])
(Shakira, PersonProfession, [[Singer-Songwriter, Singer, Songwriter], Guitarist])
(Eminem, PersonProfession, [Rapper])
PersonPlaceOfDeath person (s) died at a location (o)
(Elvis Presley, PersonPlaceOfDeath, [Graceland]) 
(Kofi Annan, PersonPlaceOfDeath, [Bern]) 
(Angela Merkel, PersonPlaceOfDeath, []) 
PersonCauseOfDeath person (s) died due to a cause (o)
(John lewis, PersonCauseOfDeath, [[Pancreatic Cancer, Cancer]]) 
(Pierre Nkurunziza, PersonCauseOfDeath, [[Covid-19, Covid]]) 
(Neil deGrasse Tyson, PersonCauseOfDeath, [])
CompanyParentOrganization company (s) has another company (o) as its parent organization
(Apple Inc, CompanyParentOrganization, []) 
(Abarth, CompanyParentOrganization, [[Stellantis Italy, Stellantis]])
(Hitachi, CompanyParentOrganization, [])

Each row in the dataset files constitutes one triple, of (1) subject-entity, (2) relation, and (3) list of all possible object-entities. For (3), we sometimes provide multiple aliases for an object-entity, where outputting any one of them is sufficient for that entity. In particular, to facilitate usage of LMs like BERT (which are constrained by single-token predictions), we provide a valid single-token form for multi-token object-entities, wherever such a form is meaningful. Please read the Data format section for more details. When the subjects have zero valid objects, the ground truth is an empty list, e.g., (Apple Inc., CompanyParentOrganization, []).

Dataset Characteristics

For each of the 12 relations, the number of unique subject-entities in the train, dev, and test are 100, 50, and 50. The minimum and maximum number of object-entities for each relation is given below. If the minimum value is 0, then the subject-entity can have zero valid object-entities for that relation.

Relation Train Dev Test
CountryBordersWithCountry [0, 17] [0, 14] [0, 11]
CountryOfficialLanguage [1, 4] [1, 15] [1, 11]
StateSharesBorderState [1, 14] [1, 15] [1, 14]
RiverBasinsCountry [1, 6] [1, 10] [1, 9]
ChemicalCompoundElement [2, 6] [2, 6] [2, 6]
PersonLanguage [1, 6] [1, 5] [1, 7]
PersonProfession [1, 23] [1, 19] [1, 20]
PersonInstrument [0, 7] [0, 14] [0, 7]
PersonEmployer [1, 8] [1, 8] [1, 8]
PersonPlaceOfDeath [0, 1] [0, 1] [0, 1]
PersonCauseOfDeath [0, 3] [0, 2] [0, 2]
CompanyParentOrganization [0, 5] [0, 1] [0, 3]

Task Evaluation

We use a standard KBC evaluation metric, the macro-averaged F1-score (based on the combination of precision and recall), to compare the predicted object-entities with true object-entities on the hidden test dataset. We release a baseline implementation and evaluation script. The baseline model probes the BERT language model using a sample prompt like "China shares border with [MASK]", and selects object-entities predicted in the [MASK] position with greater than or equal to 0.5 likelihood as outputs. This baseline achieves 31.08% F1-score (averaged across all subject-entities and relations) on the hidden test dataset. Participants can use the evaluation script to compute the F1-score for assessing the performance of their systems. For more details on LM-probing and the baseline method, please check the released notebook.

Submission Details

Participants are required to submit:

  1. A system implementing the LM probing approach
  2. The output for the test dataset subject-entites
  3. A system description in PDF format (5-12 pages, LNCS style).

All materials must be uploaded on Easychair. Additionally, there will be an optional CodaLab live leaderboard that participants can submit to. The test dataset is initially hidden to preserve the integrity of results, and will be released 10 days before the final deadline. The output files for the test subject-entities must be formatted as described here, and submitted along with the system and its description. The top performing systems will get an opportunity to present their ideas and results during the ISWC 2022 conference, and the challenge proceedings will be submitted to CEUR publication system.


Sneha Singhania
Max Planck Institute for Informatics
Tuan-Phong Nguyen
Max Planck Institute for Informatics
Simon Razniewski
Max Planck Institute for Informatics

Important Dates

Activity Dates
Dataset (train and dev) release 17 May 2022
Test subject release 11 July 2022
System + test dataset predictions + system description submission 14 July 26 July 2022
Winner announcement 16 August 2022
ISWC invitations 16 August 2022
ISWC presentations 23-27 October 2022


For general questions or discussion please use the Google group.