Knowledge Base Construction from Pre-trained Language Models (LM-KBC)

Challenge @ 22nd International Semantic Web Conference (ISWC 2023)

Download dataset v1.0 Discussions Follow us

🔔 News

  • 02.12.23: Proceedings available at
  • 25.08.23: Winning systems in each track announced.
  • 09.08.23: Final deadline extension to August 14, 11:59 pm CEST time, for both system and paper submissions.
  • 29.07.23: Deadline extended to August 10 for system and prediction submission, and 11 August for paper submission.
  • 26.07.23: Test data subject entities have been released on the GitHub repo. Please do a git pull for getting the test dataset and updated script. Submit your predictions on CodaLab to get your scores.
  • 11.07.23: Submit your validation data predictions on CodaLab to get a score now (this is optional and test data leaderboard will be separate and released later).
  • 10.07.23: New baseline (GPT3 + Wikidata NED) added to repository.
  • 22.05.23: v1.0 of dataset (train and validation splits) released.
  • 17.05.23: Test output/system submission deadline extended to August 2, 2023. Take time to submit your strongest systems!

Task Description

Pretrained language models (LMs) like chatGPT have advanced a range of semantic tasks and have also shown promise for knowledge extraction from the models itself. Although several works have explored this ability in a setting called probing or prompting, the viability of knowledge base construction from LMs remains underexplored. In the 2nd edition of this challenge, we invite participants to build actual disambiguated knowledge bases from LMs, for given subjects and relations. In crucial difference to existing probing benchmarks like LAMA (Petroni et al., 2019), we make no simplifying assumptions on relation cardinalities, i.e., a subject-entity can stand in relation with zero, one, or many object-entities. Furthermore, submissions need to go beyond just ranking predicted surface strings and materialize disambiguated entities in the output, which will be evaluated using established KB metrics of precision and recall.

Formally, given the input subject-entity (s) and relation (r), the task is to predict all the correct object-entities ({o1, o2, ..., ok}) using LM probing.

The challenge comes with two tracks:

  • Track 1: a small-model track with low computational requirements (<1 billion parameters)
  • Track 2: an open track, where participants can use any LM of their choice
Track 1: Small-model track (<1 billion parameters)
Participants are free to use any pretrained LM containing at most 1 billion parameters. This includes, for instance, BERT, BART, GPT-2, and variants of OPT. The input tuples can be paraphrased through prompt engineering techniques (e.g., AutoPrompt,LPAQA), and participants can also use prompt ensembles for better output generation. However, using context (e.g., verbalizing tuples using supporting sentences) is not allowed in this track.

Track 2: Open track
In the open track, the task is the same as in the small-model track. However,

🏆 Winners

Track System Avg. Precision Avg. Recall Avg. F1-score
1 Expanding the Vocabulary of BERT for Knowledge Base Construction
Dong Yang, XU Wang, Remzi Celebi
0.395 0.393 0.323
2 Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata
Bohui Zhang, Ioannis Reklos, Nitisha Jain, Albert Meroño-Peñuela, Elena Simperl
0.715 0.726 0.701


We release a dataset (train and validation) for a diverse set of 21 relations, each covering a different set of subject-entities and along with complete list ground truth object-entities per subject-relation-pair. The total number of object-entities varies for a given subject-relation pair. The train dataset subject-relation-object triples can be used for training or probing the language models in any form, while validation can be used for hyperparameter tuning. Futher details on the relations are given below:

Relation Description Example
BandHasMember band (s) has a member (o)
(Q941293, N.E.R.D., BandHasMember, [Q14313, Q706641, Q2584176], [Pharrell Williams, Chad Hugo, Shay Haley])
CityLocatedAtRiver City (s) is located at the river (o)
(Q365, Cologne, CityLocatedAtRiver, [Q584], [Rhine])
CompanyHasParentOrganisation Company (s) has another company (o) as its parent organization
(Q39898, NSU, CompanyHasParentOrganisation, [Q246], [Volkswagen])
CompoundHasParts chemical compound (s) consists of an element (o)
(Q150843, Hexadecane, [Q623, Q556], [carbon, hydrogen])
CountryBordersCountry country (s) shares a land border with another country (o)
(Q1020, Malawi, CountryBordersCountry, [Q924, Q953, Q1029], [Tanzania, Zambia, Mozambique])
CountryHasOfficialLanguage country (s) has an official language (o)
(Q334, Singapore, CountryHasOfficialLanguage, [Q1860, Q5885, Q9237, Q727694], [English, Tamil, Malay, Standard Mandarin])
CountryHasStates country (s) has the state (o)
(Q702, Federated States of Micronesia, CountryHasStates, [Q221684, Q1785093, Q7771127, Q11342951], [Chuuk, Kosrae State, Pohnpei State, Yap State])
FootballerPlaysPosition Footballer (s) plays in the position (o)
(Q455462, Antoine Griezmann, FootballerPlaysPosition, [Q280658], [forward])
PersonCauseOfDeath person (s) died due to a cause (o)
(Q5238609, David Plotz, PersonCauseOfDeath, [ ], [ ])
PersonHasAutobiography person (s) has the autobiography (o)
(Q6279, Joe Biden, PersonHasAutobiography, [Q100221747], [Promise Me Dad])
PersonHasEmployer person (s) is or was employed by a company (o)
(Q11476943, Yōichi Shimada, PersonHasEmployer, [Q4845464], [Fukui Prefectural University])
PersonHasNobelPrize person (s) has the nobel prize (o)
(Q65989, Wolfgang Pauli, PersonHasNobelPrize, [Q38104], [Nobel Prize in Physics])
PersonHasNumberOfChildren person (s) has number of children (o)
(Q7599711, Stanley Johnson, PersonHasNumberOfChildren, [6], [6])
PersonHasPlaceOfDeath person (s) died at a location (o)
(Q4369225, Alina Pokrovskaya, PersonHasPlaceOfDeath, [ ], [ ])
PersonHasProfession person (s) held a profession (o)
(Q468043, Jon Elster, PersonHasProfession, [Q121594, Q188094, Q1238570, Q2306091, Q4964182], [professor, economist, political scientist, sociologist, philosopher])
PersonHasSpouse person (s) has spouse (o)
(Q5111202, Chrissy Teigen, PersonHasSpouse, [Q44857], [John Legend])
PersonPlaysInstrument person (s) plays an instrument (o)
(Q15994935, Emma Blackery, PersonPlaysInstrument, [Q6607, Q61285, Q17172850], [guitar, ukulele, voice])
PersonSpeaksLanguage person (s) speaks the language (o)
(Q18958964, Witold Andrzejewski, PersonSpeaksLanguage, [Q809], [Polish])
RiverBasinsCountry river (s) basins in a country (o)
(Q45403, Brahmaputra River, RiverBasinsCountry, [Q148, Q668], [People's Republic of China, India])
SeriesHasNumberOfEpisodes series (s) has (o) number of episodes
(Q12403564, Euphoria, SeriesHasNumberOfEpisodes, [10], [10])
StateBordersState state (s) shares a border with another state (o)
(Q1204, Illinois, StateBordersState, [Q1166, Q1415, Q1537, Q1546, Q1581, Q1603], [Michigan, Indiana, Wisconsin, Iowa, Missouri, Kentucky])

Each row in the dataset files constitutes of (1) subject-entity-id, (2) subject-entity, (3) list of all possible object-entities-id, (4) list of all possible object-entities and (5) relation. Please read the data format section for more details. When the subjects have zero valid objects, the ground truth is an empty list, e.g., (Q2283, Microsoft, [ ], [ ], CompanyHasParentOrganisation).

Dataset Characteristics

For each of the 21 relations, the number of unique subject-entities in the train, dev, and test are given in the GitHub repo. The minimum and maximum number of object-entities for each relation is given below. If the minimum value is 0, then the subject-entity can have zero valid object-entities for that relation.

Relation Train Val Test
BandHasMember [2, 15] [2, 16] [2, 16]
CityLocatedAtRiver [1, 9] [1, 5] [1, 9]
CompanyHasParentOrganisation [0, 5] [0, 3] [0, 5]
CompoundHasParts [2, 6] [2, 5] [2, 6]
CountryBordersCountry [1, 17] [1, 10] [1, 17]
CountryHasOfficialLanguage [1, 16] [1, 11] [1, 16]
CountryHasStates [1, 20] [1, 20] [1, 20]
FootballerPlaysPosition [1, 2] [1, 3] [1, 2]
PersonCauseOfDeath [0, 1] [0, 3] [0, 1]
PersonHasAutobiography [1, 4] [1, 4] [1, 4]
PersonHasEmployer [1, 6] [1, 13] [1, 6]
PersonHasNobelPrize [0, 1] [0, 2] [0, 1]
PersonHasNumberOfChildren [1, 1] [1, 2] [1, 1]
PersonHasPlaceOfDeath [0, 1] [0, 1] [0, 1]
PersonHasProfession [1, 11] [1, 12] [1, 11]
PersonHasSpouse [1, 3] [1, 3] [1, 3]
PersonPlaysInstrument [1, 8] [1, 8] [1, 8]
PersonSpeaksLanguage [1, 10] [1, 4] [1, 10]
RiverBasinsCountry [1, 9] [1, 5] [1, 9]
SeriesHasNumberOfEpisodes [1, 2] [1, 1] [1, 2]
StateBordersState [1, 16] [1, 12] [1, 16]

Task Evaluation

For each test instance, predictions are evaluated by calculating precision and recall against ground-truth values. The final macro-averaged F1-score is used to rank the participating systems.


We provide several baselines:
  • Standard prompt for HuggingFace models with Wikidata default disambiguation: These baselines can be instantiated with various HuggingFace models (e.g., BERT, OPT), generate entity surface forms, and use the Wikidata entity disambiguation API to generate IDs.
  • Few-shot GPT-3 directly predicting IDs: This baseline uses a few samples to instruct GPT-3 to directly predict Wikidata IDs.
  • Few-shot GPT-3 w/ NED: Like above, but predicting surface forms that are disambiguated via Wikidata's default disambiguation.
Baseline performance:
Method Avg. Precision Avg. Recall Avg. F1-score
GPT-3 NED (Curie model) 0.308 0.210 0.218
GPT-3 IDs directly (Curie model) 0.126 0.060 0.061
BERT 0.368 0.161 0.142

Submission Details

Participants are required to submit:

  1. A system implementing the LM probing approach, uploaded to a public GitHub repo
  2. The output for the test dataset subject entites, in the same GitHub repo
  3. A system description in PDF format (5-12 pages, CEUR workshop style), mentioning the GitHub repo.

The PDF must be uploaded on OpenReview. Additionally, there is an optional CodaLab live leaderboard that participants can submit to. The test dataset is initially hidden to preserve the integrity of results, and will be released 1 week before the final deadline. The output files for the test subject-entities must be formatted as described here, and submitted along with the system and its description. The top performing systems will get an opportunity to present their ideas and results during conference, and the challenge proceedings will be submitted to CEUR publication system.


Sneha Singhania
MPI Informatics
Simon Razniewski
Bosch Center for AI
Jeff Z. Pan
University of Edinburgh
Huawei Technology R&D UK

Presentation Schedule

LM-KBC challenge session at ISWC will be held in Athens on Nov 10, 14:00-15:00 (GMT+3). Team-specific schedule is given below.

Time Team
14:00-14:10 Introduction
14:10-14:20 Track 1 Winner (Yang et al.)
14:20-14:30 Track 2 Winner (Zhang et al.)
14:30-14:40 Summary
14:40-15:00 Poster Session (Winners)
14:40-15:00 Poster Session Team 3 (Shrestha Ghosh)
14:40-15:00 Poster Session Team 4 (Li et al.)
14:40-15:00 Poster Session Team 5 (Biester et al.)
14:40-15:00 Poster Session Team 6 (Biswas et al.)
14:40-15:00 Poster Session Team 7 (Nayak et al.)

Important Dates

Activity Dates
Dataset (train and dev) release 15 April 2023
Release of test dataset 26 July 2023
Submission of test output and systems 02 August 14 August 2023
Submission of system description 09 August 14 August 2023
Winner announcement 25 August 2023
Presentations@ISWC (hybrid) early November 2023


For general questions or discussion please use the Google group.

Past Edition

1st Edition: LM-KBC 2022