Knowledge Base Construction from Pre-trained Language Models (LM-KBC)

Challenge @ 22nd International Semantic Web Conference (ISWC 2023)

Download dataset v1.0 Discussions Follow us


  • 02.6.23 Robert Bosch GmbH has signaled that they would likely sponsor a best paper award over 500 Euro.
  • 22.5.23: v1.0 of dataset (train and validation splits) released.
  • 17.5.23: Test output/system submission deadline extended to August 2, 2023. Take time to submit your strongest systems!

Task Description

Pretrained language models (LMs) like chatGPT have advanced a range of semantic tasks and have also shown promise for knowledge extraction from the models itself. Although several works have explored this ability in a setting called probing or prompting, the viability of knowledge base construction from LMs remains underexplored. In the 2nd edition of this challenge, we invite participants to build actual disambiguated knowledge bases from LMs, for given subjects and relations. In crucial difference to existing probing benchmarks like LAMA (Petroni et al., 2019), we make no simplifying assumptions on relation cardinalities, i.e., a subject-entity can stand in relation with zero, one, or many object-entities. Furthermore, submissions need to go beyond just ranking predicted surface strings and materialize disambiguated entities in the output, which will be evaluated using established KB metrics of precision and recall.

Formally, given the input subject-entity (s) and relation (r), the task is to predict all the correct object-entities ({o1, o2, ..., ok}) using LM probing.

The challenge comes with two tracks:

  • Track 1: a small-model track with low computational requirements (<1 billion parameters)
  • Track 2: an open track, where participants can use any LM of their choice
Track 1: Small-model track (<1 billion parameters)
Participants are free to use any pretrained LM containing at most 1 billion parameters. This includes, for instance, BERT, BART, GPT-2, and variants of OPT. The input tuples can be paraphrased through prompt engineering techniques (e.g., AutoPrompt,LPAQA), and participants can also use prompt ensembles for better output generation. However, using context (e.g., verbalizing tuples using supporting sentences) is not allowed in this track.

Track 2: Open track
In the open track, the task is the same as in the small-model track. However,


We release a dataset (train and validation) for a diverse set of 21 relations, each covering a different set of subject-entities and along with complete list ground truth object-entities per subject-relation-pair. The total number of object-entities varies for a given subject-relation pair. The train dataset subject-relation-object triples can be used for training or probing the language models in any form, while validation can be used for hyperparameter tuning. Futher details on the relations are given below:

Relation Description Example
BandHasMember band (s) has a member (o)
(Q941293, N.E.R.D., BandHasMember, [Q14313, Q706641, Q2584176], [Pharrell Williams, Chad Hugo, Shay Haley])
CityLocatedAtRiver City (s) is located at the river (o)
(Q365, Cologne, CityLocatedAtRiver, [Q584], [Rhine])
CompanyHasParentOrganisation Company (s) has another company (o) as its parent organization
(Q39898, NSU, CompanyHasParentOrganisation, [Q246], [Volkswagen])
CompoundHasParts chemical compound (s) consists of an element (o)
(Q150843, Hexadecane, [Q623, Q556], [carbon, hydrogen])
CountryBordersCountry country (s) shares a land border with another country (o)
(Q1020, Malawi, CountryBordersCountry, [Q924, Q953, Q1029], [Tanzania, Zambia, Mozambique])
CountryHasOfficialLanguage country (s) has an official language (o)
(Q334, Singapore, CountryHasOfficialLanguage, [Q1860, Q5885, Q9237, Q727694], [English, Tamil, Malay, Standard Mandarin])
CountryHasStates country (s) has the state (o)
(Q702, Federated States of Micronesia, CountryHasStates, [Q221684, Q1785093, Q7771127, Q11342951], [Chuuk, Kosrae State, Pohnpei State, Yap State])
FootballerPlaysPosition Footballer (s) plays in the position (o)
(Q455462, Antoine Griezmann, FootballerPlaysPosition, [Q280658], [forward])
PersonCauseOfDeath person (s) died due to a cause (o)
(Q5238609, David Plotz, PersonCauseOfDeath, [ ], [ ])
PersonHasAutobiography person (s) has the autobiography (o)
(Q6279, Joe Biden, PersonHasAutobiography, [Q100221747], [Promise Me Dad])
PersonHasEmployer person (s) is or was employed by a company (o)
(Q11476943, Yƍichi Shimada, PersonHasEmployer, [Q4845464], [Fukui Prefectural University])
PersonHasNoblePrize person (s) has the noble prize (o)
(Q65989, Wolfgang Pauli, PersonHasNoblePrize, [Q38104], [Nobel Prize in Physics])
PersonHasNumberOfChildren person (s) has number of children (o)
(Q7599711, Stanley Johnson, PersonHasNumberOfChildren, [6], [6])
PersonHasPlaceOfDeath person (s) died at a location (o)
(Q4369225, Alina Pokrovskaya, PersonHasPlaceOfDeath, [ ], [ ])
PersonHasProfession person (s) held a profession (o)
(Q468043, Jon Elster, PersonHasProfession, [Q121594, Q188094, Q1238570, Q2306091, Q4964182], [professor, economist, political scientist, sociologist, philosopher])
PersonHasSpouse person (s) has spouse (o)
(Q5111202, Chrissy Teigen, PersonHasSpouse, [Q44857], [John Legend])
PersonPlaysInstrument person (s) plays an instrument (o)
(Q15994935, Emma Blackery, PersonPlaysInstrument, [Q6607, Q61285, Q17172850], [guitar, ukulele, voice])
PersonSpeaksLanguage person (s) speaks the language (o)
(Q18958964, Witold Andrzejewski, PersonSpeaksLanguage, [Q809], [Polish])
RiverBasinsCountry river (s) basins in a country (o)
(Q45403, Brahmaputra River, RiverBasinsCountry, [Q148, Q668], [People's Republic of China, India])
SeriesHasNumberOfEpisodes series (s) has (o) number of episodes
(Q12403564, Euphoria, SeriesHasNumberOfEpisodes, [10], [10])
StateBordersState state (s) shares a border with another state (o)
(Q1204, Illinois, StateBordersState, [Q1166, Q1415, Q1537, Q1546, Q1581, Q1603], [Michigan, Indiana, Wisconsin, Iowa, Missouri, Kentucky])

Each row in the dataset files constitutes of (1) subject-entity-id, (2) subject-entity, (3) list of all possible object-entities-id, (4) list of all possible object-entities and (5) relation. Please read the data format section for more details. When the subjects have zero valid objects, the ground truth is an empty list, e.g., (Q2283, Microsoft, [ ], [ ], CompanyHasParentOrganisation).

Dataset Characteristics

For each of the 21 relations, the number of unique subject-entities in the train, dev, and test are given in the GitHub repo. The minimum and maximum number of object-entities for each relation is given below. If the minimum value is 0, then the subject-entity can have zero valid object-entities for that relation.

Relation Train Val Test
BandHasMember [2, 15] [2, 16] [2, 16]
CityLocatedAtRiver [1, 9] [1, 5] [1, 9]
CompanyHasParentOrganisation [0, 5] [0, 3] [0, 5]
CompoundHasParts [2, 6] [2, 5] [2, 6]
CountryBordersCountry [1, 17] [1, 10] [1, 17]
CountryHasOfficialLanguage [1, 16] [1, 11] [1, 16]
CountryHasStates [1, 20] [1, 20] [1, 20]
FootballerPlaysPosition [1, 2] [1, 3] [1, 2]
PersonCauseOfDeath [0, 1] [0, 3] [0, 1]
PersonHasAutobiography [1, 4] [1, 4] [1, 4]
PersonHasEmployer [1, 6] [1, 13] [1, 6]
PersonHasNoblePrize [0, 1] [0, 2] [0, 1]
PersonHasNumberOfChildren [1, 1] [1, 2] [1, 1]
PersonHasPlaceOfDeath [0, 1] [0, 1] [0, 1]
PersonHasProfession [1, 11] [1, 12] [1, 11]
PersonHasSpouse [1, 3] [1, 3] [1, 3]
PersonPlaysInstrument [1, 8] [1, 8] [1, 8]
PersonSpeaksLanguage [1, 10] [1, 4] [1, 10]
RiverBasinsCountry [1, 9] [1, 5] [1, 9]
SeriesHasNumberOfEpisodes [1, 2] [1, 1] [1, 2]
StateBordersState [1, 16] [1, 12] [1, 16]

Task Evaluation

For each test instance, predictions are evaluated by calculating precision and recall against ground-truth values. The final macro-averaged F1-score is used to rank the participating systems.


We provide several baselines:
  • Standard prompt for HuggingFace models with Wikidata default disambiguation: These baselines can be instantiated with various HuggingFace models (e.g., BERT, OPT), generate entity surface forms, and use the Wikidata entity disambiguation API to generate IDs.
  • Few-shot GPT-3 directly predicting IDs: This baseline uses a few samples to instruct GPT-3 to directly predict Wikidata IDs.
Baseline performance:
Method Avg. Precision Avg. Recall Avg. F1-score
GPT-3 IDs directly (Curie model) 0.126 0.060 0.061
BERT 0.368 0.161 0.142


Sneha Singhania
MPI Informatics
Simon Razniewski
Bosch Center for AI
Jeff Z. Pan
University of Edinburgh
Huawei Technology R&D UK

Important Dates

Activity Dates
Dataset (train and dev) release 15 April 2023
Release of test dataset 26 July 2023
Submission of test output and systems 02 August 2023
Submission of system description 09 August 2023
Winner announcement 23 August 2023
Presentations@ISWC (hybrid) early November 2023


For general questions or discussion please use the Google group.

Past Edition

1st Edition: LM-KBC 2022