News
- 02.6.23 Robert Bosch GmbH has signaled that they would likely sponsor a best paper award over 500 Euro.
- 22.5.23: v1.0 of dataset (train and validation splits) released.
- 17.5.23: Test output/system submission deadline extended to August 2, 2023. Take time to submit your strongest systems!
Task Description
Pretrained language models (LMs) like chatGPT have advanced a range of semantic tasks and have also shown promise for knowledge extraction from the models itself. Although several works have explored this ability in a setting called probing or prompting, the viability of knowledge base construction from LMs remains underexplored. In the 2nd edition of this challenge, we invite participants to build actual disambiguated knowledge bases from LMs, for given subjects and relations. In crucial difference to existing probing benchmarks like LAMA (Petroni et al., 2019), we make no simplifying assumptions on relation cardinalities, i.e., a subject-entity can stand in relation with zero, one, or many object-entities. Furthermore, submissions need to go beyond just ranking predicted surface strings and materialize disambiguated entities in the output, which will be evaluated using established KB metrics of precision and recall.Formally, given the input subject-entity (s
) and relation
(r
), the task is to predict all the correct object-entities
({o1
, o2
, ..., ok
})
using LM probing.
The challenge comes with two tracks:
- Track 1: a small-model track with low computational requirements (<1 billion parameters)
- Track 2: an open track, where participants can use any LM of their choice
Track 1: Small-model track (<1 billion parameters)
Participants are free to use any pretrained LM containing at most 1 billion parameters. This includes, for instance, BERT, BART, GPT-2, and variants of OPT. The input tuples can be paraphrased through prompt engineering techniques (e.g., AutoPrompt,LPAQA), and participants can also use prompt ensembles for better output generation. However, using context (e.g., verbalizing tuples using supporting sentences) is not allowed in this track.Track 2: Open track
In the open track, the task is the same as in the small-model track. However,- LMs of any size, e.g., GPT-3, can be probed.
- Use of context is allowed for LM-generation, e.g., context retrieval like in REALM and factual predictions with context.
Dataset
We release a dataset (train and validation) for a diverse set of 21 relations, each covering a different set of subject-entities and along with complete list ground truth object-entities per subject-relation-pair. The total number of object-entities varies for a given subject-relation pair. The train dataset subject-relation-object triples can be used for training or probing the language models in any form, while validation can be used for hyperparameter tuning. Futher details on the relations are given below:
Relation | Description | Example |
---|---|---|
BandHasMember | band (s ) has a member (o )
|
Show/Hide
(Q941293, N.E.R.D., BandHasMember, [Q14313, Q706641, Q2584176], [Pharrell Williams, Chad Hugo, Shay Haley]) |
CityLocatedAtRiver | City (s ) is located at the river (o )
|
Show/Hide
(Q365, Cologne, CityLocatedAtRiver, [Q584], [Rhine]) |
CompanyHasParentOrganisation | Company (s ) has another company (o ) as its parent organization
|
Show/Hide
(Q39898, NSU, CompanyHasParentOrganisation, [Q246], [Volkswagen]) |
CompoundHasParts | chemical compound (s ) consists of an element (o )
|
Show/Hide
(Q150843, Hexadecane, [Q623, Q556], [carbon, hydrogen]) |
CountryBordersCountry | country (s ) shares a land border with another country (o )
|
Show/Hide
(Q1020, Malawi, CountryBordersCountry, [Q924, Q953, Q1029], [Tanzania, Zambia, Mozambique]) |
CountryHasOfficialLanguage | country (s ) has an official language (o )
|
Show/Hide
(Q334, Singapore, CountryHasOfficialLanguage, [Q1860, Q5885, Q9237, Q727694], [English, Tamil, Malay, Standard Mandarin]) |
CountryHasStates | country (s ) has the state (o )
|
Show/Hide
(Q702, Federated States of Micronesia, CountryHasStates, [Q221684, Q1785093, Q7771127, Q11342951], [Chuuk, Kosrae State, Pohnpei State, Yap State]) |
FootballerPlaysPosition | Footballer (s ) plays in the position (o )
|
Show/Hide
(Q455462, Antoine Griezmann, FootballerPlaysPosition, [Q280658], [forward]) |
PersonCauseOfDeath | person (s ) died due to a cause (o )
|
Show/Hide
(Q5238609, David Plotz, PersonCauseOfDeath, [ ], [ ]) |
PersonHasAutobiography | person (s ) has the autobiography (o )
|
Show/Hide
(Q6279, Joe Biden, PersonHasAutobiography, [Q100221747], [Promise Me Dad]) |
PersonHasEmployer | person (s ) is or was employed by a company (o )
|
Show/Hide
(Q11476943, YĆichi Shimada, PersonHasEmployer, [Q4845464], [Fukui Prefectural University]) |
PersonHasNoblePrize | person (s ) has the noble prize (o )
|
Show/Hide
(Q65989, Wolfgang Pauli, PersonHasNoblePrize, [Q38104], [Nobel Prize in Physics]) |
PersonHasNumberOfChildren | person (s ) has number of children (o )
|
Show/Hide
(Q7599711, Stanley Johnson, PersonHasNumberOfChildren, [6], [6]) |
PersonHasPlaceOfDeath | person (s ) died at a location (o )
|
Show/Hide
(Q4369225, Alina Pokrovskaya, PersonHasPlaceOfDeath, [ ], [ ]) |
PersonHasProfession | person (s ) held a profession (o )
|
Show/Hide
(Q468043, Jon Elster, PersonHasProfession, [Q121594, Q188094, Q1238570, Q2306091, Q4964182], [professor, economist, political scientist, sociologist, philosopher]) |
PersonHasSpouse | person (s ) has spouse (o )
|
Show/Hide
(Q5111202, Chrissy Teigen, PersonHasSpouse, [Q44857], [John Legend]) |
PersonPlaysInstrument | person (s ) plays an instrument (o )
|
Show/Hide
(Q15994935, Emma Blackery, PersonPlaysInstrument, [Q6607, Q61285, Q17172850], [guitar, ukulele, voice]) |
PersonSpeaksLanguage | person (s ) speaks the language (o )
|
Show/Hide
(Q18958964, Witold Andrzejewski, PersonSpeaksLanguage, [Q809], [Polish]) |
RiverBasinsCountry | river (s ) basins in a country (o )
|
Show/Hide
(Q45403, Brahmaputra River, RiverBasinsCountry, [Q148, Q668], [People's Republic of China, India]) |
SeriesHasNumberOfEpisodes | series (s ) has (o ) number of episodes
|
Show/Hide
(Q12403564, Euphoria, SeriesHasNumberOfEpisodes, [10], [10]) |
StateBordersState | state (s ) shares a border with another state (o )
|
Show/Hide
(Q1204, Illinois, StateBordersState, [Q1166, Q1415, Q1537, Q1546, Q1581, Q1603], [Michigan, Indiana, Wisconsin, Iowa, Missouri, Kentucky]) |
Each row in the dataset files constitutes of (1) subject-entity-id, (2) subject-entity, (3) list of all possible object-entities-id, (4) list of all possible object-entities and (5) relation. Please read the data format section for more details. When the subjects have zero valid objects, the ground truth is an empty list, e.g., (Q2283, Microsoft, [ ], [ ], CompanyHasParentOrganisation).
Dataset Characteristics
For each of the 21 relations, the number of unique subject-entities in the train, dev, and test are given in the GitHub repo. The minimum and maximum number of object-entities for each relation is given below. If the minimum value is 0, then the subject-entity can have zero valid object-entities for that relation.
Relation | Train | Val | Test |
---|---|---|---|
BandHasMember | [2, 15] | [2, 16] | [2, 16] |
CityLocatedAtRiver | [1, 9] | [1, 5] | [1, 9] |
CompanyHasParentOrganisation | [0, 5] | [0, 3] | [0, 5] |
CompoundHasParts | [2, 6] | [2, 5] | [2, 6] |
CountryBordersCountry | [1, 17] | [1, 10] | [1, 17] |
CountryHasOfficialLanguage | [1, 16] | [1, 11] | [1, 16] |
CountryHasStates | [1, 20] | [1, 20] | [1, 20] |
FootballerPlaysPosition | [1, 2] | [1, 3] | [1, 2] |
PersonCauseOfDeath | [0, 1] | [0, 3] | [0, 1] |
PersonHasAutobiography | [1, 4] | [1, 4] | [1, 4] |
PersonHasEmployer | [1, 6] | [1, 13] | [1, 6] |
PersonHasNoblePrize | [0, 1] | [0, 2] | [0, 1] |
PersonHasNumberOfChildren | [1, 1] | [1, 2] | [1, 1] |
PersonHasPlaceOfDeath | [0, 1] | [0, 1] | [0, 1] |
PersonHasProfession | [1, 11] | [1, 12] | [1, 11] |
PersonHasSpouse | [1, 3] | [1, 3] | [1, 3] |
PersonPlaysInstrument | [1, 8] | [1, 8] | [1, 8] |
PersonSpeaksLanguage | [1, 10] | [1, 4] | [1, 10] |
RiverBasinsCountry | [1, 9] | [1, 5] | [1, 9] |
SeriesHasNumberOfEpisodes | [1, 2] | [1, 1] | [1, 2] |
StateBordersState | [1, 16] | [1, 12] | [1, 16] |
Task Evaluation
For each test instance, predictions are evaluated by calculating precision and recall against ground-truth values. The final macro-averaged F1-score is used to rank the participating systems.Baselines
We provide several baselines:- Standard prompt for HuggingFace models with Wikidata default disambiguation: These baselines can be instantiated with various HuggingFace models (e.g., BERT, OPT), generate entity surface forms, and use the Wikidata entity disambiguation API to generate IDs.
- Few-shot GPT-3 directly predicting IDs: This baseline uses a few samples to instruct GPT-3 to directly predict Wikidata IDs.
Method | Avg. Precision | Avg. Recall | Avg. F1-score |
---|---|---|---|
GPT-3 IDs directly (Curie model) | 0.126 | 0.060 | 0.061 |
BERT | 0.368 | 0.161 | 0.142 |
Organizers




Huawei Technology R&D UK