LM-KBC @ ISWC 2023

🔔 News

02.12.23: Proceedings available at https://ceur-ws.org/Vol-3577/
25.08.23: Winning systems in each track announced.
09.08.23: Final deadline extension to August 14, 11:59 pm CEST time, for both system and paper submissions.
29.07.23: Deadline extended to August 10 for system and prediction submission, and 11 August for paper submission.
26.07.23: Test data subject entities have been released on the GitHub repo. Please do a git pull for getting the test dataset and updated evaluate.py script. Submit your predictions on CodaLab to get your scores.
11.07.23: Submit your validation data predictions on CodaLab to get a score now (this is optional and test data leaderboard will be separate and released later).
10.07.23: New baseline (GPT3 + Wikidata NED) added to repository.
22.05.23: v1.0 of dataset (train and validation splits) released.
17.05.23: Test output/system submission deadline extended to August 2, 2023. Take time to submit your strongest systems!

Task Description

Pretrained language models (LMs) like chatGPT have advanced a range of semantic tasks and have also shown promise for knowledge extraction from the models itself. Although several works have explored this ability in a setting called probing or prompting, the viability of knowledge base construction from LMs remains underexplored. In the 2^nd edition of this challenge, we invite participants to build actual disambiguated knowledge bases from LMs, for given subjects and relations. In crucial difference to existing probing benchmarks like LAMA (Petroni et al., 2019), we make no simplifying assumptions on relation cardinalities, i.e., a subject-entity can stand in relation with zero, one, or many object-entities. Furthermore, submissions need to go beyond just ranking predicted surface strings and materialize disambiguated entities in the output, which will be evaluated using established KB metrics of precision and recall.

Formally, given the input subject-entity (s) and relation (r), the task is to predict all the correct object-entities ({o₁, o₂, ..., o_k}) using LM probing.

The challenge comes with two tracks:

Track 1: a small-model track with low computational requirements (<1 billion parameters)
Track 2: an open track, where participants can use any LM of their choice

Track 1: Small-model track (<1 billion parameters)

Participants are free to use any pretrained LM containing at most 1 billion parameters. This includes, for instance, BERT, BART, GPT-2, and variants of OPT. The input tuples can be paraphrased through prompt engineering techniques (e.g., AutoPrompt,LPAQA), and participants can also use prompt ensembles for better output generation. However, using context (e.g., verbalizing tuples using supporting sentences) is not allowed in this track.

Track 2: Open track

In the open track, the task is the same as in the small-model track. However,

LMs of any size, e.g., GPT-3, can be probed.
Use of context is allowed for LM-generation, e.g., context retrieval like in REALM and factual predictions with context.

🏆 Winners

Track	System	Avg. Precision	Avg. Recall	Avg. F1-score
1	Expanding the Vocabulary of BERT for Knowledge Base Construction Dong Yang, XU Wang, Remzi Celebi	0.395	0.393	0.323
2	Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata Bohui Zhang, Ioannis Reklos, Nitisha Jain, Albert Meroño-Peñuela, Elena Simperl	0.715	0.726	0.701

Dataset

We release a dataset (train and validation) for a diverse set of 21 relations, each covering a different set of subject-entities and along with complete list ground truth object-entities per subject-relation-pair. The total number of object-entities varies for a given subject-relation pair. The train dataset subject-relation-object triples can be used for training or probing the language models in any form, while validation can be used for hyperparameter tuning. Futher details on the relations are given below:

Relation	Description	Example
`BandHasMember`	band (`s`) has a member (`o`)	Show/Hide (Q941293, N.E.R.D., BandHasMember, [Q14313, Q706641, Q2584176], [Pharrell Williams, Chad Hugo, Shay Haley])
`CityLocatedAtRiver`	City (`s`) is located at the river (`o`)	Show/Hide (Q365, Cologne, CityLocatedAtRiver, [Q584], [Rhine])
`CompanyHasParentOrganisation`	Company (`s`) has another company (`o`) as its parent organization	Show/Hide (Q39898, NSU, CompanyHasParentOrganisation, [Q246], [Volkswagen])
`CompoundHasParts`	chemical compound (`s`) consists of an element (`o`)	Show/Hide (Q150843, Hexadecane, [Q623, Q556], [carbon, hydrogen])
`CountryBordersCountry`	country (`s`) shares a land border with another country (`o`)	Show/Hide (Q1020, Malawi, CountryBordersCountry, [Q924, Q953, Q1029], [Tanzania, Zambia, Mozambique])
`CountryHasOfficialLanguage`	country (`s`) has an official language (`o`)	Show/Hide (Q334, Singapore, CountryHasOfficialLanguage, [Q1860, Q5885, Q9237, Q727694], [English, Tamil, Malay, Standard Mandarin])
`CountryHasStates`	country (`s`) has the state (`o`)	Show/Hide (Q702, Federated States of Micronesia, CountryHasStates, [Q221684, Q1785093, Q7771127, Q11342951], [Chuuk, Kosrae State, Pohnpei State, Yap State])
`FootballerPlaysPosition`	Footballer (`s`) plays in the position (`o`)	Show/Hide (Q455462, Antoine Griezmann, FootballerPlaysPosition, [Q280658], [forward])
`PersonCauseOfDeath`	person (`s`) died due to a cause (`o`)	Show/Hide (Q5238609, David Plotz, PersonCauseOfDeath, [ ], [ ])
`PersonHasAutobiography`	person (`s`) has the autobiography (`o`)	Show/Hide (Q6279, Joe Biden, PersonHasAutobiography, [Q100221747], [Promise Me Dad])
`PersonHasEmployer`	person (`s`) is or was employed by a company (`o`)	Show/Hide (Q11476943, Yōichi Shimada, PersonHasEmployer, [Q4845464], [Fukui Prefectural University])
`PersonHasNobelPrize`	person (`s`) has the nobel prize (`o`)	Show/Hide (Q65989, Wolfgang Pauli, PersonHasNobelPrize, [Q38104], [Nobel Prize in Physics])
`PersonHasNumberOfChildren`	person (`s`) has number of children (`o`)	Show/Hide (Q7599711, Stanley Johnson, PersonHasNumberOfChildren, [6], [6])
`PersonHasPlaceOfDeath`	person (`s`) died at a location (`o`)	Show/Hide (Q4369225, Alina Pokrovskaya, PersonHasPlaceOfDeath, [ ], [ ])
`PersonHasProfession`	person (`s`) held a profession (`o`)	Show/Hide (Q468043, Jon Elster, PersonHasProfession, [Q121594, Q188094, Q1238570, Q2306091, Q4964182], [professor, economist, political scientist, sociologist, philosopher])
`PersonHasSpouse`	person (`s`) has spouse (`o`)	Show/Hide (Q5111202, Chrissy Teigen, PersonHasSpouse, [Q44857], [John Legend])
`PersonPlaysInstrument`	person (`s`) plays an instrument (`o`)	Show/Hide (Q15994935, Emma Blackery, PersonPlaysInstrument, [Q6607, Q61285, Q17172850], [guitar, ukulele, voice])
`PersonSpeaksLanguage`	person (`s`) speaks the language (`o`)	Show/Hide (Q18958964, Witold Andrzejewski, PersonSpeaksLanguage, [Q809], [Polish])
`RiverBasinsCountry`	river (`s`) basins in a country (`o`)	Show/Hide (Q45403, Brahmaputra River, RiverBasinsCountry, [Q148, Q668], [People's Republic of China, India])
`SeriesHasNumberOfEpisodes`	series (`s`) has (`o`) number of episodes	Show/Hide (Q12403564, Euphoria, SeriesHasNumberOfEpisodes, [10], [10])
`StateBordersState`	state (`s`) shares a border with another state (`o`)	Show/Hide (Q1204, Illinois, StateBordersState, [Q1166, Q1415, Q1537, Q1546, Q1581, Q1603], [Michigan, Indiana, Wisconsin, Iowa, Missouri, Kentucky])

Each row in the dataset files constitutes of (1) subject-entity-id, (2) subject-entity, (3) list of all possible object-entities-id, (4) list of all possible object-entities and (5) relation. Please read the data format section for more details. When the subjects have zero valid objects, the ground truth is an empty list, e.g., (Q2283, Microsoft, [ ], [ ], CompanyHasParentOrganisation).

Dataset Characteristics

For each of the 21 relations, the number of unique subject-entities in the train, dev, and test are given in the GitHub repo. The minimum and maximum number of object-entities for each relation is given below. If the minimum value is 0, then the subject-entity can have zero valid object-entities for that relation.

Relation	Train	Val	Test
`BandHasMember`	[2, 15]	[2, 16]	[2, 16]
`CityLocatedAtRiver`	[1, 9]	[1, 5]	[1, 9]
`CompanyHasParentOrganisation`	[0, 5]	[0, 3]	[0, 5]
`CompoundHasParts`	[2, 6]	[2, 5]	[2, 6]
`CountryBordersCountry`	[1, 17]	[1, 10]	[1, 17]
`CountryHasOfficialLanguage`	[1, 16]	[1, 11]	[1, 16]
`CountryHasStates`	[1, 20]	[1, 20]	[1, 20]
`FootballerPlaysPosition`	[1, 2]	[1, 3]	[1, 2]
`PersonCauseOfDeath`	[0, 1]	[0, 3]	[0, 1]
`PersonHasAutobiography`	[1, 4]	[1, 4]	[1, 4]
`PersonHasEmployer`	[1, 6]	[1, 13]	[1, 6]
`PersonHasNobelPrize`	[0, 1]	[0, 2]	[0, 1]
`PersonHasNumberOfChildren`	[1, 1]	[1, 2]	[1, 1]
`PersonHasPlaceOfDeath`	[0, 1]	[0, 1]	[0, 1]
`PersonHasProfession`	[1, 11]	[1, 12]	[1, 11]
`PersonHasSpouse`	[1, 3]	[1, 3]	[1, 3]
`PersonPlaysInstrument`	[1, 8]	[1, 8]	[1, 8]
`PersonSpeaksLanguage`	[1, 10]	[1, 4]	[1, 10]
`RiverBasinsCountry`	[1, 9]	[1, 5]	[1, 9]
`SeriesHasNumberOfEpisodes`	[1, 2]	[1, 1]	[1, 2]
`StateBordersState`	[1, 16]	[1, 12]	[1, 16]

Download dataset

Task Evaluation

For each test instance, predictions are evaluated by calculating precision and recall against ground-truth values. The final macro-averaged F1-score is used to rank the participating systems.

Baselines

We provide several baselines:

Standard prompt for HuggingFace models with Wikidata default disambiguation: These baselines can be instantiated with various HuggingFace models (e.g., BERT, OPT), generate entity surface forms, and use the Wikidata entity disambiguation API to generate IDs.
Few-shot GPT-3 directly predicting IDs: This baseline uses a few samples to instruct GPT-3 to directly predict Wikidata IDs.
Few-shot GPT-3 w/ NED: Like above, but predicting surface forms that are disambiguated via Wikidata's default disambiguation.

Baseline performance:

Method	Avg. Precision	Avg. Recall	Avg. F1-score
GPT-3 NED (Curie model)	0.308	0.210	0.218
GPT-3 IDs directly (Curie model)	0.126	0.060	0.061
BERT	0.368	0.161	0.142

Submission Details

Participants are required to submit:

A system implementing the LM probing approach, uploaded to a public GitHub repo
The output for the test dataset subject entites, in the same GitHub repo
A system description in PDF format (5-12 pages, CEUR workshop style), mentioning the GitHub repo.

The PDF must be uploaded on OpenReview. Additionally, there is an optional CodaLab live leaderboard that participants can submit to. The test dataset is initially hidden to preserve the integrity of results, and will be released 1 week before the final deadline. The output files for the test subject-entities must be formatted as described here, and submitted along with the system and its description. The top performing systems will get an opportunity to present their ideas and results during conference, and the challenge proceedings will be submitted to CEUR publication system.

Organizers

Sneha Singhania

MPI Informatics

Jan-Christoph Kalo

VU Amsterdam

Simon Razniewski

Bosch Center for AI

Jeff Z. Pan

University of Edinburgh
Huawei Technology R&D UK

Time	Team
14:00-14:10	Introduction
14:10-14:20	Track 1 Winner (Yang et al.)
14:20-14:30	Track 2 Winner (Zhang et al.)
14:30-14:40	Summary
14:40-15:00	Poster Session (Winners)
14:40-15:00	Poster Session Team 3 (Shrestha Ghosh)
14:40-15:00	Poster Session Team 4 (Li et al.)
14:40-15:00	Poster Session Team 5 (Biester et al.)
14:40-15:00	Poster Session Team 6 (Biswas et al.)
14:40-15:00	Poster Session Team 7 (Nayak et al.)

Activity	Dates
Dataset (train and dev) release	15 April 2023
Release of test dataset	26 July 2023
Submission of test output and systems	~~02 August~~ 14 August 2023
Submission of system description	~~09 August~~ 14 August 2023
Winner announcement	25 August 2023
Presentations@ISWC (hybrid)	early November 2023

Knowledge Base Construction from Pre-trained Language Models (LM-KBC)