- 16-11-2022: Proceedings published at https://ceur-ws.org/Vol-3274
- 16-08-2022: Winning systems in each track announced.
- 04-07-2022: Final deadline extension until July 26, 23:59:59 AoE time.
- 11-07-2022: Test subject entities have been released here. Submit your predictions on CodaLab to get a confirmed score now (optional, final leaderboard will be split by track, multiple submissions possible).
- 04-07-2022: Submission deadline extended to July 21 (to accommodate below changes).
- 02-07-2022: Data format and evaluation scripts have been updated, please pull again (and read our announcement here).
Pre-trained language models (LMs) have advanced a range of semantic tasks and have also shown promise for knowledge extraction from the models itself. Although several works have explored this ability in a setting called LM probing using prompting or prompt-based learning (Liu et al., 2021), the viability of knowledge base construction from LMs has not yet been explored. In this challenge, we invite participants to build actual knowledge bases from LMs, for given subjects and relations. In crucial difference to existing probing benchmarks like LAMA (Petroni et al., 2019), we make no simplifying assumptions on relation cardinalities, i.e., a subject-entity can stand in relation with zero, one, or many object-entities. Furthermore, submissions need to go beyond just ranking the predictions, and make concrete decisions on materializing outputs. The outputs are evaluated using the established F1-score KB metric.
Formally, given the input subject-entity (
s) and relation
r), the task is to predict all the correct object-entities
using LM probing.
The challenge comes with two tracks:
- Track 1: a BERT (BERT-base or BERT-large) track with low computational requirements.
- Track 2: an open track, where participants can use any LM (e.g., RoBERTa, Transformer-XL, GPT-2, BART etc.) of their choice.
|Track||System||Avg. Precision||Avg. Recall||Avg. F1-score|
|1||Task-specific Pre-training and Prompt Decomposition for Knowledge Graph Population with Language Models
Tianyi Li, Wenyu Huang, Nikos Papasarantopoulos, Pavlos Vougiouklis, Jeff Z. Pan
|2||Prompting as Probing: Using Language Models for Knowledge Base Construction
Dimitrios Alivanistos, Selene Baez Santamaria, Michael Cochez, Jan-Christoph Kalo, Thiviyan Thanapalasingam, Emile van Krieken
We release a dataset (train and development) for a diverse set of 12 relations, each covering a different set of subject-entities and along with complete list ground truth object-entities per subject-relation-pair. The total number of object-entities varies for a given subject-relation pair. The train dataset subject-relation-object triples can be used for training or probing the language models in any form, while development can be used for hyperparameter tuning. Futher details on the relations are given below:
(Canada, CountryBordersWithCountry, [[USA, United States of America]]) (Norway, CountryBordersWithCountry, [Finland, Sweden, Russian]) (Mauritius, CountryBordersWithCountry, )
(Belarus, CountryOfficialLanguage, [Belarusian, Russian]) (Seychelles, CountryOfficialLanguage, [French, English, [Seychellois Creole, Creole]]) (Bosnia and Herzegovina, CountryOfficialLanguage, [Bosnian, Serbian, Croatian])
(Elbe, RiverBasinsCountry, [Germany, Poland, Austria, [Czech Republic, Czechia]]) (Drin, RiverBasinsCountry, [Albania]) (Chari, RiverBasinsCountry, [[Central African Republic, Africa], Cameroon, Chad])
(Oregon, StateSharesBorderState, [California, Idaho, Washington, Nevada]) (Florida, StateSharesBorderState, [Georgia]) (Mexico city, StateSharesBorderState, [[State of Mexico, Mexico], Morelos])
|ChemicalCompoundElement||chemical compound (
(Water, ChemicalCompoundElement, [Hydrogen, Oxygen]) (Borax, ChemicalCompoundElement, [Boron, Oxygen, Sodium]) (Calomel, ChemicalCompoundElement, [Mercury, Chlorine])
(Chester Bennington, PersonInstrument, ) (Ringo Starr, PersonInstrument, [Guitar, Drum, [Percussion Instrument, Percussion]]) (Leeteuk, PersonInstrument, [Piano])
(Bruno Mars, PersonLanguage, [Spanish, English]) (Aamir Khan, PersonLanguage, [Hindi, Urdu, English]) (Alicia Keys, PersonLanguage, [English])
(Susan Wojcicki, PersonEmployer, [Google]) (Steve Wozniak, PersonEmployer, [[Apple Inc, Apple], [Hewlett-Packard, HP], University of Technology Sydney, [Atari, Atari Inc]]) (Jacqueline Novogratz, PersonEmployer, [UNICEF, World Bank, Chase Bank])
(Nicolas Sarkozy, PersonProfession, [Lawyer, Politician, Statesperson]) (Shakira, PersonProfession, [[Singer-Songwriter, Singer, Songwriter], Guitarist]) (Eminem, PersonProfession, [Rapper])
(Elvis Presley, PersonPlaceOfDeath, [Graceland]) (Kofi Annan, PersonPlaceOfDeath, [Bern]) (Angela Merkel, PersonPlaceOfDeath, )
(John lewis, PersonCauseOfDeath, [[Pancreatic Cancer, Cancer]]) (Pierre Nkurunziza, PersonCauseOfDeath, [[Covid-19, Covid]]) (Neil deGrasse Tyson, PersonCauseOfDeath, )
(Apple Inc, CompanyParentOrganization, ) (Abarth, CompanyParentOrganization, [[Stellantis Italy, Stellantis]]) (Hitachi, CompanyParentOrganization, )
Each row in the dataset files constitutes one triple, of (1) subject-entity, (2) relation, and (3) list of all possible object-entities. For (3), we sometimes provide multiple aliases for an object-entity, where outputting any one of them is sufficient for that entity. In particular, to facilitate usage of LMs like BERT (which are constrained by single-token predictions), we provide a valid single-token form for multi-token object-entities, wherever such a form is meaningful. Please read the Data format section for more details. When the subjects have zero valid objects, the ground truth is an empty list, e.g., (Apple Inc., CompanyParentOrganization, ).
For each of the 12 relations, the number of unique subject-entities in the train, dev, and test are 100, 50, and 50. The minimum and maximum number of object-entities for each relation is given below. If the minimum value is 0, then the subject-entity can have zero valid object-entities for that relation.
|CountryBordersWithCountry||[0, 17]||[0, 14]||[0, 11]|
|CountryOfficialLanguage||[1, 4]||[1, 15]||[1, 11]|
|StateSharesBorderState||[1, 14]||[1, 15]||[1, 14]|
|RiverBasinsCountry||[1, 6]||[1, 10]||[1, 9]|
|ChemicalCompoundElement||[2, 6]||[2, 6]||[2, 6]|
|PersonLanguage||[1, 6]||[1, 5]||[1, 7]|
|PersonProfession||[1, 23]||[1, 19]||[1, 20]|
|PersonInstrument||[0, 7]||[0, 14]||[0, 7]|
|PersonEmployer||[1, 8]||[1, 8]||[1, 8]|
|PersonPlaceOfDeath||[0, 1]||[0, 1]||[0, 1]|
|PersonCauseOfDeath||[0, 3]||[0, 2]||[0, 2]|
|CompanyParentOrganization||[0, 5]||[0, 1]||[0, 3]|
We use a standard KBC evaluation metric, the macro-averaged F1-score (based on the combination of precision and recall), to compare the predicted object-entities with true object-entities on the hidden test dataset. We release a baseline implementation and evaluation script. The baseline model probes the BERT language model using a sample prompt like "China shares border with [MASK]", and selects object-entities predicted in the [MASK] position with greater than or equal to 0.5 likelihood as outputs. This baseline achieves 31.08% F1-score (averaged across all subject-entities and relations) on the hidden test dataset. Participants can use the evaluation script to compute the F1-score for assessing the performance of their systems. For more details on LM-probing and the baseline method, please check the released notebook.
Participants are required to submit:
- A system implementing the LM probing approach
- The output for the test dataset subject-entites
- A system description in PDF format (5-12 pages, LNCS style).
All materials must be uploaded on Easychair. Additionally, there will be an optional CodaLab live leaderboard that participants can submit to. The test dataset is initially hidden to preserve the integrity of results, and will be released 10 days before the final deadline. The output files for the test subject-entities must be formatted as described here, and submitted along with the system and its description. The top performing systems will get an opportunity to present their ideas and results during the ISWC 2022 conference, and the challenge proceedings will be submitted to CEUR publication system.