What's New in 2026
Open-Weight Models
Use any open-weight system up to a 32B total neural parameter budget. No single fixed model this year.
String Outputs
Predictions are evaluated as strings. No Wikidata entity disambiguation required.
Agentic Systems
Multi-step and agentic inference strategies are allowed within the same 32B budget.
At AKBC / EMNLP
For the first time, co-located with the AKBC workshop at EMNLP in Budapest.
Important Dates
| Activity | Date |
|---|---|
| Dataset (train & dev) release | April 20, 2026 |
| Dataset (test) release | TBA |
| Submission of test output, code, and system papers | August 15, 2026 |
| Acceptance and winner announcement | September 1, 2026 |
| Camera-ready papers | September 15, 2026 |
| Presentations @ EMNLP 2026 | October 2026 |
How to Participate
Build your system
Develop your approach and validate locally using the provided evaluation script.
Submit predictions
Submit your output file to the CodaLab leaderboard (link TBA).
Write your paper
Submit a system description paper and share your code publicly.
Task
Large language models contain a substantial amount of factual knowledge. Turning that knowledge into reliable knowledge base entries, however, is much harder than answering a single factual question.
Given a subject s and a relation r, predict the complete set of correct object strings {o1, o2, …, ok}. Unlike standard factual QA, a subject may have zero, one, or many correct objects. The goal is to construct a complete and precise KB entry.
Examples
Subject: Dominican Republic
Relation: countryLandBordersCountry
Output: {"Haiti"}
Subject: Mauritius
Relation: countryLandBordersCountry
Output: {}
Subject: Turing Award
Relation: awardWonBy
Output: {"Geoffrey Hinton", "Yoshua Bengio", …}
Dataset
The dataset is released in train, validation, and test splits, covering six relations with different structural properties.
| Relation | Train | Val | Test | Properties |
|---|---|---|---|---|
countryLandBordersCountry |
68 | 68 | 67 | set-valued null values |
personHasCityOfDeath |
100 | 100 | 100 | null values |
hasCapacity |
100 | 100 | 100 | numeric |
awardWonBy |
10 | 10 | 10 | many objects |
companyTradesAtStockExchange |
100 | 100 | 100 | null values |
hasArea |
100 | 100 | 100 | numeric |
Dataset files, baseline code, and evaluation script are available in the GitHub repository
Evaluation & Output Format
Metrics
Submissions are evaluated using macro precision, macro recall, and macro F1 over predicted object sets. Predictions are evaluated as strings — no canonical entity identifiers required.
For string relations, predicted strings are normalized (lowercased,
diacritics removed, punctuation stripped) and matched against the ground-truth label
and its known aliases.
For numeric relations (hasCapacity, hasArea),
a prediction is counted as correct if it falls within a 5% relative tolerance
of the ground-truth value.
evaluate.py
in the challenge repository.
Use it to validate locally before submission.
JSONL Format
Each line: one JSON object with SubjectEntity, Relation, ObjectEntities.
{"SubjectEntity": "Dominican Republic",
"Relation": "countryLandBordersCountry",
"ObjectEntities": ["Haiti"]}
{"SubjectEntity": "Mauritius",
"Relation": "countryLandBordersCountry",
"ObjectEntities": []}
Rules
Participants may use one or more open-weight neural models, provided that the total parameter count of all inference-time neural components does not exceed 32B parameters.
Closed-book setting. No web search, no RAG, no external factual corpora, no KB lookup for fact prediction.
No additional training. Fine-tuning, continued pretraining, or instruction tuning for this challenge is not allowed.
32B parameter limit across all neural components at inference time.
Quantization does not reduce the counted size.
MoE models are counted by total published parameter count, not active parameters.
Agentic systems and multi-step inference are allowed. Multiple models are summed toward the 32B limit.
Non-neural components (rule-based filtering, string normalization, deduplication, aggregation) are allowed.
Submission
Leaderboard
Submit predictions to CodaLab (link TBA)
OpenReview
Submit your system paper on OpenReview
Code
Public GitHub repo linked in your paper
Local Eval
Validate with the eval script before submitting
Organizers
Jan-Christoph Kalo
University of Amsterdam
Tuan-Phong Nguyen
MPI for Informatics
Simon Razniewski
ScaDS.AI and TU Dresden
Bohui Zhang
King's College London