5th Edition · Shared Task

AKBC Shared Task 2026

Predicting complete knowledge base entries from language models
AKBC Workshop @ EMNLP 2026, Budapest

Closed-book
Open-weight ≤32B
6 Relations
String outputs
16.04.2026: First website online. Check back for updates.

What's New in 2026

Open-Weight Models

Use any open-weight system up to a 32B total neural parameter budget. No single fixed model this year.

String Outputs

Predictions are evaluated as strings. No Wikidata entity disambiguation required.

Agentic Systems

Multi-step and agentic inference strategies are allowed within the same 32B budget.

At AKBC / EMNLP

For the first time, co-located with the AKBC workshop at EMNLP in Budapest.

Important Dates

Activity Date
Dataset (train & dev) release April 20, 2026
Dataset (test) release TBA
Submission of test output, code, and system papers August 15, 2026
Acceptance and winner announcement September 1, 2026
Camera-ready papers September 15, 2026
Presentations @ EMNLP 2026 October 2026

How to Participate

1

Get the data

Download the dataset and baseline from the GitHub repository.

2

Build your system

Develop your approach and validate locally using the provided evaluation script.

3

Submit predictions

Submit your output file to the CodaLab leaderboard (link TBA).

4

Write your paper

Submit a system description paper and share your code publicly.

Task

Large language models contain a substantial amount of factual knowledge. Turning that knowledge into reliable knowledge base entries, however, is much harder than answering a single factual question.

Given a subject s and a relation r, predict the complete set of correct object strings {o1, o2, …, ok}. Unlike standard factual QA, a subject may have zero, one, or many correct objects. The goal is to construct a complete and precise KB entry.

Input Subject + Relation
Process Language Model
Output Object Entity 1
Object Entity 2

Examples

1 object

Subject: Dominican Republic

Relation: countryLandBordersCountry

Output: {"Haiti"}

0 objects

Subject: Mauritius

Relation: countryLandBordersCountry

Output: {}

n objects

Subject: Turing Award

Relation: awardWonBy

Output: {"Geoffrey Hinton", "Yoshua Bengio", …}

Dataset

The dataset is released in train, validation, and test splits, covering six relations with different structural properties.

Relation Train Val Test Properties
countryLandBordersCountry 68 68 67 set-valued null values
personHasCityOfDeath 100 100 100 null values
hasCapacity 100 100 100 numeric
awardWonBy 10 10 10 many objects
companyTradesAtStockExchange 100 100 100 null values
hasArea 100 100 100 numeric

Dataset files, baseline code, and evaluation script are available in the GitHub repository

Evaluation & Output Format

Metrics

Submissions are evaluated using macro precision, macro recall, and macro F1 over predicted object sets. Predictions are evaluated as strings — no canonical entity identifiers required.

For string relations, predicted strings are normalized (lowercased, diacritics removed, punctuation stripped) and matched against the ground-truth label and its known aliases. For numeric relations (hasCapacity, hasArea), a prediction is counted as correct if it falls within a 5% relative tolerance of the ground-truth value.

Evaluation Script: Available as evaluate.py in the challenge repository. Use it to validate locally before submission.

JSONL Format

Each line: one JSON object with SubjectEntity, Relation, ObjectEntities.

{"SubjectEntity": "Dominican Republic",
 "Relation": "countryLandBordersCountry",
 "ObjectEntities": ["Haiti"]}

{"SubjectEntity": "Mauritius",
 "Relation": "countryLandBordersCountry",
 "ObjectEntities": []}

Rules

Participants may use one or more open-weight neural models, provided that the total parameter count of all inference-time neural components does not exceed 32B parameters.

Data & Retrieval

Closed-book setting. No web search, no RAG, no external factual corpora, no KB lookup for fact prediction.

No additional training. Fine-tuning, continued pretraining, or instruction tuning for this challenge is not allowed.

Model Budget

32B parameter limit across all neural components at inference time.

Quantization does not reduce the counted size.

MoE models are counted by total published parameter count, not active parameters.

Systems & Processing

Agentic systems and multi-step inference are allowed. Multiple models are summed toward the 32B limit.

Non-neural components (rule-based filtering, string normalization, deduplication, aggregation) are allowed.

Submission

Leaderboard

Submit predictions to CodaLab (link TBA)

OpenReview

Submit your system paper on OpenReview

Code

Public GitHub repo linked in your paper

Local Eval

Validate with the eval script before submitting

Organizers

Jan-Christoph Kalo
Jan-Christoph Kalo

University of Amsterdam

Tuan-Phong Nguyen
Tuan-Phong Nguyen

MPI for Informatics

Simon Razniewski
Simon Razniewski

ScaDS.AI and TU Dresden

Bohui Zhang
Bohui Zhang

King's College London

Contact

For questions and discussion, use the Google Group (link TBA).