AKBC Shared Task 2026 @ EMNLP

What's New in 2026

Open-Weight Models

Use any open-weight system up to a 32B total neural parameter budget. No single fixed model this year.

String Outputs

Predictions are evaluated as strings. No Wikidata entity disambiguation required.

Agentic Systems

Multi-step and agentic inference strategies are allowed within the same 32B budget.

At AKBC / EMNLP

For the first time, co-located with the AKBC workshop at EMNLP in Budapest.

Important Dates

Activity	Date
Dataset (train & dev) release	April 20, 2026
Dataset (test) release	TBA
Submission of test output, code, and system papers	August 15, 2026
Acceptance and winner announcement	September 1, 2026
Camera-ready papers	September 15, 2026
Presentations @ EMNLP 2026	October 2026

How to Participate

Get the data

Download the dataset and baseline from the GitHub repository.

Build your system

Develop your approach and validate locally using the provided evaluation script.

Submit predictions

Submit your output file to the CodaLab leaderboard (link TBA).

Write your paper

Submit a system description paper and share your code publicly.

Task

Large language models contain a substantial amount of factual knowledge. Turning that knowledge into reliable knowledge base entries, however, is much harder than answering a single factual question.

Given a subject s and a relation r, predict the complete set of correct object strings {o₁, o₂, …, o_k}. Unlike standard factual QA, a subject may have zero, one, or many correct objects. The goal is to construct a complete and precise KB entry.

Input Subject + Relation

Process Language Model

Output Object Entity 1

Object Entity 2

…

Examples

1 object

Subject: Dominican Republic

Relation: countryLandBordersCountry

Output: {"Haiti"}

0 objects

Subject: Mauritius

Relation: countryLandBordersCountry

Output: {}

n objects

Subject: Turing Award

Relation: awardWonBy

Output: {"Geoffrey Hinton", "Yoshua Bengio", …}

Dataset

The dataset is released in train, validation, and test splits, covering six relations with different structural properties.

Relation	Train	Val	Test	Properties
`countryLandBordersCountry`	68	68	67	set-valued null values
`personHasCityOfDeath`	100	100	100	null values
`hasCapacity`	100	100	100	numeric
`awardWonBy`	10	10	10	many objects
`companyTradesAtStockExchange`	100	100	100	null values
`hasArea`	100	100	100	numeric

Dataset files, baseline code, and evaluation script are available in the GitHub repository

Evaluation & Output Format

Metrics

Submissions are evaluated using macro precision, macro recall, and macro F1 over predicted object sets. Predictions are evaluated as strings — no canonical entity identifiers required.

For string relations, predicted strings are normalized (lowercased, diacritics removed, punctuation stripped) and matched against the ground-truth label and its known aliases. For numeric relations (hasCapacity, hasArea), a prediction is counted as correct if it falls within a 5% relative tolerance of the ground-truth value.

Evaluation Script: Available as evaluate.py in the challenge repository. Use it to validate locally before submission.

JSONL Format

Each line: one JSON object with SubjectEntity, Relation, ObjectEntities.

{"SubjectEntity": "Dominican Republic",
 "Relation": "countryLandBordersCountry",
 "ObjectEntities": ["Haiti"]}

{"SubjectEntity": "Mauritius",
 "Relation": "countryLandBordersCountry",
 "ObjectEntities": []}

Rules

Participants may use one or more open-weight neural models, provided that the total parameter count of all inference-time neural components does not exceed 32B parameters.

Data & Retrieval

Closed-book setting. No web search, no RAG, no external factual corpora, no KB lookup for fact prediction.

No additional training. Fine-tuning, continued pretraining, or instruction tuning for this challenge is not allowed.

Model Budget

32B parameter limit across all neural components at inference time.

Quantization does not reduce the counted size.

MoE models are counted by total published parameter count, not active parameters.

Systems & Processing

Agentic systems and multi-step inference are allowed. Multiple models are summed toward the 32B limit.

Non-neural components (rule-based filtering, string normalization, deduplication, aggregation) are allowed.

Submission

Leaderboard

Submit predictions to CodaLab (link TBA)

OpenReview

Submit your system paper on OpenReview

Code

Public GitHub repo linked in your paper

Local Eval

Validate with the eval script before submitting

Organizers

AKBC Shared Task 2026

What's New in 2026

Open-Weight Models

String Outputs

Agentic Systems

At AKBC / EMNLP

Important Dates

How to Participate

Get the data

Build your system

Submit predictions

Write your paper

Task

Examples

Dataset

Evaluation & Output Format

Metrics

JSONL Format

Rules

Submission

Leaderboard

OpenReview

Code

Local Eval

Organizers

Jan-Christoph Kalo

Tuan-Phong Nguyen

Simon Razniewski

Bohui Zhang

Resources

Past Editions

Contact