Knowledge Base Construction from Pre-Trained Language Models

Workshop @ 24th International Semantic Web Conference (ISWC 2025)

Language models such as chatGPT, BERT, and T5, have demonstrated remarkable outcomes in numerous AI applications. Research has shown that these models implicitly capture vast amounts of factual knowledge within their parameters, resulting in a remarkable performance in knowledge-intensive applications. The seminal paper "Language Models as Knowledge Bases?" sparked interest in the spectrum between language models (LMs) and knowledge graphs (KGs), leading to a diverse range of research on the usage of LMs for knowledge base construction, including (i) utilizing pre-trained LMs for knowledge base completion and construction tasks, (ii) performing information extraction tasks, like entity linking and relation extraction, and (iii) utilizing KGs to support LM based applications.

The 3rd Workshop on Knowledge Base Construction from Pre-Trained Language Models (KBC-LM) workshop aims to give space to the emerging academic community that investigates these topics, host extended discussions around the LM-KBC Semantic Web challenge, and enable an informal exchange of researchers and practitioners.

Important Dates

Papers due: August 2August 9, 2025
Notification to authors: August 28, 2025
Camera-ready deadline: September 4, 2025
Workshop dates: TBA

Topics

We invite contributions on the following topics:

  • Entity recognition and disambiguation with LMs
  • Relation extraction with LMs
  • Zero-shot and few-shot knowledge extraction from LMs
  • Consistency of LMs
  • Knowledge consolidation with LMs
  • Comparisons of LMs for KBC tasks
  • Methodological contributions on training and fine-tuning LMs for KBC tasks
  • Evaluations of downstream capabilities of LM-based KGs in tasks like QA
  • Designing robust prompts for large language model probing

Submissions can be novel research contributions or already published papers (these will be presentation-only, and not part of the workshop proceedings). Novel research papers can be either full papers (ca. 8-12 pages), or short papers presenting smaller or preliminary results (typically 3-6 pages). We are accepting demo and position papers as well. Check out also the LM-KBC challenge for further options to contribute to the workshop.

Submission and Review Process

Papers will be peer-reviewed by at least three researchers using a single-blind review. Accepted papers will be published on CEUR (unless authors opt out). Submissions need to be formatted according to this template. Also email a paper-signed copyright form to simon.razniewski@tu-dresden.de (form no GenAI / form if using GenAI).
Submit your papers on Openreview.

Keynote Speakers

Bio: Juan Sequeda is the Principal Scientist and Head of the AI Lab at data.world. He holds a PhD in Computer Science from The University of Texas at Austin. Juan's research and industry work has been on the intersection of data and AI, with the goal to reliably create knowledge from inscrutable data, specifically designing and building Knowledge Graph for enterprise data and metadata management. Juan is the co-author of the book "Designing and Building Enterprise Knowledge Graph" and the co-host of Catalog and Cocktails, an honest, no-bs, non-salesy data podcast.

Juan has researched and developed technology on semantic data virtualization, graph data modeling, schema mapping and data integration methodologies. He pioneered technology to construct knowledge graphs from relational databases, resulting in W3C standards, research awards, patents, software and his startup Capsenta acquired by data.world in 2019. Juan is the recipient of the NSF Graduate Research Fellowship, received 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at the 2014 International Semantic Web Conference (ISWC), the 2015 Best Transfer and Innovation Project awarded by the Institute for Applied Informatics, 2023 Best Industry Paper at SIGMOD and nominated two additional times for best paper at ISWC.

Juan strives to build bridges between academia and industry as former co-chair of the LDBC Property Graph Schema Working Group, member of the LDBC Graph Query Languages task force, standards editor at the World Wide Web Consortium (W3C). Juan continues to be an active member of the scientific community by being on the editorial board and program committees of scientific journals and conferences in Semantic Web, Knowledge Graphs, Databases and AI, as well as organizer of various academic and industry conferences, including being the General Chair of The ACM Web Conference 2023.



Toward the Era of Scientific AI: Adapting Large Language Models to Scientific Domains

This talk provides an overview of how large language models (LLMs) are being adapted for scientific domains. It begins by discussing the recent trend of LLMs and highlights the development of the LLM-JP model series, which are fully open, fully scratch-built Japanese-centric LLMs. Then, the talk introduces projects aimed at developing LLMs for the Japanese medical domain, as well as the design of model construction pipelines in the AI for Science area. Finally, it addresses current and future challenges related to these adaptations.

Bio: Akiko Aizawa is a professor and the director of the Digital Content and Media Sciences Research Division at the National Institute of Informatics (NII) in Japan. She is also an adjunct professor at the University of Tokyo and the Graduate University for Advanced Studies.

Aizawa’s research interests include natural language understanding, information retrieval, and scientific research infrastructure. She is currently working on developing a Japanese-centric medical Large Language Model as part of the Cross-Ministerial Strategic Innovation Promotion Program (SIP) focused on creating an “Integrated Health Care System.” Additionally, she serves as the Deputy Director of the Large Language Model Research and Development Center at NII.

Accepted Papers

  • OSKGC: A benchmark for Ontology Schema-based Knowledge Graph Construction from Text
  • Dali Wang, Mizuho Iwaihara
  • Towards Evaluating Knowledge Graph Construction and Ontology Learning with LLMs without Test Data Leakage
  • Heiko Paulheim
  • Text2KGBench-LettrIA: A Refined Benchmark for Text2Graph Systems
  • Julien Plu, Oscar Moreno Escobar, Edouard Trouillez, Axelle Gapin, Pasquale Lisena, Thibault Ehrhart, Raphael Troncy
  • Noise or Nuance: An Investigation Into Useful Information and Filtering For LLM Driven AKBC
  • Alex Clay, Ernesto Jiménez-Ruiz, Pranava Madhyastha
  • Towards Temporal Knowledge-Base Creation for Fine-Grained Opinion Analysis with Language Models
  • Gaurav Negi, Atul Kr. Ojha, Omnia Zayed, Paul Buitelaar
  • Qualitative Coding in the Age of AI: An Ontology-Driven Approach
  • Daniil Dobriy

Schedule

9:30-9:35 Welcome
9:35-10:20 Keynote 1: Akiko Aizawa
10:20-10:40 Best Paper (1x regular)
10:40-11:00 Coffee break
11:00-12:00 Paper Session 1 (3x regular)
12:20-14:00 Lunch break
14:00-14:45 Keynote 2: Juan Sequeda
14:45-14:55 LM-KBC Challenge Introduction
14:55-15:55 Paper Session 2 (challenge)
15:55-16:15 Coffee break
16:15-17:15 Paper Session 3 (3x regular)

Program Committee

  • Shrestha Ghosh, Eberhard-Karls-Universität Tübingen
  • Nicholas Popovic, Karlsruher Institut für Technologie
  • Hang Dong, University of Exeter
  • Hiba Arnaout, Technische Universität Darmstadt
  • Blerta Veseli, MPI-SWS
  • Ghanshyam Verma, University of Galway
  • Antonio De Santis, Polytechnic Institute of Milan
  • Sven Hertling, Uni Mannheim
  • Janna Omeliyanenko, Bayerische Julius-Maximilians-Universität Würzburg
  • Fabian Hoppe, Vrije Universiteit Amsterdam
  • Remzi Celebi, Maastricht University
  • Manolis Koubarakis, National and Kapodistrian University of Athens
  • Maurice Funk, Universität Leipzig

Chairs

Jan-Christoph Kalo
University of Amsterdam
Simon Razniewski
ScaDS.AI & TU Dresden
Sneha Singhania
MPI Informatics
Duygu Sezen Islakoglu
Utrecht University