Your locale preferences have been saved. We like to think that we have excellent support for English in pretalx, but if you encounter issues or errors, please contact us!

Empowering Wikidata editors and content with the Wikidata Quality Toolkit
09.08, 15:30–16:55 (Poland), Ohrid (9)(interpretation)
Language: English

This workshop focuses on presenting and showing how to use the tools included in the Wikidata Quality Toolkit (WQT). The WQT consists of tools, built on top of recently published research, that aim at assisting Wikidata editors in three of their daily tasks: recommending editors what items to edit (based on their expertise and edit history); detecting item references that do not support claims well (and ways of improving them); and automatically generating EntitySchemas to find items with missing information. We will demonstrate these tools, help editors in using them, and gather feedback for further improvement.


The Wikidata Quality Toolkit (WQT, https://king-s-knowledge-graph-lab.github.io/WikidataQualityToolkit/) is a set of tools that various researchers, developers, and open knowledge activists and enthusiasts are currently working on to improve the quality of content in Wikidata and improve the workflows of editors in their everyday tasks. These tools are built on top of successful, recently published research about Wikidata (including a paper that received the Wikimedia Foundation Research of the Year Award in 2022). Our aim is to help transition these tools from the lab into the community and the world.

The WQT contains three tools covering the spectrum of Wikidata content quality and editor workflow improvement, and solve three tasks: verifying the quality of references in supporting claims, recommending items based on expertise automatically, and generating schemas for item completion.

  • Reference Quality Verification (RQV). RQV provides an automated pipeline that verifies whether Knowledge Graph triples are supported by their documented sources. It involves text extraction, triple verbalization, sentence selection, and claim verification using rule-based methods and machine learning models. The users can verify the reference quality of specific document or wikidata item by using this tool. Futhermore, this tool supports to verify a batch of documents and wikidata tiems automatically.

  • Wikidata Game+. Wikidata Game+ builds upon the Wikidata Game by incorporating a novel recommendation system that provides personalised recommendation items for the editors, relying on both item features and item-editor previous interactions. It utilises users', items' content, and items' relations representations using matrix factorization, ELMo, and TransR embedding techniques.

  • EntitySchema Generator. There are numerous issues with Wikidata modeling and data quality, with inconsistent modeling of EntitySchemas being one of the most significant challenges nowadays. The EntitySchema generator addresses this by generating reference patterns of entity schemas for specific topics of entities based on Large Language Models (LLMs). By training on both good and bad examples, it can generate reference patterns and evaluate the quality of entity schemas. Additionally, it can modify inconsistent entity schemas based on the generated best patterns, and provide explanations and additional comments leveraging the capabilities of LLMs.

Session recording: https://youtu.be/BbGrkYK8FEk?list=PLhV3K_DS5YfJ1xyY0LNDNX3RKyRQEXOdB&t=22855


Как ваша сессия связана с темами события: Открытое сотрудничество?

The session reinforces "Collaboration of the Open" by leveraging synergies and building collaboration bridges between three different open ecosystems: academia, which aims at improving scientific methods and providing innovation openly for the benefit of all society; open source developers, which can follow up on the outputs of academia (papers, early software) and turn them into scalable, globally accessible and usable tools; and the open knowledge community of the Wikidata/Wikimedia movement, the final users of such tools and through which we hope to improve the quality of open knowledge resources.

Какой уровень опыта нужен аудитории для вашей сессии?

Everyone can participate in this session

Как вы планируете провести эту сессию?

Hybrid with some participants in Katowice and others dialing in remotely

С какими другими темами или вопросами соотносится ваша тема? Пожалуйста, выберите из списка тегов ниже.

Collaboration, Events, Grants, Product development

See also: Presentation slides

I am an Assistant Professor (Lecturer) in Computer Science at King’s College London, United Kingdom. My research revolves around culturally-informed Artificial Intelligence, in particular multimodal knowledge graphs, Web data APIs, music semantics, and knowledge representation and reasoning for digital humanities and cultural heritage.