There are many ways to contribute and help improve DBpedia. Below you can find general ideas for contribution. In addition, feel free to post and ideas for improvement of DBpedia in the DBpedia Forum and help us migrate ideas and approaches into this wiki.
You can improve the monthly releases in several ways:
mapping
)TODO: add page which explains how to configure/improve a language
Minidumps are small Wikipedia XML dumps which are used to test the extraction framework. Any errors found in the big dumps can be tested on minidumps on-commit. Defined tests are executed to test the extraction against the minidumps.
Here are several options on how you can improve the testing:
Learn more on how Testing on Minidumps works or how to Integrate custom SHACL tests.
Archivo automatically crawls and tests ontologies, so check out the info-page of the DBpedia ontology for the results in each version. The red ✘ marks a failed test, and a click on it reveals the report with the problems. Since most tests are SHACL-tests here is a quick tutorial how to evaluate SHACL-reports:
sh:ValidationResult
is a failed test.sh:focusNode
points to the problematic Node.sh:resultPath
is the the property where the problem ocurred.sh:resultSeverity
is the severity of the problem:
sh:Info
is just an information and does not necessarily need a fixsh:Warning
is a non-critical warning and should probably be fixedsh:Violation
is a critical problem and should be fixed as soon as possiblesh:sourceConstraintComponent
points out what the problem is: For example sh:NodeKindConstraintComponent
means that the object of the focusNode resultPath object
is not the right kind of value (e.g xsd:string instead of IRI) read more.sh:resultMessage
gives a short human readable explaination what the problem is.We are open for data contributions from the community. Feel free to contribute and publish your data on the DBpedia Databus using the Databus Maven Plugin.
Many datasets have been already contributed by the community. Here are few examples:
TODO: add more datasets to the list
grep 'sameAs'
from selected, vetted artifactsTODO: this section does not fit here, consider moving somewhere else or …
Copied from an email to the DBpedia Board, as is
In most cases, data is currently fixed by consumers, people download data or query the endpoint and then massage the data locally. Mappings.dbpedia.org was a first successful attempt to establish a principle where you could contribute to data quality at the source. There is often still no comparable model in other datasets. Data is either read-only or tedious to edit like copying values manually or ad-hoc with bots, which is a bad reuse pattern compared to linking/mapping and automatic ingestion. Here is a list of steps we have designed in the last two years to elevate the mappings model: