DBpedia Development Wiki devilopment bible

Edit this page on Github


MARVIN is the release bot that does automated DBpedia releases each month on three different servers for generic, mappings, wikidata extraction.

The repository at https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config can be used to fork the architecture for creating extensions, developing new extractors or debugging old ones.

Fixes and patches will be manually deployed via git pull from the master branch of the DBpedia Extraction Framework.

The architecture and workflow can also be forked and adapted to completely different extractions and derive operations outside of the DBpedia framework.


We thank Sören Auer and the Technische Informationsbibliothek (TIB) for providing three servers to run:

  • the main DBpedia extraction on a monthly basis
  • community-provided extractors on Wikipedia, Wikidata or other sources
  • enrichment, cleaning and parsing services, so-called Databus mods for open data on the Databus

This contribution by TIB to DBpedia & its community is a great push towards incentivizing Open Data and establishing a global and national research and innovation data infrastructure.


Downloading the wikimedia dumps


Update and Run the extraction


Deploy MARVIN on Databus


[Manual] Run Databus-Derive (clone and parse)

On the respective server there is a user marvin-fetch, that has access to /data/derive containing the pom.xml of https://github.com/dbpedia/databus-maven-plugin/tree/master/dbpedia

# query to get all versions fro derive in xml syntax to paste directly into pom.xml
PREFIX dataid: <http://dataid.dbpedia.org/ns/core#>
SELECT distinct (?derive) WHERE {

    ?dataset dataid:group <https://databus.dbpedia.org/marvin/generic> .
    ?dataset dataid:artifact ?artifact .
    ?dataset dataid:version ?version .
    ?dataset dct:hasVersion "2019.08.30"^^xsd:string
	BIND (CONCAT("<version>",?artifact,"/${databus.deriveversion}</version>") as ?derive)
order by asc(?derive)
su marvin-fetch
tmux a -t derive
# prepare
cd /data/derive/databus-maven-plugin/dbpedia/$WHAT
git pull
mvn versions:set -DnewVersion=$NEWVERSION
# run
mvn databus-derive:clone -Ddatabus.deriveversion=$NEWVERSION

[Manual] pull data to downloads.dbpedia.org server

run marvin-fetch.sh script in databus/dbpedia folder

cd /media/bigone/25TB/releases/databus-maven-plugin/dbpedia
./marvin-fetch.sh wikidata 2019.08.01

Deploy cleaned files to dbpedia

cd /media/bigone/25TB/releases/databus-maven-plugin/dbpedia/mappings
mvn clean 
mvn validate
mvn -T 8 deploy