TODO Description.
TODO This.
Clone the current extraction-framework.
git clone https://github.com/dbpedia/extraction-framework
Enter the repository and install it ( This can take up to 5 minutes ).
mvn clean install
# mvn -T 4 clean install
The main properties of the extraction are configured in core/main/resources/universal.properties.
| Property | Description |
|---|---|
base-dir |
Location of the data, e.g. wikidumps and extracted data |
log-dir |
Location of produced logging data. |
The maximal avail memory for the mappingbased extraction can be assigned in dump/pom.xml.
<launcher>
<id>extraction</id>
<mainClass>org.dbpedia.extraction.dump.extract.Extraction</mainClass>
<jvmArgs>
...
<jvmArg>-Xmx16G</jvmArg>
...
</jvmArgs>
</launcher>
For a full mappingbased extraction it is recommended to have more then 100 GB free disk space.
Further we recommend to assign at least 150 GB of memory, to avoid exceptions for the larger languages ( e.g. en-wiki and commons-wiki ).
This section describes the setup and configuration of a default mappingbased extraction process. It will extact data from all possible languages, that have an entries in the mappings-wiki.
TODO Add: change languages=en in download.minimal.properties to languages=@mappings
# enter dump folder
cd dump/
# download all mappings-wiki related wikidumps
../run download download.10000.properties
# dowload ontology and mappings data
../run download-ontology
../run download-mappings
# start the extraction process
../run extraction extraction.mappings.properties
TODO Spellcheck.