TODO Description.
TODO This.
Clone the current extraction-framework.
git clone https://github.com/dbpedia/extraction-framework
Enter the repository and install it ( This can take up to 5 minutes ).
mvn clean install
# mvn -T 4 clean install
The main properties of the extraction are configured in core/main/resources/universal.properties
.
Property | Description |
---|---|
base-dir |
Location of the data, e.g. wikidumps and extracted data |
log-dir |
Location of produced logging data. |
The maximal avail memory for the mappingbased extraction can be assigned in dump/pom.xml
.
<launcher>
<id>extraction</id>
<mainClass>org.dbpedia.extraction.dump.extract.Extraction</mainClass>
<jvmArgs>
...
<jvmArg>-Xmx16G</jvmArg>
...
</jvmArgs>
</launcher>
For a full mappingbased extraction it is recommended to have more then 100 GB free disk space
.
Further we recommend to assign at least 150 GB of memory
, to avoid exceptions for the larger languages ( e.g. en-wiki and commons-wiki ).
This section describes the setup and configuration of a default mappingbased extraction process. It will extact data from all possible languages, that have an entries in the mappings-wiki.
TODO Add: change languages=en in download.minimal.properties to languages=@mappings
# enter dump folder
cd dump/
# download all mappings-wiki related wikidumps
../run download download.10000.properties
# dowload ontology and mappings data
../run download-ontology
../run download-mappings
# start the extraction process
../run extraction extraction.mappings.properties
TODO Spellcheck.