A loose list of post processing procedures See the current state of integration into the official DBpedia Release at https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config/-/blob/master/functions.sh#L36
The script replaces all subjects/objects in a dataset by their transitive closure. All triples must use the same property. Cycles are removed. Example:
Wikipedia has a redirect mechanism if articles names change, to support stability. However, in wikitext this change is not reflected. The goal of this and the next step is to canoncialize the occurence of these IRIs by picking a representant (the last element in the redirection chain/path). This step computes the redirect chain (transitive links).
Links:
For a usage example and parameter docu, see the Code
Performs canonicalization of all object IRIs replacing them by their (transitive) redirects, i.e. http://dbpedia.org/resource/Barack_Obama_Jr
will be replaced by http://dbpedia.org/resource/Barack_Obama
. This step depends on the calculation of transitive redirects.
For a usage example and parameter docu, see the Code
Takes the mapping-based properties dataset and the assigned rdf types and tries to classify them in correct or wrong statements by checking domain and range according to the definition of the properties in the DBpedia ontology. In order to work correctly it is required the object IRIs are normalized (MapObjectURIs post processing).
Statements with predicate p for which the subject resource is from a different type than specified in rdfs:domain
of p are passed to _disjointDomain
files, whereas statements with an object resource disjoint from rdfs:range
will be passed _disjointRange
files. Statements where the types match or are subtypes of the expected ones are passed to the regular dataset files (without content variant). See code for more details.