DBpedia-fusion process to fuse internal and external data. Rewrites based on the ID management original source IRIs to DBpedia global ids. Uses the DBpedia global IRI clusters to fuse and enrich the source datasets.
# Provenance data schema
{
"@context":{
"@base":"http://dbpedia.org/fusion/",
"@vocab":"http://dbpedia.org/fusion/vocab#"
},
"subject":"IRI",
"predicate":"IRI",
"objects":[
{
"source":[
{
"@value":"graph/dataset"
}
],
"value":{
"@value":"Literal value", "@type":"IRI"
# xor
"@id": "IRI"
},
"selected":true
}
]
}
To decide the number of selected values for a property, a cardinality based median is calculated.
Example median calculation
----
@preifx ex : <http://example.org/>
ex:A
ex:property
"first value"@en , "second valu"@en .
ex:B
ex:property
"value of B"@en .
ex:C
ex:property
"value of C"@en .
----
=> sorted caridinallity sequence(1,1,2)
=> median for ex:property is 1
If the property-median-number equals 1 select only one value, otherwise all.
To select the right value, the property values are weighted on the trustiness of their originated source datasets.
For example en.dbpedia > de.dbpedia
, which describes that en.dbpedia is more trustful then de.dbpedia in case of the fusion scenario.
TODO - a combined function of weighted most frequent and preference dataset value