Nikolas Everett
2015-02-17 21:28:05 UTC
Folks: so Stas and I've spent two days talking to the wikidata folks and
I've learned some fun stuff. I'm sending this email to this mailing list,
because, well, I just have to write this down and this list cares about
things. Maybe they'll care about this thing. I'm not sending this to
wikidata's mailing list because I'm not really confident enough to send
this to experts yet.
Anyway, the most interesting bit that I learned so far is that the Wikidata
team claims that Wikidata doesn't describe truth. That might seem like a
silly difference at first but you start to get into trouble when you want
to query it and don't understand that. Think about it this way: there
are multiple values for the Jesus's birthday
<https://www.wikidata.org/wiki/Q302#P569> and wikidata actually doesn't
claim that either of them are true, just that they are *according to some
sources*. Look also at George Washington's spouse
<https://www.wikidata.org/wiki/Q23#P26>. She has a qualifier - the date of
their marriage. These qualifies are like preconditions to the truth.
Kinda. They aren't always used that way but you can sort of pretend.
But we can emerge from Cartesian doubt! Wikidata has some concept of "true
enough for most uses" called "best rank"
<https://www.wikidata.org/wiki/Help:Ranking>. Its a reasonably simple
concept that amounts to "the community decides". So the plan is to
implement queries against that first. This should be good enough for
wikigrok initially and faster to implement and query because it allows us
to ignore things like qualifiers and references.
Nik
I've learned some fun stuff. I'm sending this email to this mailing list,
because, well, I just have to write this down and this list cares about
things. Maybe they'll care about this thing. I'm not sending this to
wikidata's mailing list because I'm not really confident enough to send
this to experts yet.
Anyway, the most interesting bit that I learned so far is that the Wikidata
team claims that Wikidata doesn't describe truth. That might seem like a
silly difference at first but you start to get into trouble when you want
to query it and don't understand that. Think about it this way: there
are multiple values for the Jesus's birthday
<https://www.wikidata.org/wiki/Q302#P569> and wikidata actually doesn't
claim that either of them are true, just that they are *according to some
sources*. Look also at George Washington's spouse
<https://www.wikidata.org/wiki/Q23#P26>. She has a qualifier - the date of
their marriage. These qualifies are like preconditions to the truth.
Kinda. They aren't always used that way but you can sort of pretend.
But we can emerge from Cartesian doubt! Wikidata has some concept of "true
enough for most uses" called "best rank"
<https://www.wikidata.org/wiki/Help:Ranking>. Its a reasonably simple
concept that amounts to "the community decides". So the plan is to
implement queries against that first. This should be good enough for
wikigrok initially and faster to implement and query because it allows us
to ignore things like qualifiers and references.
Nik