Wikibase, federated


#1

opening this thread to document and continue our discussion from yesterday with @almereyda and @toka

Context:

  • Projects such as TransforMap or [inventaire.io] (https://inventaire.io) could make use of an opened linked, & contributive database with a different scope than Wikidata but re-using its properties and entities, and its interface

  • The obvious solution would be starting new instances of Wikibase but it currently miss federation features, namely:

    • allow to make use of both local properties/entities and remote/wikidata ones
    • allow to customize the naming system, replacing the incremental P and Q ids by uuid-like ids to make properties/entities imports among instances easier
  • The Wikidata/Wikibase team has added federation to the roadmap but made clear that it would take time to get there. We couldn’t find the dedicated issue or thread though.

[Edit] I also opened this discussion on the Wikidata Chat with the hope to dig up possible anterior discussions


Proposal for converging Semeoz work and Transformap wiki of maps
Use data from different Semantic Mediawikis and also Communecter's database
Community Report #5: 30th June 2016
(Thomas Kalka) #2

@maxlath thanks and welcome here.

I try to catch some ideas from yesterdays brainstorming:

  • Federation at the user interface level means : some properties and entities from other wikibase instances will show up in autocompletion boxes and can be used without difference to the local data.

  • We have to find the source for the id storage in the wikibase source.

  • One simple solution to federation would be

    • define a list of federation sources with
      • ressource description
      • ressource url
      • sparql (or something similar) queries, which data should be included
      • ressource prioritiy (which data to choose, if more than one source provides data for same id)
    • define a “data origin” property for entities
    • run regular imports for the sources
      • it will help, if sources provide data as a change stream
      • imports can be started by some notification system

Id generator code is here:

other usefull links:


(Thomas Kalka) #3

Unfortunately, entity-ids seem to be 32bit unsigned integers only:


(Michael Maier) #4

That shouldn’t be too hard to change, or?

Some day also Wikidata will have more than 4B entries, as OSM did a few years ago - so this needs a fix anyway.


(Michael Maier) #5

We’ve set up a wikibase instance now: https://base.transformap.co (unstable now due to to failures on the machine)


My next question is: Which properties do we use to model an ontology in our wikibase now?

  • Do we import all of Wikidata’s properties?

  • Do we just cp&paste the propierties we need (e.g. InstanceOf (P21), subclass of (P279), …)

  • Do we create our own?

  • or do we invest some time/energy to follow the Federated Wikibase approach and create/program the possibility to use/link the properies of Wikidata in our own wiki?

@toka, @maxlath any suggestions on this?

Thanks :slight_smile:


(Jon Richter) #6

In conversation with @toka we found out he already created a patch to not use integer IDs, but UUIDs instead. @species and me are very interested in following that route. Is the code available somewhere?


(Michael Maier) #7

I done some work in creating a JSON file Kei needs, therefore I have created a proposal for the data structure of the SSEDAS and other taxonomies. I’ve used the Wikidata Properties whereever possible in this draft.

You can have a look here: https://github.com/TransforMap/import-flows/blob/master/charts/SSEDAS/ssedas-tagfilter-scheme.pdf

A JSON file (which i manually wrote) which represents the current SSEDAS taxonomy in this scheme can be found here on Github.


(Michael Maier) #8

@toka is your code for changing the ID generator to UUIDs available somewhere?


(Jon Richter) #9

@Simon_Sarazin just mentionned a Distributed Semantic MediaWiki from 2014. Maybe this could help finding strategies how to federate?


(Jon Richter) #10

During #communities:world-social-forum-2016 I have been made aware of

for end-user aware curation of bodies of data.


(Adrien Labaeye) #11

was wondering how far on the wikibase we want to link properties to existing definitions when available (e.g. from Wikipedia). It was done with the Falling Fruit map: http://fallingfruit.org/data?c=forager%2Cfreegan&locale=en
It is less controversial because it is about species that have already been defined while we have much more moving objects.
Still, it would be as @toka mentionned a safe way to make sure we have definitions for many many properties not having to develop a community to take care of that. Those definitions could be provided by other mappings, networks… it would also ensure that the work of the transformap community isn’t about producing definitions, but connecting existing definitions, helping the communities to highlight differences, and convergence This is critical if we want to be a pluralistic initiative, rather than striving for some hegemony.
I think this was expressed in different places and times, but I felt I leave it here.

I think that a video conference to introduce the wikibase would be great - we could record it and post it on homepage. Something explaining what is already there and how to add a new taxonomy and how to connect that to the existing. I personally do not understand how that should work.
And I’m aware of this very informative page https://base.transformap.co/wiki/HowtoAddTaxonomy


#12

I don’t know where you stand on this topic now but here are some news from discussions with the Wikidata team: from what I understood, they have several people actively working at getting federated Wikibase (that is, a Wikibase instance reusing entities from Wikidata/another instance) working to serve the needs of structured data for Wiktionnary editions


Transformap Community Report #9 - November 2016
(Jon Richter) #13

This is then the hint where we socially and technologically have to locate these developments. Thank you very much for the update.


(Josef Kreitmayer) #14

Hello @maxlath and @almereyda,

I am not in depth familiar with the federated approach. Can I assume, that what the wikibase guys are working on, to enable federation, would also solve the UUID-Problem we encountered, if we want to enable the current wikibase instance, that Transformap is hosting to go towards linked data?

In the setup, that Wikibase currently offers, there just simple ID-numbering. If they want to enable federated, I assume, that an UUID (or similar)-generator is a piece they are working towards, right?

@maxlath do you have any time estimations, when they will have something stable to test bleeding edge or work with (leading edge)? Would be good to have a perspective. We currently encounter difficulties to keep the tested leading edge technology (wikibase, DB, Weblate, pads) in use up and running.


@almereyda what would you say about a future pacing session, to see for perspectives in 2017 and 2018? In the postponed items we have a story that we could engage with:

Taga Story #15: Produce a TransforMap Roadmap proposal until end of 2017


Concluding three years of prototyping
(Jon Richter) #15

2 posts were split to a new topic: Concluding three years of prototyping


Concluding three years of prototyping
(Jon Richter) #16

We have found this issue:


(Matt Wallis) #17

It’s now over 2 years since this thread started, and I wanted to check my understanding of where we are now.

So, there is now a Wikibase instance set up at http://base.transformap.co/. Some questions about this:

  • Is this now being used as the source of the terms being used at http://edit.transformap.co/?
  • Is this wikibase instance now regarded as the main place where taxonomic metadata for TransforMap is maintained?
  • How can I get at this metadata in a machine-readable way? I optimistically tried this:
$ curl -H 'Accept: application/rdf+xml' -L https://base.transformap.co/wiki/Item:Q176

but it returned HTML.

  • Ideally I’d like to be able to read this data in an RDF format. Can I?

I’d also like to consider turning this taxonomy data into an RDF vocabulary. Perhaps it could be added to ESSGLOBAL?? Any views on this?


(Jon Richter) #18

Yes. If you watch your browser’s developer tools network tab, you will witness a SPARQL request at initial page load.

Yes. Next to the SSEDAS Taxonomy we’re only having few examples to show how we intend this to work.

Unfortunately Wikibase does not deliver pure linked data. The editor, by instance, reads from https://base.transformap.co/wiki/Special:EntityData/Q5.json, but it could also read https://base.transformap.co/wiki/Special:EntityData/Q5.ttl for your likeliness.

The SPARQL endpoint is accessible at https://query.base.transformap.co/bigdata/#query (take care, no authentication present ATM) and in use by the editor, i.e. with https://query.base.transformap.co/bigdata/namespace/transformap/sparql?query= and a URL encoded SPARQL query amended, plus a &format=json parameter defining the output format.

Anything else you would like to know?


(Matt Wallis) #19

Thanks very much for this, @almereyda.
I look forward to getting some time to look into this fully, and make proper use of all this very interesting information!


(Jon Richter) #20

It is only now that thanks to @maxlath mentioning https://wikibase-registry.wmflabs.org/ in an issue on GitHub that I came to learn of

https://www.wikidata.org/wiki/Wikidata:WikiProject_Wikidata_for_research/Meetups/2018-04-23-25-Antwerpen

In collaboration with the European Research Council (Q1377836), Gene Wiki (Q5531528), Rhizome (Q7320757), WikiCite (Q30035267), Wikibase Community User Group (Q51033881) and others, we are meeting in Antwerp (Q12892) to explore ways to create a federated landscape of Wikibase instances federated with Wikidata.

It seems we’re a bit late to the game, but may want to get in touch later.