Proposed Bulk Import Data Workflow for SSEDAS

I have worked out a proposal for an import work flow for bulk data into “the databases”, as proposed for SSEDAS.

Short introduction to what are the requirements for TransforMap when used in SSEDAS:

  • Each of the 26 SSEDAS partners gets a map with “their” POIs (max. 5K for all) displayed on their own website, with up to 3 custom filters for each map
  • Each POI must allow for pictures and text descriptions (in multiple languages) to be displayed
  • Some POIs also have videos to be shown.
  • Only POIs suitable for OpenStreetMap need to be handled

Some time ago in Graz, @Josef and I were working drafting an import workflow as following:

  • We send out an Excel template for POIs, which is to be filled from the partner
  • Because there are 26 partners, and we cannot handle them all individually, we proposed 3 rounds (distance of 6-8 weeks):
  • The first one is the “initial” round. After that we send back the POIs with missing/incorrect attributes back for correction in the second round
  • In the second round, the partners can send the corrected POIs, or new POIs again to us. We then again send back “incomplete” POIs for correction
  • The third round is the “final” round and “last chance” to correct POIs. POIs that are still missing crucial information (neither address nor coordinates, or name/POI-type) do net get imported by us.

The workflow in words is the following:

  • From the Partner, we get the POIs in the Spreadsheet template back. Photos and text are either sent via Email to us, or uploaded on our ownCloud or directly on our media-server or Wikimedia Commons by them.
  • We bulk-geocode the POIs with OpenStreetMap and other free address sources.
  • If no free source is available, we generate a UMAP, and the partner moves the POIs to the correct locations
  • We import the POIs to OSM, and in parallel to the TransforMap Extra-Tag-DB.
  • We upload the photos/texts in bulk to our media server

I’ve sketched the workflow here, you can find the source files and a PDF in our ownCloud/TransforMap/projects/SSEDAS/.

[edit: added numbers corresponding to Data(base) structure suggestion, updated to protocol of monkey circle meeting]

I’ve made the following assumptions:

  • We do not display contact data of the POIs (if they haven’t agreed to) on the web maps
  • they don’t get stored on the Extra-Tag-DB either, so we can make that 100% Open Data too
  • If we publish them anywhere, we cannot guarantee that they don’t get scraped and stay private, so we simply just don’t store them.
  • I’ve sketched the workflow for our Extra-Tag-DB being a clone/fork of the OSM database stack, because we currently do neither have DB schemes nor editors for an e.g. Geo-Couch-based DB.
  • Videos only end up in existing video hosting services, because we don’t have the capacity to host them ourselves.

It is a rather complicated workflow, but only with this way we can guarantee that:

  • Data is stored where it thematically belongs:
  • Geodata in OSM
  • Media in Wikimedia Commons (or the TransforMap Media Server)
  • Project-specific stuff (SSEDAS tags used for filtering) in our own DB
  • Data stays Open Data, as no unfree sources are used
  • We can only use free geo-coding sources, because it would taint our and OSM’s data license anyway!
  • Data flow of each partner is documented and backed up after each step, so errors can be tracked and different people can work in parallel on the imports

As always, this is a proposal as I would do it, it lived solely in my head until now.
Please comment and sent improvement suggestions!

  • Did I miss to explain enough details in some workflows, what needs more explanation?
  • Are there parts that are unrealistic?
  • What would you make differently/better?
  • For typos, please correct them yourselves in the sources^^

Thanks for reading until this point :slight_smile:


