Data(base) structure suggestion for SSEDAS

Tags: #<Tag:0x00007fd5e044c2e0> #<Tag:0x00007fd5e044c178>

Hi everyone,

I’ve finished my proposal how the data(flow) structure for SSEDAS should work. I’ve created a flow chart with yEd, you can find the sources in our Owncloud/website-dataflow-overview.

My basic ideas:

  • Existing open databases and storages should be used wherever possible
  • Data should be stored where it thematically belongs
  • No data should be duplicated when possible
  • The web-map running in the browser aggregates the data live, not a centralised server!
  • no background processes needed server-side to scrape data
  • solves many headaches about synchonisation issues
  • data is linked between existing datasources, not in a centralized data hub

Details for protocols are open for discussion!

I would be happy for comments and improvement suggestions!

[Edit : Added numbers corresponding to proposed bulk import workflow]


@species thanks for this great schematic work!

Following the ideas expressed on the talk, that we should plan work for specific fundings in a way that it contributes most of the effort to the TransforMap commons infrastructure, could you describe which parts of this proposal are SSEDAS specific (i.e. the infrastructure will be developed/deployed and only used for the purpose of SSEDAS) and which are developments that can form part of the permanent infrastructure of TransforMap (preferably describing how you see them fitting in the larger picture)?

This would be valuable to help in planning the articulation with CHEST, as well as in minimizing the effort put by the team on delivering a product to a single entity (the SSEDAS consortium).

The only part to be set up specific for SSEDAS is the Translation Webserver to provide localisation for the web-maps.
But this can be used in the future for other web-maps too!

In the ideal case, the SSEDAS partners translate us the web-map in 10 different languages, and we can use this in the future :slight_smile:

The TransforMap Extra Tag DB, the Taxonomy DB and the Media DB I see as integral building parts for TransforMap in the future.

But, as I go through CHEST again, I see it corresponding to the following work packages:

  • Translation Webserver: WP2 D8: […] visualize, map and integrate different taxonomies in multiple languages.
  • Taxonomy DB: WP2 D2: […] prototype […] toexchange data between different databases and map taxonomies
  • Extra Tag DB: it links to OSM (and maybe others), therefore WP2 D2: […] to exchange data between different databases
  • Media DB: Not sure, maybe part of WP2 D3-4: Reference implementation of a database […] stack […] in a distributed open data ecosystem

So when revisited, the only part I’m not sure if it fits in CHEST is the Media DB :wink:

1 Like

As one of the to-do’s from our latest tech circle meeting protocol, I’m adding other flow charts, this time not an overview, but with a time-axis to show how the data flows will happen on specific user actions.

Time axis runs from top (start) to bottom (action finished).

These examples are based on the situation that a partner website (e.g. a SSEDAS partner) wants to displays POIs with a specific tag to his organization, e.g. “ssedas_partner=1”. These tags are stored on the TransforMap-Tag-DB.

First load of a map-website

When the map-plugin is displayed the first time, as much data as possible should be served from static files to reduce the load on the live database servers. To optimize here makes sense, because 90-98% of the users do not interact with the map (zooming in/out, panning).

In words: Each of the different (SSEDAS-partner) websites has a fixed starting point and bounding box (BBOX), where the map is shown. For this BBOX, daily extracts are generated out of the databases and served as a static file for each of the partners (instead of more complicated requests to different data servers).

When the user starts to pan the map or zoom in, dynamic loading of content kicks in:

Dynamic loading of more data on map move

As the data is pulled live from different databases, some steps are necessary to collect all data for displaying new POIs:

  • After the user panned/scrolled the map to a new position, the “new” area where POIs are to be downloaded is calculated
  • Download requests are first sent to the TransforMap-Tag-DB with the BBOX and the a selector (ssedas_partner=1) - it returns all data in the BBOX which has the tag “ssedas_partner=1” set.
  • Then the web-map parses this data: In there are links to OSM objects for each POI. (and other sources in the future).
  • It queries the (Overpass) OSM-server, asking for the linked objects.
  • The OSM server returns a set of objects.
  • The web-map matches the TransforMap objects with the OSM objects and generates “combined objects” internally, with are added to the map.

Until now, we display only POI Icons (maybe with a name) on the map, no media content yet.

The media content gets loaded dynamically when the user clicks on a POI:

Click on POI

When the user clicks on a POI, a pop-up opens which displays further information and media about the object.

I guess only 1% of map users ever click on a pop-up, so this content is loaded dynamically on request.

So when a pop-up is opened, it is checked if there are any links to external media content. If there is, the content is fetched and added to the pop-up dynamically.

If the map-plugin is configured not to show objects according to SSEDAS-tags, but e.g. to show all fair-trade objects from OSM, it works the other way round: At first all OSM-Objects are loaded, and after them the corresponding objects out of the TransforMap-Tag-DB.

This is the most read thread in the last 30 days. This is however pre-API and pre-filter sets.

@species @almereyda @kei, could you provide some updates on this information, in face of the current developments and implementations?

This is a recurring issue with discourse posts (recently @alabaeye mentioned the same high number of visits for a old post he had), so we need to find ways to better put in evidence the current state of developments and marking legacy posts.