STAC Based Back End Architecture
#
Architectural Components- Acquisition Server
- Data Host
- Data Catalogue
- Service Api
#
1. Acquisition Server#
DescriptionThe Acquisition Server will be responsible for the adding and removal of assets to the data catalog. It will perform this via polling at a fixed interval for new data sets, copying and transforming the data to the Data Host if necessary and producing the STAC entry for insertion into the Data Catalog.
#
Tasks- Poll for Updated Data
- Download Ice Chart Data from 3rd party FTP
- Download Radar Data from 3rd Party hosted repositories
- Transform Ice Chart Data from initial into hosted format
- Create Catalogue entries for downloaded data
#
2. Data Host#
DescriptionWhen data sources are not sufficiently hosted that they can be interacted with in a way that works with the desired methods data transfer or the data source lacks required permanence it may be necessary to replicate and transform the data from its original source and self on LOOKNorth controlled infrastructure. When this occurs the data catalog would then use a source reference link to the hosted asset instead of the original source.
#
Tasks- Store Raster Products so they are available in streamable format
- Store Vector Products so they are available for full or partial acquisition
- Provide and necessary hosting duties for the Catalogue str
#
3. Data Catalogue#
DescriptionThe data Catalogue will be a STAC compliant organization of available spatial temporal products. The catalogue itself will be a series of db tables representing a single source of truth for data that is available to the front end client. This will be a living data repository and the management and updating of which will be handled via the acquisition server. In this iteration of the GeoConnections Application the catalogue will hold available Sentinel1 and Ice chart data that is available able for download. Catalogue entries shall be comprised of asset metadata and a link to the source as defined in the STAC specification. As this is an evolving specification it will be imperative that the catalogue is designed in such a way to reduce rigidity.
#
Tasks- Add Assets and items for Catalogued entries into organized volumes
- Provide urls to the locations of queried items in the Data Store
- When available, provide URLS to well structured
#
4. Service Api#
DescriptionThe service api will service as an intermediary for all communication between the catalogued data and the application. When requested it will provide a list of available data sources based on a parameterized query from the requesting device. Additionally it will allow for the transformation of data in order to reduce load on the front end application, examples of this would be to compress or subset from a larger dataset in order to provide a result to the client application.
#
Tasks- Respond to Requested data requests from GeoConnections application
- Request information from Data Catalogue
- Subset and preprocess product data to minimize load on bandwidth and user device
#
Architecture Diagram#
Extension of the STAC specificationPart of the STAC Specification includes the creation of custom extensions to adapt the cataloging structure to suit the specific needs of an application. In order to support many of the hosted features within the GeoConnections catalogue a "Vector Product" asset extension will be generated in order to best serve a large temporal archive of discrete data products.
#
Why STAC over WFS 3.0The STAC spec is specifically tailored to imagery and gridded data. The specification itself also recommends usage of WFS for vector data. The reason GeoConnections has decided to develop a custom extension is the extremely dynamic nature of the catalogued sources. The design of the server architecture is such that the registration and collection of new data sources is a non-administrative behaviour. In order to support this, maximum flexibility of the specification is of the up-most importance. Secondly, at this time the targeted data sources are derived from source imagery, which fits the STAC spec, and this allows adjacency of the product links. Finally the catalogue will also provide access to 3rd party hosted assets, which would provide significant challenges to adapting to WFS 3.0
#
Catalogue Layout HierarchyIn cataloguing the collected data, hierarchical collections will be generated based on context to better allow for more elegant updating and parsing of the data that is able to be acquired. All queries will initially be routed via the root collection
This Hierarchy will be comprised on 5 levels
- Data Types
- Region
- Time Period
- Dataset
- Asset
#
Data Source CollectionThis level of the catalog will contain the various collection sources that have been added to the catalogue. Examples of this level of organization would be Environment Canada Ice Chart Data or Sentinel 1 Satellite data. This collection will have links to each collection of data routing either to a Temporal Collection or if the collection is performed over separate geographical areas a Regional Collection
//TODO: Add Source Collection Schema
#
Data Region Collection (optional)The optional regional collection is used for cases where the original data source is subset into distinct named regions. Examples of this would be High Arctic and Eastern Arctic collections of the Environment Canada Ice Chart Data. This collection would then link to temporal data collections in the same way as a non-regional source.
//TODO: Add Region Collection Schema
#
Data Temporal CollectionTemporal Collections are blocks of collected data that partition collected data in a series of collections. This is done to reduce the individual records that need to be searched for a temporal query and marginally reduce the amount of written data when a asset is added. Once there is not additional data being added to (either via reaching the end of the block time period or the end of collection from the source/region) collection acts as if frozen and becomes an immutable archival source of truth. This is similar in concept to the chaining of blocks within block chain technology.
//TODO: Add Temporal Collection Schema
#
ItemAn Item is a representation of a single collection event referenced by location and time. It is that level item specific metadata is collected via properties of the Item The item then has associated assets which are links to various manifestations of the collected data and associated files such as raw data, transformed data, and accompanying documentation. It is at this level the custom Vector Product extension will be utilized
//TODO: Add Item Schema
#
AssetsAn asset is the most atomic item in a catalogue which simply references links to individual downloadable items that are available for download. It is represented by a key which is used to identify the nature of the available data link.