Backend

In this page, we explain how to deploy the backend component (software and hardware infrastructure) of the JRC CbM system. An introduction to the system is described in the INTRODUCTION TO THE JRC CbM ARCHITECTURE. Here you find the reference to the technical material, to the instruction and to the code needed to set up a DIAS virtual machine, create a database to store parcel information and image metadata, access/generate Sentinel Copernicus Analysis Ready Data (CARD) for your area, process it to populate the signature table (satellite bands statistics per parcel unit) and provide interfaces for CbM users to visualize and analyse the relevant information. The use of the information generated by the backend is documented in the FRONTEND PAGES and the DATA ANALYTICS PAGES.

Goals

The main functions of the JRC CbM backend are:

  • to set up the hardware needed to run the JRC CbM system

  • to generate Application Ready Data (ARD), if not already in DIAS archive

  • to reduce the spatio-temporal image stacks of ARD to parcel time series

  • to provide server components and APIs for data access

This component of the JRC CbM is the basis for the other tasks and is preliminary for the development of CAP-related data analytics applications. Direct assistance is also provided by JRC to support Member States/Paying Agencies in the deployment and customization of the system that is the final expected output of this activity.

General requisites to set up the CbM system

In the framework of the Outreach project, a cloud infrastructure (based on CreoDIAS) to experiment the functionalities of the system has been created by GTCAP with the backend component developed and managed by JRC. MS can use dedicated API to explore and analyse Sentinel data extracted for they declared parcels. In this project, users do not have to install anything and can run the example code for data retrieval and manipulation stored in this repository, particularly using Python and Jupiter Notebooks. Instead, to start a dedicated CbM system for a PA is essential to have:

  1. Computing resources

  2. Copernicus Analysis Ready Data (CARD) (Sentinel 1 and 2 data for the study area)

  3. Agricultural parcel data (typically, declared parcels from the Land Parcel Identification System (LPIS) and the Geospatial Aid Application (GSAA))

The first two requisites can be achieved using one of the five Copernicus Data and Information Access Services (DIAS) available (CREODIAS, WEKEO, SOBLOO, MUNDI, ONDA).

How to create a CbM infrastructure

Set up a DIAS virtual machine

The virtual machines (VMs) of DIAS Infrastructure as a Service (IaaS) are emulations of fully functional computational instances. Users obtain VMs with full root access and can define different parameters and characteristics, including machine type (physical or virtual), RAM, CPU, storage quantity and type, operating system, middleware components and virtual networks connected to the machine. In the documentation we describe how to connect to the machine and to install the software required to run the JRC CbM.

CbM database

The CbM database stores the base layers (particularly, parcels and image metadata, the latter generated by the DIAS) and the signatures (i.e. number of pixels, mean, std, min, max, 25th percentile, 50th percentile, 75th percentile) that result form the intersection of satellite image bands and parcels. PostgreSQL with its spatial extension PostGIS are used as reference database software. In the documentation page, we introduce SRDBMS, we describe the database that has been set up for the JRC CbM. We describe how to create the database structure and populate it, and how access, retrieve, export and backup data stored in the DB.

CARD

Copernicus Analysis Ready Data (CARD) in CbM are Sentinel Images available in a format that is ready for analysis so that users can work with them without the burden of complex and time consuming images pre-processing steps, which include, as a minimum, georeferenced, calibrated sensor data (Level 2) over the whole area o interest. Although it is technically feasible to generate a CARD time series on demand, it is often convenient to extract large sets of parcels, for pre-selected bands, in a batch process. In the documentation we explain how to generate CARD (if not already available in the DIAS), how to make it available in the S3 storage and how to access it.

Extract signatures

The generation of bands signal statistics (signature time series) for each agricultural parcel is used to support CbM analytics, reducing the complexity and huge amount of satellite data covering 100% of the territory to a simpler and manageable data set (this process in the context of JRC CbM is also called reduction). Examples of typical tasks based on signature time series are marker generation, outlier identification, detection of heterogeneity. At all times, the direct link to the CARD inputs is maintained, which means that the source of the particular indicator can be (automatically) retrieved, e.g. for more detailed inspection. In the documentation we illustrate how to configure the system and run the Python procedure that extracts signature statistics and populate the database.

Create a RESTful API

In order to facilitate the access to parcel time series and satellite images, for analysts and final users who do not have a DIAS account, and particularly for they with limited technical background and no knowledge of SQL, a RESTful API with Flask can be build. It acts as an intermediate layer that provides predefined functionalities to extract data based on a limited and controlled set of parameters. This ensures performance and security. In the documentation we explain how to deploy and configure the RESTful API docker container.

Deployment of the CbM system

There are several steps to set up the core components for CbM that require different types of technical expertise.

  1. Setup server applications

    • Docker (containerization system)

    • Postgres database with PostGIS extension

    • Jupyter (interactive analysis and visualization environment)

    • Restful API (intermediate layer to access and use Copernicus data and the database)

  2. Adding data to the database

    • Parcels data

    • CARD Metadata and other setting data

  3. Process Sentinel data to derive relevant information

    • Parcel stats extraction routines

    • Machine learning algorithms

    • Analytical routines (e.g. markers detection)

  4. Analyzing and reporting

    • Using Jupyter Notebooks to explore and analyze data

    • Generate reports and other outputs to classify aid applications

Expertise required

To complete the tasks and set up a running system, Member States/Paying Agencies need the support of experts with

  • strong skills in Linux Virtual Machine (VM) configuration and administration

  • strong skills in the use of relevant open source components for EO image processing and geospatial data analysis (as a minimum: GDAL, web map services - WMS)

  • strong skills in programming and scripting (preferably python and relevant data processing libraries) for geospatial analysis

  • good working knowledge of PostgreSQL/Postgis and SQL

  • working knowledge of cloud or cluster computing solutions (Openstack, Docker, Docker Swarm) for VM orchestration for parallel computing

  • working knowledge of server interfaces for data access and analytics (Jupyter Hub, RESTful)

  • working knowledge of Sentinel data specifications, processing toolkits, and use case requirements in agriculture

  • good English communication skills if direct support from JRC is needed

In the future, the backend development may be impacted by Copernicus programme decisions (e.g. ARD production) and adoption of novel approaches (Kubernetes, Dask, GPU).

Enhancement Proposals

JRC CbM backend evolves very quickly with improved or new features. In this section we list some improvements that we want to integrate into the system in the future. If you want to suggest a new enhancement or you want to contribute with new modules, in the next section you find the links with the instructions to do so.

Terrain Correction

Introduce Radiometric Terrain Correction in CARD-BS processing: RTC normalizes for SAR viewing configuration, allowing better comparison of ASC and DESC and multiple orbit data.

S2 Multi-band

Implemented multi-band and index extraction for Sentinel-2 L2A: significant for marker generation. Database size may become an issue, thus, this is strictly done for cloud free parcels only.

Dask tests

Test Dask and decide whether it can replace (some of) the Docker based parallelisation: Dask allows for parallel processing of large and deep image stacks, e.g. for extraction and local image processing tasks (e.g. segmentation).

ML Revision

Revisit the 2018 machine learning code (DNN in tensorflow) and update with latest best-practice: the old code needs to be revisited and implemented as a “crop” marker. Documentation to be completed. Best to start with S1 signatures.

RESTful meteo

Create a RESTful service that accesses open meteorological “now cast” data (e.g. ERA5, GFS): Interpretation of time series often requires temperature and precipitation data to understand spurious trends.