Software prerequisites

Docker (Ubuntu 20.04)

Note these instructions are for Ubuntu 20.04 and may not work for other platforms. Installation instructions for other platforms can be found at docs.docker.com.

The open virtualization software Docker was used to deploy all the applications required for CbM development. The Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers. To run the extraction routines it is recommended to install the latest version of docker with the below steps:

sudo snap remove docker
rm -R /var/lib/docker
sudo apt-get remove docker docker-engine docker.io
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

To add the user to the docker group (the user will be able to run docker commands without sudo). Restart may be required.

sudo usermod -aG docker $USER

PostGIS

For this project we use PostgreSQL database with the PostGIS extension. Postgis extends the open source PostgreSQL database server with spatial data constructs (e.g. geometries) and spatial querying capacities, allowing storage and query of information about location and mapping.

To run a postgres database with postgis extension run:

docker run --name cbm_db --restart always -v pgdata:/var/lib/postgresql -p 5432:5432 -e POSTGRES_PASSWORD=MYPASSWORD -d postgis/postgis

Change the POSTGRES_PASS=mydiaspassword to a secure password for the database.

This will return with a long docker container ID. Check if all is well:

docker ps -a
CONTAINER ID        IMAGE             COMMAND                  CREATED             STATUS                   PORTS                    NAMES
85fc1f296000        postgis/postgis   "docker-entrypoint.s…"   9 seconds ago       Up 7 seconds             0.0.0.0:5432->5432/tcp   cbm_db

You need postgresql client tools to access the database. Make sure these match the database version (which is 10 in the example case). For example, for the command line interface on ubuntu:

sudo apt-get install postgresql-client-common postgresql-client-10

You can now connect to the database:

psql -h localhost -d postgres -U postgres

psql (10.12 (Ubuntu 10.12-0ubuntu0.18.04.1), server 10.7 (Debian 10.7-1.pgdg90+1))
Type "help" for help.

postgres=#

To be sure the required postgis extensions are enable run:

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;
The postgis image may contain the TIGER data base per default (this is a often

used in postgis training). We don’t need it, so remove with:

postgres=# DROP schema tiger, tiger_data cascade;

and exit and reconnect (you are now in schema public). List the default tables in that schema:

postgres=# \q
psql -h localhost -d postgres -U postgres
postgres=# \d
               List of relations
 Schema |       Name        | Type  |  Owner
--------+-------------------+-------+----------
 public | geography_columns | view  | postgres
 public | geometry_columns  | view  | postgres
 public | raster_columns    | view  | postgres
 public | raster_overviews  | view  | postgres
 public | spatial_ref_sys   | table | postgres
(5 rows)
These tables are required for the handling of spatial constructs (geometries,

raster data, projection information).

Optimizing

The main configuration settings for PostgreSQL are in a text file postgresql.conf (/etc/postgresql/”version”/main/postgresql.conf). PostgreSQL ships with a basic configuration tuned for wide compatibility rather than performance.

It is strongly recommended to configure the settings of the PostgreSQL database based on your hardware configuration and application, suggested configurations can be found at PGTune or at PGConfig.

Essential CbM tables

We now need to create the tables that we will use in the CbM context:

CREATE TABLE public.aois (name text);
SELECT addgeometrycolumn('aois', 'wkb_geometry', 4326, 'POLYGON', 2);


CREATE TABLE public.dias_catalogue (
    id serial NOT NULL,
    obstime timestamp without time zone NOT NULL,
    reference character varying(120) NOT NULL,
    sensor character(2) NOT NULL,
    card character(2) NOT NULL,
    status character varying(24) DEFAULT 'ingested'::character varying NOT NULL,
    footprint public.geometry(Polygon,4326)
);

ALTER TABLE ONLY public.dias_catalogue
    ADD CONSTRAINT dias_catalogue_pkey PRIMARY KEY (id);

CREATE INDEX dias_catalogue_footprint_idx ON public.dias_catalogue USING gist (footprint);

CREATE UNIQUE INDEX dias_catalogue_reference_idx ON public.dias_catalogue USING btree (reference);

CREATE TABLE public.aoi_s2_signatures (
    pid integer,
    obsid integer,
    band character(3),
    count real, mean real, std real,
    min real, max real,
    p25 real, p50 real, p75 real
);

CREATE INDEX aoi_s2_signatures_bidx ON public.aoi_s2_signatures USING btree (band);
CREATE INDEX aoi_s2_signatures_obsidx ON public.aoi_s2_signatures USING btree (obsid);
CREATE INDEX aoi_s2_signatures_pidx ON public.aoi_s2_signatures USING btree (pid);

The table aois is an ancillary table in which one can define the geometries of the areas of interest. The dias_catalogue is an essential table that stores the metadata for the relevant Sentinel-1 and -2 image frames. The table aoi_s2_signatures will store the time series extracts which will be linked to the parcel ID (pid) from the to-be-uploaded parcel reference table for each observation id (obsid) in the dias_catalogue.

Generate a new aoi_s2_signatures table for each aoi. This will typically be needed for separate years, as parcel references change. For instance, a table name like nld2019_s2_signatures would store all S2 records for the NL reference for 2019.

For Sentinel-1 time series create the equivalent tables with bs (backscattering coefficients) and c6 (6-day coherence) instead of s2 in the table name.

Jupyter server

The Jupyter Server is an open source web application that allows to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more (https://jupyter.org). JupyterLab is the next-generation user interface for Project Jupyter offering all the familiar building blocks of the classic Jupyter Notebook (notebook, terminal, text editor, file browser, rich outputs, etc.) in a flexible and powerful user interface. JupyterLab will eventually replace the classic Jupyter Notebook (https://jupyterlab.readthedocs.io).

Instaling DIAS Jupyter (Jupyter Notebook Tensorflow Python Stack for CbM)

GTCAP cbm_jupyter docker image is based on the tensorflow-notebook of Jupyter Notebook Scientific Python Stack and configured for Copernicus DIAS for CAP “checks by monitoring” with all the requirements. This is the recommended way to run a Jupyter server. Some DIAS providers may provide preinstalled Jupyter environments as well.

Run GTCAP Jupyter docker image

To run a jupyter server with the default setup:

docker run --name cbm_jupyter -p 8888:8888 gtcap/cbm_jupyter

This will run the jupyter server on port ‘8888’ and can be accessed from a web browser on ‘localhost:8888’.

To expose the jupyter server to port 80, change -p 8888:8888 to -p 80:8888, or to any other port.

More options

To pull the docker image from dockerhub use:

docker pull gtcap/cbm_jupyter

To configure and access the current local directory within the jupyter server run:

docker run -it -p 8888:8888 -v "$PWD":/home/jovyan --name=cbm_jupyter gtcap/cbm_jupyter

To run the Jupyter server with a predefined token, add at the end of the command:

start-notebook.sh --ServerApp.token='abcdefghijk1234567890'

Note: JupyterLab can be accessed by adding /lab at the url, instead of /tree (e.g. localhost/lab).

To run with enabled JupyterLab by default add -e JUPYTER_ENABLE_LAB=yes flag.

To run with enabled JupyterLab by default and mount the current directory run:

docker run -it -e JUPYTER_ENABLE_LAB=yes -p 8888:8888 -v "$PWD":/home/jovyan --name=cbm_jupyter gtcap/cbm_jupyter

For more options visit jupyter-docker-stacks.readthedocs.io

To access jupyter server, open in the web browser the link with the token that is provided in the terminal (e.g. http://localhost/tree?token=abcdefghijk1234567890).

Usage Instructions

All Jupyter Notebooks files have the extension ‘.ipynb’ and are identifiable by the notebook icon next to their name. To create a new notebook in JupyterLab, go to File -> New and select ‘New Notebook’. Notebooks currently running will have a green dot, while non-running ones will not. To run a cell with a script, click on the run icon or press Shift+Enter

More information can be found at: https://jupyter.org/documentation

The token to access the jupyter server will be in the command line output:

[I 08:51:48.705 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 08:51:48.708 NotebookApp]

   To access the notebook, open this file in a browser:
       file:///home/jovyan/.local/share/jupyter/runtime/nbserver-8-open.html
   Or copy and paste one of these URLs:
       http://abcd12345678:8888/?token=abcd12345678
    or http://127.0.0.1:8888/?token=abcd12345678

You will be able to access the Jupyter server on port 8888 (or any other port) on VM’s public ip e.g.: 0.0.0.0:8888 Copy the token from the command line and add it to the web interface.

Build Jupyter image from source

To build cbm_jupyter docker image from source see the Jupyter for cbm README.md file.