# Conceptual Data Models

Many of the existing OGC standards have assumed or defined two dimensional tilesets, such as in WMTS. It was agreed at the OGC Tech Conference in Boulder, June 2015, that the Web Coverage Tile Service SWG should define a conceptual model of an n-dimensional tileset, where n=2,3,4,5,.... The case where n=3 (x,y,z) is straighforward to envisage, and n=4 (x,y,z,t) not too difficult either. If one is considering multi-wavelength remote sensing or imaging, n=5 (x,y,z,t,λ) becomes reasonable. Meteorology may routinely forecast, not specific values, but a distribution of likely values, a Probability Distribution Function, so we could then envisage n=6 (x,y,z,t,λ,π). Of course a more recent weather forecast is probably more accurate than an earlier one, so now we could have n=7 (x,y,z,t1,λ,π,t2), and we still have not considered all the possible variable or parameters that are of interest: n=8 (p,x,y,z,t1,λ,π,t2)!

Of course, a time series could be considered a 1-D tileset.

We probably need to distinguish those dimensions that are, in some sense, continuous (x,y,z,t) in that interpolation to intermediate values is reasonable and those 'dimensions' that really are discrete layers, such as successive parameters (wind speed, wind direction, temperature, humidity,..) that may occur in a

NetCDF file, where interpolation is meaningless, or perhaps not reasonable.

We probably need a discussion here of range and domain sets in WCS, and the problem space of various ancillary services associated with WCS such as WPS, WCPS, etc and interfaces to other standrads and services. E.g. a likely processing chain to suspport a

WCTileService.

Here is a summary of some relevant tileset/datacube/data grid concepts.

## WMTS Web Map Tile Service

2D to be done

## WMTS Simple Profile

2D but only on certain map projections, to be done

## GeoPackage

2D map images only?? to be done

## DGGS Digital Global Grid System

?? to be done

## Meteorology and Oceanography Modelling Grids

Meteorological and oceanographic forecast models usually assume a rectilinear quasi-horizontal grid covering either the complete earth or a 'rectangular' domain of interest. The earth is usually assumed to be a sphere, or occasionally an oblate spheroid (i.e. a N-S cross section including the earth's axis is an ellipse). Remote sensing information, such as from satellites, is usally re-projected from a highly detailed, geoid based, location and time to the model grids to allow consistent data usage.

The grid is usually regularly spaced in some map projection. These are often conformal (Equal Angle) such as Mercator or Northern Polar Stereographic, rather than Equal Area or other projections, as navigation is a primary use case.

In the vertical, there is usually an irregularly spaced grid, usually to have extra resolution in the boundary layer, the bottom kilometre or so of atmosphere, or near the tropopause where there are significantly strong winds. In the oceans, the grids have higher resolution just below the surface, though some continental shelf models also have increased vertical resolution near the bottom.

The grids are usually regular over time, say every few minutes or every hour, though the data may only be stored for a more irregular pattern such as 24 times hourly for a 24 hour forecast, then 16 times every 3 hours for two more days to day three, then 14 times every 12 hours until the ten day forecast is reached.

These grids define 'boxes' or cubes, and the data is usually a value representative of that volume and is considered to be at the centre of the 'box'. I.e. the temperature is an 'average temperature' over the volume. Some parameters may be accumulations over time, or integrations over the full depth of the vertical grid.

Any such grid also defines a dual, alternative, grid consisting of the centres of the boxes, or the planes/line/points of contact between the boxes. In practice, a grid may consist of a combination of such views. E.g. Pressure and Temperature are at the centre of a 'box', whereas wind components are on the 'edges' of the 'boxes'. Fortunately, in practice, values are nearly always interpolated to consistent 'central' positions for external consumption. There is a taxonomy of such patterns of parameter 'grid reference positions' created by the Japanese scientist Arakawa, and these are determined by the efficient solutions of equations. Generally, there is no usage of reference points like 'top left' or bottom right'.

So, for meteorology and oceanography, a tileset is a rectangular, multi-dimensional array of values, where 3- or 4-D location can be determine by counting using some scanning pattern in x,y,z and perhaps t and solving a relatively simple functional equation of the map projection. Some tilesets are 1-D, such as 'soundings'/vertical trajectories/ascents or time series. The traditional meteorological data formats do this. Also, values are often stored as application specific, scaled positive short integers to save space and bandwidth. For example, surface temperature is only measured to the nearest 0.1°C. More accuracy is spurious. Locations do not need to be highly accurate either, quantified to the nearest 10m, 100m or even a Km is usually good enough.

### Some complications (been there, done that!)

Meteorologists often rotate the poles and the equator to more convenient locations such as the north Pacific or Paris, and also 'stretch' the lat/long spacing.

Having values at half of a grid length north of the north pole, or south of the south pole, is not unusual.

We know how to handle multiple values of one vector value at the pole, as well as scalars and tensors. E.g, consider a one degree lat-long grid from -90 to +90, 0-360, size 180x360. In the +90 row, repeat the scalar value 360 times for consistent behaviour and processing.

Oceanographers often assume that there are three poles, to get convenient projections.

Some forecast models use spherical harmonic functions to represent values, giving rise to grids in physical space with slightly irregular latitude spacing.

Some operational models have grids based on an icosohedral partitioning of the earth's surface. There have been many other experiments, such as spiral grids with Fibonacci number based spacing.

Let us just ignore all these complications.

## Bare Bones Conceptual Model

This is a brain dump of my thoughts and questions after the 2015-06-23 telco. I think that any conceptual model probably needs:

### Rectilinear grids of boxes

Let's ignore TINs and non-rectilinear.

Whole earth edge cases:

- one grid box for whole earth (WMTS/Google Level 0 tileset?)
- very small grid boxes, each containing one point (WMTS/Google Level 18/17/16/.. tileset?)

Limited area edge cases:

- one grid box for whole area. Does is cross any poles? meridians?
- very small grid boxes, each containing one point (WMTS/Google Level 18/17/16/.. tileset?)

Grid points/data values/pixels within each box.

Assume each are 'regular' wrt the enclosing box, so they all align and let's not do staggered or not fully aligned and in phase.

### Shape of the earth

Need to support spherical earths, oblate spheroids and more complex geoids.

Should we have a default? Yes - makes 'schema free' JSON and CSV easier, but do not forbid explcit declarations of a geoid. WGS84?

### Separable dimensions

#### Time:

Data at regular intervals are easy. Specify Origin/Epoch in ISO8601, interval duration with

UoM (seconds or hours or millions of years etc) and count.

Irregular intervals requires a Look-Up Table - sequence of specified times. Do we specify a temporal CRS? E.g. seconds or hours or millions of years etc. Or do we just use ISO8601 calendar orientated notation?

Do we assume times are centred? So that time T(n) is representative of box from (T(n-1)+T(n))/2 to (T(n)+T(n+1))/2 ? Or do we assume T(n) is representative of T(n) to T(n+1) or T(n-1) to T(n)? TImeseries ML call these options the "InterpolationType". AKA Pixel Ref Point. As a rule of thumb, imagery (aerial, satellite) use the centred pixel-is-area and measurement data like elevation use pixel-is-point, usually 'top left', or in this case T(n) to T(n+1) .

Then we partition gridpoints/data values/pixels into equal sized groups. This is the 'tileset' for delivery. Edge cases: only one group or one pixel/group

#### Vertical

Regular intervals are easy. specify Origin/vertical datum, vertical interval with

UoM (metres, hPa, Flight levels etc) and count.

Irregular intervals requires a Look-Up Table - sequence of specified levels. Do we specify a vertical CRS with datum? E.g. metres, hPa, Flight levels etc.

Do we assume levels are centred? So that level L(n) is representative of box from (L(n-1)+L(n))/2 to (L(n)+L(n+1))/2 ? Or do we assume

L(n) is representative of L(n) to L(n+1) or L(n-1) to L(n)? What about L(0) the surface? This favours the pixel-is-point L(n) to L(n+1).Meteorology recognises both options.

Then we partition gridpoints/data values/pixels into equal sized groups. This is the 'tileset' for delivery. Edge cases: only one group or one pixel/group

#### Other dimensions

like wavelength? No different?

#### x and y

no different? apart from the combinatorial explosion of possible options for pixel-is-point (8 in 2D) and pixel-is-area (1 in 2D)

-- Main.clittle - 24 Jun 2015