NetCDF-U Use Cases
Introduction
This page lists use cases relevant to
NetCDF -U.
Please follow these instructions to document your use case:
- assign a new number to your use case (e.g. 'Use Case #23')
- create a new section on this page, with the heading showing the use case number and preferably a short title
- provide a summary on this page
- provide detailed information (whatever you want) on a dedicated page for your use case - create such a page by simply linking to it as shown in the following
You can use the following template when editing this page (simply copy it, then edit the page, paste the text at the end of the page and modify the text appropriately):
---++ Use Case #XXX - Short Title
Summary: tbd
Detailed information is provided on a [[PubSubSwgUseCaseXXX][dedicated page for use case #XXX]].
Use Case #1 - Sea Surface Temperature measurements from space
(Provided by
JonBlower, 26th Feb 2014)
Background
The ESA
Climate Change Initiative programme is generating Climate Data Records that provide high-quality measurements of a number of
"Essential Climate Variables", derived from Earth Observation satellites. These CDRs must have uncertainty measurements recorded in the data files where possible. This Use Case expresses the needs of the
Sea Surface Temperature (SST) project, led by
Prof. Chris Merchant at the University of Reading.
Reported uncertainties
Chris is producing estimates of SST as daily fields on a 1/20 degree (~5km) grid. “Customers” can generate products on coarser space-time grids through a toolbox. He wants to express uncertainties at the level of each individual grid cell in the field. The measures of uncertainty he wants to report are:
- Total uncertainty, i.e. standard deviation of the total estimated error distribution (non-systematic).
- Standard deviation of uncorrelated effects (instrument noise, sampling uncertainty)
- Standard deviation of synoptically-correlated effects (due to retrieval errors associated with large weather systems)
- Uncertain systematic errors (i.e. unknown biases due to the inversion process). These are usually estimated I think.
Therefore the CCI SST product contains 5 fields (i.e. 5
NetCDF variables) – the above 4 error fields, plus the mean field (best estimate).
In any derived product (e.g. a product that has been regridded onto a new grid), the total uncertainty (a) is derived from the error fields reported in the original 5km/1day field. These errors scale down in different ways, so it’s important for the original field to report these errors separately and accurately. It’s not enough to report only the total uncertainty.
All of these uncertainty measurements are standard deviations (Chris is using the metrological definition of “uncertainty” which is the “standard deviation of the estimated error distribution”). The uncertainties are just standard deviation statistics with no implication about the form of the distribution: the errors are not formally Gaussian.
Can NetCDF -U handle this?
The difficulty is that there are three different
causes of uncertainty, each with the same
form (a standard deviation).
UncertML can only express the form of the dispersion. But how should these be grouped and how should we distinguish between the different measures of uncertainty?
I suggest the following solution:
- Create a single "concept without variables" that groups the mean field plus all the measures of standard deviation. Tag the "concept without variables" with http://www.uncertml.org/statistics/statistics-collection. Tag the mean field with http://www.uncertml.org/statistics/mean and all uncertainty fields with http://www.uncertml.org/statistics/standard-deviation.
- Use the CF standard name "sea_surface_temperature" for all five fields.
- Distinguish between the uncertainty fields with new CF standard name modifiers, e.g.
sea_surface_temperature standard_deviation_of_uncorrelated_effects
and sea_surface_temperature standard_deviation_of_synoptically_correlated_effects
.
Can CF handle this?
I suspect that, since CF can already express mean and standard deviation statistics, CF could handle this without the need for
NetCDF -U. Each of the five variables would be indicated by standard names and standard name modifiers. The grouping mechanism is not explicit in CF, but is implied by the fact that all five fields would have the same standard name. (New standard name modifiers would need to be proposed, but this is a relatively simple procedure.) However, I'm not sure how to properly indicate the mean field in CF. More investigation from the CF community would be desirable.
Use Case #2 - atmospheric sounding measurements
(Provided by David Haffner, 27th Feb 2014)
Background
I am specifically interested in how we can assign uncertainty metadata to atmospheric sounding measurements. This data gets wide and varied use in several communities: NWP, data assimilation, climate-chemistry model evaluation, and inter-comparison of observations. Making the uncertainty easy to access and interpret will dramatically increase the useful application of the data.
In our total ozone algorithm, the largest uncertainties arise with the inverse technique itself. We use a standard Bayesian approach in our retrieval and report these errors in our latest version of the algorithm as datasets. But we have not worked out how to define the semantic links between the uncertainties and the retrieved quantity so an assimilation system would know what to do with them.
We also report operator metadata (functions) that allow users to revise the retrieved quantity if a more accurate a priori is available, and identify where the uncertainty originates as a function of vertical height. Though not uncertainties in the classic sense, this info is very useful in assimilation and inter-comparison studies. We have invested considerable effort in improving our ability to provide this information in an accurate way.
Is anyone in the community interested in developing a convention to encode this kind of information in CF uncertainty conventions? Perhaps someone has proposed a way to do it with existing schema tags?
Use Case #3 - probability density function
(Provided by Mark Hedley, 6th June Feb 2014)
Background
Numerical models may run ensembles with a range of initial conditions to enable the uncertainty of the evolution of the model to be estimated. The results of such model runs can be summarised using many different statistics.
One way of representing the results is to provide a sampled probability density function as an output.
Reported Data Sets
This use case requires a set of diagnostics, physical phenomena, to be provided; for each phenomena, a set of statistical aggregations are calculated across the ensemble of results.
A particular phenomena may be reported as:
- arithmetic mean
- standard deviation
- 1st percentile
- 5th percentile
- 10th percentile
- 17th percentile
- 33rd percentile
- 40th percentile
- 50th percentile
- 60th percentile
- 66th percentile
- 83rd percentile
- 90th percentile
- 95th percentile
- 99th percentile
No distribution is assumed for the ensemble, the data sets are provided as a discretised probability density function.
Each of these fields will be extensive in 4D spacetime. The collection of these fields for a single phenomena should be treated as parts of a single reported entity, the probability density function. The collection of collections, covering a number of phenomena, are provided as a forecast data set.
--
LorenzoBigagli - 09 Oct 2013