Standards and Bodies
This page collects standards relevant for Big Data, as well as relevant bodies performing standardization or other relevant activities in the field. Eventually, this will be merged into the final document. For each standard added provide a brief justification why it is relevant for Big Data, best with geo relevance. -- PeterBaumann
- 31 May 2014
OGC deals with Big Data in manifold ways. As the saying goes
, 80% of all data is has a location component - this alone justifies existence of Big Geo Data. However, drilling deeper into the manifold domains covered by OGC member companies, agencies, and universities confirms this even more. Prominent examples include satellite imagery and climate simulation output, each filling Petabyte data center archives and keeping thousands of cluster and cloud nodes busy.
Below we list the main important standards in the Big Geo Data, underlining OGC's contribution to addressing the data deluge.
- GML 3.2.1 Application Schema - Coverages("GMLCOV") defines OGC's unified coverage model, based on ISO 19123 / OGC Abstract Topic 6. Coverages are at the heart of Big Data in the geo realm. This includes regular and irregular grids, point clouds, and meshes. While ISO 19123 defines an abstract coverage model (where concrete implementations usually are not interoperable), GMLCOV refines this to a concrete, interoperable model which allows conformance testing down to pixel level.
- Volume: Coverages are the prime contributors in terms of volume: satellite image archives, climate simulation output, sensor data, etc. represent Petabyte holdings by NASA, ESA, etc.
- Variety: Coverages encompass 1D sensor timeseries, 2D satellite imagery, 3D x/y/t image timeseries and x/y/z geophysical voxelmodels, 4D x/y/z/t climate data, and beyond. They come with different numbers of bands ("variables", "channels"), different resolution, and different semantics (optical, radar, physical atmospheric and ocean parameters, lithology, etc.).
- Velocity: Satellite imagery, for example, is continuously pouring in at Terabyte rates per day (e.g., NASA EOSDIS and ESA Sentiel family). For disaster mitigation, products need to be available immediately after acquisition.
- Veracity: Issues of provenance, lineage, trustworthiness play an important role. For example, the ocean color analysis program l2gen produces, for every pixel, a quality vector with over 30 different characteristics.
- Web Coverage Service(WCS). While coverages can be served by most OGC services, such as WMS, WFS, WCS, WCPS, WPS, SOS, etc. as they are a special class of features, WCS offers the most versatile and Big Data oriented functionality. In brief, this is (i) data reduction and (ii) code shipping. Not only does this save substantial bandwidth, it also brings along a significantly increased quality of service to users. WCS is that OGC standard which fully supports multi-dimensional, spatio-temporal data.
- data reduction: The WCS Core offers, in an easy-to-use way, extraction of relevant data from a coverage already in the server. Trimming gives a subset of the same dimension (such as a 2D cutout from a 2D map) while slicing extracts data along a particular dimension, thereby reducing the number of dimensions (for example, a 2D timeslice or a 1D timeseries from an x/y/t image data cube). Range subsetting allows extracting and recombining of bands from hyperspectral imagery and multi-parameter climate data.
- "Code shipping" means sending the tasks to the server for execution close to the data, rather than transporting large amounts of data to a processing location ("data shipping". Already WCS Core and its service extensions massively support this paradigm: simple request, encoded in one of several protocol bindings supported, task the server to perform a pre-processing and filtering, thereby returning to the client only what it actually needs ("what you get is what you need").
- Web Coverage Processing Service (WCPS) defines a protocol-independent, high-level language for on-demand processing and filtering on multi-dimensional gridded coverages. It is particularly suited for high-performance navigation, extraction, aggregation, and general analytics of Big Data representing spatio-temporal sensor, image, model, and statistics data. Such queries can be written by experts, giving them maximum flexibility, or can be generated through visual interfaces hiding the language behind appealing point-and-click interfaces. As opposed to WPS which offers processes pre-defined by some administrator, the WCPS language enables users to perform an unlimited ad-hoc mix-and-match, implementing a philosophy of "build your own product on the fly". * volume: server-side filtering and processing allows user extracting exactly that information from the Big Data they need. * variety: a single set of operations works coherently across all space and time dimensions. Coverages of different resolution and type can be combined ad-hoc. * veracity: WCPS queries allow to extract quality information from the data offerings, thereby enabling users to assess usefulness of data prior to accessing them.
- Web Processing Service
- OGC WAMI for video streaming
- OWS-10 Engineering Report on Cloud Performance
Big Geo Data standards:
- ISO 19123:2005 Geographic information -- Schema for coverage geometry and functions. This standard, which is identical to OGC Abstract Topic 6, defines an abstract model of coverages, loosely speaking, as a digital representation of some space/time-varying phenomenon. This includes regular and irregular grids, point clouds, and meshes. ISO 19123 forms the basis for the concrete, interoperable coverage model of OGC, GMLCOV (see above).
- ISO 19136:2007, Geographic information – Geography Markup Language (aka GML 3.2.1)
- ISO/CD 19136-2, Geographic information - Geography Markup Language (GML) - Part 2: Extended schemas and encoding rules (aka GML 3.3)
- ISO 19139:2007 Geographic information -- Metadata -- XML schema implementation. This addresses metadata; while relevant for handling Big Data, metadata do not constitute Big Data themselves.
- ISO/TS 19138:2006 Geographic information -- Data quality measures: ""ISO/TS 19138:2006 defines a set of data quality measures. These can be used when reporting data quality for the data quality subelements identified in ISO 19113. Multiple measures are defined for each data quality subelement, and the choice of which to use will depend on the type of data and its intended purpose. The data quality measures are structured so that they can be maintained in a register established in conformance with ISO 19135."
- ISO 19149:2011, Geographic information - Rights expression language for geographic information - GeoREL: "ISO 19149 defines a XML-based vocabulary or language to express rights for geographic information in order that digital licenses may be created for such information and related services… Each digital license will unambiguously express those particular rights that the owners (or their agent) of a digital geographic resource extends to the holders of that license. ... These “rights” are not always covered by copyright law, and are often the result of contracts between individuals."
- ISO 19153 Geographic information - Geospatial Digital Rights Management Reference Model (GeoDRM RM)
- ISO/IEC 13249-3:2011, Information technology -- Database languages -- SQL multimedia and application packages -- Part 3: Spatial
- ISO 80000-1:2009 Quantities and units - Part 1: General, 2009-11-17 (under work)
- ISO 80000-3:2006, Quantities and units - Part 3: Space and time (under work)
- "ISO/IEC 15444-1:2004 w/Cor1:2007/2:2008 & Amd1:2006/2:2009 Information Technology -- JPEG 2000 image coding system: Core coding system, Edition 2, 2004-09-23; Cor 1:2007, Edition 1, 2007-07-10; Cor 2:2008 Clarification on determination of maximum file size, Edition1, 2008-04-07; Amd 1:2006 Profiles for digital cinema applications, Edition 1, 2006-01-09; Amd 2:2009 Extended profiles for cinema and video production and archival applications, Edition 1, 2009-12-07
- GML Application Schema - Coverages - GeoTIFF Coverage Encoding Profile of WCS
- OGC Web Coverage Service 2.0 Interface Standard - Earth Observation Application Profile version 0.4.0
- OGC GeoSPARQL: A Geographic Query Language for RDF Data Standard
- GML in JPEG 2000 Encoding Standard version 2: "This standard applies to the encoding and decoding of JPEG 2000 images that contain GML for use with geographic imagery. This document specifies the use of the Geography Markup Language (GML) within the XML boxes of the JPEG 2000 data format and provides an application schema for JPEG 2000 that can be extended to include geometrical feature descriptions and annotations. The document also specifies the encoding and packaging rules for GML use in JPEG 2000."
- ESA HMA
To be inspected:
- ISO 19363:2012: Space data and information transfer systems -- Audit and certification of trustworthy digital repositories
- : “The objective of this Technical Specification is the coordinated development of standards that [allows] the benefits of distributed geographic image processing to be realized in an environment of heterogeneous IT resources and multiple organizational domains.”
- ISO 19113:2002 Geographic information - Quality principles: withdrawn upon release of ISO 19157:2013
- ISO 19115-2:2009, Geographic information - Metadata - Part 2: Extensions for imagery and gridded data: "ISO 19115-2:2009 extends the existing geographic metadata standard by defining the schema required for describing imagery and gridded data."
- ISO 19119 Geographic information - Services (Revision of ISO 19119:2005): "This standard provides a framework for developers to create software that enables users to access and process geographic data from a variety of sources across a generic computing interface within an open information technology environment."
- ISO/TS 19130:2010 Geographic information -- Imagery sensor models for geopositioning: "identifies the information required to determine the relationship between the position of a remotely sensed pixel in image coordinates and its geoposition. It supports exploitation of remotely sensed images. It defines the metadata to be distributed with the image to enable user determination of geographic position from the observations."
- ISO 19130-2 Geographic information - Imagery sensor models for geopositioning — Part 2: SAR, InSAR, Lidar and Sonar: ""ISO/TS 19130-2:2014 supports exploitation of remotely sensed images. It specifies the
sensor models and metadata for geopositioning images remotely sensed by Synthetic Aperture Radar (SAR), Interferometric Synthetic Aperture Radar (InSAR
), LIght Detection And Ranging (lidar), and SOund Navigation And Ranging (sonar) sensors. The specification also defines the metadata needed for the aerial triangulation of airborne and spaceborne images."
- ISO 19131:2007, Geographic information -- Data product specifications: "ISO 19131:2007 specifies requirements for the specification of geographic data products, based upon the concepts of other ISO 19100 International Standards. It also provides help in the creation of data product specifications, so that they are easily understood and fit for their intended purpose."
- ISO 19131, Geographic information — Data product specifications AMENDMENT
- ISO 19133:2005, Geographic information - Location-based services - Tracking and navigation
- ISO 19134:2007 Geographic information -- Location-based services -- Multimodal routing and navigation
- ISO 19139-2:2012 Geographic information - Metadata - XML Schema Implementation - Part 2 : Extensions for imagery and gridded data: "“This Technical Specification defines geographic metadata for imagery and gridded data XML (gmi) encoding. This Technical Specification extends the ISO/TS 19139 specification to define XML Schema implementation for ISO 19115-2, Metadata for imagery and gridded data.”
- ISO 19157, Geographic information – Data quality
- ISO 19163, Geographic information - Content components and encoding rules for imagery and gridded data (note: will build upon GMLCOV)
to be evaluated:
- IEEE P2413:Standard for an Architectural Framework for the Internet of Things (IoT)
- Geographic imagery and gridded thematic data are widely used in geospatial communities and related application fields. Over the past two decades, several standards of geographic images have been developed by ISO TC 211. ISO 19123:2005 defines a conceptual schema for the spatial characteristics of coverages and defines the relationship between the domain of a coverage and an associated attribute range. Multiple types of coverages are defined in ISO 19123, including raster, triangulated irregular network, point , curve, and polygon coverages.
- ISO/TS 19129:2009 defines a framework for the content of imagery, gridded and coverage data, which covers the general data structure and associated metadata. Based on ISO 19123:2005, this Technical Specification specifies the template application schema for different coverages, including continuous quadrilateral grid coverage, Riemann hyperspatial multidimensional grid coverage, Triangulated Irregular Network (TIN), discrete point and surface coverage.
- ISO/TS 19101-2:2008 provides a reference model and a common abstract architecture for processing geographic imagery in open distributed environments . The reference model includes gridded data with an emphasis on geographic imagery. ISO 19101-2 specifies images sensed directly by remote sensors as well as images derived from Geographic Imagery Scenes. Derived images can be the measures of physical properties of a remote object.
- ISO 19115-2:2009 extends ISO 19115:2003 by defining the metadata schema required for describing imagery.
- ISO/TS 19130:2010 and ISO/TS 19130-2:2014 define the sensor models for geopositioning imagery data.
- ISO/RS 19124:2000 provides a summary on the conceptual classification of gridded data based on spatial and attribute properties, and identifies five basic components of imagery and gridded data. ISO 19101-2, ISO 19123 and ISO 19129 specify domain and range of imagery, grid and coverage and their associated relationship. ISO 19129 breaks down the metadata into discovery, structural, acquisition, and quality metadata. However, there are no detailed descriptions on each category and no clear associations with metadata defined in ISO 19115:2003, ISO 19115-2:2009, ISO/TS 19130:2010 , and ISO/TS 19130-2: 2014.
Imagery is acquired by remote sensors directly or derived from source imagery. Value-added image processings can be used to derive physical properties of a remote object from inages [ISO 19101-2:2008]. Besides the derived images, imagery can also be integrated with other data sources to produce new gridded coverage data for a specific theme, so-called thematic data. Thematic data provides more information about objects and is widely used in various applications. However, the characteristics of thematic data are not covered by the existing standards and specifications noted above.
ISO/TS 19130:2010 identifies the type of remote sensors by the measurand of the sensor, e.g., optical radiation, microwave energy, sonar (acoustic) energy. Images acquired by optical sensor have a different appearance and performance compared to those by a microwave sensor, e.g. SAR data.
- An increasingly large volume of image and gridded data, both natural and synthetic, is being produced because more and more remote sensors are becoming available. These data are encoded using different formats, for example, GeoTIFF, HDF-EOS, JPEG 2000, or other formats described in ISO/TR 19121. These encoding formats follow different imagery exchange standards without a common data model, preventing them from being interoperable.
- The framework defined in ISO 19129 describes imagery, gridded and coverage data at multiple levels, including an abstract level, a content model level, and an encoding level. The first two levels combine a number of well-defined content structures in accordance with ISO 19123, and define the contents of continuous quadrilateral gridded coverage with both constant cell size grid and variable cell size grid. However, the content model level does not specify the necessary metadata for common understanding during interworking of datasets encoded in different formats. At the encoding level, ISO 19129 does not provide the explicit encoding rules to describe how to map content model to machine-independent encoding structure, which is crucial for the mapping and translation among images in different formats without loosing information..
- Based on the frameworks defined in ISO 19101-2:2008 and ISO 19123:2005, This Technical Specification specifies categories of imagery and gridded data and correspondingly establishes a hierarchical content model. Categories of imagery and gridded data are defined based on thematic and spatial attributes and sensor types. Then, the content model is defined to describe the required content components of each category, including spatial and attribute structures and critical metadata entries as well. These metadata entries are specified as the minimum required metadata information for common understanding purposes.
- For ease of implementation, this Technical Specification defines encoding rules to map the content models into XML-based encoding, following the general encoding rules defined in ISO 19118. It also provides examples to illustrate the binding of XML-based encoding data into selected commonly-used physical data formats, e.g. GeoTIFF, JPEG 2000, HDF-EOS.
- ISO 20718:2014, Information technology -- Security techniques -- Code of practice for protection of personally identifiable information (PII) in public clouds acting as PII processors
- ISO/IEC 27000:2014, Information technology — Security techniques — Information security management systems — Overview and vocabulary
- ISO/IEC 27001:2013, Information technology — Security techniques — Information security management systems — Requirements
- ISO/IEC 27002:2013, Information technology — Security techniques — Code of practice for information security controls
- ISO/IEC 29100:2011, Information technology — Security techniques — Privacy framework
Other relevant bodies