Skip to content

NovaStar Data / Data Reports and Time Series


Introduction

As indicated in other sections of this documentation, data at points can originate from multiple sources:

The following sections explains how the original data reports are processed into time series.

Data Reports

The core NovaStar system processes "data reports", which are measurements and other data values originating from various data sources, as discussed in the Data Generators documentation. Each "report" conceptually consists of the following. The values in data reports originate from configuration data stored in the database, measurement-related data, and calculated values.

  • location identifier (from configuration data)
  • data type (from configuration data)
  • timestamp for the data value (from measurement)
  • raw data value (from measurement)
  • calibration identifier (from configuration data)
  • scaled data value (from measurement or calculation)
  • NovaScore (from alarm trigger analysis)
  • data flags (from measurement and analysis)
  • rated values and associated flags (from calculations)

Data web services allow data reports to be queried directly using the data and dataReports services. See also the Time Series section for information about querying data as time series.

Location Identifier

The location identifier for time series returned by the tscatalog and ts services is the station numerical identifier, or station numerical identifier and point tag name if needed for uniqueness. Data reports in the database are associated with a point and each point is associated with a station (which has a location identifier).

Data Type

Data types in NovaStar have evolved over time. The current design allows either a SHEF parameter code to be specified for a point type (e.g., HG for data type WaterLevelRiver), or specify a non-SHEF data type such as (Alert2GpsLock). In the past, it was necessary to request data using the point identifier. However, the current convention used to request time series in web services is to specify the location identifier and data type.

Timestamp

NovaStar internally stores data using UTC timezone. However, data import programs typically allow local time to be specified. The timezone will be converted to and from UTC as appropriate.

Raw Data Value

Raw data values are those measured at the station. For example precipitation is often measured as a counter of bucket tips and water level may be measured as an analog value using a pressure transducer.

Calibration Identifier

The raw values are processed into scaled values using calibration data. Data reports store the calibration identifier that was used to perform the calculation.

Scaled Data Value

The scaled data value is the value that is typically used for operations. The data loading process may set the scaled value directly or it may be calculated from the raw value using a calibration (see the previous section).

NovaScore

A NovaScore value is assigned to the data report based on the scaled value and an evaluation of alarm triggers. A default NovaScore is assigned to each report indicating normal conditions. If any alarm triggers are defined, they are evaluated, and if triggered, will result in higher-severity NovaScore being assigned to the data report.

Data Flags

The data flags is text that provides additional information about a data report.

Data Quality Flags

The data flags contain one or more single characters that indicate qualitative information, including whether the data are questionable, valid, verified, collected during maintenance, etc. NovaStar software typically only outputs data reports that are flagged as valid values unless otherwise requested.

Data Source Line Number

NovaStar systems allow data to be provided from multiple sources, including real-time services such as ALERT/ALERT2 and scheduled data import programs.

Rated Values

If defined, ratings are used to compute additional rated values. For example, accumulated precipitation is used to compute incremental precipitation, storm total, and season total. Water level can be used to compute discharge (streamflow).

Time Series

The term "time series" refers to a sequence of data values, each associated with a measurement time. Time series returned by web services have the following properties:

  • location identifier (for NovaStar data web services this is the station numerical identifier (stationNumId), or if necessary for uniqueness, stationNumId and point tag name)
  • data type - see Data Web Services Data Type
  • interval - for example IrregSecond, 1Hour, etc.
  • name
  • description
  • other properties

Data values associated with time series include:

  • date/time (timestamp)
  • data value, or a missing indicator such as NaN
  • data flag (e.g., Q for questionable and V for validated)
  • duration, optional, which indicates the duration over which the measurement occurred

Time series that have irregular interval must provide the timestamp for each data value. Time series that have regular interval can be optimized to only store the start and end date/time

Irregular Interval Time Series

Irregular interval time series for real-time data in data web services typically measure an instantaneous value from a sensor and have an interval of IrregSecond, indicating that the data have irregular interval and the precision of the timestamp is to seconds (or fractions of seconds). Such data is typical for systems that are event-driven, such as flood warning systems.

Some systems use the term "instantaneous" data. For example, the USGS provides web services for "instantaneous" data. However, the USGS web service values have 15Minute data, meaning that values are recorded every 15 minutes, or are calculated from smaller interval.

Data web services, such as the tscatalog and ts services return irregular interval time series for data reports and associated values (rated values, calibrations, alarms, NovaScores).

Examples of irregular interval time series include:

  • data reports triggered by a change in value:
    • rain gage bucket tips
    • water level and discharge at a time, caused by a change in value
  • a timed "regular report" that is transmitted at a certain time each day, regardless of whether conditions are changing:
    • typically 12 hours in a legacy ALERT system
    • typically 1 hour in an ALERT2 system
    • depends on the cost of communication in a system
  • values computed from the above, for example:
    • storm precipitation total spanning non-zero precipitation values, totaled for each measured precipitation value
    • stream discharge computed from water level

Regular Interval Time Series

The NovaStar system is designed manage event-based data rather than as an archival system for interval data. Interval time series are not typically stored in NovaStar, but it is possible to do so and may make sense in some cases.

Instead of storing interval time series in NovaStar, interval time series are typically computed as needed. Storing interval time series in the database has limitations because the many combinations of output intervals and statistics would greatly increase the size of the database and processing time needed to compute the interval time series.

Interval Calculations

A regular interval time series value contains values calculated from input value over the interval, using a statistic of choice. For example, for precipitation, the 1Hour time series value is the total (sum) of any precipitation increments that have occurred during the hour. In this case, the core data type is Precip, the interval is 1Hour and the statistic is Total. In data web services, the full data type and interval is then Precip-Total.1Hour.

In order to minimize ambiguity, data web services require that the statistic is specified and not assumed.

The following table illustrates timestamps that control how interval data are determined for different intervals. Intervals must typically be divided evenly into the local day.

Interval Calculation Time Span

Interval Timestamp Example Time Span for Data
5Minute 2020-07-15T00:15 > 2020-07-15T00:10
<= 2020-07-15T00:15
6Hour 2020-07-15T03 > 2020-07-15T02
<= 2020-07-15T03
1Day 2020-07-15 for instantaneous data:
> 2020-07-15T00:00:00
<= 2020-07-16T00:00:00

The calculated interval values depend on the statistic being computed. Some statistics are involve simple comparisons. For example, the Max statistic computes the maximum value from the sample in an interval. However, statistics such as Mean that include a normalization over time are more complex because they consider a weighting of multiple values, for example:

  • Mean
    • simple arithmetic mean
    • the interval value is an arithmetic mean of the sample values in the interval
  • TimeWeightedMean
    • each value in the sample is weighted by the number of seconds in the interval over which the value applies

See the ts service for details about statistics calculations.

No Report Interval

NovaStar is an event-driven system. Data reports are generated when data values change, such as when a rain sensor bucket tips or water level values change above a threshold amount on the data logger. In areas such as the arid west, with little rain and streams that may dry up during part of the year, this can lead to no data reports for a long period.

To ensure that the data collection system is functional, stations are also typically programmed to periodically transmit a data report on a regular interval.

The combination of event-driven reports and periodic regular reports is appropriate because continuous (small interval) data monitoring is not necessary and would require much more database storage, communication capacity, and memory when processing the data for analysis and visualization.

NovaStar uses a "no report interval" on points to indicate the interval during which the last data report can be assumed to be in effect. The last value is then used to fill interval data for recent intervals, as follows:

  • accumulated values such as precipitation value will remain the same, indicating no additional precipitation
  • values such as water level will use the same value, indicating no change in value

The following table summarizes typical conventions for defining NovaStar no report interval. A no report alarm trigger is typically configured for each station at a multiple of the no report interval, so that a delay in transmitting a regular report does not trigger excessive alarms.

No Report Interval Conventions

Data Collection Type No Report Interval Regular Report Interval **Alarm Trigger for No Report **
ALERT 12 hours 12 hours 24 hours
ALERT2 1 Hour 1 hour 2 hours

See the ts service documentation for more information about how no report interval is handled when computing interval time series.

The no report interval can be used by graphing tools to indicate a gap in the data. Otherwise, data points on each side of a long gap might be connected with a straight line, which is not an accurate reflection of data collection. Visualization tools for regular interval time series may need to use the no report interval because the ts service by default returns only intervals where observations occurred (the service has query parameter to also return estimated values in the no report interval).

Time Zone

NovaStar stores data internally in UTC time zone. The timestamps for data measurements are converted from the measurement time zone to UTC when loading into the database and are typically converted to local time when queried from the database. Data web services by default uses the time zone of the server but will output in a requested time zone, such as that of the web browser application that is requesting data.

Interval time series calculations in particular will be impacted by time zone if the interval is larger than 24 hours because the boundaries between days will impact data values. This only tends to be an issue where a database is storing data from multiple time zones and serving data to multiple time zones. Therefore, time zone can be specified if necessary when retrieving daily interval data.

Complex Time Series

The conventions for time series discussed in the previous examples work for most cases. However, there are complex cases that require special handling and are illustrated here.

Consider the case where an irregular time series is being used to represent the moving 1Hour total precipitation for every bucket tip. In this case the time series has an interval (IrregSecond) and a duration (1Hour). The time series data type and interval should probably be similar to the following to explicitly describe the time series:

Precip-Total.IrregSecond-1Hour

If using SHEF parameter code, PPH is equivalent to the above.

This case is not currently handled in data web services but is planned.

Services

The following services are related to data reports and time series.

  • data - return data reports with flag
  • tscatalog - returns a list of time series, including forecast time series
  • ts - returns a time series