NovaStar Data / Data Reports and Time Series
Introduction
As indicated in other sections of this documentation, data at points can originate from multiple sources:
- data collection system, using calibrations and ratings
- data imports
- equations
- forecasts
The following sections explains how the original data reports are processed into time series.
Data Reports
The core NovaStar system processes "data reports", which are measurements and other data values originating from various data sources, as discussed in the Data Generators documentation. Each "report" conceptually consists of the following. The values in data reports originate from configuration data stored in the database, measurement-related data, and calculated values.
- location identifier (from configuration data)
- data type (from configuration data)
- timestamp for the data value (from measurement)
- raw data value (from measurement)
- calibration identifier (from configuration data)
- scaled data value (from measurement or calculation)
- NovaScore (from alarm trigger analysis)
- data flags (from measurement and analysis)
- rated values and associated flags (from calculations)
Data web services allow data reports to be queried directly using the data and dataReports services. See also the Time Series section for information about querying data as time series.
Location Identifier
The location identifier for time series returned by the
tscatalog and
ts services
is the station numerical identifier,
or station numerical identifier and point tag name if needed for uniqueness.
Data reports in the database are associated with a point and each point is associated with a station
(which has a location identifier).
Data Type
Data types in NovaStar have evolved over time.
The current design allows either a SHEF parameter code to be specified for a point type
(e.g., HG for data type WaterLevelRiver),
or specify a non-SHEF data type such as (Alert2GpsLock).
In the past, it was necessary to request data using the point identifier.
However, the current convention used to request time series in web services
is to specify the location identifier and data type.
Timestamp
NovaStar internally stores data using UTC timezone. However, data import programs typically allow local time to be specified. The timezone will be converted to and from UTC as appropriate.
Raw Data Value
Raw data values are those measured at the station. For example precipitation is often measured as a counter of bucket tips and water level may be measured as an analog value using a pressure transducer.
Calibration Identifier
The raw values are processed into scaled values using calibration data. Data reports store the calibration identifier that was used to perform the calculation.
Scaled Data Value
The scaled data value is the value that is typically used for operations. The data loading process may set the scaled value directly or it may be calculated from the raw value using a calibration (see the previous section).
NovaScore
A NovaScore value is assigned to the data report based on the scaled value and an evaluation of alarm triggers. A default NovaScore is assigned to each report indicating normal conditions. If any alarm triggers are defined, they are evaluated, and if triggered, will result in higher-severity NovaScore being assigned to the data report.
Data Flags
The data flags is text that provides additional information about a data report.
Data Quality Flags
The data flags contain one or more single characters that indicate qualitative information, including whether the data are questionable, valid, verified, collected during maintenance, etc. NovaStar software typically only outputs data reports that are flagged as valid values unless otherwise requested.
Data Source Line Number
NovaStar systems allow data to be provided from multiple sources, including real-time services such as ALERT/ALERT2 and scheduled data import programs.
Rated Values
If defined, ratings are used to compute additional rated values. For example, accumulated precipitation is used to compute incremental precipitation, storm total, and season total. Water level can be used to compute discharge (streamflow).
Time Series
The term "time series" refers to a sequence of data values, each associated with a measurement time. Time series returned by web services have the following properties:
- location identifier (for NovaStar data web services this is the station numerical identifier (
stationNumId), or if necessary for uniqueness,stationNumIdand point tag name) - data type - see Data Web Services Data Type
- interval - for example
IrregSecond,1Hour, etc. - name
- description
- other properties
Data values associated with time series include:
- date/time (timestamp)
- data value, or a missing indicator such as
NaN - data flag (e.g.,
Qfor questionable andVfor validated) - duration, optional, which indicates the duration over which the measurement occurred
Time series that have irregular interval must provide the timestamp for each data value. Time series that have regular interval can be optimized to only store the start and end date/time
Irregular Interval Time Series
Irregular interval time series for real-time data in data web services
typically measure an instantaneous value from a sensor and have an interval of IrregSecond,
indicating that the data have irregular interval and the precision of the timestamp is to seconds (or fractions of seconds).
Such data is typical for systems that are event-driven, such as flood warning systems.
Some systems use the term "instantaneous" data. For example, the USGS provides web services for "instantaneous" data. However, the USGS web service values have 15Minute data, meaning that values are recorded every 15 minutes, or are calculated from smaller interval.
Data web services, such as the tscatalog and
ts services return irregular interval time series for
data reports and associated values (rated values, calibrations, alarms, NovaScores).
Examples of irregular interval time series include:
- data reports triggered by a change in value:
- rain gage bucket tips
- water level and discharge at a time, caused by a change in value
- a timed "regular report" that is transmitted at a certain time each day,
regardless of whether conditions are changing:
- typically 12 hours in a legacy ALERT system
- typically 1 hour in an ALERT2 system
- depends on the cost of communication in a system
- values computed from the above, for example:
- storm precipitation total spanning non-zero precipitation values, totaled for each measured precipitation value
- stream discharge computed from water level
Regular Interval Time Series
The NovaStar system is designed manage event-based data rather than as an archival system for interval data. Interval time series are not typically stored in NovaStar, but it is possible to do so and may make sense in some cases.
Instead of storing interval time series in NovaStar, interval time series are typically computed as needed. Storing interval time series in the database has limitations because the many combinations of output intervals and statistics would greatly increase the size of the database and processing time needed to compute the interval time series.
Interval Calculations
A regular interval time series value contains values calculated from input value over the interval,
using a statistic of choice.
For example, for precipitation, the 1Hour time series value is the total (sum) of any precipitation
increments that have occurred during the hour.
In this case, the core data type is Precip, the interval is 1Hour and the
statistic is Total.
In data web services, the full data type and interval is then Precip-Total.1Hour.
In order to minimize ambiguity, data web services require that the statistic is specified and not assumed.
The following table illustrates timestamps that control how interval data are determined for different intervals. Intervals must typically be divided evenly into the local day.
Interval Calculation Time Span
| Interval | Timestamp Example | Time Span for Data |
|---|---|---|
5Minute |
2020-07-15T00:15 |
> 2020-07-15T00:10<= 2020-07-15T00:15 |
6Hour |
2020-07-15T03 |
> 2020-07-15T02<= 2020-07-15T03 |
1Day |
2020-07-15 |
for instantaneous data: > 2020-07-15T00:00:00<= 2020-07-16T00:00:00 |
The calculated interval values depend on the statistic being computed.
Some statistics are involve simple comparisons.
For example, the Max statistic computes the maximum value from the sample in an interval.
However, statistics such as Mean that include a normalization over time are more complex
because they consider a weighting of multiple values, for example:
Mean- simple arithmetic mean
- the interval value is an arithmetic mean of the sample values in the interval
TimeWeightedMean- each value in the sample is weighted by the number of seconds in the interval over which the value applies
See the ts service for details about statistics calculations.
No Report Interval
NovaStar is an event-driven system. Data reports are generated when data values change, such as when a rain sensor bucket tips or water level values change above a threshold amount on the data logger. In areas such as the arid west, with little rain and streams that may dry up during part of the year, this can lead to no data reports for a long period.
To ensure that the data collection system is functional, stations are also typically programmed to periodically transmit a data report on a regular interval.
The combination of event-driven reports and periodic regular reports is appropriate because continuous (small interval) data monitoring is not necessary and would require much more database storage, communication capacity, and memory when processing the data for analysis and visualization.
NovaStar uses a "no report interval" on points to indicate the interval during which the last data report can be assumed to be in effect. The last value is then used to fill interval data for recent intervals, as follows:
- accumulated values such as precipitation value will remain the same, indicating no additional precipitation
- values such as water level will use the same value, indicating no change in value
The following table summarizes typical conventions for defining NovaStar no report interval. A no report alarm trigger is typically configured for each station at a multiple of the no report interval, so that a delay in transmitting a regular report does not trigger excessive alarms.
No Report Interval Conventions
| Data Collection Type | No Report Interval | Regular Report Interval | **Alarm Trigger for No Report ** |
|---|---|---|---|
| ALERT | 12 hours | 12 hours | 24 hours |
| ALERT2 | 1 Hour | 1 hour | 2 hours |
See the ts service documentation for more information
about how no report interval is handled when computing interval time series.
The no report interval can be used by graphing tools to indicate a gap in the data.
Otherwise, data points on each side of a long gap might be connected with a straight line,
which is not an accurate reflection of data collection.
Visualization tools for regular interval time series may need to use
the no report interval because the ts service
by default returns only intervals where observations occurred
(the service has query parameter to also return estimated values in the no report interval).
Time Zone
NovaStar stores data internally in UTC time zone. The timestamps for data measurements are converted from the measurement time zone to UTC when loading into the database and are typically converted to local time when queried from the database. Data web services by default uses the time zone of the server but will output in a requested time zone, such as that of the web browser application that is requesting data.
Interval time series calculations in particular will be impacted by time zone if the interval is larger than 24 hours because the boundaries between days will impact data values. This only tends to be an issue where a database is storing data from multiple time zones and serving data to multiple time zones. Therefore, time zone can be specified if necessary when retrieving daily interval data.
Complex Time Series
The conventions for time series discussed in the previous examples work for most cases. However, there are complex cases that require special handling and are illustrated here.
Consider the case where an irregular time series is being used to represent the
moving 1Hour total precipitation for every bucket tip.
In this case the time series has an interval (IrregSecond) and a duration (1Hour).
The time series data type and interval should probably be similar to the following to
explicitly describe the time series:
Precip-Total.IrregSecond-1Hour
If using SHEF parameter code, PPH is equivalent to the above.
This case is not currently handled in data web services but is planned.
Services
The following services are related to data reports and time series.