Validation workflow

The core functions for generating validation metric values for a set of extracted model data against a wavebuoy data archive are in the following modules:

  • MatchUpDatabase used to generate a searchable set of model/observation data matches.

  • Validation used to generate validation metrics for a set of matched data records.

The steps involved are as follows:

Step 1: Match up model and observation data records

The validation process requires timeseries of data extracted for a specific location with in the model domain and a corresponding time record of wavebuoy data. The ResourceCode hindcast data are computed on an hourly basis and the data are stored in monthly records, sorted by year, month and data type (FREQ_NC, SPEC_NC). The InSituTAC wavebuoy data are available as either monthly files or a continuous single record file; these tools have been developed for use with the continuous wavebuoy records to simplify the process of the validating the complete hindcast dataset.

The function for generating the matchup database is MatchUpDatabase.construct_rscd_mdb(). This is used to generate an SQLite3 database of matched buoy/model extract data by location.

The validation process has been generalised to take matched data record information from a CSV file, allowing manual generation of the data match records without the need for generating an SQLite database. The SQLite matchup database can be converted to a CSV file using the MatchUpDatabase.db_match_to_csv() function.

Step 2: Extract matched data records from tabulated CSV file

The validation process takes as input a list of a matched data records in the form of a tuple. The data in the CSV files is converted into the list of tuples using the function MatchUpDatabase.csv2tuples().

Step 3: Identify unique platforms within the matched data records

The records are processed by location, so the records for a particular platform need to be identified. The unique platform names are extracted from the list of matched data records using the function MatchUpDatabase.getPlatformList().

Step 4: Generate validation statistics for a platform

The following substeps are carried out:

(4.1) Extract the records relevant to the current platform.

The records corresponding to a given platform are extracted using the function MatchUpDatabase.getPlatformRecords().

(4.2) Provide a list of the wave parameters to be tested

The wave parameter options available are: [Hm0, Tp, Tm02, Dir, Spr]

(4.3) Process the set of extracted matchup records

The records are processed via the Validation.validate_records() function. This loops over each record in the supplied list and processes them separately.

A single record is processed by a call to the Validation.process_record() function. The internal steps of the validation process for a given record are:

(4.3.1) Extract model record timestamp, variable data and temporal coverage structure

(4.3.2) Extract observation record timestamp and variable data

(4.3.3) Check the required buoy variable exists

If the variable exists then continue process, otherwise exit record processing.

(4.3.4) Check for temporal overlap between records

If the time records overlap and there are at least 200 common samples then continue process, otherwise exit record processing.

(4.3.5) Interpolate the observation variable onto the model time stamps

The observations must be mapped to the model time stamps to allow the differencing required for metric statistics to be calculated. The interpolation method used is nearest neighbour with the constraint that the difference between the observation time and the model time is less than 1 hour. Missing observational data are filled with the numpy.NAN value.

The function used is Validation.time_interpolate_obs().

The model and interpolated observation data are plotted as overlayed time series if requested, and the model data and interpolated observational data are save as a binary file to allow rapid reprocessing if required.

(4.3.6) Check there are sufficient valid observational values in interpolated data

If there are at least 200 valid observations in the interpolated data then continue process, otherwise exit record processing.

(4.3.7) Calculate validation metric statistics

Determine the wave parameter type (standard or circular), then calculate the set of validation metrics for the records and save the output in the results records structure.

Generate and save a correlation plot if requested by calling Graphics.plot_correlation().

(4.3.8) Continue to next record

Select next platform record and repeat steps (4.3.1) to (4.3.7)

(4.4) Return the full set of validation results for the current platform.

The results are returned in the form of a python data dictionary, where each element in a key field corresponds to a specific matched data pair record.

Step 5: Save the validation results for post-processing

The results are written out as a table formatted ASCII file, and are save as binary file for post-processing.

This done by calling Validation.save_tabulated_results().