WaveVal internal data formats

The Validation module uses set predefined file and data formats for data input, passing, and output. The user input matched data records is in the form of a comma delimited CSV file, this is converted to a list of tuples for passing to the Validation process records function. Internally the validataion data are stored in a simple dictionary, and the data are returned in this format from the Validation process. The pre-processed time series data for a matched data pair are saved as a Matlab v4 binary file; beyond Matlab7.2 the data format converted to HDF5. The rationale for using a Matlab format was that Matlab is still widely used for data processing, but the formatted data can be easily imported into python using the scipy.io.loadmat function.

Matched data records CSV format

The default input format for a set of model/observation data matches is a comma delimited CSV ASCII file. Each row in the CSV file represents a matched pair for a particular year and month; this year/month data fragmentation corresponds to the storage fragmentation used for archiving the ResourceCode hindcast data.

The columns of each row contain:

  1. Path to the directory containing the model extract data records

  2. Model extract data file name

  3. Path to the directory containing the observation data records

  4. Observation data file name

  5. Year to be processed

  6. Month to be processed

  7. Start date and time of observational data (space separated data and time)

  8. End data and time of observational data (space separated data and time)

  9. Platform name string

  10. Platform latitude

  11. Platform longitude

To minimise issues with the use of different path separators across operating systems, it is recommended that the separator / be used for path strings.

e.g. E:/GitRepos/waveval/examples/data/RSCD_HINDCAST/2017/01/FREQ_NC

File names should not include spaces or special characters as these are not currently accounted for in the readers. If the file names used in an archive include spaces and/or special characters the readers will need to be modified.

The format for the observational data start and end date and time is yyyy-mm-dd HH:MM:SS.

e.g. 2005-07-04 12:35:00

Matched data records internal format

Internally the Validation module uses a python list of tuples, where each tuple corresponds to a record in the original CSV file supplied by the user. In essence the elements in a tuple corresponds to the CSV record field at the same location, e.g. the tuple elements for a record are:

  1. tuple[0] = Path to the directory containing the model extract data records

  2. tuple[1] = Model extract data file name

  3. tuple[2] = Path to the directory containing the observation data records

  4. tuple[3] = Observation data file name

  5. tuple[4] = Year to be processed

  6. tuple[5] = Month to be processed

  7. tuple[6] = Start date and time of observational data (space separated data and time)

  8. tuple[7] = End data and time of observational data (space separated data and time)

  9. tuple[8] = Platform name string

Internal validation results data format

Internally the validation results are collected in a python data dictionary with the following key fields:

Keys

Description

results.MODEL

Model data file name

results.OBSERVATION

Observation data file name

results.YEAR

Year being processed

results.MONTH

Month being processed

results.PARAM

Observation parameter name

results.R

Correlation value

results.MB

Mean bias value

results.NMB

Normalised mean bias value

results.MAE

Mean absolute error value

results.NMAE

Normalised mean absolute error value

results.RMSE

Root mean square error value

resutls.NRMSE

Normalised root mean square error value

results.SI

Scatter index value

results.NSMPL

Number of valid time series data samples used

This structure is initialised by the Validation.initialize_results() function, and results for a given matchup record are stored in the dictionary using the Validation.store_valid_results() function.

Saved binary data format

To support post processing of the validation resuts, the results are stored in a binary format. This is currently a Matlab v4 binary format. This is generated using the scipy.io.savemat function. This generates a simple binary data structure where each dictionary keys is saved as a variable using the key name. For post-processing exercises, the matlab formatted binary data can be loaded back into an internal results data dictionary using the Validation.load_binary_results() function.

Saved tabulated ASCII data format

The results are store in a human-readable ASCII format as part of the Validation.save_tabulated_results() call. The header line gives the integrated wave parameter that the tabulated statistics are for, the next line gives the table column field names, then there is a formated row for each valid result.

The columns and formats are as follows:

Field

Format

Description

MODEL

{:<60}

Upto 60 characters

OBSERVATION

{:<25}

Upto 25 characters

YEAR

{:^7}

Upto 7 digits centred

MONTH

{:^7}

Upto 7 digits centred

PARAM

{:^7}

Upto 7 characters centred

R

{:^9.3}

Upto 9 digits centred with 3 decimal place precision

MB

{:^9.4}

Upto 9 digits centred with 4 decimal place precision

NMB

{:^9.4}

Upto 9 digits centred with 4 decimal place precision

MAE

{:^9.4}

Upto 9 digits centred with 4 decimal place precision

NMAE

{:^9.4}

Upto 9 digits centred with 4 decimal place precision

RMSE

{:^9.4}

Upto 9 digits centred with 4 decimal place precision

NRMSE

{:^9.4}

Upto 9 digits centred with 4 decimal place precision

SI

{:^9.4}

Upto 9 digits centred with 4 decimal place precision

NSMPL

{:^9}

Upto 9 digits centred