WaveVal internal data formats¶
The Validation module uses set predefined file and data formats for data input, passing, and output. The user input matched data records is in the form of a comma delimited CSV file, this is converted to a list of tuples for passing to the Validation process records function. Internally the validataion data are stored in a simple dictionary, and the data are returned in this format from the Validation process. The pre-processed time series data for a matched data pair are saved as a Matlab v4 binary file; beyond Matlab7.2 the data format converted to HDF5. The rationale for using a Matlab format was that Matlab is still widely used for data processing, but the formatted data can be easily imported into python using the scipy.io.loadmat function.
Matched data records CSV format¶
The default input format for a set of model/observation data matches is a comma delimited CSV ASCII file. Each row in the CSV file represents a matched pair for a particular year and month; this year/month data fragmentation corresponds to the storage fragmentation used for archiving the ResourceCode hindcast data.
The columns of each row contain:
Path to the directory containing the model extract data records
Model extract data file name
Path to the directory containing the observation data records
Observation data file name
Year to be processed
Month to be processed
Start date and time of observational data (space separated data and time)
End data and time of observational data (space separated data and time)
Platform name string
Platform latitude
Platform longitude
To minimise issues with the use of different path separators across operating systems, it is recommended that the separator / be used for path strings.
e.g. E:/GitRepos/waveval/examples/data/RSCD_HINDCAST/2017/01/FREQ_NC
File names should not include spaces or special characters as these are not currently accounted for in the readers. If the file names used in an archive include spaces and/or special characters the readers will need to be modified.
The format for the observational data start and end date and time is yyyy-mm-dd HH:MM:SS.
e.g. 2005-07-04 12:35:00
Matched data records internal format¶
Internally the Validation module uses a python list of tuples, where each tuple corresponds to a record in the original CSV file supplied by the user. In essence the elements in a tuple corresponds to the CSV record field at the same location, e.g. the tuple elements for a record are:
tuple[0] = Path to the directory containing the model extract data records
tuple[1] = Model extract data file name
tuple[2] = Path to the directory containing the observation data records
tuple[3] = Observation data file name
tuple[4] = Year to be processed
tuple[5] = Month to be processed
tuple[6] = Start date and time of observational data (space separated data and time)
tuple[7] = End data and time of observational data (space separated data and time)
tuple[8] = Platform name string
Internal validation results data format¶
Internally the validation results are collected in a python data dictionary with the following key fields:
Keys |
Description |
|---|---|
results.MODEL |
Model data file name |
results.OBSERVATION |
Observation data file name |
results.YEAR |
Year being processed |
results.MONTH |
Month being processed |
results.PARAM |
Observation parameter name |
results.R |
Correlation value |
results.MB |
Mean bias value |
results.NMB |
Normalised mean bias value |
results.MAE |
Mean absolute error value |
results.NMAE |
Normalised mean absolute error value |
results.RMSE |
Root mean square error value |
resutls.NRMSE |
Normalised root mean square error value |
results.SI |
Scatter index value |
results.NSMPL |
Number of valid time series data samples used |
This structure is initialised by the Validation.initialize_results() function, and
results for a given matchup record are stored in the dictionary using the
Validation.store_valid_results() function.
Saved binary data format¶
To support post processing of the validation resuts, the results are stored in a binary format. This is
currently a Matlab v4 binary format. This is generated using the scipy.io.savemat function. This
generates a simple binary data structure where each dictionary keys is saved as a variable using the key
name. For post-processing exercises, the matlab formatted binary data can be loaded back into an internal
results data dictionary using the Validation.load_binary_results() function.
Saved tabulated ASCII data format¶
The results are store in a human-readable ASCII format as part of the Validation.save_tabulated_results()
call. The header line gives the integrated wave parameter that the tabulated statistics are for, the next line
gives the table column field names, then there is a formated row for each valid result.
The columns and formats are as follows:
Field |
Format |
Description |
|---|---|---|
MODEL |
{:<60} |
Upto 60 characters |
OBSERVATION |
{:<25} |
Upto 25 characters |
YEAR |
{:^7} |
Upto 7 digits centred |
MONTH |
{:^7} |
Upto 7 digits centred |
PARAM |
{:^7} |
Upto 7 characters centred |
R |
{:^9.3} |
Upto 9 digits centred with 3 decimal place precision |
MB |
{:^9.4} |
Upto 9 digits centred with 4 decimal place precision |
NMB |
{:^9.4} |
Upto 9 digits centred with 4 decimal place precision |
MAE |
{:^9.4} |
Upto 9 digits centred with 4 decimal place precision |
NMAE |
{:^9.4} |
Upto 9 digits centred with 4 decimal place precision |
RMSE |
{:^9.4} |
Upto 9 digits centred with 4 decimal place precision |
NRMSE |
{:^9.4} |
Upto 9 digits centred with 4 decimal place precision |
SI |
{:^9.4} |
Upto 9 digits centred with 4 decimal place precision |
NSMPL |
{:^9} |
Upto 9 digits centred |