WaveVal internal data formats¶

The Validation module uses set predefined file and data formats for data input, passing, and output. The user input matched data records is in the form of a comma delimited CSV file, this is converted to a list of tuples for passing to the Validation process records function. Internally the validataion data are stored in a simple dictionary, and the data are returned in this format from the Validation process. The pre-processed time series data for a matched data pair are saved as a Matlab v4 binary file; beyond Matlab7.2 the data format converted to HDF5. The rationale for using a Matlab format was that Matlab is still widely used for data processing, but the formatted data can be easily imported into python using the scipy.io.loadmat function.

Matched data records CSV format¶

The default input format for a set of model/observation data matches is a comma delimited CSV ASCII file. Each row in the CSV file represents a matched pair for a particular year and month; this year/month data fragmentation corresponds to the storage fragmentation used for archiving the ResourceCode hindcast data.

The columns of each row contain:

Path to the directory containing the model extract data records

Model extract data file name

Path to the directory containing the observation data records

Observation data file name

Year to be processed

Month to be processed

Start date and time of observational data (space separated data and time)

End data and time of observational data (space separated data and time)

Platform name string

Platform latitude

Platform longitude

To minimise issues with the use of different path separators across operating systems, it is recommended that the separator / be used for path strings.

e.g. E:/GitRepos/waveval/examples/data/RSCD_HINDCAST/2017/01/FREQ_NC

File names should not include spaces or special characters as these are not currently accounted for in the readers. If the file names used in an archive include spaces and/or special characters the readers will need to be modified.

The format for the observational data start and end date and time is yyyy-mm-dd HH:MM:SS.

e.g. 2005-07-04 12:35:00

Matched data records internal format¶

Internally the Validation module uses a python list of tuples, where each tuple corresponds to a record in the original CSV file supplied by the user. In essence the elements in a tuple corresponds to the CSV record field at the same location, e.g. the tuple elements for a record are:

tuple[0] = Path to the directory containing the model extract data records

tuple[1] = Model extract data file name

tuple[2] = Path to the directory containing the observation data records

tuple[3] = Observation data file name

tuple[4] = Year to be processed

tuple[5] = Month to be processed

tuple[6] = Start date and time of observational data (space separated data and time)

tuple[7] = End data and time of observational data (space separated data and time)

tuple[8] = Platform name string

Internal validation results data format¶

Internally the validation results are collected in a python data dictionary with the following key fields:

Keys	Description
results.MODEL	Model data file name
results.OBSERVATION	Observation data file name
results.YEAR	Year being processed
results.MONTH	Month being processed
results.PARAM	Observation parameter name
results.R	Correlation value
results.MB	Mean bias value
results.NMB	Normalised mean bias value
results.MAE	Mean absolute error value
results.NMAE	Normalised mean absolute error value
results.RMSE	Root mean square error value
resutls.NRMSE	Normalised root mean square error value
results.SI	Scatter index value
results.NSMPL	Number of valid time series data samples used

This structure is initialised by the Validation.initialize_results() function, and results for a given matchup record are stored in the dictionary using the Validation.store_valid_results() function.

Saved binary data format¶

To support post processing of the validation resuts, the results are stored in a binary format. This is currently a Matlab v4 binary format. This is generated using the scipy.io.savemat function. This generates a simple binary data structure where each dictionary keys is saved as a variable using the key name. For post-processing exercises, the matlab formatted binary data can be loaded back into an internal results data dictionary using the Validation.load_binary_results() function.

Saved tabulated ASCII data format¶

The results are store in a human-readable ASCII format as part of the Validation.save_tabulated_results() call. The header line gives the integrated wave parameter that the tabulated statistics are for, the next line gives the table column field names, then there is a formated row for each valid result.

The columns and formats are as follows:

Field	Format	Description
MODEL	{:<60}	Upto 60 characters
OBSERVATION	{:<25}	Upto 25 characters
YEAR	{:^7}	Upto 7 digits centred
MONTH	{:^7}	Upto 7 digits centred
PARAM	{:^7}	Upto 7 characters centred
R	{:^9.3}	Upto 9 digits centred with 3 decimal place precision
MB	{:^9.4}	Upto 9 digits centred with 4 decimal place precision
NMB	{:^9.4}	Upto 9 digits centred with 4 decimal place precision
MAE	{:^9.4}	Upto 9 digits centred with 4 decimal place precision
NMAE	{:^9.4}	Upto 9 digits centred with 4 decimal place precision
RMSE	{:^9.4}	Upto 9 digits centred with 4 decimal place precision
NRMSE	{:^9.4}	Upto 9 digits centred with 4 decimal place precision
SI	{:^9.4}	Upto 9 digits centred with 4 decimal place precision
NSMPL	{:^9}	Upto 9 digits centred

WaveVal internal data formats¶

Matched data records CSV format¶

Matched data records internal format¶

Internal validation results data format¶

Saved binary data format¶

Saved tabulated ASCII data format¶

Table of Contents

Previous topic

Next topic

This Page