.. _formats: WaveVal internal data formats ============================= The :ref:`Validation` module uses set predefined file and data formats for data input, passing, and output. The user input matched data records is in the form of a comma delimited CSV file, this is converted to a list of tuples for passing to the :ref:`Validation` process records function. Internally the validataion data are stored in a simple dictionary, and the data are returned in this format from the :ref:`Validation` process. The pre-processed time series data for a matched data pair are saved as a Matlab v4 binary file; beyond Matlab7.2 the data format converted to HDF5. The rationale for using a Matlab format was that Matlab is still widely used for data processing, but the formatted data can be easily imported into python using the `scipy.io.loadmat`_ function. .. _scipy.io.loadmat: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html Matched data records CSV format ############################### The default input format for a set of model/observation data matches is a comma delimited CSV ASCII file. Each row in the CSV file represents a matched pair for a particular year and month; this year/month data fragmentation corresponds to the storage fragmentation used for archiving the ResourceCode hindcast data. The columns of each row contain: #. Path to the directory containing the model extract data records #. Model extract data file name #. Path to the directory containing the observation data records #. Observation data file name #. Year to be processed #. Month to be processed #. Start date and time of observational data (space separated data and time) #. End data and time of observational data (space separated data and time) #. Platform name string #. Platform latitude #. Platform longitude To minimise issues with the use of different path separators across operating systems, it is recommended that the separator / be used for path strings. *e.g.* E:/GitRepos/waveval/examples/data/RSCD_HINDCAST/2017/01/FREQ_NC File names should not include spaces or special characters as these are not currently accounted for in the readers. If the file names used in an archive include spaces and/or special characters the readers will need to be modified. The format for the observational data start and end date and time is yyyy-mm-dd HH:MM:SS. *e.g.* 2005-07-04 12:35:00 Matched data records internal format #################################### Internally the :ref:`Validation` module uses a python list of tuples, where each tuple corresponds to a record in the original CSV file supplied by the user. In essence the elements in a tuple corresponds to the CSV record field at the same location, *e.g.* the tuple elements for a record are: #. tuple[0] = Path to the directory containing the model extract data records #. tuple[1] = Model extract data file name #. tuple[2] = Path to the directory containing the observation data records #. tuple[3] = Observation data file name #. tuple[4] = Year to be processed #. tuple[5] = Month to be processed #. tuple[6] = Start date and time of observational data (space separated data and time) #. tuple[7] = End data and time of observational data (space separated data and time) #. tuple[8] = Platform name string Internal validation results data format ####################################### Internally the validation results are collected in a python data dictionary with the following key fields: ====================== ================================================= Keys Description ====================== ================================================= results.MODEL Model data file name results.OBSERVATION Observation data file name results.YEAR Year being processed results.MONTH Month being processed results.PARAM Observation parameter name results.R Correlation value results.MB Mean bias value results.NMB Normalised mean bias value results.MAE Mean absolute error value results.NMAE Normalised mean absolute error value results.RMSE Root mean square error value resutls.NRMSE Normalised root mean square error value results.SI Scatter index value results.NSMPL Number of valid time series data samples used ====================== ================================================= This structure is initialised by the :py:func:`Validation.initialize_results` function, and results for a given matchup record are stored in the dictionary using the :py:func:`Validation.store_valid_results` function. Saved binary data format ######################## To support post processing of the validation resuts, the results are stored in a binary format. This is currently a Matlab v4 binary format. This is generated using the `scipy.io.savemat`_ function. This generates a simple binary data structure where each dictionary keys is saved as a variable using the key name. For post-processing exercises, the matlab formatted binary data can be loaded back into an internal results data dictionary using the :py:func:`Validation.load_binary_results` function. .. _scipy.io.savemat: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.savemat.html Saved tabulated ASCII data format ################################# The results are store in a human-readable ASCII format as part of the :py:func:`Validation.save_tabulated_results` call. The header line gives the integrated wave parameter that the tabulated statistics are for, the next line gives the table column field names, then there is a formated row for each valid result. The columns and formats are as follows: ============== ======== ==================================================== Field Format Description ============== ======== ==================================================== MODEL {:<60} Upto 60 characters OBSERVATION {:<25} Upto 25 characters YEAR {:^7} Upto 7 digits centred MONTH {:^7} Upto 7 digits centred PARAM {:^7} Upto 7 characters centred R {:^9.3} Upto 9 digits centred with 3 decimal place precision MB {:^9.4} Upto 9 digits centred with 4 decimal place precision NMB {:^9.4} Upto 9 digits centred with 4 decimal place precision MAE {:^9.4} Upto 9 digits centred with 4 decimal place precision NMAE {:^9.4} Upto 9 digits centred with 4 decimal place precision RMSE {:^9.4} Upto 9 digits centred with 4 decimal place precision NRMSE {:^9.4} Upto 9 digits centred with 4 decimal place precision SI {:^9.4} Upto 9 digits centred with 4 decimal place precision NSMPL {:^9} Upto 9 digits centred ============== ======== ====================================================