.. _workflow:

Validation workflow
===================

The core functions for generating validation metric values for a set of extracted model data 
against a wavebuoy data archive are in the following modules:

* :ref:`MatchUpDatabase` used to generate a searchable set of model/observation data matches.
* :ref:`Validation` used to generate validation metrics for a set of matched data records.

The steps involved are as follows:

Step 1: Match up model and observation data records
###################################################

  The validation process requires timeseries of data extracted for a specific location 
  with in the model domain and a corresponding time record of wavebuoy data.
  The ResourceCode hindcast data are computed on an hourly basis and the data are 
  stored in monthly records, sorted by year, month and data type (FREQ_NC, SPEC_NC).
  The InSituTAC wavebuoy data are available as either monthly files or a continuous 
  single record file; these tools have been developed for use with the continuous 
  wavebuoy records to simplify the process of the validating the complete hindcast dataset.

  The function for generating the matchup database is 
  :py:func:`MatchUpDatabase.construct_rscd_mdb`. This is used to generate an SQLite3 
  database of matched buoy/model extract data by location.
  
  The validation process has been generalised to take matched data record information from a 
  CSV file, allowing manual generation of the data match records without the need for 
  generating an SQLite database. The SQLite matchup database can be converted to a CSV file 
  using the :py:func:`MatchUpDatabase.db_match_to_csv` function.

Step 2: Extract matched data records from tabulated CSV file
############################################################

  The validation process takes as input a list of a matched data records in the form of a 
  tuple. The data in the CSV files is converted into the list of tuples using the function 
  :py:func:`MatchUpDatabase.csv2tuples`.
    
Step 3: Identify unique platforms within the matched data records
#################################################################

  The records are processed by location, so the records for a particular platform need to be 
  identified. The unique platform names are extracted from the list of matched data records 
  using the function :py:func:`MatchUpDatabase.getPlatformList`. 

Step 4: Generate validation statistics for a platform
#####################################################

  The following substeps are carried out:

  **(4.1) Extract the records relevant to the current platform.**
    The records corresponding to a given platform are extracted using the function 
    :py:func:`MatchUpDatabase.getPlatformRecords`.
    
  **(4.2) Provide a list of the wave parameters to be tested**
    The wave parameter options available are: [Hm0, Tp, Tm02, Dir, Spr]
    
  **(4.3) Process the set of extracted matchup records**
    The records are processed via the :py:func:`Validation.validate_records` function.
    This loops over each record in the supplied list and processes them separately.
    
    A single record is processed by a call to the :py:func:`Validation.process_record` 
    function. The internal steps of the validation process for a given record are:
    
      *(4.3.1) Extract model record timestamp, variable data and temporal coverage structure*
      
        Call :py:func:`Validation.extract_model`
      
      *(4.3.2) Extract observation record timestamp and variable data*
        
        Call :py:func:`Validation.extract_obs`
      
      *(4.3.3) Check the required buoy variable exists*
        
        If the variable exists then continue process, otherwise exit record processing.
      
      *(4.3.4) Check for temporal overlap between records*
      
        If the time records overlap and there are at least 200 common samples then continue process, 
        otherwise exit record processing.
      
      *(4.3.5) Interpolate the observation variable onto the model time stamps*
        
        The observations must be mapped to the model time stamps to allow the differencing required 
        for metric statistics to be calculated. The interpolation method used is nearest neighbour with 
        the constraint that the difference between the observation time and the model time is less 
        than 1 hour. Missing observational data are filled with the ``numpy.NAN`` value.
        
        The function used is :py:func:`Validation.time_interpolate_obs`.
        
        The model and interpolated observation data are plotted as overlayed time series if requested, 
        and the model data and interpolated observational data are save as a binary file to allow rapid 
        reprocessing if required.
      
      *(4.3.6) Check there are sufficient valid observational values in interpolated data*
        
        If there are at least 200 valid observations in the interpolated data then continue process, 
        otherwise exit record processing.
        
      *(4.3.7) Calculate validation metric statistics*
      
        Determine the wave parameter type (standard or circular), then calculate the set of validation 
        metrics for the records and save the output in the results records structure.
        
        * For standard parameters call :py:func:`Validation.calculate_metrics`
        * For circular parameters call :py:func:`Validation.calculate_circular_metrics`
        
        Generate and save a correlation plot if requested by calling :py:func:`Graphics.plot_correlation`.
        
      *(4.3.8) Continue to next record*
        
        Select next platform record and repeat steps (4.3.1) to (4.3.7)

    **(4.4) Return the full set of validation results for the current platform.**
      
      The results are returned in the form of a python data dictionary, where each element in a key field 
      corresponds to a specific matched data pair record.
    
Step 5: Save the validation results for post-processing
#######################################################

  The results are written out as a table formatted ASCII file, and are save as binary file 
  for post-processing. 
  
  This done by calling :py:func:`Validation.save_tabulated_results`.