It is always rewarding to visualise data sets, particularly those visualisations that exhibit characteristics of the systems that we are attempting to model. I've had such an experience today.
Data visualisation is a key part of the scientific research process it is,
- an important stage in the knowledge creation and discovery process
- aids the validation of data sets
The model described in the previous post describes some of the MHD modelling we are attempting to undertake. The simulation generated 400 time series points from a total of 80000 iterations and the stored configuration size was an array of 3x 2-vectors and 4 scalars. For each time step there is a total of 1976x400 points. Visualisation of the raw data can be acheived using tools such as Matlab, IDL, AVS or IBM Data Explorer.
For the case considered visualising the raw data is quite challenging and a recommended approach is to undertake some post processing of the data to take averaging or data samples from the computed mesh. The first stage in this process was the use of a covertdata routine provided with VAC.
The advantages of using data explorer include
- Well supported with good documentation and an active user forum
- A visual programming environment enabling rapid development of applications
- A powerful data buffering capability, once a large data set has been read it is possible to modify and reexecute the program without having to reload the data set
. - Each of the modules feature useful descriptions providing user guidelines
For the data set considered here we generated two applications using data explorer. The first application reduces and selects the data to a manageable volume. Very often there is a need to preserve the original "raw" data, this comes at a storage cost. This application can be run non-interactively and submitted to a job queue, given the size of the raw data it can take a few hours.
The second application is the application used to visualise the data this application is run in interactive mode and is used by the researcher for generating images and movies that might be shared with the community.
The main IBM data explorer modules used for the post processing stage are as follows
Having imported a data set and selected a member of a data set that data can be mapped onto a new grid for this purpose we use the construct and regrid modules. The construct module is used to specify the form of the new grid the counts enable us to specify the number of items in the new grid, the deltas enablem us to select the correct items from the input data field. The output from the construct module feeds into the grid input of regrid module. The selected and imported data feeds into the input of the regrid module.
Having regridded data we can collect together all the selected data fields using the CollectNamed module, the resulting data object may then be exported to an output file for use in the final visualisation stage. We mention here the use of the connect module which is used to provide connection information for the data set, this information enables data interapolation.
At each stage of the research process, from preparation of the model to the generation of the raw data and the final post processing stages, there are quite a few data transformations. For studies where the researcher investigates a range of state points in the modelling phase space this can lead to an overwhelming collection of data in the problems are increased as the number of compute nodes and storage resources are increased. This problem requires the utilisation of metadata capture techniques, this includes suitable storage, search and querying mechanisms.
In later posts we will review metadata, capture techniques and search and querying mechanisms. We will also look at the data provenance capture and querying mechanisms. We add here that, given the nature of the research process and the exploration of generated data sets provenance capture may provide an imporatnt research tool, we can envisage paralles with social book marking and "MyExperiment" style environments.