Data Sets > NIR Spectra

Near Infrared Spectra of Diesel Fuels

These data consist of NIR spectra of diesel fuels along with various properties of those fules including:

  • bp50 - boiling point at 50% recovery, deg C (ASTM D 86)
  • CN - cetane Number (like Octane number only for diesel, ASTM D 613)
  • d4052 - density, g/mL, @ 15 deg C, (ASTM D 4052)
  • freeze - freezing temperature of the fuel, deg C
  • total - total aromatics, mass% (ASTM D 5186)
  • visc - viscosity, cSt, @ 40 deg C

There are three formats of these data: Matlab DataSet objects, Standard Matlab variables, and CSV files. This data was obtained at Soutwest Research Institute (SWRI) on a project sponsored by the U.S. Army. Many thanks to them for letting us post it here!

DataSet Object Format

The file "SWRI_Diesel_NIR.zip" contains a .mat file which can be loaded into MATLAB. This .mat file contains two dataset objects: One includes all the raw unpreprocessed spectra (diesel_spec) and another that is all the properties (diesel_prop). Some of the properties are not measured on some of the samples, so diesel_prop has some missing values (NaNs) in it. The wavelength axis is included as axisscale in the diesel_spec. If you don't have PLS_Toolbox or our freeware for the DataSet Object, these two variables should turn into structures when you load them into MATLAB.


Name Size Kind Last Modified

SWRI_Diesel_NIR.zip 1,443K document Mon, Nov 28, 2005, 11:28 AM

Standard Matlab Variable Format

The following are .zip files of separate .mat files, each with standard Matlab variables containing the same data as above. There are 6 workspace variables in each file, 3 for the spectra and 3 matching ones for the property value. In each case the data includes 20 high leverage samples (_hl) and the remaining samples are split into two random groups (_ll_a and _ll_b). These spectra can be used to test variable selection and calibration algorithms. For instance, you can use the high leverage samples and one of the other sets to make a calibration model (say the _hl and _ll_a), then test it on the third set (the _ll_b). In all cases the data have been pretty thoroughly weeded: outliers removed, and all samples belong to the same class (all summer fuels, no winter fuels).

All of the files end in GATEST because we've used the data to test genetic algorithms for variable selection.


Name Size Kind Last Modified

bp50gatest.zip 720K document Tue, Jan 26, 1999, 01:47 PM
cngatest.zip 718K document Tue, Jan 26, 1999, 01:47 PM
d4052gatest.zip 770K document Tue, Jan 26, 1999, 01:48 PM
freezegatest.zip 735K document Tue, Jan 26, 1999, 01:49 PM
totalgatest.zip 749K document Tue, Jan 26, 1999, 01:49 PM
viscgatest.zip 738K document Tue, Jan 26, 1999, 01:50 PM

CSV File Format

The file "SWRI_Diesel_NIR_CSV.zip" contains two .csv files. One includes all the raw unpreprocessed spectra (diesel_spec) and another that is all the properties (diesel_prop). Some of the properties are not measured on some of the samples, so diesel_prop has some missing values (NaNs) in it. The wavelength axis is included as axisscale in the diesel_spec.


Name Size Kind Last Modified

SWRI_Diesel_NIR_CSV.zip 1,005K document Mon, Nov 28, 2005, 11:28 AM