Getting Started with Lookup Classification

This is a tutorial for PEAXACT - Software for Quantitative Spectroscopy from S-PACT. Its main objective is to get you familiar with classification using a Database Lookup method. The tutorial addresses to PEAXACT users and people interested in PEAXACT.

In this tutorial you learn how to:

  1. Use categorical features in a Cluster Analysis
  2. Apply pretreatments to improve sample clustering
  3. Perform classification using a database lookup method
  4. Identify classes of unknown samples

If you have PEAXACT installed on your computer you may try this tutorial right away. If you don't have PEAXACT yet, get a free trial now.

Preparations

You can find data of this tutorial in %ProgramFiles%\S-PACT\PEAXACT 5\Data\Raman - Pharma. The directory will be referred to as DATA in the following.

  • Start PEAXACT.
  • Choose File > New Session > Raman from the menu which opens a new modelling session with default settings for Raman data.

Cluster Analysis

Sample clustering and classification deal with categorical features (also known as grouping variables). Categorical features contain text values - the categories, groups, species, levels, or classes. A cluster analysis aims at dividing samples into groups without knowing the actual classes in advance. If we do know the actual classes, we can perform sanity checks on the found clusters and train a Classification Model. But one step after another:

  • Choose Data > Load Table... from the menu, browse to DATA\References and select DataTableClassification.xlsx to load 90 Raman spectra of pharmaceutical ingredients with associated categorical features: substance name (10 classes), kind of packing (3 classes), and instrument type (3 classes).
  • Select all samples in the Samples Panel.
  • Choose Data > Data Inspector from the menu to start the Data Inspector. Switch from the table view to the Data Plotter.
  • From the top-right drop-down list, select Clusters to display a dendrogram.

A dendrogram is a tree that illustrates the arrangement of clusters found by a hierarchical cluster analysis. Each leaf corresponds to one sample. Leaves are connected by branches, forming clusters. Clusters are connected to other clusters, forming even bigger clusters. The height of a branch represents the distance of the two objects being connected. If we assume that our spectra can be distinguished by substance, we would like to see a tree that splits up into 10 big clusters with large distance to each other. Doesn't look like it yet, though.

Improvements through Data Pretreatments

  • Select Substance from the "C" drow-down list to colorize the tree by substance.
  • Click the Colorbar icon in the toolbar to display the color legend. (Resize the window if the tree is too small now.)
  • In the Data Pretreatment Panel (bottom-right) you can apply options to manipulate spectra. See if you can find a combination of pretreatments that improve the clustering. You can judge by coloring of the tree whether clusters nicely correspond to classes, and by branch length whether clusters are nicely separated.
  • Eventually, let's use the following pretreatments:
    • Resampling: Equidistant Points
    • Number of Points: 1000
    • Global Range: 260 - 1700
    • Smoothing/Derivative: 1st order derivative
    • Filter Length: 19
    • Standardization: SNV normalization

  • Clusters are much more pronounced now. There are still two clusters that contain samples of two classes, but that's OK, because in both cases (Titanium Dioxide A/B and Lactose Monohydrate A/B) those are identical substances that just came from different manufacturers A and B.
  • Click the Export button to export pretreatments to a new model.

Database Lookup Classification

  • Back in the main window, select the new model in the Model Tree Panel. Then choose Edit Model > Classification Model > New... from the menu to display the Classification Setup Dialog.
  • The classification method should be set to Database Lookup by default. Select {Substance} as the categorical feature to train the model for. Click OK to start the training.

Classification results are presented in a Report Window. The first plot you see is the Confusion Matrix which shows a per-class performance of the Classification Model. An overall misclassification error for training and test samples is displayed at the bottom of the window. Also take a look at the other reports, e.g., Identified Class vs. ... which allows for inspecting the performance in even more detail. In the end, though, database lookup is a rather simple method, and all to do is to accept the performance as is.

  • Click OK to accept the Classification Model and close the Report Window.
  • Choose File > Save from the menu to save the model.

Identification Analysis

Now that you've trained the model it can be used to identify classes of unknown samples.

  • Choose Data > Load Samples... from the menu, browse to DATA\Analysis and load all files. For these files the classes are unknown.
  • Select the newly added samples in the Samples Panel and choose Analysis > Identification from the menu. Results are displayed in a Report Window. For instance, the Report Table shows identified classes and a corresponding class probability.

This concludes the tutorial on classification. But we have more to other topics. See the PEAXACT Quick Start page for an overview!

 

SPACT GmbH

Burtscheider Str. 1
52064 Aachen
Tel.: +49 241 - 9569 9812
Fax: +49 241 - 4354 4308
E-Mail:
Internet: www.s-pact.de