Preprocessing pipelines

In this example, we will see how easy it is to construct, customise and reuse preprocessing protocols with RamanSPy.

Data used is from [1].

Prerequisites

import matplotlib.pyplot as plt
import random
import numpy as np

import ramanspy

Set random seed for reproducibility

random.seed(42)

Define color palette.

colors = plt.cm.get_cmap()(np.linspace(0, 1, 4))

Set up global figure size.

plt.rcParams['figure.figsize'] = [4, 2]

Data loading

Loading the data.

thp1_volumes = ramanspy.datasets.volumetric_cells(cell_type='THP-1', folder=r'../../../data/kallepitis_data')

# selecting the first volume
thp1_volume = thp1_volumes[0]

Grab 2 random spectra from the volume

random_spectra_indices = random.sample(range(thp1_volume.flat.shape[0]), 2)
random_spectra = list(thp1_volume.flat[random_spectra_indices])

Plot the raw spectra

_ = ramanspy.plot.spectra(random_spectra, color=colors[1], plot_type='separate')
  • Raman spectra
  • Raman spectra

Plot the fingerprint region

cropper = ramanspy.preprocessing.misc.Cropper(region=(700, 1800))
fingerprint_region = cropper.apply(random_spectra)
_ = ramanspy.plot.spectra(fingerprint_region, color=colors[1], plot_type='separate')
  • Raman spectra
  • Raman spectra

Pipelines

Below, we will investigate a series of preprocessing pipelines and their effect on the spectra.

Pipeline I

Applying a preprocessing protocol which consists of:

  • spectral cropping to the fingerprint region (700-1800 cm-1);

  • cosmic ray removal with Whitaker-Hayes algorithm;

  • denoising with a Gaussian filter;

  • baseline correction with Asymmetric Least Squares;

  • Area under the curve normalisation (pixelwise).

Define the pipeline

pipe = ramanspy.preprocessing.Pipeline([
    cropper,
    ramanspy.preprocessing.despike.WhitakerHayes(),
    ramanspy.preprocessing.denoise.Gaussian(),
    ramanspy.preprocessing.baseline.ASLS(),
    ramanspy.preprocessing.normalise.AUC(pixelwise=True),
])

# preprocess the spectra
preprocessed_spectra = pipe.apply(random_spectra)

# plot the results
_ = ramanspy.plot.spectra(preprocessed_spectra, color=colors[3], plot_type='separate')
  • Raman spectra
  • Raman spectra

Pipeline II

Applying a preprocessing protocol which consists of:

  • spectral cropping to the fingerprint region (700-1800 cm-1);

  • cosmic ray removal with Whitaker-Hayes algorithm;

  • denoising with Savitzky-Golay filter with window length 9 and polynomial order 3;

  • baseline correction with Adaptive Smoothness Penalized Least Squares (asPLS);

  • MinMax normalisation (pixelwise).

# preprocess the spectra
pipe = ramanspy.preprocessing.protocols.Pipeline([
    cropper,
    ramanspy.preprocessing.despike.WhitakerHayes(),
    ramanspy.preprocessing.denoise.SavGol(window_length=9, polyorder=3),
    ramanspy.preprocessing.baseline.ASPLS(),
    ramanspy.preprocessing.normalise.MinMax(pixelwise=True),
])
preprocessed_spectra = pipe.apply(random_spectra)

# plot the results
_ = ramanspy.plot.spectra(preprocessed_spectra, color=colors[0], plot_type='separate')
  • Raman spectra
  • Raman spectra

Pipeline III

Applying a preprocessing protocol inspired from [2] which consists of:

  • spectral cropping to the fingerprint region (700-1800 cm-1);

  • cosmic ray removal with Whitaker-Hayes algorithm.

  • baseline correction with polynomial fitting of order 2;

  • (Unit) Vector normalisation (pixelwise).

# preprocess the spectra
pipe = ramanspy.preprocessing.Pipeline([
    cropper,
    ramanspy.preprocessing.despike.WhitakerHayes(),
    ramanspy.preprocessing.baseline.Poly(poly_order=3),
    ramanspy.preprocessing.normalise.Vector(pixelwise=True)
])
preprocessed_spectra = pipe.apply(random_spectra)

# plot the results
_ = ramanspy.plot.spectra(preprocessed_spectra, color=colors[2], plot_type='separate')
  • Raman spectra
  • Raman spectra

References

Total running time of the script: ( 0 minutes 0.701 seconds)