
Audio signal transformations in Sigment are represented by classes that can be used to apply a specific type of transformation to the audio data.

Some example transformation classes include GaussianWhiteNoise and TimeStretch. A full list of available transformations and their details and parameters can be found below.

Each of these transformation classes are a subclass of the more generic Transform base class, which provides a basic interface that can also be used to write custom transformations.

Sigment offers a familiar interface for transformations, taking inspiration from popular augmentation libraries such as imgaug, nlpaug, albumentations and audiomentations.

Available transformations


class sigment.transforms.Identity[source]

Applies an identity transformation to a signal.


  • A sampling rate is not required when applying this transformation.

Additive Gaussian White Noise

class sigment.transforms.GaussianWhiteNoise(scale, p=1.0, random_state=None)[source]

Applies additive Gaussian white noise to the signal.

scale: float [scale > 0] or (float, float)
Amount to scale the value sampled from the standard normal distribution.
Essentially the variance \(\sigma^2\).


  • A sampling rate is not required when applying this transformation.

Time Stretch

class sigment.transforms.TimeStretch(rate, p=1.0, random_state=None)[source]

Stretches the duration or speed of the signal without affecting its pitch.

rate: float [rate > 0] or (float, float)

Stretch rate.

  • If rate < 1, the signal is slowed down.

  • If rate > 1, the signal is sped up.


  • A sampling rate is not required when applying this transformation.

Pitch Shift

class sigment.transforms.PitchShift(n_steps, p=1.0, random_state=None)[source]

Shifts the pitch of the signal without changing its duration or speed.

n_steps: float [-12 ≤ n_steps ≤ 12] or (float, float)

Number of semitones to shift.


  • A sampling rate is required when applying this transformation.

Edge Crop

class sigment.transforms.EdgeCrop(side, crop_size, p=1.0, random_state=None)[source]

Crops a section from the start or end of the signal.

side: {‘start’, ‘end’}

The side of the signal to crop.

crop_size: float [0 < crop_size ≤ 0.5] or (float, float)

The fraction of the signal duration to crop from the chosen side.


  • A sampling rate is not required when applying this transformation.

Random Crop

class sigment.transforms.RandomCrop(crop_size, n_crops, p=1.0, random_state=None)[source]

Randomly crops multiple sections from the signal.

crop_size: float [0 < crop_size < 1] or (float, float)

The fraction of the signal duration to crop.

n_crops: int [n_crops > 0] or (int, int)

The number of random crops of size crop_size to make.


  • Chunking is done according to the algorithm defined at [1].

  • crop_size \(\times\) n_crops must not exceed 1.

  • A sampling rate is not required when applying this transformation.



Linear Fade In/Out

class sigment.transforms.LinearFade(direction, fade_size, p=1.0, random_state=None)[source]

Linearly fades the signal in or out.

direction: {‘in’, ‘out’}

The direction to fade the signal.

fade_size: float [0 < fade_size ≤ 0.5] or (float, float)

The fraction of the signal to fade in the chosen direction.


  • A sampling rate is not required when applying this transformation.


class sigment.transforms.Normalize(independent=True, p=1.0, random_state=None)[source]

Normalizes the signal by dividing each sample by the maximum absolute sample amplitude.

independent: bool

Whether or not to normalize each channel independently.


  • A sampling rate is not required when applying this transformation.


class sigment.transforms.PreEmphasize(alpha=0.95, p=1.0, random_state=None)[source]

Pre-emphasizes the signal by applying a first-order high-pass filter.

\[\begin{split}x'[t] = \begin{cases} x[t] & \text{if $t=0$} \\ x[t] - \alpha x[t-1] & \text{otherwise} \end{cases}\end{split}\]
alpha: float [0 < alpha ≤ 1] or (float, float)

Pre-emphasis coefficient.


  • A sampling rate is not required when applying this transformation.

Extract Loudest Section

class sigment.transforms.ExtractLoudestSection(duration, p=1.0, random_state=None)[source]

Extracts the loudest section from the signal using sliding window aggregation over amplitudes.

duration: float [0 < duration ≤ 1] or (float, float)

The duration of the section to extract, as a fraction of the original signal duration.


  • See [2] for more details on the implementation.

  • A sampling rate is not required when applying this transformation.



Median Filter

class sigment.transforms.MedianFilter(window_size, p=1.0, random_state=None)[source]

Applies a median filter to the signal.

\[x'[t] = \mathrm{median} \underbrace{\Big[ \ldots, x[t-1], x[t], x[t+1], \ldots \Big]}_\text{window size}\]
window_size: int [window_size > 1] or (int, int)

The size of the window of neighbouring samples.


  • A sampling rate is not required when applying this transformation.


class sigment.transforms.Reverb(delay, decay, p=1.0, random_state=None)[source]

Applies reverb to the signal.

delay: float [0 < delay ≤ 1] or (float, float)

Fraction of signal diration to delay reverberated samples by.

decay: float [0 < decay ≤ 1] or (float, float)

Scalar to decay reverberated samples by.


  • See [3] for more details on the implementation.



Clipping Distortion

class sigment.transforms.ClipDistort(percentile, independent=False, p=1.0, random_state=None)[source]

Applies clipping distortion to the signal according to a percentile clipping threshold.

percentile: int [0 < percentile ≤ 100]

Percentile of sample amplitudes to use as a clipping threshold.

independent: boolean

Whether or not to independently distort channels by calculating individual percentiles.

Using transformations

Each transformation class comes with a number of methods that can be used to apply the transformation to either a numpy.ndarray or WAV file.

All transformations accept the p and random_state parameters, inherited from the Transform base class described below.

class sigment.transforms.Transform(p, random_state)[source]

Base class representing a single transformation or augmentation.


As this is a base class, it should not be directly instantiated.

You can however, use it to create your own transformations, following the implementation of the pre-defined transformations in Sigment.

p: float [0 ≤ p ≤ 1]

The probability of executing the transformation.

random_state: numpy.RandomState, int, optional

A random state object or seed for reproducible randomness.

__call__(self, X, sr=None)[source]

Runs the transformation on a provided input signal.

X: numpy.ndarray [shape (T,) or (1xT) for mono, (2xT) for stereo]

The input signal to transform.

sr: int [sr > 0], optional

The sample rate for the input signal.


Not required if using transformations that do not require a sample rate.

transformed: numpy.ndarray [shape (T,) for mono, (2xT) for stereo]

The transformed signal, clipped so that it fits into the \([-1,1]\) range required for 32-bit floating point WAVs.


If a mono signal X of shape (1xT) was used, the output is reshaped to (T,).


>>> import numpy as np
>>> from sigment.transforms import PitchShift
>>> # Create an example stereo signal.
>>> X = np.array([
>>>     [0.325, 0.53 , 0.393, 0.211],
>>>     [0.21 , 0.834, 0.022, 0.38 ]
>>> ])
>>> # Create the pitch-shifting transformation object.
>>> shift = PitchShift(n_steps=(-1., 1.))
>>> # Run the __call__ method on the transformation object to transform X.
>>> # NOTE: Pitch shifting requires a sample rate when called.
>>> X_shift = shift(X, sr=10)
generate(self, X, n, sr=None)[source]

Runs the transformation on a provided input signal, producing multiple augmented copies of the input signal.

X: numpy.ndarray [shape (T,) or (1xT) for mono, (2xT) for stereo]

The input signal to transform.

n: int [n > 0]

Number of augmented copies of X to generate.

sr: int [sr > 0], optional

The sample rate for the input signal.


Not required if using transformations that do not require a sample rate.

augmented: List[numpy.ndarray] or numpy.ndarray

The augmented copies (or copy if n=1) of the signal X, clipped so that they fit into the \([-1,1]\) range required for 32-bit floating point WAVs.


If a mono signal X of shape (1xT) was used, the output is reshaped to (T,).


>>> import numpy as np
>>> from sigment.transforms import GaussianWhiteNoise
>>> # Create an example stereo signal.
>>> X = np.array([
>>>     [0.325, 0.53 , 0.393, 0.211],
>>>     [0.21 , 0.834, 0.022, 0.38 ]
>>> ])
>>> # Create the Gaussian white noise transformation object.
>>> add_noise = GaussianWhiteNoise(scale=(0.05, 0.15))
>>> # Generate 5 augmented versions of X, using the noise transformation.
>>> Xs_noisy = add_noise.generate(X, n=5)
apply_to_wav(self, source, out=None)

Applies the augmentation to the provided input WAV file and writes the resulting signal back to a WAV file.


The resulting signal is always clipped so that it fits into the \([-1,1]\) range required for 32-bit floating point WAVs.

source: str, Path or path-like

Path to the input WAV file.

out: str, Path or path-like

Output WAV path for the augmented signal.


If out is set to None (which is the default) or the same as source, the input WAV file will be overwritten!


>>> from sigment import *
>>> # Create a transformation or quantifier object.
>>> transform = ...
>>> # Apply the transformation to the input WAV file and write it to the output file
>>> transform.apply_to_wav('in.wav', 'out.wav')
generate_from_wav(self, source, n=1)

Applies the augmentation to the provided input WAV file and returns a numpy.ndarray.

source: str, Path or path-like

Path to the input WAV file.

n: int [n > 0]

Number of augmented versions of the source signal to generate.

augmented: List[numpy.ndarray] or numpy.ndarray

The augmented versions (or version if n=1) of the source signal, clipped so that they fit into the \([-1,1]\) range required for 32-bit floating point WAVs.


>>> from sigment import *
>>> # Create a transformation or quantifier object.
>>> transform = ...
>>> # Generate 5 augmented versions of the signal data from 'signal.wav' as numpy.ndarrays.
>>> transformed = transform.generate_from_wav('signal.wav', n=5)