Transformations¶

Audio signal transformations in Sigment are represented by classes that can be used to apply a specific type of transformation to the audio data.

Some example transformation classes include GaussianWhiteNoise and TimeStretch. A full list of available transformations and their details and parameters can be found below.

Each of these transformation classes are a subclass of the more generic Transform base class, which provides a basic interface that can also be used to write custom transformations.

Sigment offers a familiar interface for transformations, taking inspiration from popular augmentation libraries such as imgaug, nlpaug, albumentations and audiomentations.

Section contents

Available transformations
Using transformations

Available transformations ¶

Identity ¶

class sigment.transforms.Identity[source]¶

Applies an identity transformation to a signal.

Notes

A sampling rate is not required when applying this transformation.

Additive Gaussian White Noise ¶

class sigment.transforms.GaussianWhiteNoise(scale, p=1.0, random_state=None)[source]¶

Applies additive Gaussian white noise to the signal.

Parameters

scale: float [scale > 0] or (float, float): Amount to scale the value sampled from the standard normal distribution.

Essentially the variance $\sigma^2$.

Notes

A sampling rate is not required when applying this transformation.

Time Stretch ¶

class sigment.transforms.TimeStretch(rate, p=1.0, random_state=None)[source]¶

Stretches the duration or speed of the signal without affecting its pitch.

Parameters

rate: float [rate > 0] or (float, float)

Stretch rate.

If rate < 1, the signal is slowed down.
If rate > 1, the signal is sped up.

Notes

A sampling rate is not required when applying this transformation.

Pitch Shift ¶

class sigment.transforms.PitchShift(n_steps, p=1.0, random_state=None)[source]¶

Shifts the pitch of the signal without changing its duration or speed.

Parameters

n_steps: float [-12 ≤ n_steps ≤ 12] or (float, float): Number of semitones to shift.

Notes

A sampling rate is required when applying this transformation.

Edge Crop ¶

class sigment.transforms.EdgeCrop(side, crop_size, p=1.0, random_state=None)[source]¶

Crops a section from the start or end of the signal.

Parameters

side: {‘start’, ‘end’}: The side of the signal to crop.
crop_size: float [0 < crop_size ≤ 0.5] or (float, float): The fraction of the signal duration to crop from the chosen side.

Notes

A sampling rate is not required when applying this transformation.

Random Crop ¶

class sigment.transforms.RandomCrop(crop_size, n_crops, p=1.0, random_state=None)[source]¶

Randomly crops multiple sections from the signal.

Parameters

crop_size: float [0 < crop_size < 1] or (float, float): The fraction of the signal duration to crop.
n_crops: int [n_crops > 0] or (int, int): The number of random crops of size crop_size to make.

Notes

Chunking is done according to the algorithm defined at [1].
crop_size $\times$ n_crops must not exceed 1.
A sampling rate is not required when applying this transformation.

References

1: https://stackoverflow.com/a/49944026

Linear Fade In/Out ¶

class sigment.transforms.LinearFade(direction, fade_size, p=1.0, random_state=None)[source]¶

Linearly fades the signal in or out.

Parameters

direction: {‘in’, ‘out’}: The direction to fade the signal.
fade_size: float [0 < fade_size ≤ 0.5] or (float, float): The fraction of the signal to fade in the chosen direction.

Notes

A sampling rate is not required when applying this transformation.

Normalize ¶

class sigment.transforms.Normalize(independent=True, p=1.0, random_state=None)[source]¶

Normalizes the signal by dividing each sample by the maximum absolute sample amplitude.

Parameters

independent: bool: Whether or not to normalize each channel independently.

Notes

A sampling rate is not required when applying this transformation.

Pre-Emphasize ¶

class sigment.transforms.PreEmphasize(alpha=0.95, p=1.0, random_state=None)[source]¶

Pre-emphasizes the signal by applying a first-order high-pass filter.

\[\begin{split}x'[t] = \begin{cases} x[t] & \text{if $t=0$} \\ x[t] - \alpha x[t-1] & \text{otherwise} \end{cases}\end{split}\]

Parameters

alpha: float [0 < alpha ≤ 1] or (float, float): Pre-emphasis coefficient.

Notes

A sampling rate is not required when applying this transformation.

Extract Loudest Section ¶

class sigment.transforms.ExtractLoudestSection(duration, p=1.0, random_state=None)[source]¶

Extracts the loudest section from the signal using sliding window aggregation over amplitudes.

Parameters

duration: float [0 < duration ≤ 1] or (float, float): The duration of the section to extract, as a fraction of the original signal duration.

Notes

See [2] for more details on the implementation.
A sampling rate is not required when applying this transformation.

References

2: https://github.com/petewarden/extract_loudest_section

Median Filter ¶

class sigment.transforms.MedianFilter(window_size, p=1.0, random_state=None)[source]¶

Applies a median filter to the signal.

\[x'[t] = \mathrm{median} \underbrace{\Big[ \ldots, x[t-1], x[t], x[t+1], \ldots \Big]}_\text{window size}\]

Parameters

window_size: int [window_size > 1] or (int, int): The size of the window of neighbouring samples.

Notes

A sampling rate is not required when applying this transformation.

Reverberate ¶

class sigment.transforms.Reverb(delay, decay, p=1.0, random_state=None)[source]¶

Applies reverb to the signal.

Parameters

delay: float [0 < delay ≤ 1] or (float, float): Fraction of signal diration to delay reverberated samples by.
decay: float [0 < decay ≤ 1] or (float, float): Scalar to decay reverberated samples by.

Notes

See [3] for more details on the implementation.

References

3: https://stackoverflow.com/a/1117249

Clipping Distortion ¶

class sigment.transforms.ClipDistort(percentile, independent=False, p=1.0, random_state=None)[source]¶

Applies clipping distortion to the signal according to a percentile clipping threshold.

Parameters

percentile: int [0 < percentile ≤ 100]: Percentile of sample amplitudes to use as a clipping threshold.
independent: boolean: Whether or not to independently distort channels by calculating individual percentiles.

Using transformations ¶

Each transformation class comes with a number of methods that can be used to apply the transformation to either a numpy.ndarray or WAV file.

All transformations accept the p and random_state parameters, inherited from the Transform base class described below.

class sigment.transforms.Transform(p, random_state)[source]¶

Base class representing a single transformation or augmentation.

Note

As this is a base class, it should not be directly instantiated.

You can however, use it to create your own transformations, following the implementation of the pre-defined transformations in Sigment.

Parameters

p: float [0 ≤ p ≤ 1]: The probability of executing the transformation.
random_state: numpy.RandomState, int, optional: A random state object or seed for reproducible randomness.

__call__(self, X, sr=None)[source]¶

Runs the transformation on a provided input signal.

Parameters

X: numpy.ndarray [shape (T,) or (1xT) for mono, (2xT) for stereo]: The input signal to transform.
sr: int [sr > 0], optional: The sample rate for the input signal.

Note

Not required if using transformations that do not require a sample rate.

Returns

transformed: numpy.ndarray [shape (T,) for mono, (2xT) for stereo]: The transformed signal, clipped so that it fits into the $[-1,1]$ range required for 32-bit floating point WAVs.

Note

If a mono signal X of shape (1xT) was used, the output is reshaped to (T,).

Examples

>>> import numpy as np
>>> from sigment.transforms import PitchShift
>>> # Create an example stereo signal.
>>> X = np.array([
>>>     [0.325, 0.53 , 0.393, 0.211],
>>>     [0.21 , 0.834, 0.022, 0.38 ]
>>> ])
>>> # Create the pitch-shifting transformation object.
>>> shift = PitchShift(n_steps=(-1., 1.))
>>> # Run the __call__ method on the transformation object to transform X.
>>> # NOTE: Pitch shifting requires a sample rate when called.
>>> X_shift = shift(X, sr=10)

generate(self, X, n, sr=None)[source]¶

Runs the transformation on a provided input signal, producing multiple augmented copies of the input signal.

Parameters

X: numpy.ndarray [shape (T,) or (1xT) for mono, (2xT) for stereo]: The input signal to transform.
n: int [n > 0]: Number of augmented copies of X to generate.
sr: int [sr > 0], optional: The sample rate for the input signal.

Note

Not required if using transformations that do not require a sample rate.

Returns

augmented: List[numpy.ndarray] or numpy.ndarray: The augmented copies (or copy if n=1) of the signal X, clipped so that they fit into the $[-1,1]$ range required for 32-bit floating point WAVs.

Note

If a mono signal X of shape (1xT) was used, the output is reshaped to (T,).

Examples

>>> import numpy as np
>>> from sigment.transforms import GaussianWhiteNoise
>>> # Create an example stereo signal.
>>> X = np.array([
>>>     [0.325, 0.53 , 0.393, 0.211],
>>>     [0.21 , 0.834, 0.022, 0.38 ]
>>> ])
>>> # Create the Gaussian white noise transformation object.
>>> add_noise = GaussianWhiteNoise(scale=(0.05, 0.15))
>>> # Generate 5 augmented versions of X, using the noise transformation.
>>> Xs_noisy = add_noise.generate(X, n=5)

apply_to_wav(self, source, out=None)¶

Applies the augmentation to the provided input WAV file and writes the resulting signal back to a WAV file.

Note

The resulting signal is always clipped so that it fits into the $[-1,1]$ range required for 32-bit floating point WAVs.

Parameters

source: str, Path or path-like: Path to the input WAV file.
out: str, Path or path-like: Output WAV path for the augmented signal.

Warning

If out is set to None (which is the default) or the same as source, the input WAV file will be overwritten!

Examples

>>> from sigment import *
>>> # Create a transformation or quantifier object.
>>> transform = ...
>>> # Apply the transformation to the input WAV file and write it to the output file
>>> transform.apply_to_wav('in.wav', 'out.wav')

generate_from_wav(self, source, n=1)¶

Applies the augmentation to the provided input WAV file and returns a numpy.ndarray.

Parameters

source: str, Path or path-like: Path to the input WAV file.
n: int [n > 0]: Number of augmented versions of the source signal to generate.

Returns

augmented: List[numpy.ndarray] or numpy.ndarray: The augmented versions (or version if n=1) of the source signal, clipped so that they fit into the $[-1,1]$ range required for 32-bit floating point WAVs.

Examples

>>> from sigment import *
>>> # Create a transformation or quantifier object.
>>> transform = ...
>>> # Generate 5 augmented versions of the signal data from 'signal.wav' as numpy.ndarrays.
>>> transformed = transform.generate_from_wav('signal.wav', n=5)