Transformations¶
Audio signal transformations in Sigment are represented by classes that can be used to apply a specific type of transformation to the audio data.
Some example transformation classes include GaussianWhiteNoise
and TimeStretch
. A full
list of available transformations and their details and parameters can be found below.
Each of these transformation classes are a subclass of the more generic Transform
base class,
which provides a basic interface that can also be used to write custom transformations.
Sigment offers a familiar interface for transformations, taking inspiration from popular augmentation libraries such as imgaug, nlpaug, albumentations and audiomentations.
Section contents
Available transformations¶
Additive Gaussian White Noise¶
-
class
sigment.transforms.
GaussianWhiteNoise
(scale, p=1.0, random_state=None)[source]¶ Applies additive Gaussian white noise to the signal.
- Parameters
- scale: float [scale > 0] or tuple
- Amount to scale the value sampled from the standard normal distribution.Essentially the variance \(\sigma^2\).
Notes
A sampling rate is not required when applying this transformation.
Time Stretch¶
-
class
sigment.transforms.
TimeStretch
(rate, p=1.0, random_state=None)[source]¶ Stretches the duration or speed of the signal without affecting its pitch.
- Parameters
- rate: float [rate > 0] or tuple
Stretch rate.
If rate < 1, the signal is slowed down.
If rate > 1, the signal is sped up.
Notes
A sampling rate is not required when applying this transformation.
Pitch Shift¶
-
class
sigment.transforms.
PitchShift
(n_steps, p=1.0, random_state=None)[source]¶ Shifts the pitch of the signal without changing its duration or speed.
- Parameters
- n_steps: float [-12 < n_steps < 12] or tuple
Number of semitones to shift.
Notes
A sampling rate is required when applying this transformation.
Edge Crop¶
-
class
sigment.transforms.
EdgeCrop
(side, crop_size, p=1.0, random_state=None)[source]¶ Crops a section from the start or end of the signal.
- Parameters
- side: {‘start’, ‘end’}
The side of the signal to crop.
- crop_size: float [0 < crop_size < 1] or tuple
The fraction of the signal duration to crop from the chosen
side
.
Notes
A sampling rate is not required when applying this transformation.
Random Crop¶
-
class
sigment.transforms.
RandomCrop
(crop_size, n_crops, p=1.0, random_state=None)[source]¶ Randomly crops multiple sections from the signal.
- Parameters
- crop_size: float [0 < crop_size < 1] or tuple
The fraction of the signal duration to crop.
- n_crops: int [n_crops > 0] or tuple
The number of random crops of size
crop_size
to make.
Notes
Chunking is done according to the algorithm defined at [1].
crop_size
\(\times\)n_crops
must not exceed 1.A sampling rate is not required when applying this transformation.
References
Linear Fade In/Out¶
-
class
sigment.transforms.
LinearFade
(direction, fade_size, p=1.0, random_state=None)[source]¶ Linearly fades the signal in or out.
- Parameters
- direction: {‘in’, ‘out’}
The direction to fade the signal.
- fade_size: float [0 < fade_size < 0.5] or tuple
The fraction of the signal to fade in the chosen
direction
.
Notes
A sampling rate is not required when applying this transformation.
Normalize¶
-
class
sigment.transforms.
Normalize
(independent=True, p=1.0, random_state=None)[source]¶ Normalizes the signal by dividing each sample by the maximum absolute sample amplitude.
- Parameters
- independent: bool
Whether or not to normalize each channel independently.
Notes
A sampling rate is not required when applying this transformation.
Pre-Emphasize¶
-
class
sigment.transforms.
PreEmphasize
(alpha=0.95, p=1.0, random_state=None)[source]¶ Pre-emphasizes the signal by applying a first-order high-pass filter.
\[\begin{split}x'[t] = \begin{cases} x[t] & \text{if $t=0$} \\ x[t] - \alpha x[t-1] & \text{otherwise} \end{cases}\end{split}\]- Parameters
- alpha: float [0 < alpha < 1] or tuple
Pre-emphasis coefficient.
Notes
A sampling rate is not required when applying this transformation.
Extract Loudest Section¶
-
class
sigment.transforms.
ExtractLoudestSection
(duration, p=1.0, random_state=None)[source]¶ Extracts the loudest section from the signal using sliding window aggregation over amplitudes.
- Parameters
- duration: float [0 < duration < 1] or tuple
The duration of the section to extract, as a fraction of the original signal duration.
Notes
See [2] for more details on the implementation.
A sampling rate is not required when applying this transformation.
References
Median Filter¶
-
class
sigment.transforms.
MedianFilter
(window_size, p=1.0, random_state=None)[source]¶ Applies a median filter to the signal.
\[x'[t] = \mathrm{median} \underbrace{\Big[ \ldots, x[t-1], x[t], x[t+1], \ldots \Big]}_\text{window size}\]- Parameters
- window_size: int [window_size > 1] or tuple
The size of the window of neighbouring samples.
Notes
A sampling rate is not required when applying this transformation.
Using transformations¶
Each transformation class comes with a number of methods that can be used to apply the transformation to either a numpy.ndarray
or WAV file.
All transformations accept the p and random_state parameters, inherited from the Transform
base class described below.
-
class
sigment.transforms.
Transform
(p, random_state)[source]¶ Base class representing a single transformation or augmentation.
Note
As this is a base class, it should not be directly instantiated.
You can however, use it to create your own transformations, following the implementation of the pre-defined transformations in Sigment.
- Parameters
- p: float [0 <= p <= 1]
The probability of executing the transformation.
- random_state: numpy.RandomState, int, optional
A random state object or seed for reproducible randomness.
-
__call__
(self, X, sr=None)[source]¶ Runs the transformation on a provided input signal.
- Parameters
- X: numpy.ndarray [shape (T,) or (1xT) for mono, (2xT) for stereo]
The input signal to transform.
- sr: int, optional
The sample rate for the input signal.
Note
Not required if the transformation does not rely on sr.
- Returns
- transformed: numpy.ndarray [shape (T,) for mono, (2xT) for stereo]
The transformed signal.
Note
If a mono signal X of shape (1xT) was used, the output is reshaped to (T,).
Examples
>>> import numpy as np >>> from sigment.transforms import PitchShift >>> # Create an example stereo signal. >>> X = np.array([ >>> [0.325, 0.53 , 1.393, 1.211], >>> [1.21 , 0.834, 1.022, 0.38 ] >>> ]) >>> # Create the pitch-shifting transformation object. >>> shift = PitchShift(n_steps=(-1., 1.)) >>> # Run the __call__ method on the transformation object to transform X. >>> # NOTE: Pitch shifting requires a sample rate when called. >>> X_shift = shift(X, sr=10)
-
generate
(self, X, n, sr=None)[source]¶ Runs the transformation on a provided input signal, producing multiple augmented copies of the input signal.
- Parameters
- X: numpy.ndarray [shape (T,) or (1xT) for mono, (2xT) for stereo]
The input signal to transform.
- n: int [n > 0]
Number of augmented copies of X to generate.
- sr: int, optional
The sample rate for the input signal.
Note
Not required if not using transformations that require a sample rate.
- Returns
- augmented: List[numpy.ndarray] or numpy.ndarray
The augmented copies (or copy if n=1) of the signal X.
Note
If a mono signal X of shape (1xT) was used, the output is reshaped to (T,).
Examples
>>> import numpy as np >>> from sigment.transforms import GaussianWhiteNoise >>> # Create an example stereo signal. >>> X = np.array([ >>> [0.325, 0.53 , 1.393, 1.211], >>> [1.21 , 0.834, 1.022, 0.38 ] >>> ]) >>> # Create the Gaussian white noise transformation object. >>> add_noise = GaussianWhiteNoise(scale=(0.05, 0.15)) >>> # Generate 5 augmented versions of X, using the noise transformation. >>> Xs_noisy = add_noise.generate(X, n=5)
-
apply_to_wav
(self, source, out=None)¶ Applies the augmentation to the provided input WAV file and writes the resulting signal back to a WAV file.
- Parameters
- source: str, Path or path-like
Path to the input WAV file.
- out: str, Path or path-like
Output WAV path for the augmented signal.
Warning
If out is set to
None
(which is the default) or the same as source, the input WAV file will be overwritten!
Examples
>>> import numpy as np >>> from sigment.transforms import Identity >>> # Create the identity transformation object. >>> identity = Identity() >>> # Apply the transformation to the input WAV file and write it to the output file >>> identity.apply_to_wav('in.wav', 'out.wav')
-
generate_from_wav
(self, source, n=1)¶ Applies the augmentation to the provided input WAV file and returns a
numpy.ndarray
.- Parameters
- source: str, Path or path-like
Path to the input WAV file.
- n: int [n > 0]
Number of augmented versions of the source signal to generate.
- Returns
- augmented: List[numpy.ndarray] or numpy.ndarray
The augmented versions (or version if n=1) of the source signal.
Examples
>>> import numpy as np >>> from sigment.transforms import LinearFade >>> # Create the fade-in transformation object. >>> fade_in = LinearFade(direction='in', fade_size=(0.025, 0.1)) >>> # Generate 5 augmented versions of the signal data from 'signal.wav' as numpy.ndarrays, using the fade-in transformation. >>> Xs_faded = fade_in.generate_from_wav('signal.wav', n=5)
Quantifiers¶
Quantifiers are used to specify rules for how a sequence of transformations or quantifiers should be applied.
Each quantifier class is a subclass of the more generic Quantifier
base class,
which provides a basic interface that can also be used to write custom quantifiers,
though there is rarely a need for this.
As with transformations, Sigment offers a familiar interface for quantifiers, taking inspiration from popular augmentation libraries such as imgaug and nlpaug.
Section contents
Available quantifiers¶
In the below table, the steps argument is of type List[Transform, Quantifier]
, specifying a sequence of transformations or quantifiers to be applied.
Quantifier |
Summary |
---|---|
Quantifier (Base)Quantifier(steps, **kwargs) |
A base class representing a single quantifier. |
Main parameters
• None
|
|
Notes: As this is a base class, it should not be initialized. |
|
PipelinePipeline(steps, **kwargs) |
Sequentially executes each transformation or quantifier step. |
Main parameters
• None
|
|
Notes: None |
|
SometimesSometimes(steps, p, **kwargs) |
Probabilistically applies the provided transformation or quantifier steps. |
Main parameters
• p:
0 <= float <= 1 The probability of executing the
transformation or quantifier steps.
|
|
Notes: None |
|
SomeSomeOf(steps, n, **kwargs) |
Randomly applies a number of the provided transformation or quantifier steps. |
Main parameters
• n:
tuple or int > 0 The number of transformation or quantifier
steps to apply.
|
|
Notes: The chosen steps will still be applied in the same order they were defined by default. |
|
OneOneOf(steps, **kwargs) |
Randomly applies a single step from the provided transformation or quantifier steps. |
Main parameters
• None
|
|
Notes: This is a special case of the
|
Using quantifiers¶
-
class
sigment.quantifiers.
Quantifier
(steps, [main params, ]random_order=False, random_state=None)¶ Base class representing a single quantifier.
Note
As
Quantifier
is a base class, it should not be directly instantiated – use one of the quantifier classes listed above.- Parameters
steps (
List[Transform, Quantifier]
) – A collection of transformation or quantifier steps to apply.random_order (
bool
) – Whether or not to randomize the order of execution of steps.random_state (
numpy.RandomState
,int
orNone
) – A random state object or seed for reproducible randomness.
-
__call__
(self, X, sr=None)¶ Runs the quantifier steps on a provided input signal.
- Parameters
X (
numpy.ndarray
\((T,)\) or \((1\times T)\) for mono, \((2\times T)\) for stereo) – The input signal to transform.sr (
int
\(> 0\) orNone
) – Sample rate.
If the steps of the quantifier do not depend on a sample rate, this should beNone
(which is the default). See the transformations table to determine whether you need a sample rate or not.
- Returns
The transformed signal.
- Return type
numpy.ndarray
\((T,)\) for mono, \((2\times T)\) for stereo
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
import numpy as np from sigment.quantifiers import SomeOf from sigment.transforms import GaussianWhiteNoise, PitchShift, EdgeCrop # Create an example stereo signal. X = np.array([ [0.325, 0.53 , 1.393, 1.211], [1.21 , 0.834, 1.022, 0.38 ] ]) # Use the SomeOf quantifier to run only 1 to 2 of the transformations. transform = SomeOf([ GaussianWhiteNoise(scale=(0.05, 0.15)), PitchShift(n_steps=(-1., 1.)), EdgeCrop(side='start', crop_size=(0.02, 0.05)) ], n=(1, 2)) # Run the __call__ method on the quantifier object to transform X. # NOTE: Pitch shifting requires a sample rate when called, therefore # we must call the quantifier with a specified sample rate parameter. X_transform = transform(X, sr=10)
-
generate
(self, X, n, sr=None)¶ Runs the quantifier steps on a provided input signal, producing multiple augmented copies of the input signal.
- Parameters
X (
numpy.ndarray
\((T,)\) or \((1\times T)\) for mono, \((2\times T)\) for stereo) – The input signal to transform.n (
int
\(> 0\)) – Number of augmented versions of X to generate.sr (
int
\(> 0\) orNone
) – Sample rate.
If the steps of the quantifier do not depend on a sample rate, this should beNone
(which is the default). See the transformations table to determine whether you need a sample rate or not.
- Returns
The augmented versions (or version if n=1) of the signal X.
- Return type
List[numpy.ndarray]
ornumpy.ndarray
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
import numpy as np from sigment.quantifiers import Sometimes, OneOf from sigment.transforms import LinearFade, GaussianWhiteNoise # Create an example stereo signal. X = np.array([ [0.325, 0.53 , 1.393, 1.211], [1.21 , 0.834, 1.022, 0.38 ] ]) # Use the Sometimes and OneOf quantifiers to sometimes (with probability 0.65) # apply a Gaussian white noise transformation and either a fade-in or fade-out. transform = Sometimes([ GaussianWhiteNoise(scale=(0.05, 0.15)), OneOf([ LinearFade(direction='in', fade_size=(0.05, 0.1)), LinearFade(direction='out', fade_size=(0.05, 0.1)) ]) ], p=0.65) # Generate 5 augmented versions of X, using the quantifier object. Xs_transform = transform.generate(X, n=5)
-
apply_to_wav
(self, source, out=None)¶ Runs the quantifier steps on a provided input WAV file and writes the resulting signal back to a WAV file.
Warning
If out is set to
None
(which is the default) or the same as source, the input WAV file will be overwritten!- Parameters
source (
str
,Path
or path-like) – Path to the input WAV file.out (
str
,Path
or path-like) – Output WAV path for the augmented signal.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
from librosa import load from sigment import * # Load the stereo WAV audio file X, sr = load('audio.wav', mono=False) # Create a complex augmentation pipeline transform = Pipeline([ GaussianWhiteNoise(scale=(0.001, 0.0075), p=0.65), ExtractLoudestSection(duration=(0.85, 0.95)), OneOf([ RandomCrop(crop_size=(0.01, 0.04), n_crops=(2, 5)), SomeOf([ EdgeCrop('start', crop_size=(0.05, 0.1)), EdgeCrop('end', crop_size=(0.05, 0.1)) ], n=(1, 2)) ]), Sometimes([ SomeOf([ LinearFade('in', fade_size=(0.1, 0.2)), LinearFade('out', fade_size=(0.1, 0.2)) ], n=(1, 2)) ], p=0.5), TimeStretch(rate=(0.8, 1.2)), PitchShift(n_steps=(-0.25, 0.25)), MedianFilter(window_size=(5, 10), p=0.5) ], random_state=seed) # Apply the pipeline steps to the input WAV file and write it to the output file. transform.apply_to_wav('in.wav', 'out.wav')
-
generate_from_wav
(self, source, n=1)¶ Runs the quantifier steps on a provided input WAV file and returns a
numpy.ndarray
.- Parameters
source (
str
,Path
or path-like) – Path to the input WAV file.n (
int
\(> 0\)) – Number of augmented versions of the source signal to generate.
- Returns
The augmented versions (or version if n=1) of the source signal.
- Return type
List[numpy.ndarray]
ornumpy.ndarray
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
from sigment import * # Create a pipeline of multiple OneOf quantifiers. transform = Pipeline([ OneOf([ EdgeCrop(side='start', crop_size=(0.04, 0.08)), EdgeCrop(side='end', crop_size=(0.04, 0.08)) ]), OneOf([ LinearFade(direction='in', fade_size=(0.02, 0.05)), LinearFade(direction='out', fade_size=(0.02, 0.05)) ]) ]) # Generate 5 augmented versions of the signal data from 'signal.wav' as numpy.ndarrays. Xs_transform = transform.generate_from_wav('signal.wav', n=5)