calpy.dsp package

Submodules

calpy.dsp.audio_features module

calpy.dsp.audio_features.dB_profile(signal, sampling_rate, time_step=0.01, frame_window=0.025)[source]

Computes decible of signal amplitude of an entire conversation

Args:
signal (numpy.array(float)): Padded audio signal. sampling_rate (float): Sampling frequency in Hz. time_step (float, optional): The time interval (in seconds) between two dB values. Default to 0.01. frame_window (float, optional): The length of speech (in seconds) used to estimate dB. Default to 0.025.
Returns:
numpy.array(float): The decibles.
calpy.dsp.audio_features.get_pause_length(pauses)[source]

Compute the length of pause. Args:

pauses (numpy array, bool): True indicates occurrence of pause.
Returns:
res (numpy array): The length of consecutive pauses.
calpy.dsp.audio_features.mfcc_profile(signal, sampling_rate, time_step=0.01, frame_window=0.025, NFFT=512, nfilt=40, ceps=12)[source]

Compute MFCC for a long (usually over an entire conversation) sound signal.

Reference: http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html

Args:
signal (numpy.array(float)): Padded audio signal. sampling_rate (float): Sampling frequency in Hz. time_step (float, optional): The time interval (in seconds) between two MFCC. Default to 0.01. frame_window (float, optional): The length of speech (in seconds) used to estimate MFCC. Default to 0.025. NFFT (int, optional): NFFT-point FFT. Defaults to 512. nfilt (int, optional): Number of frequency bands in Mel-scaling. Defaults to 40. ceps (int, optional): Number of mel frequency ceptral coefficients to be retained. Defaults to 12.
Returns:
numpy.array() : Calculated Mel-Frequecy Cepstral Coefficients Matrix.
calpy.dsp.audio_features.pause_length_histogram(pauses, min_silence_duration=0.01, bins=30)[source]

Compute the histogram of pause lenghth. Args:

pauses (numpy array, bool): True indicates occurrence of pause. min_silence_duration (float, optional): The minimum duration in seconds to be considered pause. If not provided, then default to 0.01. bins (int, optional): Defines the number of equal-width bins in the given range. Defaults to 30.
Returns:
hist (numpy array): The values of the histogram. bin_edges (numpy array, float): the bin edges (length(hist)+1) in seconds.
calpy.dsp.audio_features.pause_profile(signal, sampling_rate, min_silence_duration=0.01, time_step=0.01, frame_window=0.025)[source]

Find pauses in audio.

Args:
signal (numpy.array(float)): Audio signal. sampling_rate (float): Sampling frequency in Hz. min_silence_duration (float, optional): The minimum duration in seconds to be considered pause. Default to 0.01. time_step (float, optional): The time interval (in seconds) between two pauses. Default to 0.01. frame_window (float, optional): The length of speech (in seconds) used to estimate pause. Default to 0.025.
Returns:
numpy.array(float): 0-1 1D numpy integer array with 1s marking sounding.
calpy.dsp.audio_features.pitch_profile(signal, sampling_rate, time_step=0.01, frame_window=0.025, lower_threshold=75, upper_threshold=255)[source]

Compute pitch for a long (usually over an entire conversation) sound signal

Args:
signal (numpy.array(float)): Padded audio signal. sampling_rate (float): Sampling frequency in Hz. time_step (float, optional): The time interval (in seconds) between two pitches. Default to 0.01. frame_window (float, optional): The length of speech (in seconds) used to estimate pitch. Default to 0.025. lower_threshold (int, optional): Defaults to 75. upper_threshold (int, optional): Defaults to 225.
Returns:
numpy.array(float): Estimated pitch in Hz.
calpy.dsp.audio_features.remove_long_pauses(inputfilename, outputfilename, long_pause=0.5, min_silence_duration=0.01)[source]

Remove long pauses/silence in a wav file.

Args:
inputfilename (string): file name of input wav. outputfilename (string): file name of output wav. long_pause (float, optional): minimum duration of silence to be considered a long pause, in seconds. Defaults to 0.5. min_silence_duration (float, optional): The minimum duration in seconds to be considered pause. Default to 0.01.
Returns:
NULL: writes a wav file to disk.

calpy.dsp.yin module

calpy.dsp.yin.absolute_threshold(signal, threshold)[source]

Absolute thresholdeshold. Step 4 in YIN.

Args:
signal (numpy.array(float)): A small piece normalised self correlated audio d’(t, tau) processed by normalisation(). 1D array like. threshold (float): Threshold value.
Returns:
float: The index tau.
calpy.dsp.yin.difference_function(signal)[source]

Calculate difference function of the signal. Step 1 and 2 of YIN.

Args:
signal (numpy.array(float)): A short audio signal. 1D array.
Returns:
numpy.array(float): Equation (6) of YIN. The difference function d(t, tau). 1D array.
calpy.dsp.yin.instantaneous_pitch(signal, sampling_frequency, threshold=0.1)[source]

Computes fundamental frequency (based on YIN) as pitch of a given (usually a very short) time interval.

Code is an adpation of https://github.com/ashokfernandez/Yin-Pitch-Tracking.

Args:
signal (numpy.array(float)): Audio signal. 1D array. sampling_frequency (int): Sampling frequency in Hz. threshold (float,optional): Absolute thresholdeshold value as defined in Step 4 of YIN. Default 0.1
Returns:
f0: fundamental frequency in Hz (estimated speech pitch), a float
calpy.dsp.yin.normalisation(signal)[source]

Normalise the difference function by the cumulative mean. Step 3 of YIN.

Args:
signal (numpy.array(float)): A small piece of self correlated audio signal d(t, tau) processed by difFunction(). 1D array.
Returns:
numpy.array(float): Equation (8) of YIN. Normalised difference function d’(t, tau). 1D array.
calpy.dsp.yin.parabolic_interpolation(signal, tau)[source]

Parabolic Interpolation on tau. Step 5 in YIN.

Args:
signal (numpy.array(float)): A small piece normalised self correlated audio d’(t, tau) processed by normalisation(). 1D array. tau (int): Estimated thresholdeshold.
Returns:
float: A better estimation of tau.