calpy.dsp package¶
Submodules¶
calpy.dsp.audio_features module¶
-
calpy.dsp.audio_features.
dB_profile
(signal, sampling_rate, time_step=0.01, frame_window=0.025)[source]¶ Computes decible of signal amplitude of an entire conversation
- Args:
- signal (numpy.array(float)): Padded audio signal. sampling_rate (float): Sampling frequency in Hz. time_step (float, optional): The time interval (in seconds) between two dB values. Default to 0.01. frame_window (float, optional): The length of speech (in seconds) used to estimate dB. Default to 0.025.
- Returns:
- numpy.array(float): The decibles.
-
calpy.dsp.audio_features.
get_pause_length
(pauses)[source]¶ Compute the length of pause. Args:
pauses (numpy array, bool): True indicates occurrence of pause.- Returns:
- res (numpy array): The length of consecutive pauses.
-
calpy.dsp.audio_features.
mfcc_profile
(signal, sampling_rate, time_step=0.01, frame_window=0.025, NFFT=512, nfilt=40, ceps=12)[source]¶ Compute MFCC for a long (usually over an entire conversation) sound signal.
Reference: http://haythamfayek.com/2016/04/21/speech-processing-for-machine-learning.html
- Args:
- signal (numpy.array(float)): Padded audio signal. sampling_rate (float): Sampling frequency in Hz. time_step (float, optional): The time interval (in seconds) between two MFCC. Default to 0.01. frame_window (float, optional): The length of speech (in seconds) used to estimate MFCC. Default to 0.025. NFFT (int, optional): NFFT-point FFT. Defaults to 512. nfilt (int, optional): Number of frequency bands in Mel-scaling. Defaults to 40. ceps (int, optional): Number of mel frequency ceptral coefficients to be retained. Defaults to 12.
- Returns:
- numpy.array() : Calculated Mel-Frequecy Cepstral Coefficients Matrix.
-
calpy.dsp.audio_features.
pause_length_histogram
(pauses, min_silence_duration=0.01, bins=30)[source]¶ Compute the histogram of pause lenghth. Args:
pauses (numpy array, bool): True indicates occurrence of pause. min_silence_duration (float, optional): The minimum duration in seconds to be considered pause. If not provided, then default to 0.01. bins (int, optional): Defines the number of equal-width bins in the given range. Defaults to 30.- Returns:
- hist (numpy array): The values of the histogram. bin_edges (numpy array, float): the bin edges (length(hist)+1) in seconds.
-
calpy.dsp.audio_features.
pause_profile
(signal, sampling_rate, min_silence_duration=0.01, time_step=0.01, frame_window=0.025)[source]¶ Find pauses in audio.
- Args:
- signal (
numpy.array(float)
): Audio signal. sampling_rate (float): Sampling frequency in Hz. min_silence_duration (float, optional): The minimum duration in seconds to be considered pause. Default to 0.01. time_step (float, optional): The time interval (in seconds) between two pauses. Default to 0.01. frame_window (float, optional): The length of speech (in seconds) used to estimate pause. Default to 0.025. - Returns:
- numpy.array(float): 0-1 1D numpy integer array with 1s marking sounding.
-
calpy.dsp.audio_features.
pitch_profile
(signal, sampling_rate, time_step=0.01, frame_window=0.025, lower_threshold=75, upper_threshold=255)[source]¶ Compute pitch for a long (usually over an entire conversation) sound signal
- Args:
- signal (numpy.array(float)): Padded audio signal. sampling_rate (float): Sampling frequency in Hz. time_step (float, optional): The time interval (in seconds) between two pitches. Default to 0.01. frame_window (float, optional): The length of speech (in seconds) used to estimate pitch. Default to 0.025. lower_threshold (int, optional): Defaults to 75. upper_threshold (int, optional): Defaults to 225.
- Returns:
- numpy.array(float): Estimated pitch in Hz.
-
calpy.dsp.audio_features.
remove_long_pauses
(inputfilename, outputfilename, long_pause=0.5, min_silence_duration=0.01)[source]¶ Remove long pauses/silence in a wav file.
- Args:
- inputfilename (string): file name of input wav. outputfilename (string): file name of output wav. long_pause (float, optional): minimum duration of silence to be considered a long pause, in seconds. Defaults to 0.5. min_silence_duration (float, optional): The minimum duration in seconds to be considered pause. Default to 0.01.
- Returns:
- NULL: writes a wav file to disk.
calpy.dsp.yin module¶
-
calpy.dsp.yin.
absolute_threshold
(signal, threshold)[source]¶ Absolute thresholdeshold. Step 4 in YIN.
- Args:
- signal (
numpy.array(float)
): A small piece normalised self correlated audio d’(t, tau) processed by normalisation(). 1D array like. threshold (float): Threshold value. - Returns:
- float: The index tau.
-
calpy.dsp.yin.
difference_function
(signal)[source]¶ Calculate difference function of the signal. Step 1 and 2 of YIN.
- Args:
- signal (
numpy.array(float)
): A short audio signal. 1D array. - Returns:
numpy.array(float)
: Equation (6) of YIN. The difference function d(t, tau). 1D array.
-
calpy.dsp.yin.
instantaneous_pitch
(signal, sampling_frequency, threshold=0.1)[source]¶ Computes fundamental frequency (based on YIN) as pitch of a given (usually a very short) time interval.
Code is an adpation of https://github.com/ashokfernandez/Yin-Pitch-Tracking.
- Args:
- signal (
numpy.array(float)
): Audio signal. 1D array. sampling_frequency (int): Sampling frequency in Hz. threshold (float,optional): Absolute thresholdeshold value as defined in Step 4 of YIN. Default 0.1 - Returns:
- f0: fundamental frequency in Hz (estimated speech pitch), a float
-
calpy.dsp.yin.
normalisation
(signal)[source]¶ Normalise the difference function by the cumulative mean. Step 3 of YIN.
- Args:
- signal (
numpy.array(float)
): A small piece of self correlated audio signal d(t, tau) processed by difFunction(). 1D array. - Returns:
numpy.array(float)
: Equation (8) of YIN. Normalised difference function d’(t, tau). 1D array.
-
calpy.dsp.yin.
parabolic_interpolation
(signal, tau)[source]¶ Parabolic Interpolation on tau. Step 5 in YIN.
- Args:
- signal (
numpy.array(float)
): A small piece normalised self correlated audio d’(t, tau) processed by normalisation(). 1D array. tau (int): Estimated thresholdeshold. - Returns:
- float: A better estimation of tau.