Sound

Generating sounds

The Sound class provides methods for generating, manipulating, displaying, and analysing sound stimuli. You can generate typical experimental stimuli with this class, including tones, noises, and click trains, and also more specialized stimuli, like equally-masking noises, Schroeder-phase harmonics, iterated ripple noise and synthetic vowels. Slab methods assume sensible defaults where possible. You can call most methods without arguments to get an impression of what they do (f.i. slab.Sound.tone() returns a 1s-long 1kHz tone at 70 dB sampled at 8 kHz) and then customise from there. For instance, let’s make a 500 ms long 500 Hz pure tone signal with a band-limited (one octave below and above the tone) pink noise background with a 10 dB signal-to-noise ratio:

tone = slab.Sound.tone(frequency=500, duration=0.5)
tone.level = 80 # setting the intensity to 80 dB
noise = slab.Sound.pinknoise(duration=0.5)
noise.filter(frequency=(250, 1000), kind='bp') # bandpass .25 to 1 kHz
noise.level = 70 # 10 dB lower than the tone
stimulus = tone + noise # combine the two signals
stimulus = stimulus.ramp() # apply on- and offset ramps to avoid clicks
stimulus.play()

Sound objects have many useful methods for manipulating (like ramp(), filter(), and pulse()) or inspecting them (like waveform(), spectrum(), and spectral_feature()). A complete list is in the Reference documentation section, and the majority is also discussed here. If you use IPython, you can tap the tab key after typing slab.Sound., or the name of any Sound object followed by a full stop, to get an interactive list of the possibilities.

Sounds can also be created by recording them with slab.Sound.record(). For instance recording = slab.Sound.record(duration=1.0, samplerate=44100) will record a 1-second sound at 44100 Hz from the default audio input (usually the microphone). The record method uses SoundCard if installed, or SoX (via a temporary file) otherwise. Both are cross-platform and easy to install. If neither tool is installed, you won’t be able to record sounds.

Specifying durations

Sometimes it is useful to specify the duration of a stimulus in samples rather than seconds. All methods that generate sounds have a duration argument that accepts floating point numbers or integers. Floating point numbers are interpreted as durations in seconds (slab.Sound.tone(duration=1.0) results in a 1 second tone). Integers are interpreted as number of samples (slab.Sound.tone(duration=1000) gives you 1000 samples of a tone).

Setting the sample rate

We did not specify a sample rate for any of the stimuli in the examples above. When the samplerate argument of a sound-generating method is not specified, the default sample rate (8 kHz if not set otherwise) is used. It is possible to set a sample rate separately for each Sound object, but it is usually better to set a suitable default sample rate at the start of your script or Python session using slab.set_default_samplerate(). This rate is kept in the class variable _default_samplerate and is used whenever you call a sound generating method without specifying a rate. This rate depends on the frequency content of your stimuli and should be at least double the highest frequency of interest. For some speech sounds or narrow bad noises you might get away with 8 kHz; for spatial sounds you may need 48 kHz or more.

Specifying levels

Same as for the sample rate, sounds are generated at a default level (70 dB if not set otherwise). The default is kept in the class variable _default_level and you can set set it to a different value using slab.set_default_level(). Level are not specified directly when generating sounds, but rather afterwards by setting the level property:

sig = slab.Sound.pinknoise()
sig.level # return the current level
sig.level = 85 # set a new level

Note that the returned level will not be the actual physical playback level, because that depends on the playback hardware (soundcard, amplifiers, headphones, speakers). Calibrate your system if you need to play stimuli at a known level (see Calibrating the output).

Calibrating the output

Analogous to setting the default level at which sounds are generated with slab.set_default_level(). Each sound’s level can be set individually by changing its level property. Setting the level property of a stimulus changes the root-mean-square of the waveform and relative changes are correct (reducing the level attribute by 10 dB will reduce the sound output by the same amount), but the absolute intensity is only correct if you calibrate your output. The recommended procedure it to set your system volume to maximum, connect the listening hardware (headphone or loudspeaker) and set up a sound level meter. Then call slab.calibrate(). The calibrate() function will play a 1 kHz tone for 5 seconds. Note the recorded intensity on the meter and enter it when requested. The function returns a calibration intensity, i.e. difference between the tone’s level attribute and the recorded level. Pass this value to slab.set_calibration_intensity() to to correct the intensities returned by the level property all sounds. The calibration intensity is saved in the class variable _calibration_intensity. It is applied to all level calculations so that a sound’s level attribute now roughly corresponds to the actual output intensity in dB SPL—‘roughly’ because your output hardware may not have a flat frequency transfer function (some frequencies play louder than others). See Filters for methods to equalize transfer functions.

Experiments sometimes require you to play different stimuli at comparable loudness. Loudness is the perception of sound intensity and it is difficult to calculate. You can use the Sound.aweight() method of a sound to filter it so that frequencies are weighted according to the typical human hearing thresholds. This will increase the correspondence between the rms intensity measure returned by the level attribute and the perceived loudness. However, in most cases, controlling relative intensities is sufficient.

To increase the accuracy of the calibration for your experimental stimuli, pass a sound with a similar spectrum to slab.calibrate(). For instance, if your stimuli are wide band pink noises, then you may want to use a pink noise for calibration. The level of the noise should be high, but not cause clipping.

If you do not have a sound level meter, then you can present sounds in dB HL (hearing level). For that, measure the hearing threshold of the listener at the frequency or frequencies that are presented in your experiment and play your stimuli at a set level above that threshold. You can measure the hearing threshold at one frequency (or for any broadband sound) with the few lines of code (see Audiogram).

Saving and loading sounds

You can save sounds to wav files by calling the object’s Sound.write() method (signal.write('signal.wav')). By default, sounds are normalized to have a maximal amplitude of 1 to avoid clipping when writing the file. You should set signal.level to the intended level when loading a sound from file or disable normalization if you know what you are doing. You can load a wav file by initializing a Sound object with the filename: signal = slab.Sound('signal.wav').

Combining sounds

Several functions allow you to string stimuli together. For instance, in a forward masking experiment [1] we need a masking noise followed by a target sound after a brief silent interval. An example implementation of a complete experiment is discussed in the Psychoacoustics section, but here, we will construct the stimulus:

masker = slab.Sound.tone(frequency=550, duration=0.5) # a 0.5s 550 Hz tone
masker.level = 80 # at 80 dB
masker.ramp() # default 10 ms raised cosine ramps
silence = slab.Sound.silence(duration=0.01) # 10 ms silence
signal = slab.Sound.tone(duration=0.05) # using the default 500 Hz
signal.level = 80 # let's start at the same intensity as the masker
signal.ramp(duration=0.005) # short signal, we'll use 5 ms ramps
stimulus = slab.Sound.sequence(masker, silence, signal)
stimulus.play()

We can make a classic non-interactive demonstration of forward masking by playing these stimuli with decreasing signal level in a loop, once without the masker, and once with the masker. Count for how many steps you can hear the signal tone:

import time # we need the sleep function
for level in range(80, 10, -5): # down from 80 in steps of 5 dB
    signal.level = level
    signal.play()
    time.sleep(0.5)
# now with the masker
for level in range(80, 10, -5): # down from 80 in steps of 5 dB
    signal.level = level
    stimulus = slab.Sound.sequence(masker, silence, signal)
    stimulus.play()
    time.sleep(0.5)

Many listeners can hear all of the steps without the masker, but only the first 6 or 7 steps with the masker. This depends on the intensity at which you play the demo (see Calibrating the output below). The sequence() method is an example of list unpacking—you can provide any number of sounds to be concatenated. If you have a list of sounds, call the method like so: slab.Sound.sequence(*[list_of_sound_objects]) to unpack the list into function arguments.

Another method to put sounds together is crossfade(), which applies a crossfading between two sounds with a specified overlap in seconds. An interesting experimental use is in adaptation designs, in which one longer stimulus is played to adapt neuronal responses to its sound features, and then a new stimulus feature is introduced (but nothing else changes). Responses (measured for instance with EEG) at that point will be mostly due to that feature. A classical example is the pitch onset response, which is evoked when the temporal fine structure of a continuous noise is regularized to produce a pitch percept without altering the sound spectrum (see Krumbholz et al. (2003)). It is easy to generate the main stimulus of that study, a noise transitioning to an iterates ripple noise after two seconds, with 5 ms crossfade overlap, then filtered between 0.8 and 3.2 kHz:

slab.set_default_samplerate(16000) # we need a higher sample rate
slab.set_default_level(80)  # set the level for all sounds to 80 dB
adapter = slab.Sound.whitenoise(duration=2.0)
irn = slab.Sound.irn(frequency=125, n_iter=2, duration=1.0) # pitched sound
stimulus = slab.Sound.crossfade(adapter, irn, overlap=0.005) # crossfade
stimulus.filter(frequency=[800, 3200], kind='bp') # filter
stimulus.ramp(duration=0.005) # 5 ms on- and offset ramps
stimulus.spectrogram() # note that there is no change at the transition
stimulus.play() # but you can hear the onset of the regularity (pitch)

Plotting and analysis

You can inspect sounds by plotting the waveform(), spectrum(), or spectrogram():

from matplotlib import pyplot as plt
a = slab.Sound.vowel(vowel='a')
e = slab.Sound.vowel(vowel='e')
i = slab.Sound.vowel(vowel='i')
signal = slab.Sound.sequence(a,e,i)
import matplotlib.pyplot as plt # preparing a 2-by-2 figure
_, [[ax1, ax2], [ax3, ax4]] = plt.subplots(
                nrows=2, ncols=2, constrained_layout=True)
signal.waveform(axis=ax1, show=False)
signal.waveform(end=0.05, axis=ax2, show=False) # first 50ms
signal.spectrogram(upper_frequency=5000, axis=ax3, show=False)
signal.spectrum(axis=ax4)

(Source code, png, hires.png, pdf)

Instead of plotting, spectrum() and spectrogram() will return the time frequency bins and spectral power values for further analysis if you set the show argument to False. All plotting functions can draw into an existing matplotlib.pyplot axis supplied with the axis argument.

You can also extract common features from sounds, such as the crest_factor() (a measure of how ‘peaky’ the waveform is), or the average onset_slope() (a measure of how fast the on-ramps in the sound are—important for sound localization). Features of the spectral content are bundled in the spectral_feature() method. It can compute spectral centroid, flux, flatness, and rolloff, either for an entire sound (suitable for stationary sounds), or for successive time windows (frames, suitable for time-varying sounds). * The centroid is a measure of the center of mass of a spectrum (i.e. the ‘center’ frequency). * The flux measures how quickly the power spectrum is changing by comparing the power spectrum for one frame against the power spectrum from the previous frame; flatness measures how tone-like a sound is, as opposed to being noise-like, and is calculated by dividing the geometric mean of the power spectrum by the arithmetic mean (see Dubnov (2004)). * The rolloff measures the frequency at which the spectrum rolls off, typically used to find a suitable low-cutoff frequency that retains most of the sound power. These particular features are integrated in slab because we find them useful in our daily work. Many more features are available in packages specialised on audio processing, for instance librosa. librosa interfaces easily with slab, you can just hand the sample data and the sample rate of an slab object separately to most of its methods:

import librosa
sig = slab.Sound('music.wav') # load wav file into slab.Sound object
librosa.beat.beat_track(y=sig.data, sr=sig.samplerate)

When working with environmental sounds or other recorded stimuli, one often needs to compute relevant features for collections of recordings in different experimental conditions. The slab module contains a function slab.apply_to_path(), which applies a function to all sound files in a given folder and returns a dictionary of file names and computed features. In fact, you can also use that function to modify (for instance ramp and filter) all files in a folder.

For other time-frequency processing, the frames() provides an easy way to step through the signal in short windowed frames and compute some values from it. For instance, you could detect on- and offsets in the signal by computing the crest factor in each frame:

from matplotlib import pyplot as plt
signal.pulse() # apply a 4 Hz pulse to the 3 vowels from above
signal.waveform() # note the pulses
crest = [] # the short-term crest factor will show on- and offsets
frames = signal.frames(duration=64)
for f in frames:
    crest.append(f.crest_factor())
times = signal.frametimes(duration=64) # frame center times
import matplotlib.pyplot as plt
plt.plot(times, crest) # peaks in the crest factor mark intensity ramps

Binaural sounds

For experiments in spatial hearing, or any other situation that requires differential manipulation of the left and right channel of a sound, you can use the Binaural class. It inherits all methods from Sound and provides additional methods for generating and manipulating binaural sounds, including advanced interaural time and intensity manipulation.

Generating binaural sounds

Binaural sounds support all sound generating functions with a n_hannels attribute of the Sound class, but automatically set n_channels to 2. Noises support an additional kind argument, which can be set to ‘diotic’ (identical noise in both channels) or ‘dichotic’ (uncorrelated noise). Other methods just return 2-channel versions of the stimuli. You can recast any Sound object as Binaural sound, which duplicates the first channel if n_channels is 1 or greater than 2:

monaural = slab.Sound.tone()
monaural.n_channels
out: 1
binaural = slab.Binaural(monaural)
binaural.n_channels
out: 2
binaural.left # access to the left channel
binaural.right # access to the right channel

Loading a wav file with slab.Binaural('file.wav') returns a Binaural sound object with two channels (even if the wav file contains only one channel).

Manipulating ITD and ILD

The easiest manipulation of a binaural parameter may be to change the interaural level difference (ILD). This can be achieved by setting the level attributes of both channels:

noise = slab.Binaural.pinknoise()
noise.left.level = 75
noise.right.level = 85
noise.level
out: array([75., 85.])

The ild() makes this easier and keeps the overall level constant: noise.ild(10) amplifies the right channel by 5 dB and attenuates the left channel by the same amount to achieve a 10dB level difference. Positive dB values move the virtual sound source to the right and negative values move the source to the left. The pink noise in the example is a broadband signal, and the ILD is frequency dependent and should not be the same for all frequencies. A frequency-dependent level difference can be computed and applied with interaural_level_spectrum(). The level spectrum is computed from a head-related transfer function (HRTF) and can be customised for individual listeners. See HRTFs for how to handle these functions. The default level spectrum is computed form the HRTF of the KEMAR binaural recording mannequin (as measured by Gardener and Martin (1994) at the MIT Media Lab). The level spectrum takes a while to compute and it may be useful to save it. It is a Python dict containing the level differences in a numpy array along with a frequency vector, an azimuth vector, and the sample rate. You can save it for instance with pickle:

import pickle
ils = slab.Binaural.make_interaural_level_spectrum()
pickle.dump(ils, open('ils.pickle', 'wb')) # save using pickle
ils = pickle.load(open('ils.pickle', 'rb')) # load pickle

If the limitations of pickle worry you, you can use numpy.save with a small caveat when loading: numpy.save wraps the dict in an object and we need to remove that after loading with the somewhat strange index [()]:

import numpy
numpy.save('ils.npy', ils) # save using numpy
ils = numpy.load('ils.npy, allow_pickle=True)[()] # load and get the original dict from the wrapping object

If you are unsure which ILD value is appropriate, azimuth_to_ild() can compute ILDs corresponding to an azimuth angle, for instance 45 degrees, and a frequency:

slab.Binaural.azimuth_to_ild(45)
# -9.12  # correct ILD in dB
noise.ild(-9.12)  # apply the ILD

A dynamic ILD, which evokes the perception of a moving sound source, can be applied with ild_ramp(). The ramp is linear from and to a given ILD.

Similar functions exist to manipulate interaural time differences (ITD): itd(), azimuth_to_ild() (using a given head radius), and itd_ramp(). To present a signal from a given azimuth using both cues, use the at_azimuth(), which calculates the correct ILD and ITD for you and applies it.

ITD and ILD manipulation leads to the percept of lateralization, that is, a source somewhere between the ears inside the head. Additional spectral shaping is necessary to generate an externalized percept (outside the head). This shaping can be achieved with the externalize(), which applies a low-resolution HRTF filter (KEMAR by default). Using both ramp functions and externalization, it is easy to generate a convincing sound source movement with pulsed pink noise:

noise = slab.Binaural.pinknoise(samplerate=44100)
from_ild = slab.Binaural.azimuth_to_ild(-90)
from_itd = slab.Binaural.azimuth_to_itd(-90)
to_ild = slab.Binaural.azimuth_to_ild(90)
to_itd = slab.Binaural.azimuth_to_itd(90)
noise_moving = noise.ild_ramp(from_ild, to_ild)
noise_moving = noise_moving.itd_ramp(from_itd, to_itd)
noise_moving.externalize() # apply filter in place
noise_moving.play() # best through headphones

Signals

Sounds inherit from the Signal class, which provides a generic signal object with properties duration, number of samples, sample times, number of channels. The actual samples are kept as numpy array in the data property and can be accessed, if necessary as for instance signal.data. Signals support slicing, arithmetic operations, and conversion between sample points and time points directly, without having to access the data property. The methods resample(), envelope(), and delay() are also implemented in Signal and passed to the child classes Sound, Binaural, and Filter. You do not normally need to use the Signal class directly.

sig = slab.Sound.pinknoise(n_channels=3)
sig.duration
out: 1.0
sig.n_samples
out: 8000
sig.data.shape # accessing the sample array
out: (8000, 3) # which has shape (n_samples x n_channels)
sig2 = sig.resample(samplerate=4000) # resample to 4 kHz
env = sig2.envelope() # returns a new signal containing the lowpass Hilbert envelopes of both channels
sig.delay(duration=0.0006, channel=0) # delay the first channel by 0.6 ms

Footnotes