.. _Sounds: Sound ===== Generating sounds ----------------- The :class:`Sound` class provides methods for generating, manipulating, displaying, and analysing sound stimuli. You can generate typical experimental stimuli with this class, including tones, noises, and click trains, and also more specialized stimuli, like equally-masking noises, Schroeder-phase harmonics, iterated ripple noise and synthetic vowels. Slab methods assume sensible defaults where possible. You can call most methods without arguments to get an impression of what they do (f.i. :meth:`slab.Sound.tone()` returns a 1s-long 1kHz tone at 70 dB sampled at 8 kHz) and then customise from there. For instance, let's make a 500 ms long 500 Hz pure tone signal with a band-limited (one octave below and above the tone) pink noise background with a 10 dB signal-to-noise ratio: :: tone = slab.Sound.tone(frequency=500, duration=0.5) tone.level = 80 # setting the intensity to 80 dB noise = slab.Sound.pinknoise(duration=0.5) noise.filter(frequency=(250, 1000), kind='bp') # bandpass .25 to 1 kHz noise.level = 70 # 10 dB lower than the tone stimulus = tone + noise # combine the two signals stimulus = stimulus.ramp() # apply on- and offset ramps to avoid clicks stimulus.play() :class:`Sound` objects have many useful methods for manipulating (like :meth:`.ramp`, :meth:`.filter`, and :meth:`.pulse`) or inspecting them (like :meth:`.waveform`, :meth:`.spectrum`, and :meth:`.spectral_feature`). A complete list is in the :ref:`Reference` section, and the majority is also discussed here. If you use IPython, you can tap the `tab` key after typing ``slab.Sound.``, or the name of any Sound object followed by a full stop, to get an interactive list of the possibilities. Sounds can also be created by recording them with :meth:`slab.Sound.record`. For instance ``recording = slab.Sound.record(duration=1.0, samplerate=44100)`` will record a 1-second sound at 44100 Hz from the default audio input (usually the microphone). The ``record`` method uses `SoundCard `_ if installed, or `SoX `_ (via a temporary file) otherwise. Both are cross-platform and easy to install. If neither tool is installed, you won't be able to record sounds. Specifying durations -------------------- Sometimes it is useful to specify the duration of a stimulus in samples rather than seconds. All methods that generate sounds have a :attr:`duration` argument that accepts floating point numbers or integers. Floating point numbers are interpreted as durations in seconds (``slab.Sound.tone(duration=1.0)`` results in a 1 second tone). Integers are interpreted as number of samples (``slab.Sound.tone(duration=1000)`` gives you 1000 samples of a tone). Setting the sample rate ----------------------- We did not specify a sample rate for any of the stimuli in the examples above. When the :attr:`samplerate` argument of a sound-generating method is not specified, the default sample rate (8 kHz if not set otherwise) is used. It is possible to set a sample rate separately for each Sound object, but it is usually better to set a suitable default sample rate at the start of your script or Python session using :func:`slab.set_default_samplerate`. This rate is kept in the class variable :data:`_default_samplerate` and is used whenever you call a sound generating method without specifying a rate. This rate depends on the frequency content of your stimuli and should be at least double the highest frequency of interest. For some speech sounds or narrow bad noises you might get away with 8 kHz; for spatial sounds you may need 48 kHz or more. Specifying levels -------------------------- Same as for the sample rate, sounds are generated at a default level (70 dB if not set otherwise). The default is kept in the class variable :data:`_default_level` and you can set set it to a different value using :func:`slab.set_default_level`. Level are not specified directly when generating sounds, but rather afterwards by setting the :attr:`level` property:: sig = slab.Sound.pinknoise() sig.level # return the current level sig.level = 85 # set a new level Note that the returned level will *not* be the actual physical playback level, because that depends on the playback hardware (soundcard, amplifiers, headphones, speakers). Calibrate your system if you need to play stimuli at a known level (see :ref:`calibration`). .. _calibration: Calibrating the output ---------------------- Analogous to setting the default level at which sounds are generated with ``slab.set_default_level()``. Each sound's level can be set individually by changing its :attr:`level` property. Setting the :attr:`level` property of a stimulus changes the root-mean-square of the waveform and relative changes are correct (reducing the level attribute by 10 dB will reduce the sound output by the same amount), but the *absolute* intensity is only correct if you calibrate your output. The recommended procedure it to set your system volume to maximum, connect the listening hardware (headphone or loudspeaker) and set up a sound level meter. Then call :func:`slab.calibrate`. The :func:`.calibrate` function will play a 1 kHz tone for 5 seconds. Note the recorded intensity on the meter and enter it when requested. The function returns a calibration intensity, i.e. difference between the tone's level attribute and the recorded level. Pass this value to :func:`slab.set_calibration_intensity` to to correct the intensities returned by the :attr:`level` property all sounds. The calibration intensity is saved in the class variable :data:`_calibration_intensity`. It is applied to all level calculations so that a sound's level attribute now roughly corresponds to the actual output intensity in dB SPL---'roughly' because your output hardware may not have a flat frequency transfer function (some frequencies play louder than others). See :ref:`Filters` for methods to equalize transfer functions. Experiments sometimes require you to play different stimuli at comparable loudness. Loudness is the perception of sound intensity and it is difficult to calculate. You can use the :meth:`Sound.aweight` method of a sound to filter it so that frequencies are weighted according to the typical human hearing thresholds. This will increase the correspondence between the rms intensity measure returned by the :attr:`level` attribute and the perceived loudness. However, in most cases, controlling relative intensities is sufficient. To increase the accuracy of the calibration for your experimental stimuli, pass a sound with a similar spectrum to :func:`slab.calibrate`. For instance, if your stimuli are wide band pink noises, then you may want to use a pink noise for calibration. The `level` of the noise should be high, but not cause clipping. If you do not have a sound level meter, then you can present sounds in dB HL (hearing level). For that, measure the hearing threshold of the listener at the frequency or frequencies that are presented in your experiment and play your stimuli at a set level above that threshold. You can measure the hearing threshold at one frequency (or for any broadband sound) with the few lines of code (see :ref:`audiogram`). Saving and loading sounds ------------------------- You can save sounds to wav files by calling the object's :meth:`.Sound.write` method (``signal.write('signal.wav')``). By default, sounds are normalized to have a maximal amplitude of 1 to avoid clipping when writing the file. You should set :attr:`signal.level` to the intended level when loading a sound from file or disable normalization if you know what you are doing. You can load a wav file by initializing a Sound object with the filename: ``signal = slab.Sound('signal.wav')``. Combining sounds ---------------- Several functions allow you to string stimuli together. For instance, in a forward masking experiment [#f1]_ we need a masking noise followed by a target sound after a brief silent interval. An example implementation of a complete experiment is discussed in the :ref:`Psychoacoustics` section, but here, we will construct the stimulus: :: masker = slab.Sound.tone(frequency=550, duration=0.5) # a 0.5s 550 Hz tone masker.level = 80 # at 80 dB masker.ramp() # default 10 ms raised cosine ramps silence = slab.Sound.silence(duration=0.01) # 10 ms silence signal = slab.Sound.tone(duration=0.05) # using the default 500 Hz signal.level = 80 # let's start at the same intensity as the masker signal.ramp(duration=0.005) # short signal, we'll use 5 ms ramps stimulus = slab.Sound.sequence(masker, silence, signal) stimulus.play() We can make a classic non-interactive demonstration of forward masking by playing these stimuli with decreasing signal level in a loop, once without the masker, and once with the masker. Count for how many steps you can hear the signal tone: :: import time # we need the sleep function for level in range(80, 10, -5): # down from 80 in steps of 5 dB signal.level = level signal.play() time.sleep(0.5) # now with the masker for level in range(80, 10, -5): # down from 80 in steps of 5 dB signal.level = level stimulus = slab.Sound.sequence(masker, silence, signal) stimulus.play() time.sleep(0.5) Many listeners can hear all of the steps without the masker, but only the first 6 or 7 steps with the masker. This depends on the intensity at which you play the demo (see :ref:`Calibrating the output` below). The :meth:`.sequence` method is an example of list unpacking---you can provide any number of sounds to be concatenated. If you have a list of sounds, call the method like so: ``slab.Sound.sequence(*[list_of_sound_objects])`` to unpack the list into function arguments. Another method to put sounds together is :meth:`.crossfade`, which applies a crossfading between two sounds with a specified :attr:`overlap` in seconds. An interesting experimental use is in adaptation designs, in which one longer stimulus is played to adapt neuronal responses to its sound features, and then a new stimulus feature is introduced (but nothing else changes). Responses (measured for instance with EEG) at that point will be mostly due to that feature. A classical example is the pitch onset response, which is evoked when the temporal fine structure of a continuous noise is regularized to produce a pitch percept without altering the sound spectrum (see `Krumbholz et al. (2003) `_). It is easy to generate the main stimulus of that study, a noise transitioning to an iterates ripple noise after two seconds, with 5 ms crossfade overlap, then filtered between 0.8 and 3.2 kHz: :: slab.set_default_samplerate(16000) # we need a higher sample rate slab.set_default_level(80) # set the level for all sounds to 80 dB adapter = slab.Sound.whitenoise(duration=2.0) irn = slab.Sound.irn(frequency=125, n_iter=2, duration=1.0) # pitched sound stimulus = slab.Sound.crossfade(adapter, irn, overlap=0.005) # crossfade stimulus.filter(frequency=[800, 3200], kind='bp') # filter stimulus.ramp(duration=0.005) # 5 ms on- and offset ramps stimulus.spectrogram() # note that there is no change at the transition stimulus.play() # but you can hear the onset of the regularity (pitch) Playing a sound in the background --------------------------------- In some experiments you may want to play a continuous background signal (a noise or a multitalker babble for instance) and present stimuli in that background noise. The :meth:`play_background` starts a non-blocking SoundDevice.OutputStream in the background that is not interrupted by playing other stimuli. The background sound can also be played in a loop. This stream is temporarily attached to the Sound object as :attr:`stream` attribute together with a :attr:`current_frame` attribute that holds a frame counter that is updated during play. Don't access these variables unless you know what you are doing. The stream has to be terminated by calling the :meth:`stop_background` method, even when the background sound has already finished playing. This closes the stream object and removed the temporary :attr:`stream` and :attr:`current_frame` attributes. :: sig = slab.Sound.vowel(vowel='a', duration=5., samplerate=44100) # a long background sound sig.play_background(looping=True) # start playing an endless background /a/ sig2 = slab.Sound.vowel(vowel='i', duration=.5, samplerate=44100) # a short foreground sound for _ in range(5): time.sleep(.5) sig2.play() # each second, play a short /i/ sig.stop_background() # necessary to close the background stream Plotting and analysis --------------------- You can inspect sounds by plotting the :meth:`.waveform`, :meth:`.spectrum`, or :meth:`.spectrogram`: .. plot:: :include-source: from matplotlib import pyplot as plt a = slab.Sound.vowel(vowel='a') e = slab.Sound.vowel(vowel='e') i = slab.Sound.vowel(vowel='i') signal = slab.Sound.sequence(a,e,i) import matplotlib.pyplot as plt # preparing a 2-by-2 figure _, [[ax1, ax2], [ax3, ax4]] = plt.subplots( nrows=2, ncols=2, constrained_layout=True) signal.waveform(axis=ax1, show=False) signal.waveform(end=0.05, axis=ax2, show=False) # first 50ms signal.spectrogram(upper_frequency=5000, axis=ax3, show=False) signal.spectrum(axis=ax4) Instead of plotting, :meth:`.spectrum` and :meth:`.spectrogram` will return the time frequency bins and spectral power values for further analysis if you set the :attr:`show` argument to False. All plotting functions can draw into an existing matplotlib.pyplot axis supplied with the :attr:`axis` argument. .. _spectral_features: You can also extract common features from sounds, such as the :meth:`.crest_factor` (a measure of how 'peaky' the waveform is), the average :meth:`.onset_slope` (a measure of how fast the on-ramps in the sound are---important for sound localization), or the :meth:`.spectral_coverage` (the fraction of the spectrogram containing energy as a measure of the masking ability of a sound). Features of the spectral content are bundled in the :meth:`.spectral_feature` method. It can compute spectral centroid, flux, flatness, and rolloff, either for an entire sound (suitable for stationary sounds), or for successive time windows (frames, suitable for time-varying sounds). * The centroid is a measure of the center of mass of a spectrum (i.e. the 'center' frequency). * The flux measures how quickly the power spectrum is changing by comparing the power spectrum for one frame against the power spectrum from the previous frame; flatness measures how tone-like a sound is, as opposed to being noise-like, and is calculated by dividing the geometric mean of the power spectrum by the arithmetic mean (see `Dubnov (2004) `_). * The rolloff measures the frequency at which the spectrum rolls off, typically used to find a suitable low-cutoff frequency that retains most of the sound power. These particular features are integrated in slab because we find them useful in our daily work. Many more features are available in packages specialised on audio processing, for instance `librosa `_. librosa interfaces easily with slab, you can just hand the sample data and the sample rate of an slab object separately to most of its methods:: import librosa sig = slab.Sound('music.wav') # load wav file into slab.Sound object librosa.beat.beat_track(y=sig.data, sr=sig.samplerate) When working with environmental sounds or other recorded stimuli, one often needs to compute relevant features for collections of recordings in different experimental conditions. The slab module contains a function :func:`slab.apply_to_path`, which applies a function to all sound files in a given folder and returns a dictionary of file names and computed features. In fact, you can also use that function to modify (for instance ramp and filter) all files in a folder. For other time-frequency processing, the :meth:`.frames` provides an easy way to step through the signal in short windowed frames and compute some values from it. For instance, you could detect on- and offsets in the signal by computing the crest factor in each frame: :: from matplotlib import pyplot as plt signal.pulse() # apply a 4 Hz pulse to the 3 vowels from above signal.waveform() # note the pulses crest = [] # the short-term crest factor will show on- and offsets frames = signal.frames(duration=64) for f in frames: crest.append(f.crest_factor()) times = signal.frametimes(duration=64) # frame center times import matplotlib.pyplot as plt plt.plot(times, crest) # peaks in the crest factor mark intensity ramps Binaural sounds --------------- For experiments in spatial hearing, or any other situation that requires differential manipulation of the left and right channel of a sound, you can use the :class:`Binaural` class. It inherits all methods from :class:`Sound` and provides additional methods for generating and manipulating binaural sounds, including advanced interaural time and intensity manipulation. Generating binaural sounds ^^^^^^^^^^^^^^^^^^^^^^^^^^ Binaural sounds support all sound generating functions with a :attr:`n_hannels` attribute of the :class:`Sound` class, but automatically set :attr:`n_channels` to 2. Noises support an additional :attr:`kind` argument, which can be set to 'diotic' (identical noise in both channels) or 'dichotic' (uncorrelated noise). Other methods just return 2-channel versions of the stimuli. You can recast any Sound object as Binaural sound, which duplicates the first channel if :attr:`n_channels` is 1 or greater than 2: :: monaural = slab.Sound.tone() monaural.n_channels out: 1 binaural = slab.Binaural(monaural) binaural.n_channels out: 2 binaural.left # access to the left channel binaural.right # access to the right channel Loading a wav file with ``slab.Binaural('file.wav')`` returns a Binaural sound object with two channels (even if the wav file contains only one channel). Manipulating ITD and ILD ^^^^^^^^^^^^^^^^^^^^^^^^ The easiest manipulation of a binaural parameter may be to change the interaural level difference (ILD). This can be achieved by setting the :attr:`level` attributes of both channels: :: noise = slab.Binaural.pinknoise() noise.left.level = 75 noise.right.level = 85 noise.level out: array([75., 85.]) The :meth:`.ild` makes this easier and keeps the overall level constant: ``noise.ild(10)`` amplifies the right channel by 5 dB and attenuates the left channel by the same amount to achieve a 10dB level difference. Positive dB values move the virtual sound source to the right and negative values move the source to the left. The pink noise in the example is a broadband signal, and the ILD is frequency dependent and should not be the same for all frequencies. A frequency-dependent level difference can be computed and applied with :meth:`.interaural_level_spectrum`. The level spectrum is computed from a head-related transfer function (HRTF) and can be customised for individual listeners. See :ref:`hrtfs` for how to handle these functions. The default level spectrum is computed form the HRTF of the KEMAR binaural recording mannequin (as measured by `Gardener and Martin (1994) `_ at the MIT Media Lab). The level spectrum takes a while to compute and it may be useful to save it. It is a Python dict containing the level differences in a numpy array along with a frequency vector, an azimuth vector, and the sample rate. You can save it for instance with pickle: :: import pickle ils = slab.Binaural.make_interaural_level_spectrum() pickle.dump(ils, open('ils.pickle', 'wb')) # save using pickle ils = pickle.load(open('ils.pickle', 'rb')) # load pickle If the limitations of pickle worry you, you can use numpy.save with a small caveat when loading: numpy.save wraps the dict in an object and we need to remove that after loading with the somewhat strange index `[()]`: :: import numpy numpy.save('ils.npy', ils) # save using numpy ils = numpy.load('ils.npy, allow_pickle=True)[()] # load and get the original dict from the wrapping object If you are unsure which ILD value is appropriate, :meth:`.azimuth_to_ild` can compute ILDs corresponding to an azimuth angle, for instance 45 degrees, and a frequency: :: slab.Binaural.azimuth_to_ild(45) # -9.12 # correct ILD in dB noise.ild(-9.12) # apply the ILD A dynamic ILD, which evokes the perception of a moving sound source, can be applied with :meth:`.ild_ramp`. The ramp is linear from and to a given ILD. Similar functions exist to manipulate interaural time differences (ITD): :meth:`.itd`, :meth:`.azimuth_to_ild` (using a given head radius), and :meth:`.itd_ramp`. To present a signal from a given azimuth using both cues, use the :meth:`.at_azimuth`, which calculates the correct ILD and ITD for you and applies it. ITD and ILD manipulation leads to the percept of *lateralization*, that is, a source somewhere between the ears inside the head. Additional spectral shaping is necessary to generate an externalized percept (outside the head). This shaping can be achieved with the :meth:`.externalize`, which applies a low-resolution HRTF filter (KEMAR by default). Using both ramp functions and externalization, it is easy to generate a convincing sound source movement with pulsed pink noise: :: noise = slab.Binaural.pinknoise(samplerate=44100) from_ild = slab.Binaural.azimuth_to_ild(-90) from_itd = slab.Binaural.azimuth_to_itd(-90) to_ild = slab.Binaural.azimuth_to_ild(90) to_itd = slab.Binaural.azimuth_to_itd(90) noise_moving = noise.ild_ramp(from_ild, to_ild) noise_moving = noise_moving.itd_ramp(from_itd, to_itd) noise_moving.externalize() # apply filter in place noise_moving.play() # best through headphones Signals ------- Sounds inherit from the :class:`Signal` class, which provides a generic signal object with properties duration, number of samples, sample times, number of channels. The actual samples are kept as numpy array in the :attr:`data` property and can be accessed, if necessary as for instance :attr:`signal.data`. Signals support slicing, arithmetic operations, and conversion between sample points and time points directly, without having to access the :attr:`data` property. The methods :meth:`.resample`, :meth:`.envelope`, and :meth:`.delay` are also implemented in Signal and passed to the child classes :class:`Sound`, :class:`Binaural`, and :class:`Filter`. You do not normally need to use the Signal class directly. :: sig = slab.Sound.pinknoise(n_channels=3) sig.duration out: 1.0 sig.n_samples out: 8000 sig.data.shape # accessing the sample array out: (8000, 3) # which has shape (n_samples x n_channels) sig2 = sig.resample(samplerate=4000) # resample to 4 kHz env = sig2.envelope() # returns a new signal containing the lowpass Hilbert envelopes of both channels sig.delay(duration=0.0006, channel=0) # delay the first channel by 0.6 ms .. rubric:: Footnotes .. [#f1] Forward masking occurs when a signal cannot be heard due to a preceding masking sound. Typically, three intervals are presented to the listener, two contain only the masker and one contains the masker followed by the signal. The listener has to identify the interval with the signal. The level of the masker is fixed and the signal level is varied adaptively to obtain the masked threshold.