*One component of the Modulation Toolbox for Matlab.
1. Introduction
The graphical user interface (GUI) version of the modulation toolbox adds a function called
modspecgramgui to the Matlab environment. Simply type the function name in the Matlab console to load the GUI.
modspecgramgui starts a graphical user interface for the analysis, modification and synthesis of
signals with respect to their modulation spectrum.
When loaded with a signal, the main window of the GUI will appear as in Figure 1.
Figure 1. Modulation spectrogram using default settings.
Try it yourself by loading 'speech_short.wav' located in the \sounds folder. In general, there are two ways to load a signal into the modulation spectrogram interface, as described next.
Loading a Signal from File
To load a signal from disk, select File->Open... from the menu. A dialog box will appear in which
you can select the wave-file (*.wav) you want to open. If you load a stereo recording, the left
and right channels will be averaged together to form a single-channel.
Loading a Signal from Workspace
To load a signal from the Matlab workspace, go to the menu bar and select File->Import.. A dialog box will appear listing all workspace variables that could represent an audio signal (i.e. all vectors of doubles), as well as all workspace variables that could be a sampling frequency (all doubles). Select the signal/sampling frequency pair that you want to import, or select a signal and specify an alternate sampling frequency. Check the normalize checkbox if you want to subtract the mean and scale the signal so that its maximum amplitude is equal to 1.
A word of warning
The modulation spectrogram GUI was not designed with memory efficiency in mind. As a consequence,
loading large signals into the interface without a large amount of memory installed in your computer
will probably slow your computer down quite considerably. We recommend limiting the signals you load
to 100,000 samples.
Viewing Modulation Spectra
The central figure in the graphical interface is the joint-frequency representation, or modulation spectrum, of one data frame from the input signal. It shows the modulation frequency content (horizontal axis) versus the carrier frequency bin (vertical axis). For more explanation, refer to the second tutorial, tutorial2_modspecgram.m included with the toolbox installation. To see the modulation spectrum for a different segment in time, simply click on the signal spectrogram or on the time-domain plot of the signal itself. The dashed lines indicate the extent of the input data currently being analyzed. We are now ready to modify the modulation spectrum parameters.2. Adjusting Demodulation Parameters
The Modulation Toolbox is designed to allow you to compare the performance of various demodulation methods. Broadly speaking, demodulation is either incoherent or coherent. The difference is that coherent demodulation detects carrier signals that are constrained by certain properties that allow more effective modulation filtering. To try out some different demodulation settings, click on Options->Demodulation Options... You will see a dialog box that looks like Figure 2.
Figure 2. Demodulation options dialog box.
Data Frame Settings
First, note the "Data frame settings" rectangle. The drop-down box allows you to set the
length of the window used to segment the time-domain input signal. A modulation spectrogram
is computed for each frame of the input signal. You can also adjust the overlap between successive
frames, either 50% or 75%.
Demodulation Methods
Next, look at the "Demodulation methods" rectangle. The radio button allows you to pick from three methods: Hilbert envelope (incoherent), spectral center-of-gravity (coherent), and pitch-synchronous (coherent).
- Hilbert envelope - This is the conventional method, which always returns a real-valued, non-negative envelope estimate. It is the easiest to compute and is straightforward to interpret, but the distortion inherent in rectifying the envelope reduces the effectiveness of modulation filtering (refer to tutorial4_modfiltering.m within the toolbox file listing for examples).
- Spectral center-of-gravity - This method uses a sliding window to estimate the carrier frequency at each point in time, for each subband separately. Unlike the Hilbert envelope approach, the spectral COG will result in a smooth carrier frequency as well as bandlimited envelopes and carriers. The envelopes can also be complex-valued with asymmetric modulation spectra (hence the negative values on the modulation frequency axis). Increasing the spectral COG window will increase the smoothness of the carrier frequency estimates, but may lack the time-resolution to accurately track fast carrier changes (such as vibrato in a singer's voice).
- Pitch-synchronous or harmonic - For signals that are harmonic in nature (such as voiced speech and some musical instruments), this method detects the time-varying fundamental frequency of the pitch and then treats each harmonic as one carrier. The associated modulators will also be complex-valued with asymmetric spectra. Pitch detection may require some parameter tuning and prior knowledge about your signal to get accurate results. For example, you can specify the minimum and maximum values of the fundamental frequency (in Hz) to constrain the algorithm. Also, the harmonic range restricts the pitch estimation algorithm to look at a lowpass region of the input signal, which is useful for ignoring indistinct higher order harmonics.
Figure 1 shows the default setting using spectral center-of-gravity on the first frame of 'speech_short.wav.' To see the Hilbert envelope spectrum, select "Hilbert envelope" and click 'Ok.' You should see the results shown in Figure 3.
Figure 3. Modulation spectra using Hilbert envelope (incoherent) demodulation.
Or, observe the pitch-synchronous modulator spectrum, as shown in Figure 4, where this time we have shifted attention to the sixth analysis frame.
Figure 4. Modulation spectra using pitch-synchronous (coherent) demodulation.
The next section discusses filterbank settings, which only apply to the spectral COG and the
Hilbert envelope demodulation methods. Before proceeding, open the Demodulation Options dialog box
again and select "Hilbert envelope (incoherent)." This is method is the quickest to compute and will
allow you to easily test the effects of the filterbank settings.
Filterbank Settings
Finally, the "Filterbank settings" rectangle contains parameters that control how many carriers are detected and how they are constrained in frequency (but they only apply to Hilbert envelope and spectral center-of-gravity demodulation methods). The plot next to it shows representative subband frequency responses from the filterbank.
- Number of subbands - is the number of frequency channels spread evenly between 0 and the sampling rate. The GUI displays everything between 0 and Nyquist, so that the number of displayed modulator spectra is (number of subbands) / 2 + 1. Increasing this parameter has the effect of increasing the resolution seen along the vertical frequency axis in the modulation spectrum.
- Downsampling rate - applies to each subband signal prior to demodulation. Downsampling is permitted because modulators tend to be low-frequency signals. The downsampling rate is proportional to the number of subbands, so you can choose between 1/8, 1/4, or 1/2 of the number of subbands. The filterbank frequency response to the right shows the frequency limits after downsampling. Note that for the default filterbank settings, the downsampling rate of 1/4 brings the Nyquist rate down to the edges of the subband mainlobe. Let's try making a change: select 1/8 and the click "Ok." You should now notice that the modulation spectrogram content looks narrower. Go to the Options->Demodulation Options.. menu again, and the filterbank frequency response should be updated as shown in Figure 5. With less downsampling, the Nyquist rate of the subbands is now twice the width of the subband mainlobe.
- Subband frequency overlap - there is a choice between "standard," which has 75% overlap between subbands, and "minimal," which reduces overlap by sharpening the subband frequency responses. By reducing overlap, you eliminate redundancy between neighboring modulators. Select "minimal" and click "Ok." After the modulation spectrogram has re-computed, go to Options->Demodulation Options.. again and observe the filterbank frequency response. It should look like Figure 6.
Figure 5. Updated demodulation options dialog using a smaller downsampling rate.
Figure 6. Updated demodulation options dialog showing reduced subband spectral overlap.
Choosing the Modulation Transform
Referring to the "modulation spectrum" often involves taking the Fourier transform of the modulators of a signal. As in the previous version of the modulation spectrogram GUI, other transforms are offered as well. When you go to the Transform menu, you will see three options:
- Fourier - takes the discrete Fourier transform (DFT) of each modulator (default).
- Wavelet (Daubechies 4) - takes the discrete wavelet transform (DWT) of each modulator, resulting in a time-scale decomposition. The DWT coefficients for each modulator are arrayed from left to right, starting with large-scale coefficients. Vertical lines indicate the boundaries between each dyadic scale in the decomposition. "Daubechies 4" refers to the mother wavelet used in the decomposition, which in this case is the 4-point minimum-phase Daubechies kernel.
- Wavelet (Least Asymmetric 8) - takes the DWT using the 8-point least-asymmetric Daubechies kernel, where least asymmetric means that the kernel is close to a linear-phase filter.
After selecting the Daubechies 4 wavelet transform, the modulation spectrogram will appear as in Figure 7.
Figure 7. Modulation spectrum using the Daubechies-4 wavelet transform.
Modifying Modulation Spectra
Looking at the main interface, you can easily design masks in frequency to attenuate, amplify, or zero-out select parts of the modulation spectrogram. Simply click and drag within the joint-frequency axes to make a selection, and then choose an option from the menu on the right. In designing a masking function, click "Symmetrize" at any time to reflect left-right symmetry in the mask. Having completed your mask design, click "Apply." To listen to the resulting signal, click "Play Masked." For instructional purposes, the following screenshot shows an arbitrary masking function, with three zeroed-out portions and a fourth selection that has not yet been modified, as applied to the coherent modulation spectra found via spectral COG.
Figure 8. Modulation spectra with filtering mask applied.
Or, we can construct a simple, 10 Hz lowpass modulation filter by applying the mask seen in Figure 9.
Figure 9. A 10-Hz lowpass filter applied to the modulation spectra.
For example, we can recreate the results from application1_musicSeparation.m, in which a lowpass modulation filter of 12 Hz isolates the saxophone in a jazz recording using pitch-synchronous demodulation. To verify, load the signal "saturn1.wav" from the \sounds folder. Set the demodulation settings to "pitch synchronous" with min F0 = 179 Hz, max F0 = 550 Hz, and harmonic range = 6050 Hz (these are the settings used in application1_musicSeparation.m). Also, set the data frame size to 1 second, so that the resulting modulation spectrum appears as in Figure 10.
Figure 10. Pitch-synchronous modulation spectrogram of the saxophone/drums mix.
Then, applying a +/- 12 Hz lowpass filter in modulation frequency yields the modified spectrum seen in Figure 11. Click the "Play Masked" button to hear the result, which should not contain any of the drumming from the original signal.
Figure 11. The same modulation spectrum as in Figure 10, with a lowpass mask applied to
zero out modulation frequencies beyond +/-12 Hz.