PhD thesis: “Auditory modelling for assessing room acoustics”

Author: Jasper van Dorp Schuitman

Year: 2011

Publisher: Delft University of Technology

Download a PDF



The acoustics of a concert hall, or any other room, are generally assessed by measuring room impulse responses for one or multiple source and receiver location(s). From these responses, objective parameters can be determined that should be related to various perceptual attributes of room acoustics. A set of these parameters is collected in ISO standard 3382. However, this method of assessing room acoustical quality has some major shortcomings.

First of all, it is known that the perception of the acoustics of a room is dependent on the type of source signal. This is not taken into account when only impulse responses are considered. Furthermore, because of the type of test signals used to perform such a measurement, measurements are mostly carried out in empty rooms, while the acoustics can change drastically when a room is fully occupied with people. Finally, there is evidence in the literature of cases in which the parameters sometimes do not correlate well with perception. For example, it has been found that some parameters can fluctuate severely over small measurement intervals, whereas the perceptual attributes for which these parameters should be predictors remain constant. Apparently, some important properties of the human auditory system are not taken into account sufficiently.

In this thesis, a new method is proposed. The method consists of the processing of arbitrary binaural audio recordings using a binaural, non-linear auditory model. These recordings (or simulations) should be performed with a dummy head. This model simulates the most important stages of the auditory system, such as the response of the inner ear, basilar membrane and hair cells, neural adaptation and binaural interaction. Using a peak detection algorithm, the output signals of the model are split into two separate streams: one related to the source (direct sound) and one related to the environment (reverberant sound). Together with the calculation of the amount of fluctuation in the Interaural Time Difference (ITD) over time, parameters can be determined that are related to the perceptual attributes reverberance, clarity, apparent source width and listener envelopment.

The new method has been validated through four listening tests. In these tests, subjects had to rate the four perceptual attributes that were discussed above, for various room/stimulus combinations. Two different source stimuli were used: male speech and cello music. Two listening tests included virtual rooms, which were simulated binaurally using a simulator for shoebox-shaped rooms, which is also presented in this thesis. The two other tests included real rooms of which the impulse responses were measured binaurally. Statistical analyses were performed on the results to evaluate which factors have a significant effect on the results. In all the tests, significant differences were detected between the rooms. The source signal did also have a significant effect in some situations.

Using the results of one of the four tests, the free parameters in the model, like upper- and lower frequency limits, were optimized using a genetic algorithm. Next, the method was validated by calculating the correlation coefficients between the new parameters and the average ratings. The results are very promising. In most cases, the new parameters correlate better with the perceptual data than the conventional parameters. Furthermore, compared with the conventional parameters, there were far less situations in which the new parameters showed insignificant correlation.

Besides the good results in terms of correlation with perception, the method has other advantages. Since the method accepts arbitrary binaural audio recordings, a measurement can be performed in an occupied room, during a performance, for example. This way, the effect of the presence of an audience on the acoustics is automatically taken into account. Furthermore, the parameters will be content-specific, which means that the ways in which the spectral and temporal properties of the source signal influence the perception of the acoustics of a room will be reflected in the resulting parameters.

Various practical aspects of the new method are also discussed, like robustness to noise and the influence of the type of source signal. Three different signal categories were tested: voice stimuli, instrument stimuli and ensemble stimuli. It was found that even within one signal category, differences between the parameters were found for the different source signals. These differences could all be explained by the temporal and spectral properties of the signals and how these properties have an effect on the perception of the acoustics of a room. Finally, the effects of the dummy head position and orientation on the parameters was evaluated. The results do not show severe fluctuations as a function of offset or angle, although this can not be judged quantitatively as long as the just noticeable differences (JNDs) of the new parameters are unknown.

Other publications:

For a list of my other publications, please visit my Google Scholar profile.