What are the correlations between "measured" sound and "heard" sound? In the first part of this article-interview that you can find here, Mr. Parolo has already offered us some juicy previews. Author of the research Properties of Nonlinear Distortions and Related Measures in Audio Amplifiers, published in the Journal of the Audio Engineering Society, he introduced us to the differences between linear and nonlinear distortions. So let's continue immediately with the question that arises spontaneously.
Question: Compared to what you told us, what did you see in the nonlinear distortion that is so relevant in audio?
Giuseppe Parolo: From the above descriptions, it would seem that the distortions have an exclusively negative connotation. Therefore, the primary goal of every designer should be to minimize as much as possible all deviations from ideality. This, in fact, is what happens in most cases and is praised in technical reviews. But another point of view should also be considered, which I realized about five years ago when I started a collaboration with a couple of skilled amplifier repairers and designers, Tino Russo and Roberto Garlaschi, whose equipment bears the Celeste and Angstrom Audiolab brands, respectively - see here. With them I learned how to measure audio equipment, integrating my theoretical knowledge with field experience. Among the aspects I was able to touch on was the fact that often the devices most appreciated by audiophiles for sound quality are not necessarily those with the lowest nonlinear distortions. Also, amplifiers with small differences in classical measures can turn out to be very different when listening. I realized that I was missing something.
I then looked for answers in some forums, from which I was able to get a better but not entirely satisfactory picture. Given my technical training and youthful passion for this fascinating field, I tried to deepen the issue with a more scientific approach, collecting texts and articles on the subject, mainly from AES and IEEE. From reading them I realized that there is a lot of material regarding the audibility levels of distortions and their classifications in relation to tolerability, but there is very little about how these can be viewed with a positive connotation in audio reproduction. The reality is in fact that these represent an element that the designer must manage appropriately to give his device a distinctive sound character, favoring listening pleasure over instrumental correctness.
Indeed, the field is very complex: there are many variables at play, and among them the one that determines the rules of the game is psychoacoustics, which studies both the physiological aspects of our complex auditory system and the perceptual aspects. But even without trespassing into other fields, I have not found much material on the analysis of the physical effects of nonlinear distortions on complex signals similar to musical signals and how these may be related to particular effects on perception.
So, driven by curiosity, I dusted off my university texts in the field of signal processing and systems modelling to try my hand at studying the typical distortions of an audio amplifier. The first step was to reproduce its dynamic behaviour through nonlinear “black-box” mathematical models – see here and here – which are also widely used in other fields. These models have the considerable advantage of not requiring knowledge of the internal structure of a device: their “identification” is based only on measurements of the device's output for particular input signals.
In more detail, the specific family of models I used is of the “block-oriented” type, of which Fig. 7 shows a simplified schema. In summary, the model involves the parallel interconnection of blocks specialized in applying linear (LS macroblock) and nonlinear (NLS macroblock) distortions to the signal. The former modifies the tonal components already present in the input signal x(t); the latter adds only the new tones, harmonic or non-harmonic (the “carpet” of Fig. 6). To complete the output signal y(t), a noise component N(t), uncorrelated from the input, is also added.

Fig. 7 - Outline of the nonlinear block-oriented model used.
In the next step, I translated the models into simulators, that is, into programs that can apply to any type of signal the distortions desired or resulting from measurements of real equipment. The input and output signals can then be statistically analysed and listened to on an audio system. In reality, the listening experience is not exactly equivalent to such distortion of an amplifier, but it can still provide useful insights.
The results of this work were mainly on two fronts:
- a deeper insight into the structure of nonlinear distortions;
- evidence of the limitations of classical measures measures or, more correctly, they should be called metrics, since they are obtained by combining the values of more measures.
In my own small way, I also performed some experiments on the effects on perception, but both the procedures and the statistical sample size are to be refined, so the results should be considered provisional. In any case, I consolidated the theoretical part and submitted it as a research paper to AES. Their experts judged it to be original and of interest in the field of engineering, hence the publication in the related Journal of the Audio Engineering Society.

Question: Designers typically have these three types of measurements to indicate nonlinear distortions: the total harmonic distortion (THD) measurement, the total harmonic distortion + noise measurement (THD + N), and the intermodulation distortion measurement (IMD). I'll ask you a few questions in bursts... Have they ever been considered all together as in your work? Which is the most important in light of your research?
Parolo: Yes, they represent the classic standardized (metrics) measures that all manufacturers generally make on their amplifiers and are listed in the technical specifications to demonstrate their qualities. All measures quantify distortion as the ratio of the energy contained in the distortion tones to that contained in the fundamentals of the steady-state output signal. What changes between measurements are the types of test signals and the set of distortion tones considered. In detail we have:
- THD - Total Harmonic Distortion - The test signal consists of a single tone, usually at 1 kHz, with which harmonic distortions can be detected. In Fig. 4 – see also the first part here – it is given by the ratio of the level of the tones at 2 kHz and 3 kHz to the fundamental at 1 kHz.
- THD+N - This is analogous to THD and also considers noise among the components of distortion.
- IMD - Intermodulation Distortion - The test signal consists – typically – of two tones; the distortions considered are limited to some intermodulation products. There are several types, distinguished by the frequencies of the fundamentals and their relative levels: CCIF, as in the example in Fig. 5, SMPTE and DFD.
- TD+N - Total Distortion + Noise - The test signal in this case is multitonal, as in Fig. 6, and all distortion and noise are considered in the measure. In some contexts, it is also referred to as MD - Multitonal Distortion.
Note that the phases of the distortions do not affect the value of the measure. These measures are also applicable to the other elements in an audio chain, such as DACs and speakers.
The measures commonly considered most important are those that include intermodulation products for multitonal signals, thus IMD or TD+N. In any case, these measures differ in one substantial respect from linear measures. While the latter tell us virtually everything about what to expect from the device for this aspect, nonlinear measurements quantify the magnitude of distortion only on specific stationary signals. Real signals are much more complex and have different statistics than test signals: the data collected are theoretically insufficient to extrapolate the value of distortion measurements for any signal.
An analogy may offer a better idea of their representativeness: it is as if we try to assess the characteristics of a mountain trail by looking only at the distance and elevation difference between the starting and finishing points. If the terrain is fairly uniform, the prediction of what to expect may be close to reality. If, on the other hand, it has irregularities, expectations may be completely off. In fact, more detailed measurements could be made, for example, by reporting measurement values as a function of frequency or signal level. This is also what some of the most engaged magazines in the field do when reviewing a device to get a more complete view. In the mountain trail analogy, this detail corresponds to additional knowledge of some intermediate step: important aspects can always be missed.
For a more complete view, one solution would be to obtain a mathematical model of the device, i.e., a representation similar to the scheme in Fig. 7. But even if a realistic model is identified, which is quite complex, there is then a need to study what is the correlation of the parameters of the model obtained - that are generally not very intuitive - with the sound produced. This is a way that requires fairly well-established expertise in the field of systems. Consequently, we currently rely on classical or even better on single distortion measures such as those in Figs. 4 and 5, which are relatively simple to obtain, understand and control, hoping to find “regular territory” and trying to deduce as much information from these as possible.
It should be noted that, in addition to these measures, there are others that have their importance: we are talking about signal-to-noise ratio, crosstalk, interfacing parameters, etc. If these are not good, there will still be other undesirable impacts on the signal.
Question: As a long-time audiophile, I can confirm that the instrumental measures in most cases are rather far from the listening feelings, which at least are of little use to get an idea of listening, if not for certain characteristics of "driveability" and "interfaceability" between amplifiers and speakers. As valid as they are for electrical couplings, which are not discussed, they are so far from timbral or musical ones. What do you think?
Parolo: The described measures, also used in other fields, were designed more than fifty years ago to measure certain physical quantities with the goal of quantifying certain deviations of a device from ideality. Although very small deviations are indicative of higher fidelity of sound, they were not designed to obtain feedback on the quality of sound we perceive in the face of different types of deviations. First, it is necessary to define what we mean by “sound quality”. This multidimensional and complex concept can be defined as the degree of similarity perceived between the reproduced and original sounds in terms of timbre, space, dynamics and time, as assessed subjectively by one or more listeners. Perception involves the study of how the signal is processed by both the ear and the brain – Ref. [1].
Ref. [1] Brian C. J. Moore, An Introduction to the Psychology of Hearing, 6th ed. 2013, Brill, available for full download here
To illustrate the complexity of this subject with reference to the physiology of the ear, the following mechanisms, which have been known since the beginning of the last century, should be considered:
- Mechanical-electrical transduction - The hair cells within the cochlea, the innermost part of the ear, convert the movement of the basilar membrane into electrical signals. These signals are encoded as trains of electrical discharges in the auditory nerve. Sound intensity is converted into discharge rate (rate coding), while frequency can be coded in either space or time.
- Frequency filtering - The basilar membrane of the cochlea responds selectively to different frequencies: high frequencies stimulate the base of the cochlea, while low frequencies stimulate the apex. This “mapping” acts as a bank of bandpass filters, determining the spectral decomposition of sound. The bandwidth of each filter is called the “critical band”.
- Masking - Sounds that are close in frequency, especially if they fall within the same critical band, can interfere with each other: a loud tone may render a weaker tone inaudible, even though the second tone can be heard on its own. This effect is a consequence of the overlap between the cochlea’s mechanical responses. Masking is more prevalent at higher frequencies than at lower ones, and increases in extent and intensity as the masking tone increases. It exists in both simultaneous (overlapping sounds) and temporal (forward/backward masking) forms.
- Nonlinear compression - Cochlear amplification, operated by the ciliated cells, makes the ear's response nonlinear: at increasing levels of sound, the perceived increase in intensity is less. This improves sensitivity to mild signals while maintaining tolerance to loud ones.
- Temporal encoding - Nerve fibers synchronize their discharges to the phase of an acoustic signal for frequencies below about 4-5 kHz. This “phase locking” enables precise temporal encoding of frequency, which is crucial for the perception of sound pitch and for spatial localization based on phase differences between the two ears.
The processes that occur at the brain level have also been extensively studied. However, the possibilities for defining patterns here are much more complex, so we have:
- Spatial localization - In the brainstem, signals from both ears are compared to estimate the spatial location of sound. Time differences (ITDs) and intensity differences (ILDs) between the ears are used to locate sound sources in the horizontal plane.
- Sound source segregation - The brain can separate multiple overlapping sound sources and identify distinct auditory streams. It uses clues such as timbre, direction, temporal coherence and spectral structure. This process, known as “auditory scene analysis”, is crucial in complex environments such as a crowded rooms.
- Pattern recognition - Central hearing compares incoming signals with known sound patterns, such as words, melodies and familiar noises. This recognition can occur even in the presence of distortion or noise, thanks to auditory memory and context. This process is influenced by experience.
- Compensation for distortion - Even when a sound is degraded (e.g., speech in a band-restricted telephone) the brain is able to “reconstruct” its meaning. It uses internal models and top-down inferences to fill in the gaps, exploiting the redundancy of language or music. This ability is fundamental to communication.
- Selective attention - The listener can focus attention on one sound source while ignoring others, for example by following a conversation in a noisy environment.
All these aspects contribute to the identification of classical perceptual features, such as the loudness, timbre, localization and pitch of a sound, as well as the perception of “melody”. Clearly, classical measurements do not delve into these aspects, hence the modest correlation with perceived sound quality. Some studies - for example in Ref. [2], confirmed in my own small way with some tests - have even shown that the correlation is negative, that is, the preference of the presence of some forms of distortion to their complete absence.
Rif. [2] E. Geddes, L. Lee, Premium Home Theater - Design & Construction, 2003, available for full download here
Another way pursued is to design new, more reliable indicators of sound quality based on perceptual models that take into account the aspects described. As far as I know, to date there is no agreement on a sufficiently general model that succeeds in providing reliable indicators of sound quality for music content. One of the most important attempts is the PEAQ or Perceptual Evaluation of Audio Quality, standard ITU-R BS.1116-1 and its evolutions, but its use is not widespread, due to some limitations. As a result, as reported in Ref. [2], we still have to live with the condition that “there are some aspects of sound quality that we perceive but cannot be measured; conversely, other aspects can be measured but are not perceived.”
Ref. [3] F. Rumsey, T. McCormick, Sound and Recording - Application and Theory, 8th ed., Routledge, 2021
Unfortunately, this situation encourages the proliferation on the market of products that promise positive effects on sound, but in reality, often benefit only those who sell them. Perhaps there is not yet a measure that can detect physical changes that have an impact on perception. But in order to have objective indications of any effects on perception, it is necessary to proceed with statistical analysis, conduct listening tests on significant groups of people, in controlled and repeatable situations, establish indicators, etc. Therefore, it is not so simple and straightforward to establish the effectiveness of solutions or practices in this field.
Question: One of the stereotypical expressions of the Hi-Fi reviewer is to call a device or speaker “musical”. This effectively sums up the impression of hearing no electronic “annoyances”, of the absence of artifacts, of finally perceiving as if a natural, unamplified instrument is playing in our listening environment. What can be or are according to your research the characteristics at the measures that lead to this feeling?
Parolo: In general, the feeling of fatigue is related to the “distance” of the reproduced sound from the “natural” sound, which causes greater stress on our brains in the task of interpreting what our ears pick up. In relation to an audio amplifier, this feeling turns into annoyance when the coloration or disharmonic components become excessive. In my experience, the causes of these two effects are mainly due to:
- Linear distortion, i.e., “coloration” - Poor driving ability on the part of the power amp of the speakers. The latter in fact present to the power amp a load that varies with frequency, with inductive and capacitive characteristics. If the power amplifier cannot supply the necessary current, the effect is a deviation from the linearity of the frequency response of Fig. 2.
- Nonlinear distortion: is caused by both the preamplifier and the power amplifier. Measurement of the entire chain will show a proliferation of higher-order distortion components, which are cascading. These components are more audible, annoying, and therefore less desirable.
Regarding the feeling of naturalness, a couple of aspects should be specified.
The first is related to the fact that sound field produced by one or more “natural” sound sources will never be reproducible by a system with only two channels. The sound reproduced by it is the result of an illusion, artfully created by the work of the sound engineer, which exploits only the characteristics of our auditory system to recreate a believable soundstage.
The second factor concerns the intrinsic quality of the source material. Several factors influence this production: on the one hand, there are technical limitations imposed by the instruments used for recording and signal processing. On the other hand, there are more human factors, such as the “sound archetypes” of the sound technician, as well as the tastes of the artist and their producer, which naturally follow current trends. The latter play an important role. In fact, it is a common observation that certain genres of music or albums from certain periods sound more satisfying on lower-quality sound systems. This is because high-end, balanced systems with greater clarity detect artifacts already contained in the songs. These include those created to optimize the sound for a particular distribution channel, such as radio, streaming, the web or CDs. This is what is known as the “loudness war”: the trend in recent decades to increase the audio levels of recorded music at the expense of dynamics, resulting in reduced fidelity.
Ref. [1] Brian C. J. Moore, An Introduction to the Psychology of Hearing, 6th ed. 2013, Brill, available for full download here
Rif. [2] E. Geddes, L. Lee, Premium Home Theater - Design & Construction, 2003, available for full download here
Rif. [3] F. Rumsey, T. McCormick, Sound and Recording - Application and Theory, 8th ed., Routledge, 2021
End part 2 of 4 - To the third part
For further info:
write to Eng. Parolo
to JAES website
to AES website