The picture above shows "Equal Loudness Contour". It describe when humans are subject to different volume, one perceive sound (tonal balance) differently. 


Think of it as humans' hearing have different frequency response filtering at different volume.


Optimum listening volume sound be around 75-85 dB SPL combined of all frequencies.


Please also think about sound wave in "Time Domain". The peak to peak spike governed the perceived highest volume, depending on the rest of the sound / speech i.e. depending on waveform.


For example, if it was a instantaneous shout, where the peak to peak is assumed to be 1 and assume it is loud. Then when continuous speech / talking / music playing with equal volume all the way for the long time (long time relative to shout) but only have a peak to peak of around 0.5, it may also be perceived as loud.


The reason is due to the total combine time energy, during a certain PERIOD of time frame. Therefore it is necessary to study sound in "Frequency" domain as well as "Time".


There are more to it in "Equal Loudness Contour", one is advice to take up a few textbooks to find out, how it was created, why it was created and other issues.

