Other things on this site...

Evolutionary sound
Listen to Flat Four Internet Radio
Learn about
The Molecules of HIV
Make Oddmusic!
Make oddmusic!

Audio phenomenon: Schroeder-phase complexes

Here's an audio phenomenon you should know about: Schroeder-phase complexes.

These are harmonic series which are designed so that their amplitude envelope is maximally flat. When you synthesise a harmonic series of partials, you know what frequencies you should use for the component sinewaves: F, 2F, 3F, 4F, etc. But what phase should you use?

Often we stick with a simple default such as every partial starts with zero phase. There's an issue with that, though, which can lead to issues in perceptual tests: the amplitude envelope, within one pitch period, is quite bumpy, because there are moments when the component phases all line up to produce strong amplitude. Sometimes this bumpiness leads to experimental confounds.

One thing you could do to work around this is use random phases, but adding this extra randomness into an experiment is usually not that desirable.

In 1970 Schroeder published a formula for choosing the phases so that the resulting waveform has a minimal crest factor, i.e. no big amplitude peaks. The formula is pretty simple but my blog doesn't render equations yet so see e.g. this paper.

Let me prove this to you directly: here I've synthesised the same harmonic sound with five different choices of phase. The top row, "sine-phase" and "cosine-phase" correspond to two versions of the default phase-aligned choice, and look how spiky they are:


In the middle is random phase, and at the bottom are two plots from Schroeder-phase. Please note that the y-axis has different scales in each plot - the waveforms each have the same energy, and the same Fourier-transform magnitudes, despite looking very different!

The reason that there are TWO Schroeder plots is because we have an option to flip the sign (time-reverse the waveform) while preserving the waveform characteristics. The shorthand label that people sometimes use is that one of these is "Schroeder-plus" and one is "Schroeder-minus".

BUT WAIT there's one weird thing I haven't shown you yet, and it pops out when you listen to the examples. These stimuli can be used to find frequency thresholds - at low frequency we can tell the difference, but at high frequency they sound identical. And the weirdest thing is when you listen to them at very low frequencies, they don't sound like static harmonic complexes at all (evenr though that's definitely how we generated them), they sound like otherworldly down- or up-chirps.

Listen to this audio file where I play a plus and a minus, at different frequencies. First at 300 Hz, then at 65 Hz, then 16 Hz, then 2 Hz. At first you'll hear two essentially identical tones, but then the differences become noticeable, and then overwhelming:

Download the wave file

It's a nice demonstration of the fact that any periodic signal can be conceived as a sum of stationary sinusoids - as in Fourier analysis. Here we synthesised a chirpy nonstationary-sounding (but periodic) signal, starting from scratch from the sinusoids.

My implementation is here as SuperCollider code, inspired by this paper: Phase effects on the perceived elevation of complex tones.

Tuesday 2nd February 2016 | sound | Permalink

You're not testing a bird recognition app if you're not testing it with birds

So, our Warblr bird sound recognition app has been out for almost a month, and we've had many thousands of people using it and submitting bird audio recordings (thanks!). We've also had lots of great reviews in the consumer press. (Listen to this evocative piece on BBC Radio Scotland, fast-forward to 1hr 43.)

One thing which we knew was going to happen was that some people would demo it by playing back sound recordings into the mic, rather than recording actual birds. After all, sound recordings are easier to grab... What I didn't realise, from my own perspective, is that people would think this was a good way to test the app.

Playing back recordings is usually a really bad way to test the app, or any sound recognition app really, because recorded sounds differ in many many ways:

  • Often people test it with low-quality audio recordings (encoded badly or squished as MP3s or Youtube videos). There are lots of recordings out there on the web which are noticeably distorted or over-filtered.
  • Usually people use low-quality speakers to play back (laptops, phones) which miss out some of the audio content, or again distort it.
  • Usually the audio environment around the playback is inappropriate (e.g. a chiff chaff in the kitchen!) which means the sound contains misleading information.

All of these things make the audio drastically different from a genuine direct recording, even though our human ears are clever enough to understand the correspondence. Yes, ideally a system would be as clever as our human ears, but that's for the future. (Note the difference from a product like Shazam, which recognises recordings but does not recognise the real live musician... interesting eh!)

Plus there's yet another aspect to consider: we make use of your location to help determine what kind of bird is likely. This is thanks to the BTO whose amazing crowdsourced bird data helps us know which birds to expect where and when. So, if you're playing a sound file that isn't native to where you are, our system is doubtful that the bird is there... and quite rightly doubtful, perhaps.

I can't emphasise enough that playing back recorded sounds is not the best way to test. We can't prevent people from doing this, of course! That's fine, but always bear in mind that you didn't test it in proper field conditions, only at your desk. You're not testing a bird recognition app if you're not testing it against real wild birds...

Sunday 6th September 2015 | sound | Permalink

How to analyse pan position per frequency of your sound files

Someone on the Linux Audio Users list asked how they could analyse a load of FLAC files to work out if it was true for their music collection, that bass frequencies below about 150 Hz (say) tended to be centre-panned. Here's my answer.

First of all, coincidentally I know that Pedro Pestana published a nice analysis of exactly this phenomenon, at the AES 53rd conference recently. He actually looked at hundreds of number-one singles to determine the relationship between panning and frequency in the habits of producers/engineers for popular tracks. The paper isn't open access unfortunately but there you go.

So anyway here's a Python script I just wrote: script to analyse your audio files and plot the distribution of panning per frequency. And here's how it looks when I analyse the excellent Rumour Cubes album:

(Just to stress, this is a simple analysis. It simply looks at the spectral representation of the complete mix, it doesn't infer anything clever about the component parts of the mix.)

See any patterns? The pattern I was looking for is a bit subtle, but it's right down at the bottom below 100 Hz (i.e. 0.1 kHz on the scale): the bass tends to "pinch in" and not get panned around so much as the other stuff.

This analysis of Lotus Flower by Radiohead (by Daniel Jones) shows the effect more clearly.

This is what's generally observed, and widely known in mixing engineer "folklore": pan your bass to the centre, do what you like with the rest. Not everyone agrees on the reasons: some people say it's because the bass can cause the needle to skip out of vinyl records if it's off-centre, some people say it's because we can't really perceive the spatialisation very well at low frequencies, some people say it's just to maximise the energy in the mix. I have no comment on what the reasons might be, but it's certainly folk wisdom for various audio people, and empirically you can test it for yourself by analysing some of your music collection.

NOTE: Code and image updated 2014-02-08, thanks to Daniel Jones (see comments below) for spotting an issue.

Friday 7th February 2014 | sound | Permalink

Embedded acoustic environments (Barry Truax)

This weekend I was at the Symposium on Acoustic Ecology. Interesting event, but here I just want to note one specific thing from Barry Truax, who gave a keynote as well as a new composition.

Truax has a pretty nice way of talking about acoustic structure at different scales. As a composer he's been an important proponent of granular synthesis, and as a teacher his way of talking about sound meshes rather neatly with the granular approach.

One issue he brought out in his keynote is how, over the past 100 years, our ways of listening have changed, and our sophistication as listeners. He's not just talking about professional or arty listeners, but all of us. In the past, our "acoustic environment" was pretty much synonymous with our immediate environment more generally. This (Truax argues) is one of the reasons that people in the 1910s seemed to be fooled by the sound of an opera singer on a phonograph record, a sound which to us comes across as a feeble imitation. But recording technologies have allowed us to abstract the acoustic environment from our immediate environment: we now have a felicity with embedded acoustic environments that is so sophisticated as to be casual. We know how to relate to the person sitting next to us on the tube listening to headphones; we understand the voices in the radio, why they have different reverb from the room we're sitting in, and why they can't hear us; we understand what is being hinted at when the narrator in a radio play doesn't seem to be in the same room as the characters.

Later that night, at the concert, there was a great example of embedded acoustic environments. We were listening to a multi-channel electronic concert, in a huge ex-ship-building shed ("No. 3 slip") in a dockyard. This hangar allowed plenty of sound in from outdoors, and so as the music played, it was... ahem... "augmented" by various other sounds: the dockyard's big clock chiming the hours; the firework-like sounds of artillery fire in a naval training ground; and also a heavily-echoed "Call Me" by Blondie!

I don't believe any of this was deliberate ;) but it's a great example of an embedded acoustic environment - and furthermore, the challenge that it presents to electronic composers. Composers need to be aware that the environment they're constructing will be usually played back over some speakers which don't form the entirety of the acoustic environment, but a sub-system of it, for the listeners. (Is this challenge equivalent to a demand to always be site-specific? Not quite, but related.) Some of the composers last night I think did not rise to this challenge, and it showed. But some of them did. Barry Truax was premiering a new piece called "Earth and Steel", written specifically for this place, and it worked great, it was very affecting.

Sunday 10th November 2013 | sound | Permalink
Creative Commons License
Dan's blog articles may be re-used under the Creative Commons Attribution-Noncommercial-Share Alike 2.5 License. Click the link to see what that means...