I've been awarded a 5-year research fellowship! It's funded by the EPSRC and gives me five years to research "structured machine listening for soundscapes with multiple birds". What does that mean? It means I'm going to be developing computerised processes to analyse large amounts of sound recordings - automatically detecting the bird sounds in there and how they vary, how they relate to each other, how the birds' behaviour relates to the sounds they make.
What's the point of analysing bird sounds? Well...
One surprising fact about birdsong is that it has a lot in common with human language, even though it evolved separately. Many songbirds go through similar stages of vocal learning as we do, as they grow up. And each species is slightly different, which is useful for comparing and contrasting. So, biologists are keen to study songbird learning processes - not only to understand more about how human language evolved, but also to help understand more about social organisation in animal groups, and so on. I'm not a biologist but I'm going to be collaborating with some great people to help improve the automatic sound analysis in their toolkit - for example, by analysing much larger audio collections than they can possibly analyse by hand.
Bird population/migration monitoring is also important. UK farmland bird populations have declined by 50% since the 1970s, and woodland birds by 20% (source). We have great organisations such as the BTO and the RSPB, who organise professionals and amateurs to help monitor bird populations each year. If we can add improved automatic sound recognition to that, we can help add some more detail to this monitoring. For example, many birds are changing location year-on-year in response to climate change (source) - that's the kind of pattern you can detect better when you have more data and better analysis.
Sound is fascinating, and still surprisingly difficult to analyse. What is it that makes one sound similar to another sound? Why can't we search for sounds as easily as we can for words? There's still a lot that we haven't sorted out in our scientific and engineering understanding of audio. Shazam works well for music recordings, but don't be lulled into a false sense of security by that! There's still a long way to go in this research topic before computers can answer all of our questions about sounds.
I'll be developing automatic analysis techniques (signal processing and machine learning techniques), building on starting points such as my recent work on tracking multiple birds in an audio recording and on analysing frequency-modulation in bird sounds. I'll be based at Queen Mary University of London.
I'll also be collaborating with some experts in machine learning, in animal behaviour, in bioacoustics. One of the things on the schedule for this year is to record some zebra finches with the Clayton Lab. I've met the zebra finches already - they're jolly little things, and talkative too! :)
Here's an update to my own personal data about how long it takes to get academic articles published. I've also augmented it with funding applications too, to compare how long all these decisions take in academia.
It's important because often, especially as an early-career researcher, if it takes one year for a journal article to come out (even after the reviewers have said yes), that's one year of not having it on your CV.
So how long do the different bits take? Here's a bar-chart summarising the mean durations in my data:
The data is divided into 3 sections: first, writing up until first submission; then, reviewing (including any back-and-forth with reviewers, resubmission etc); then finally, the time from final decision through to publication.
Firstly note that there are not many data points here, so for example I have one journal article that took an extremely long time after acceptance to actually appear, and this skews the average. But it's certainly notable that the time spent writing generally is dwarfed by the time spent waiting. And particularly that it's not necessarily the reviewing process itself that forces us all to wait - various admin things such as typesetting seem to take at least as long. Whether or not things should take that long, well, it's up to you to decide.
Also - I was awarded a fellowship recently, which is great - but you can see in the diagram, that I spent about two years repeatedly getting negative funding decisions. It's tough!
This is just my own data - I make no claims to generality.
Agh, I just got caught out by a "silent" change in the behaviour of scipy for Python. By "silent" I mean it doesn't seem to be in the scipy 0.12 changelog even though it should be. I'm documenting it here in case anyone else needs to know:
Here's the simple code example - using scoreatpercentile to find a percentile for some 2D array:
import numpy as np from scipy.stats import scoreatpercentile scoreatpercentile(np.eye(5), 50)
On my laptop with scipy 11.0 (and numpy 1.7.1) the answer is:
array([ 0., 0., 0., 0., 0.])
On our lab machine with scipy 13.3 (and numpy 1.7.0) the answer is:
In the first case, it calculates the percentile along one axis. In the second, it calculates the percentile of the flattened array, because in scipy 12 someone added a new "axis" argument to the function, whose default value "None" means to analyse the flattened array. Bah! Nice feature, but a shame about the compatibility. (P.S. I've logged it with the scipy team.)
Someone on the Linux Audio Users list asked how they could analyse a load of FLAC files to work out if it was true for their music collection, that bass frequencies below about 150 Hz (say) tended to be centre-panned. Here's my answer.
First of all, coincidentally I know that Pedro Pestana published a nice analysis of exactly this phenomenon, at the AES 53rd conference recently. He actually looked at hundreds of number-one singles to determine the relationship between panning and frequency in the habits of producers/engineers for popular tracks. The paper isn't open access unfortunately but there you go.
So anyway here's a Python script I just wrote: script to analyse your audio files and plot the distribution of panning per frequency. And here's how it looks when I analyse the excellent Rumour Cubes album:
(Just to stress, this is a simple analysis. It simply looks at the spectral representation of the complete mix, it doesn't infer anything clever about the component parts of the mix.)
See any patterns? The pattern I was looking for is a bit subtle, but it's right down at the bottom below 100 Hz (i.e. 0.1 kHz on the scale): the bass tends to "pinch in" and not get panned around so much as the other stuff.
This analysis of Lotus Flower by Radiohead (by Daniel Jones) shows the effect more clearly.
This is what's generally observed, and widely known in mixing engineer "folklore": pan your bass to the centre, do what you like with the rest. Not everyone agrees on the reasons: some people say it's because the bass can cause the needle to skip out of vinyl records if it's off-centre, some people say it's because we can't really perceive the spatialisation very well at low frequencies, some people say it's just to maximise the energy in the mix. I have no comment on what the reasons might be, but it's certainly folk wisdom for various audio people, and empirically you can test it for yourself by analysing some of your music collection.
NOTE: Code and image updated 2014-02-08, thanks to Daniel Jones (see comments below) for spotting an issue.
This week I was learning about Gaussian Processes, at the very nice Gaussian Processes Winter School in Sheffield. The term "Gaussian Processes" refers to a family of techniques for inferring a smooth surface (1D, 2D, 3D or more) from a set of sampled noisy data points. Essentially, it's an advanced and mathematically very sound type of regression.
Don't get confused by the name, by the way: your data doesn't have to be Gaussian, and Gaussian Process regression doesn't always produce smooth Gaussian-looking results. It's very flexible.
As an example, here's a first pass I did of analysing the frequency trajectories in a single recording of birdsong.
I used the "GPy" Python package to do all this. Here's their GPy regression tutorial.
I do want to emphasise that this is just a first pass, I don't claim this is a meaningful analysis yet. But there's a couple of neat things about the analysis:
So now here's my second example, in a completely different domain. I'm not a geostatistician but I decided to have a go at reconstructing the hills and valleys of Britain using point data from OpenStreetMap. This is a fairly classic example of the technique, and OpenStreetMap data is almost a perfect for the job: it doesn't hold any smooth data about the surface terrain of the Earth, but it does hold quite a lot of point data where elevations have been measured (e.g. the heights of mountain peaks).
If you want to run this one yourself, here's my Python code and OpenStreetMap data for you.
This is what the input data look like - I've got "ele" datapoints, and separately I've got coastline location points (for which we can assume ele=0):
Those scatter plots don't show the heights, but they show where we have data. The elevation data is densest where we have mountain ranges etc, such as central Scotland and in Derbyshire.
And here are two different fits, one with an "exponential" kernel and one with a "Matern" kernel:
Again, the nice thing about Gaussian Process regression is that it seamlessly handles smooth generalisations as well as occasional patches of fine detail where needed. How good are the results? Well it's hard to tell by eye, and I'd need some official relief-map data to validate it. But from looking at these two, I like the exponential-kernel fit a bit better - it certainly gives an intuitively appealing relief map in central Scotland, and it gives visually a bit less blobbiness than the other plot. However it's a bit more wrong in some places, e.g. an overestimated elevation in Derbyshire there (near the centre of the picture). If you ask an actual geostatistics expert, they will probably tell you which kernel is a good choice for regressing terrain shapes.
The other thing you can see in the images is that it isn't doing a very good job of predicting the sea. Often, we dip down to altitude of zero at the coast and then pop back upwards after. No surprises about this, for two reasons: firstly I didn't give it any data points about the sea, and secondly I'm using "stationary" kernels, meaning there's no reason for the algorithm to believe the sea behaves any differently from the land. This is easy to fix by masking out the sea but I haven't bothered.
So altogether, these examples show some of the nice features of Gaussian Process regression, and, along with the code, that the GPy module makes it pretty easy to put together this kind of analysis in Python.
As a contributor to OpenStreetMap, one thing I've been wondering recently is what sort of map data should we collect for the UK, now that the coverage has already got good. Since OpenStreetMap generally has great coverage of the UK, when you're out and about with a printed-out map and a pen, it's very rare that you can find much significant that isn't mapped already - sometimes a new street or a missing church. You could pour your time into mapping increasingly obscure things, whatever you're interested in. But what would be the most useful things to map in the UK, over the coming year? Things that are not just interesting to map but could be practically useful to people? Some thoughts:
Some notes on other things which I'm not sure how vital they are:
I'd be grateful for any feedback on the thoughts above, including other things that could be priorities. Just one UK mapper's perspective.
SuperCollider is an audio environment that gets a lot of things right in terms of hacking around with multichannel sound, live coding and composing the different structures you need for music.
I think there's at least one I've forgotten. Please let me know if you spot others, I'd be interested to keep tabs.
So there are obvious questions: is this a duplication of effort? should these people get together and hack on one system? is any one of them better than the others? I don't know if any of them is better, but one thing I know: it's still very early days in the world of Web Audio. (The underlying APIs aren't even implemented fully by all major browsers yet.) I'm sure some cool live coding web systems will emerge, and they may or may not be based on the older generation. But there's still plenty of room for experimentation.
There was an inspection of GP surgeries that came out last week, widely reported/headlined as "one third of GP surgeries" failing basic health standards. So is it true that one third of GP surgeries fails basic standards? No, and for a very simple reason.
The Care Quality Commission surveyed 910 GP surgeries (out of 8000 total) and found failings in one-third of them. But how did they pick the surgeries to inspect? Did they do it at random? No.
"80% were targeted because of known concerns. The remainder were chosen at random."
In other words, this survey was not a survey of all our surgeries, but of the ones that people were already suspicious about. In a sense, it was a survey of the worst of the bunch. When you pick your targets like this, it makes no sense to generalise the result to the rest of the GP surgeries.
What's the true number? Well we don't know. If we make the assume that all the dodgy surgeries were included in the batch of 910, the percentage would be 3.8%. It would be good luck to capture all the dodgy surgeries, though, so probably a bit higher than that. Still something to be concerned about, of course - but no crisis. The UK is still internationally leading in quality and cost effective healthcare so there's no need to panic...