Other things on this site...

MCLD
music
Evolutionary sound
Listen to Flat Four Internet Radio
Learn about
The Molecules of HIV
MCLD
software
Make Oddmusic!
Make oddmusic!

I have been awarded a 5-year fellowship to research bird sounds

I've been awarded a 5-year research fellowship! It's funded by the EPSRC and gives me five years to research "structured machine listening for soundscapes with multiple birds". What does that mean? It means I'm going to be developing computerised processes to analyse large amounts of sound recordings - automatically detecting the bird sounds in there and how they vary, how they relate to each other, how the birds' behaviour relates to the sounds they make.

zebra finches

Why it matters:

What's the point of analysing bird sounds? Well...

One surprising fact about birdsong is that it has a lot in common with human language, even though it evolved separately. Many songbirds go through similar stages of vocal learning as we do, as they grow up. And each species is slightly different, which is useful for comparing and contrasting. So, biologists are keen to study songbird learning processes - not only to understand more about how human language evolved, but also to help understand more about social organisation in animal groups, and so on. I'm not a biologist but I'm going to be collaborating with some great people to help improve the automatic sound analysis in their toolkit - for example, by analysing much larger audio collections than they can possibly analyse by hand.

Bird population/migration monitoring is also important. UK farmland bird populations have declined by 50% since the 1970s, and woodland birds by 20% (source). We have great organisations such as the BTO and the RSPB, who organise professionals and amateurs to help monitor bird populations each year. If we can add improved automatic sound recognition to that, we can help add some more detail to this monitoring. For example, many birds are changing location year-on-year in response to climate change (source) - that's the kind of pattern you can detect better when you have more data and better analysis.

Sound is fascinating, and still surprisingly difficult to analyse. What is it that makes one sound similar to another sound? Why can't we search for sounds as easily as we can for words? There's still a lot that we haven't sorted out in our scientific and engineering understanding of audio. Shazam works well for music recordings, but don't be lulled into a false sense of security by that! There's still a long way to go in this research topic before computers can answer all of our questions about sounds.

What I am going to do:

I'll be developing automatic analysis techniques (signal processing and machine learning techniques), building on starting points such as my recent work on tracking multiple birds in an audio recording and on analysing frequency-modulation in bird sounds. I'll be based at Queen Mary University of London.

I'll also be collaborating with some experts in machine learning, in animal behaviour, in bioacoustics. One of the things on the schedule for this year is to record some zebra finches with the Clayton Lab. I've met the zebra finches already - they're jolly little things, and talkative too! :)

Tuesday 18th March 2014 | science | Permalink

How long it takes to get my articles published - update

Here's an update to my own personal data about how long it takes to get academic articles published. I've also augmented it with funding applications too, to compare how long all these decisions take in academia.

It's important because often, especially as an early-career researcher, if it takes one year for a journal article to come out (even after the reviewers have said yes), that's one year of not having it on your CV.

So how long do the different bits take? Here's a bar-chart summarising the mean durations in my data:

The data is divided into 3 sections: first, writing up until first submission; then, reviewing (including any back-and-forth with reviewers, resubmission etc); then finally, the time from final decision through to publication.

Firstly note that there are not many data points here, so for example I have one journal article that took an extremely long time after acceptance to actually appear, and this skews the average. But it's certainly notable that the time spent writing generally is dwarfed by the time spent waiting. And particularly that it's not necessarily the reviewing process itself that forces us all to wait - various admin things such as typesetting seem to take at least as long. Whether or not things should take that long, well, it's up to you to decide.

Also - I was awarded a fellowship recently, which is great - but you can see in the diagram, that I spent about two years repeatedly getting negative funding decisions. It's tough!

This is just my own data - I make no claims to generality.

Monday 17th March 2014 | science | Permalink

Gaussian Processes: advanced regression with sounds, and with geographic data

This week I was learning about Gaussian Processes, at the very nice Gaussian Processes Winter School in Sheffield. The term "Gaussian Processes" refers to a family of techniques for inferring a smooth surface (1D, 2D, 3D or more) from a set of sampled noisy data points. Essentially, it's an advanced and mathematically very sound type of regression.

Don't get confused by the name, by the way: your data doesn't have to be Gaussian, and Gaussian Process regression doesn't always produce smooth Gaussian-looking results. It's very flexible.

As an example, here's a first pass I did of analysing the frequency trajectories in a single recording of birdsong.

I used the "GPy" Python package to do all this. Here's their GPy regression tutorial.

I do want to emphasise that this is just a first pass, I don't claim this is a meaningful analysis yet. But there's a couple of neat things about the analysis:

  1. It can combine periodic and nonperiodic variation (by combining periodic and nonperiodic covariance kernels). Here I used a standard RBF kernel plus a periodic kernel which repeats every 1 syllable, and another periodic kernel which repeats every 3 syllables, which reflects well the patterning of this song bout.
  2. It can represent variation across multiple levels of detail. Unlike many other regressions/interpolations, sometimes there are fast wiggles and sometimes broad curves.
  3. It gives you error bars, which are derived from a proper Bayesian posterior.

So now here's my second example, in a completely different domain. I'm not a geostatistician but I decided to have a go at reconstructing the hills and valleys of Britain using point data from OpenStreetMap. This is a fairly classic example of the technique, and OpenStreetMap data is almost a perfect for the job: it doesn't hold any smooth data about the surface terrain of the Earth, but it does hold quite a lot of point data where elevations have been measured (e.g. the heights of mountain peaks).

If you want to run this one yourself, here's my Python code and OpenStreetMap data for you.

This is what the input data look like - I've got "ele" datapoints, and separately I've got coastline location points (for which we can assume ele=0):

Those scatter plots don't show the heights, but they show where we have data. The elevation data is densest where we have mountain ranges etc, such as central Scotland and in Derbyshire.

And here are two different fits, one with an "exponential" kernel and one with a "Matern" kernel:

Again, the nice thing about Gaussian Process regression is that it seamlessly handles smooth generalisations as well as occasional patches of fine detail where needed. How good are the results? Well it's hard to tell by eye, and I'd need some official relief-map data to validate it. But from looking at these two, I like the exponential-kernel fit a bit better - it certainly gives an intuitively appealing relief map in central Scotland, and it gives visually a bit less blobbiness than the other plot. However it's a bit more wrong in some places, e.g. an overestimated elevation in Derbyshire there (near the centre of the picture). If you ask an actual geostatistics expert, they will probably tell you which kernel is a good choice for regressing terrain shapes.

The other thing you can see in the images is that it isn't doing a very good job of predicting the sea. Often, we dip down to altitude of zero at the coast and then pop back upwards after. No surprises about this, for two reasons: firstly I didn't give it any data points about the sea, and secondly I'm using "stationary" kernels, meaning there's no reason for the algorithm to believe the sea behaves any differently from the land. This is easy to fix by masking out the sea but I haven't bothered.

So altogether, these examples show some of the nice features of Gaussian Process regression, and, along with the code, that the GPy module makes it pretty easy to put together this kind of analysis in Python.

Friday 17th January 2014 | science | Permalink

The UK Government Response to the BIS Open Access Review

The UK Government's Department of Business Innovation and Skills recently published a review of Open Access research publication. It made a number of really good recommendations, including de-emphasising the "gold" (pay-to-publish) route, and stepping back from the over-extended embargo periods that the publishers seem to have got RCUK to agree to.

The Government has published its response to this review. What is their response? basically, "Nah, no thanks."

  • The review said "RCUK should build on its original world leading policy by reinstating and strengthening the immediate deposit mandate in its original policy". The Government said "... timely OA ... through mutually acceptable embargo period". There's nothing "mutual" about the choice of embargo period, given that many academics have been asking for the position that the government has just explicitly rejected.
  • The review said "We recommend that the Government and RCUK revise their policies to place an upper limit of 6 month embargoes on STEM subject research and up to 12 month embargoes for HASS subject research, in line with RCUK’s original policy published in July 2012". The Government said "A re-engineering of the research publications market entails a journey not an event" or in other words "No". Note the vacuousness of their statement. It could easily have been "an event", and the committee wasn't even recommending the total removal of embargoes.
  • The review said "We recommend that the Government and RCUK reconsider their preference for Gold open access during the five year transition period, and give due regard to the evidence of the vital role that Green open access and repositories have to play as the UK moves towards full open access." The government said "Government and RCUK policy with an expressed preference for Gold OA [sets the direction of travel]". This is fair enough as a sentiment, but unfortunately the government response also included the publisher's favourite "open access flowchart" which clearly tells researchers that gold open access must be chosen if available. Note that this is not a consensus or objective reading of current RCUK rules, let alone the future. The government is showing no signs of backing away from this weird new competitive problem they're creating right now, where researchers in universities have to compete with their own colleagues (studying completely different disciplines) for the tiny and certainly insufficient institutional pay-to-publish funding pots.
  • The review in fact agrees with the position I just stated: "RCUK’s current guidance provides that the choice of Green or Gold open access lies with the author and the author’s institution, even if the Gold option is available from the publisher. This is incompatible with the Publishers Association decision tree, and RCUK should therefore withdraw its endorsement of the decision tree as soon as possible, to avoid further confusion within the academic and publishing communities." The government says "As discussed above the UK OA Decision Tree sets out clearly the direction of travel." Arrrrrrrrrrrrrrrgh, are you not even listening?

I could go on. Suffice to say that I was so encouraged by the sane voice of the BIS review; yet the government's response appears to be a solid and completely shameless "not for turning".

Thursday 28th November 2013 | science | Permalink

Improved multiple birdsong tracking - video

The "Faculti" website did a video interview with me about automatic birdsong tracking. A little tongue-tied occasionally but here it is (5:36):

Image from video - click to watch

The research papers related to this are:

Monday 28th October 2013 | science | Permalink

Open access: green does NOT mean CC-BY-NC

There's been a fair amount of confusion around the new UK guidelines that mean we have to publish our research articles as open access. One of the urban myths that has sprung up is rather curious, and it's the idea that if you choose to publish under the green route, you're supposed to publish under a Creative Commons NonCommercial licence. This is not true. (It's just one of the many licences that would work.) But I have heard it from heads of research groups, I've heard it from library staff. We need to be clear!

(BACKGROUND: "Green" and "gold" are terms often used to describe two different sorts of open access, and they're also the two terms used by Research Councils UK [RCUK] to tell us what to do. "Gold" means that the publisher has to provide the article freely to everyone, rather than charging people for access; in lieu of that, most publishers will charge us researchers in order to publish under gold. "Green" means the publisher doesn't have to do anything, except to agree that the author can put a copy of the paper on their website or in an online repository. So, both enable free access to research, but in different ways, and with different costs and benefits.)

Now, in RCUK official guidance we have the option of green or gold publication. If we go the gold route, RCUK requires a specific licence: Creative Commons Attribution, aka CC-BY. If we go the green route, the RCUK policy doesn't exactly specify the licence, but it does say that it has to be published "without restriction on non‐commercial re‐use". Pause for a second to unpick the triple-negative in that turn of phrase...

The reason for that wording is that RCUK didn't want the publishers to "lock down" green OA by saying things like "you can self-archive the paper, but only under these strict terms and conditions which don't actually let people get the benefits of OA". For whatever reasons, they decided that it was OK for publishers to forbid commercial reuse (perhaps to prevent other publishers profiting from simply re-publishing?), but they would draw the line and say they weren't allowed to forbid non-commercial reuse. However, the policy doesn't require any particular licence.

But we might be tempted to ask, well, fine, but what is an example of a licence that would satisfy these RCUK rules? Well, Mark Thorley of RCUK gave an example of this: the Creative Commons Attribution-NonCommercial or CC-BY-NC would be fine. It's an appropriate example because it forbids commercial reuse but allows non-commercial reuse. OK so far?

Unfortunately, when you look at Mark Thorley's slides on the RCUK website, that's not exactly what is conveyed. If you go to slide 10 it says:

"Green (at least post print) with a maximum embargo period of 6(12) months, and CC-BY-NC"

OK that's pretty clear isn't it? It doesn't say that CC-BY-NC is just an example, it basically says CC-BY-NC is required. This is not what Thorley meant. I raised this issue on a mailing list, and he clarified the position:

"The policy does not define a specific licence for green deposit, provided non-commercial re-use such as text and data mining is supported. In presentations I say that this 'equates to CC-BY-NC', however, we do not specifically require CC-BY-NC. This is because some publishers, such as Nature, offer specific deposit licences which meet the requirements of the policy. However [...] this is the minimum requirement. So if authors are able and willing to use more open licences, such as CC-BY, we would encourage this. The more open the licence, the less ambiguities and barriers there are to re-use of repository content."

This clarification is welcome. But unfortunately it was provided in a reply on a mailing list discussion, and the RCUK website itself doesn't provide this clarification, so the misunderstanding is bound to run and run. This week I heard it repeated in an Open Access forum, and I hope that if you've read this far you'll help stop this misconception getting out of hand!

Wednesday 17th July 2013 | science | Permalink

Bird sound analysis with MPTK: chirp vs gabor

I'm just back from a great two-week research visit to INRIA in Rennes. The fruit of our labour will be a new release of the Matching Pursuit ToolKit with some whizzy extra features and polish. In my previous blog entry I showed how we can use Matching Pursuit to detect patterns in spectrograms - now I want to show you a quick example of how these techniques can give you a clearer, more meaningful representation of sounds such as birdsong.

On my way home one day I got a nice recording of a chiff-chaff, so we'll use that as our example. (I also put the longer audio on Xeno Canto as XC125867.)

My particular concern is how to analyse this sound so we can capture some of the fine detail of the very fast pitch variation in birdsong - the chiffchaff is a clear example of this because it sings individual "notes", each with a very fast downward chirp onto the note.

So, using MPTK, I have a few choices of how to analyse. A classic option is using Gabor atoms, which you can think of as simple time-frequency blobs a little bit like the pixels in a spectrogram. MPTK can find a sparse representation of the signal using Gabor atoms - in the picture below, the first plot is a simple spectrogram, and the second one is the result of Matching Pursuit with Gabor atoms:

Untitled

(BTW, the vertical axes aren't quite the same - oops.) As you can see, it's worked out how to build the energetic parts of the signal using a small number of Gabor atoms.

But another choice is to analyse using chirplets. These are a lot like Gabor atoms except they don't just have a fixed frequency, they can slope downwards or upwards in frequency. MPTK has a nice feature for efficient chirplet analysis (it uses Rémi Gribonval's fast matching pursuit technique for chirplets).

You can see the chirplet-based analysis in the bottom of the plot above. Notice how each syllable from the bird seems to begin with a big downward slash, showing a very fast downward chirp. That reflects what is actually happening and what you can hear in the recording.

The important thing, for me, is that chirplets here seem to be getting a much more meaningful representation of the signal than the Gabor atoms. This should be more useful for downstream analysis (whether by human or machine).

We can even sonify the difference using timestretching. Once I've analysed the sound using MPTK, I can reconstruct it... or I can manipulate the data and reconstruct modified versions of it. In the following MP3 player you'll hear 5 tracks. First the 7-second original recording. Then there's a reconstruction using chirplets; then a four-times-slower timestretch version made from the chirplets. Then the same but with Gabor atoms (a reconstruction, then a timestretch version).

In particular, compare the timestretched versions. With the Gabor version, you hear a lot of very robotised / quantised / MP3ish artifacts to the sound, whereas the chirplet version sounds much more natural. Still some artifacts in both, of course.

The Python code for these examples is available here - note that it relies on the pyMPTK wrapper, which is going to be in the soon-to-be-released MPTK version 0.7.

Monday 25th March 2013 | science | Permalink

Pulling bird sounds out of the fog

I'm on a research visit to Rémi Gribonval's research group at INRIA (Rennes, France). So far it's been great, and maybe I'll tell you more later, but first I just want to blog a little signal-processing achievement for today.

Together with Bob Sturm and Boris Mailhé I've been working on improvements to MPTK, the C++ toolkit for sparse decomposition of signals. I've spent quite a lot of time on making a nice Python wrapper (yay) and also on code refactoring (hmm), but today I have actually done some signal processing:

Below you can see a little spectrogram of a chiffchaff singing:

Untitled

Time is along the x-axis, frequency the y-axis. (The audio is from Xeno Canto: #XC25760.)

Now, in my previous research work I developed a way of tracking those chirrups, but it relied on a rather simplistic first step of detecting individual sounds. What I've been able to do, finally today, is use Matching Pursuit instead, thanks to MPTK (with the "anywave" feature which I think I have just fixed). Potentially, this has some advantages in detecting birdsong syllables cleanly.

So my first test today is to take a simple "template" for a syllable:

Untitled

And use this as a one-atom dictionary in matching pursuit applied to the above spectrogram. The result is a set of "detections" which I can use for various purposes, such as, for example, reconstructing a cleaned-up spectrogram:

Untitled

Looks pretty good eh? There's one false-positive in the above, and one or two false-negatives, but the basic principle is looking good. This should be useful.

Monday 18th March 2013 | science | Permalink

Update on GM-PHD filter (with Python code)

Note: I drafted this a while back but didn't get round to putting it on the blog. Now I have published code and a published paper about the GM-PHD filter, I thought these practical insights might be useful:

I've been tweaking the GM-PHD filter which I blogged about recently. (Gaussian mixture PHD is a GM implementation of the Probability Hypothesis Density filter, for tracking multiple objects in a set of noisy observations.)

I think there are some subtleties to it which are not immediately obvious from the research articles.

Also, I've published my open source GM-PHD Python code so if anyone finds it useful (or has patches to contribute) I'd be happy. There's also a short research paper about using the GM-PHD filter for multi-pitch tracking.

In that original blog post I said the results were noisier than I was hoping. I think there are a couple of reasons for this:

  • The filter benefits from a high-entropy representation and a good model of the target's movement. I started off with a simple 1D collection of particles with fixed velocity, and in my modelling I didn't tell the GM-PHD about the velocity - I just said there was position with some process noise and observation noise. Well, if I update this so the model knows about velocity too, and I specify the correct linear model (i.e. position is updated by adding the velocity term on to it) the results improve a little. I was hoping that I coud be a bit more generic than that. It may also be that my 1D example is too low-complexity, and a 2D example would give it more to focus on. Whatever happened to "keep it simple"?!

  • The filter really benefits from knowing where targets are likely to come from. In the original paper, the simulation examples are of objects coming from a fixed small number of "air bases" and so they can be tracked as soon as they "take off". If I'm looking to model audio, then I don't know what frequency things will start from, there's no strong model for that. So, I can give it a general "things can come from anywhere" prior, but that leads to the burn-in problem that I mentioned in my first blog post - targets will not accumulate much evidence for themselves, until many frames have elapsed. (It also adds algorithmic complexity, see below.)

  • Cold-start problem: the model doesn't include anything about pre-existing targets that might already be in the space, before the first frame (i.e. when the thing is "turned on"). It's possible to account for this slightly hackily by using a boosted "birth" distribution when processing the first frame, but this can't answer the question of how many objects to expect in the first frame - so you'd have to add a user parameter. It would be nice to come up with a neat closed-form way to decide what the steady-state expectation should be. (You can't just burn it in by running the thing with no observations for a while before you start - "no observations" is expressed as "empty set", which the model takes to mean definitely nothing there rather than ignorance. Ignorance would be expressed as an equal distribution over all possible observation sets, which is not something you can just drop in to the existing machinery.)

One mild flaw I spotted is in the pruning algorithm. It's needed because without it the number of Gaussians would diverge exponentially, so to keep it manageable you want to reduce this to some maximum limit at each step. However, the pruning algorithm given in the paper is a bit arbitrary, and in particular it fails to maintain the total sum of weights. It chops off low-weight components, and doesn't assign their lost weight to any of the survivors. This is important because the sum of weights for a GMPHD filter is essentially the estimated number of tracked objects. If you have a strong clean signal then it'll get over this flaw, but if not, you'll be leaking away density from your model at every step. So in my own code I renormalise the total mass after simplification - a simple change, hopefully a good one.

And a note about runtime: the size of the birth GMM strongly affects the running speed of the model. If you read through the description of how it works, this might not be obvious because the "pruning" is supposed to keep the number of components within a fixed limit so you might think it allows it to scale fine. However, the if birth GMM has many components, then they all must be cross-fertilised with each observation point at every step, and then pruned afterwards, so even if they don't persist they are still in action for the CPU-heavy part of the process. (The complexity has a kind of dependence on number-of-observations * number-of-birth-Gaussians.) If like me you have a model where you don't know where tracks will be born from, then you need many components to represent a flat distribution. (In my tests, using a single very wide Gaussian led to unpleasant bias towards the Gaussian's centre, no matter how wide I spread it.)

Tuesday 5th February 2013 | science | Permalink

Comment on 'High heels as supernormal stimuli: How wearing high heels affects judgements of female attractiveness'

There's a research paper just out which has gained itself some press: "High heels as supernormal stimuli: How wearing high heels affects judgements of female attractiveness". It's described in the popular press as "proving" that high heels make women attractive, and that's fair enough but it's obviously not very surprising news given that high heels are widely known in current Western society to have that association. The research paper is slightly more specific than that: it finds that whatever "information" is transmitted to the viewer by high heels is even transmitted when we can see nothing but a handful of moving dots, hiding everything about the viewee except their gait.

That's interesting. But unfortunately, the authors go on to make one further step, which strikes me as a step too far - namely they infer that this reflects some evolutionary explanation for the popularity of high heels. The word "supernormal" in the title refers to the idea that high heels might cause women to walk in a way which exaggerates female aspects of gait, i.e. makes them walk even more unlike males than otherwise. There is indeed evidence for this in their paper. But the authors explicitly test for whether the "female" aspects of gait correlate with attractiveness judgments, and they find insignificant or barely significant correlations.

(Technical note: two of the correlations attain p<0.05, but they didn't control for multiple comparisons, so the true significance is probably lower. And the correlations I'm talking about now are in their Table 2, which is looking at differences within the high-heel category and within the flat-shoe category. The main effect demonstrated by the authors is indeed significant: viewers rated the high-heel videos as more attractive.)

So what does this suggest? To me it seems they've demonstrated that
(a) high heels affect gait (as you can tell on most Friday nights in town), and
(b) people recognise the change in gait as being associated with attractiveness and femininity.
But this second finding can just as easily be explained by cultural learning as by something evolutionary, despite the fact that the paper was published in "Evolution and Human Behavior".

In fact, (b) could conceivably be caused by a conjunction of:
(b1) people recognise the change as being caused by high heels (whether consciously or not); and
(b2) people recognise that high heels are associated with attrractiveness and femininity.
(This b1-and-b2 scenario is also a potential explanation for their second set of findings, in which the gaits of high-heeled walkers are less often mistaken for men.)

All of which means that I don't think these experiments manage to discern any difference between effects caused by evolved factors and effects caused by cultural learning. Given that, the obvious way to test that difference would be to show the dot videos to viewers who grew up in a non-Western society which doesn't have a tradition of high heels. (Not a convenient test to do - but I'd definitely be interested in the results!)

Here's one quote from their results, about a minor aspect, whether male or female onlookers have different opinions:

"note that there was no shoetype-gender interaction, showing that both males and females judged high heels to be more attractive than flat shoes. [...] furthermore, there were high correlations between male and female attractiveness ratings of the walkers in both the flat and heels condition demonstrating that males and females agreed which were the attractive and unattractive walkers."

So, in this study, the male and female onlookers showed the same pattern of response to the presence of high heels. Does this perhaps hint that the difference might be learned, rather than from some presumed phwoar-factor inbuilt in men?

This study is an example of what I see as a frustrating tendency for people in biological disciplines to do interesting quantitative studies, but then to plunge into the discussion section and make unwarranted generalisations about the evolutionary reasons for something's existence. As well as invoking evolution, in this case they also discuss women's motivation for how they dress:

"Therefore we suggest that one, conscious or unconscious, motivation for women to wear high heels is to increase their attractiveness."

Firstly, this study explicitly does not explore women's motivations, in any sense. It only studies judgments made by outside observers. Secondly, as the authors have already acknowledged,

"High heels have become a part of the uniform of female attire in a number of different contexts and as such are part of a much more complex set of display rules."

I don't dispute that attractiveness might be a more important motivation for some than other motivations (fashion, identity, confidence, social norms, availability, symbolism), but let's not imply that this hunch is an empirical finding, please. The association of high heels with attractiveness is already a common trope, so the idea that women might be motivated to buy into that trope is perfectly plausible, but this study throws no light on it.

Still, as I said, the main finding is interesting: the differences in gait induced by high heels, and the rating of such gaits as attractive, are demonstrated to be easily perceivable even in a display reduced to a handful of green dots.

Saturday 5th January 2013 | science | Permalink

Academia and flying

When I started in academia I had no idea how much travel was involved. I started a PhD because I was fascinated by computational possibilities in digital sound, and almost by chance I ended up at one of the world-leading research groups in that area, which just happened to be nearby in London. Throughout my PhD I was lucky enough to get funded travel to conferences in interesting cities like Copenhagen and Helsinki, which was an unexpected plus - not just for the research benefits you get from meeting other researchers worldwide, but just for being able to visit and see those places.

Now, two things we know are that international travel tends to involve flying, and that flying is bad for the environment. Having finished my PhD and now working as an academic researcher, there are still plenty of research conferences around the world that are good to go to: they're specifically good for my project right now, and also for my professional development more generally. On average, research conferences are in other countries. So, is it possible to be an academic researcher and avoid flying? Does that harm your career? Could academia be "rearranged" to make it involve less flying?

Here's an example: I was invited to give a seminar in a European country. A nice invitation, and the organisers agreed to try and arrange to travel by train rather than plane. From the UK, this is tricky, because as an island the options are a little restricted: we have some nice (but slow) ferry services, and we have the Eurostar. It's hard for me to request non-plane transport because it tends to be more expensive for the organisers, and it can be really hard to schedule (since there are fewer schedule options and they take longer). So in the end, this time round we had to go for a compromise: I'm taking a plane one way and a train the other way. We couldn't do better than that.

In environmental terms, we can do better - I could decline the invitation. But academic research is international: the experts who are "next door" in terms of the subject are almost never "next door" geographically. If you want to develop your research you have to have meaningful personal interactions with these experts. Email, phone, videoconferencing are all fine, but if that's all you do then you lose out on the meaningful, full-bandwidth interaction that actually leads to new ideas, future collaborations, real understandings.

(For some research that confirms and discusses the importance of face-to-face collaboration, try this interesting story about lasers: Collins, H.M. "Tacit Knowledge, Trust and the Q of Sapphire" Social Studies of Science p. 71-85 31(1) 2001)

As a whole, is there much that the academic world can do to mitigate the amount of travel needed? Well, I'd still say it's worth encouraging teleconferencing and the like, though as I've noted I don't think it completely scratches the itch. Should we try to focus on local-ish conferencing rather than one global summit? That doesn't strike me as a very fruitful idea, since it would reduce the amount of international mixing if it worked (and thus the amount of productive international collaboration), and I don't think it would work since one "local" conference would probably tend to emerge as the stronger.

And if you're a researcher, aware of the issues involved in heavy use of air travel, you have a balance to strike. How much can/should you turn down interesting opportunities for presenting, networking, collaboration, based on geographic distance? Will it harm your own opportunities, while others jet off to take advantage of them? Personally I know there are specific opportunities I've turned down in the past year or so, because it didn't feel right to jet off to certain places just for a couple of days' meeting. In other cases, I've taken up opportunities only after making sure I make the most of the visit by adding other meetings or holidays into the visit.

Your thoughts would be welcome...

Tuesday 20th November 2012 | science | Permalink

How long it takes to get my articles published

NB Updated version here

Today I remembered about an article I submitted ages ago to a journal, accepted but not out yet. I also realised that since I store all my work in version-control, I can pull out all the exact dates when I started writing things, submitted them, rewrote them, etc.

So here's a visualisation of that data. In the following, a solid straight line is a period where the paper is "in my hands" (writing, rewriting, or whatever), and a dashed arc is where the paper is with someone else (a reviewer or a typesetter):

Each GREEN blob is a moment of official acceptance; a RED blob the moment of official rejection; a YELLOW blob the moment of a journal article actually being publicly available. (I also included some conference papers - the yellow blob is the date of presentation for those.) This only covers things since I got my PhD.

One thing you can see straight away is that often the initial review and/or the final typesetting periods are massive compared against the writing periods. I hadn't realised, but for my journal articles it's pretty much at least 1 year between submission and availability.

People often complain about the peer-review process and how slow it can be, but the thing that's puzzling me right now is why these massive post-acceptance delays, which are nothing to do with reviewing? For someone like me who normally submits LaTeX documents, I can't even guess what work is left to do... yet it seems to take a minimum of 4 months!

This is just my own data - I make no claims to generality.

Monday 17th September 2012 | science | Permalink

Notes from EUSIPCO 2012

Just back from the EUSIPCO 2012 conference in Bucharest. (The conference was held in the opulent Palace of the Parliament - see previous article for some thoughts on the palace and the town.) Here some notes about interesting talks/posters I saw:

Talks/posters

Lots of stuff relevant to recognition in audio scenes, which is handy because that's related to my current work.

  • David Damm's "System for audio summarisation in acoustic monitoring scenarios". Nice approach and demo (with sounds localised around the Frauenhofer campus), though the self-admitted drawback is that it isn't yet particularly scalebale, using full DTW search etc.
  • Sebastien Fenet's "fingerprint-based detection of repeating objects in multimedia streams" - here a very scaleable approach, using fingerprints (as is done in other large-scale systems such as Shazam). In this paper he compared two fingerprint types: a Shazam-like spectral-peaks method (but using constant-Q spectrum); and a shallow Matching Pursuit applied to multiscale STFT. His results seem to favour the former.
  • Xavier Valero's "Gammatone wavelet features for sound classification in surveillance applications" - this multiscale version of gammatone is apparently better for detecting bangs and bumps (which fits with folk knowledge about wavelets...).
  • M. A. Sehili's "Daily sound recognition using a combination of GMM and SVM for home automation" - they used something called a Sequence Classification Kernel which apparently can be used in an SVM to classify sequential data, even different-length sequential data. Have to check that out.
  • Two separate papers - Anansie Zlatintsi's "AM-FM Modulation Features" and Xavier Valero's "Narrow-band Autocorrelation features" - used features which are complementary to the standard Mel energies, by analysing the fine variation within each band. They each found improved results (for different classification tasks). (In my own thesis I looked at band-wise spectral crest features, hoping to achieve something similar. I found that they did provide complementary information [Sec 3.4] but unfortunately were not robust enough to noise/degradation for my purposes [Sec 3.3]. It'll be interesting to see how these different features hold up - they are more interesting than my spectral crests I think.)

Plenty of informed audio source separation was in evidence too. Not my specialism, more that of others in our group who came along... but I caught a couple of them, including Derry Fitzgerald's "User assisted separation using tensor factorisations" and Juan-Jose Bosch's "Score-informed and timbre-independent lead instrument separation in real-world scenarios".

Other papers that were interesting:

  • T Adali, "Use of diversity in independent decompositions" - for indendence-based decompositions, you can use either of two assumptions about the components: non-Gaussianity or time-dependence. The speaker noted that measuring mutual information rate covers both of these properties, so it seems like a neat thing to use. She used it for some tensor decompositions which were a bit beyond me.
  • C Areliano's poster on "Shape model fitting algorithm without point correspondence": simple idea for matching a hand image against a template which has marked points on it (but the query image doesn't): convert both representations into GMMs then find a good registration between the two GMMs. Could be useful, though the registration search is basically brute-force in this paper I think.
  • Y Panagakis prsented "Music structure analysis by subspace modeling" - it makes a lot of sense, intuitively, that music structure such as verse-chorus-verse should be suited to this idea of fitting different feature subspaces to them. The way music is produced and mixed should make it appropriate for this, I imagine (whereas for audio scenes we probably don't hop from subspace to subspace... unless the mic is moving from indoors to outdoors for example...)
  • Y Bar-Yosef's "Discriminative Algorithm for comacting mixture models with application to language recognition". Taking a GMM and approximating it by a smaller one is a general useful technique - here they were using Hershey and Olsen's 2007 "variational approximation" to the KLD between two GMMs. In this paper, their optimisation tries to preserve the discriminative power between two GMMs, rather than simply keeping the best fit independently.
  • I Ari's "Large scale polyphonic music transcription using randomized matrix decompositions" - some elegant tweaks which mean they can handle a very large matrix of data, using a weighted-random atom selection technique which reminds me a little of a kind of randomised Matching Pursuit (though MP is not what they're doing). They reduce the formal complexity of matrix factorisation, both in time and in space, so that it's much more tractable.
  • H Hu's "Sparsity level in a non-negative matrix factorisation based speech strategy in cochlear implants" - I know they do some good work with cochlear implants at Southampton Uni. This was a nice example: not only did they use Sparse NMF for noise reduction, and test it with human subjects in simulated conditions, but they also implemented it on a hardware device as used in cochlear implants. This latter point is important because at first I was dubious whether this fancy processing was efficient enough to run on a cochlear implant - good to see a paper that answers those kind of questions immediately.

Plenaries/tutorials

Christian Jutten gave a plenary talk on source-separation in nonlinear mixtures. Apparently there's a proof from the 1980s by Darmois that if you have multiple sources nonlinearly mixed, then ICA cannot guarantee to separate them, for the following simple reason: ICA works by maximising independence, but Darmois proved that for any set of perfectly independent sources you can always construct a nonlinear mixture that preserves this independence. (Jutten gave an example procedure to do this; I think you could use the inverse-copula of the joint distribution as another way.)

Therefore to do source-separation on nonlinear mixtures you need to add some assumptions, either as constraints or regularisation. Constraining just to "smooth mappings" doesn't work. One set of mixture types which does work is "post-nonlinear mixtures", which means mixtures in which nonlinearities are applied separately to the outputs after linear mixing. (This is a reasonable model, for example, if your mics have nonlinearities but you can assume the sounds linearly mixed in the air before they reached the mics.) You have to use nonlinearities which satisfy a particular additivity constraint (f(u+v) = (f(u)+f(v))/(1+f(u)f(v)) ... tanh satisfies this). Or at least, you have to use those kind of nonlinearities in order to use Jutten's method.

Eric Moulines talked about prediction in sparse additive models. There's a lot of sparsity around at the moment (and there were plenty of sparsity papers here); Moulines' different approach is that when you want to predict new values, rather than to reverse-engineer the input values, you don't want to select a single sparsity pattern but aggregate over the predictions made by all sparsity patterns. He uses a particular weighted aggregation scheme which he calls "exponential aggregation" involving the risk calculated for each "expert" (each function in the dictionary).

Now, we don't want to calculate the result for an exponentially large number of sparsity patterns and merge them all, since that would take forever. Moulines uses an inequality to convert the combinatorial problem to a continuous problem; unfortunately, at the end of it all it's still too much to calculate easily (2^m estimators) so he uses MCMC estimation to get his actual results.

I also went to the tutorial on Population Monte Carlo methods (which apparently were introduced by Cappe in 2004). I know about Particle Filters so my learnings are relative to that:

  • Each particle or iteration can have its OWN instrumental distribution, there's no need for it to be common across all particles. In fact the teacher (Petar Djuric) had worked on methods where you have a collection of instrumental distributions, and weighted-sample from all of them, adapting the weights as the iterations progress. This allows it to automatically do the kind of things we might heuristically want: start with broad, heavy-tailed distributions, then focus more on narrow distributions in the final refinement stages.
  • For static MC (i.e. not sequential), you can use the samples from ALL iterations to make your final estimate (though you need to take care to normalise appropriately).
  • Rao-Blackwellisation lets you solve a lower-dimensional problem (approximating a lower-dimensional target distribution) if you can analytically integrate to solve for a subset of the parameters given the other ones. For example, if some parameters are gaussian-distributed when conditioned on the others. This can make your approximation much simpler and faster.
  • It's generally held a good idea to use heavy-tailed distributions, e.g. people use Student's t distribution since heavier-tailed than Gaussian.
Sunday 2nd September 2012 | science | Permalink

Comment on 'Seeing women as objects: The sexual body part recognition bias'

PREFACE: There's a risk that I might come across here as dismissing the research, and doing so for an odd reason. I'd like to be clear that I think this is an interesting study, and I'm not an expert in cognitive psychology but I'm writing because I'm interested in seeing these issues teased apart in more detail. See also the comments section.

Interesting article someone pointed out in European Journal of Social Psychology: Seeing women as objects: The sexual body part recognition bias. The basic idea is to use a psychophysics-type perceptual experiment to explore whether people looking at men and at women process them differently. If perceiving people "as objects" makes a difference to the cognitive processes involved, then that should be detectable.

There's plenty of evidence about our society's exaggerated emphasis on female body image, and the consequences of such objectification. What the researchers do here is use an experiment in which participants are shown images of men and women (either complete or partial images), and ask them to do a kind of spot-the-difference task. They find people get different percentage-correct scores depending on whether it's an image of a man or a woman one is looking at.

The researchers discuss this result as relating to objectification of women, and I think that's broadly OK, but there's an extra hop that I think is glossed over. A tweet summarised the research as "People perceive men using global processing, but women with local processing" but it would be more correct to say "People perceive images of men using global processing, but images of women with local processing". (It's not just the 140-character limit at fault here, the research paper itself makes the leap.)

The point is that the participants were reacting to 2D images, rather than real physical presences of men or women. Now, you might think, is that an important difference, or just quibbling? I'm not claiming that the results are wrong, and I'm not even claiming that the results don't tell us something about objectification of women. But the difference between looking-at-people and looking-at-images is important here since it relates closely to the claims being made - and this highlights the complexity of making measurements of socially-embedded cognitive processing.

Here's why I think it's a difference: In our everyday lives we see "3D" men and women. We also see "2D" images of men and women. So there are four pertinent categories here: 3D men, 3D women, 2D men-images and 2D women-images. We have absorbed general impressions about these four categories from out experiences so far (whether those "categories" are categories we use ourselves is beside the point). It's well known that there are more and different images of women than men, used in advertising and other media. As a person develops they see examples of all four categories around them, and they might learn similarities and differences, things that the categories have in common or not.

[Edit: Maybe a better way of putting it is inanimate-vs-animate, not 2D-vs-3D - see comments]

So, it's reasonable to expect that an average person in Western society is more familiar with objectified images of women around than of men. (Note that I do not claim this state of affairs is OK! I just claim that it's the average person's developmental environment.) It's easier to deal with familiar categories than unfamiliar ones. So we'd expect people to have better processing when presented with 2D body-part-images of women - and it probably correlates with their visual processing of real-life people, but that's not certain and it needs to be tested.

Am I claiming that the research should not be trusted? No. It looks like a decent and interesting experimental result. But the authors make a slight leap, which we should treat with caution: they imply that their statistically significant result on how people visually process 2D-images-of-men and 2D-images-of-women transfers directly to how people visually process men and women in the flesh. Personally I would expect that people's perception of "3D" men and women probably partly generalises from the image perception and partly doesn't. (There might be existing research on that; comments welcome.)

And obviously it's much harder to conduct large experiments by showing people "glimpses of real live men/women" rather than images, so there's a good reason why such research hasn't yet been done.

But that's good news right? - more research needed ;)

Friday 17th August 2012 | science | Permalink

A very simple toy problem for matching pursuits

To help me think about how and why matching pursuits fail, here's a very simple toy problem which defeats matching pursuit (MP) and orthogonal matching pursuit (OMP). [[NOTE: It doesn't defeat OMP actually - see comments.]]

We have a signal which is a sequence of eight numbers. It's very simple, it's four "on" and then four "off". The "on" elements are of value 0.5 and the "off" are of value 0; this means the L2 norm is 1, which is convenient.

signal = array([0.5, 0.5, 0.5, 0.5, 0, 0, 0, 0])
Diagram of signal

Now, we have a dictionary of 8 different atoms, each of which is again a sequence of eight numbers, again having unit L2 norm. I'm deliberately constructing this dictionary to "outwit" the algorithms - not to show that there's anything wrong with the algorithms (because we know the problem in general is NP-hard), but just to think about what happens. Our dictionary consists of four up-then-down atoms wrapped round in the first half of the support, and four double-spikes:

dict = array([
   [0.8, -0.6, 0, 0, 0, 0, 0, 0],
   [0, 0.8, -0.6, 0, 0, 0, 0, 0],
   [0, 0, 0.8, -0.6, 0, 0, 0, 0],
   [-0.6, 0, 0, 0.8, 0, 0, 0, 0],
   [sqrt(0.8), 0, 0, 0, sqrt(0.2), 0, 0, 0],
   [0, sqrt(0.8), 0, 0, 0, sqrt(0.2), 0, 0],
   [0, 0, sqrt(0.8), 0, 0, 0, sqrt(0.2), 0],
   [0, 0, 0, sqrt(0.8), 0, 0, 0, sqrt(0.2)],
]).transpose()
Diagram of dictionary atoms

BTW, I'm writing my examples as very simple Python code with Numpy (assuming you've run "from numpy import *"). We can check that the atoms are unit norm, by getting a list of "1"s when we run:

sum(dict ** 2, 0)

So, now if you wanted to reconstruct the signal as a weighted sum of these eight atoms, it's a bit obvious that the second lot of atoms are unappealing because the sqrt(0.2) elements are sitting in a space that we want to be zero. The first lot of atoms, on the other hand, look quite handy. In fact an equal portion of each of those first four can be used to reconstruct the signal exactly:

sum(dict * [2.5, 2.5, 2.5, 2.5, 0, 0, 0, 0], 0)

That's the unique exact solution for the present problem. There's no other way to reconstruct the signal exactly.

So now let's look at "greedy" matching pursuits, where a single atom is selected one at a time. The idea is that we select the most promising atom at each step, and the way of doing that is by taking the inner product between the signal (or the residual) and each of the atoms in turn. The one with the highest inner product is the one for which you can reduce the residual energy by the highest amount on this step, and therefore the hope is that it typically helps us toward the best solution.

What's the result on my toy data?

  • For the first lot of atoms the inner product is (0.8 * 0.5) + (-0.6 * 0.5) which is of course 0.1.
  • For the second lot of atoms the inner product is (sqrt(0.8) * 0.5) which is about 0.4472.

To continue with my Python notation you could run "sum(dict.T * signal, 1)". The result looks like this:

array([ 0.1,  0.1,  0.1,  0.1,  0.4472136,  0.4472136,  0.4472136,  0.4472136])

So the first atom chosen by MP or OMP is definitely going to be one of the evil atoms - more than four times better in terms of the dot-product. (The algorithm would resolve this tie-break situation by picking one of the winners at random or just using the first one in the list.)

What happens next depends on the algorithm. In MP you subtract (winningatom * winningdotproduct) from the signal, and this residual is what you work with on the next iteration. For my purposes here it's irrelevant: both MP and OMP are unable to throw away this evil atom once they've selected it, which is all I needed to show. There exist variants which are allowed to throw away dodgy candidates even after they've picked them (such as "cyclic OMP").

NOTE: see the comments for an important proviso re MP.

Friday 13th July 2012 | science | Permalink

A* orthogonal matching pursuit - or is it

The delightful thing about the A* routing algorithm is that it is provably the optimal algorithm for the purpose, in the sense that it's the algorithm that visits the fewest possible path nodes given the information made available. See the original paper for proof. Despite its simplicity, it is apparently still used in a lot of industrial routing algorithms today, and it can be adapted to help solve other sorts of problem.

A colleague pointed out a paper about "A*OMP" - an algorithm that performs a kind of OMP (Orthogonal Matching Pursuit) with a tree search added to try out different paths towards fitting a good sparse representation. "Aha," I thought, "if they can use A* then they can get some nice recovery properties inherited from the A* search."

However, in reading the paper I find two issues with the A*OMP algorithm which make me reluctant to use the name "A*" for it:

  1. The heuristics used are not "consistent" (this means you can't guarantee they are always less-than-or-equal to the true distance remaining). This lack of consistency means the proof of A*'s optimality doesn't apply. (Remember, A*'s "optimality" is about the number of nodes inspected before finding the best path.) (EDIT: a colleague pointed out that it's actually worse than this - if the heuristic isn't consistent then it's not just sub-optimal search, it may fail to inspect the best path.)
  2. Since A*OMP restricts the number of paths it adds (to the top "B" atoms having largest cross-product with the residual) there are no guarantees that it will even inspect the true basis.

These issues are independent of each other. If you leave out the pragmatic restriction on the number of search paths (to get round the second issue), the first issue still applies. OMP itself is greedy rather than exact, so this doesn't make A*OMP worse than OMP, but to my mind it's "not as good as A*".

In practice, the authors' A*OMP algorithm might indeed get good results, and the experiments shown seem to do so. So my quibbles may be mere quibbles. But the name "A*" led me to expect guarantees that just aren't there (e.g. guarantees of being better than OMP). It's quite easy to construct a toy problem for which A*OMP will not get you nearer the true solution than OMP will.

It's not obvious how to come up with a consistent heuristic. For a given problem, if we knew there was an exact solution (i.e. zero residual was possible within the sparsity constraints) then we could use the residual energy, but since we can't know that in general then the residual energy may overestimate the distance to be "travelled" to the goal.

One minor thing: their "equivalent path pruning" in section 4.2.3 is a bit overkill - I know a simpler way to avoid visiting duplicate paths. I'll leave that as an exercise for the reader :)

Friday 13th July 2012 | science | Permalink

My research about time and sound

I've got three papers accepted in conferences this summer, and they're all different angles on the technical side of how we analyse audio with respect to time.

In our field, time is often treated in a fairly basic way, for reasons of tractability. A common assumption is the "Markov assumption" that the current state only depends on the immediate past - this is really handy because it means our calculations don't explode with all the considerations of what happened the day before yesterday. It's not a particularly hobbling assumption - for example, most speech recognition systems in use have the assumption in there, and they do OK.

It's not obvious whether we "need" to build systems with complicated representations of time. There is some good research in the literature already which does, and they have promising results. And conversely, there are some good arguments that simple representations capture most of what's important.

Anyway, I've been trying to think about how our signal-processing systems can make intelligent use of the different timescales in sound, from the short-term to the long-term. Some of my first work on this is in these three conference talks I'm doing this summer, each on a different aspect:

  1. At EUSIPCO I have a paper about a simple chirplet representation that can do better than standard spectral techniques at representing birdsong. Birdsong has lots of fast frequency modulation, yet the standard spectral approaches assume "local stationarity" - i.e. they assume that within a small-enough window, we can treat the signal as unchanging. My argument is that we're throwing away information at this point in the analysis chain, information that for birdsong at least is potentially very useful.

  2. At MML I have a paper about tracking multiple pitched vibrato sounds, using a technique called the PHD filter which has already been used quite a bit in radar and video tracking. The main point is that when we're trying to track multiple objects over time (and we don't know how many objects), it's suboptimal to just take a model that deals with one object and apply the model multiple times. You benefit from using a technique that "knows" there may be multiple things. The PHD filter is one such technique, and it lets you model things with a linear-gaussian evolution over time. So I applied it to vibrato sources, which don't have a fixed pitch but oscillate around. It seems (in a synthetic experiment) the PHD filter handles them quite nicely, and is able to pull out the "base" pitches as well as the vibrato extent automatically. The theoretical elegance of the filter is very nice, although there are some practical limitations which I'll mention in my talk.

  3. At CMMR I have a paper about estimating the arcs of expression in pianists' tempo modulations. The paper is with Elaine Chew, a new lecturer in our group who works a lot with classical piano performance. She has had students before working on the technical question of automatically identifying the "arcs" that we can see visually in expressive tempo modulation. I wanted to apply a Bayesian formulation to the problem, and I think it gives pretty nice results and a more intuitive way to specify the prior assumptions about scale.

So all of these are about machine learning applied to temporal evolution of sound, at different timescales. Hope to chat to some other researchers in this area over the summer!

Saturday 9th June 2012 | science | Permalink

Implementing the GM-PHD filter

I'm implementing the GM-PHD filter. (The what? The Gaussian mixture Probability Hypothesis Density filter, which is for tracking multiple objects.) Implementing it in python, which is nice, but I'm not completely clear if it's working as intended yet.

Here's a screenshot of progress so far. Look at the first four plots in this picture, which are:

  1. The true trajectory of two simulated objects moving in 1D over time.
  2. Observations received, with "clutter" and occasional missed detections.
  3. The "intensity" calculated by the GM-PHD filter. This is the core state variable of the filter's model.
  4. Filtered trajectories output from the PHD filter.

So what do you think? Good results?

Not sure. It's clearly got rid of lots of the clutter - good. In fact yes it's got rid of the majority of the noise, hooray hooray. But the clutter right close to the the targets is still there, seems a bit mucky, in a kind of way that suggests it's not going to be easy to clear that up.

And there's also a significant "cold start" problem - it takes up to about 20 frames for the filter to be convinced that there's anything there at all. That's no real surprise, since there's an underlying "birth" model which says that a trail could spring into being at any point, but there's no model for "pre-existing" targets. There's nothing in the PHD and GMPHD papers which I've read which even mentions this, let alone accounts for it - I'm pretty sure that we'll either need to initialise the state to account for this, or always do some kind of "warmup" before getting any results out of the filter. That's not great, especially when we might be tracking things that only have a short lifespan themselves.

One thing: this is a one-dimensional problem I'm testing it on. PHD filters are usually used for 2D or 3D problems - and maybe there needs to be enough entropy in the representation for clutter to be distinguished more clearly from signals. That would be a shame, since I'd like to use it on 1D things like spectrogram frames.

More tests needed. Any thoughts gratefully received too

Thursday 29th March 2012 | science | Permalink

Perceptually-modelled audio analysis

This week I went to a research workshop in Plymouth called Making Sense of Sounds. It was all based around an EU project which aims to improve the state of the art in auditory models (i.e. models of what happens imbetween our ear and our consciousness, to turn a physical sound into an auditory perception) and also use them to help computers and machines to understand sound.

I won't blog the whole thing but just a few notes here. There was a lot of research on the streaming paradigm, and it's quite amazing how it's still possible to discover new facts about human hearing using such a simple sound. Basically, the sound is usually something like "bip boop bip, bip boop bip, bip boop bip", and the clever bit is that we can either hear this as a single stream or as two segregated streams (a bip stream and a boop stream), depending on the relative pitches and durations. It's an example of "bistable perception", just like famous optical illusions such as the Necker cube or the faces/vase thing. With modern EEG and fMIR brain scanning, this streaming paradigm shows some interesting facts about how we hear sounds - for example, it seems that our auditory system does entertain both "versions" at some point, but this resolves to just one choice at some point below conscious perception.


I was interested by Maria Chait's talk on change detection, and in conversation afterwards she pointed us to some recent research - see this 2010 paper by Scholl et al - which shows that humans have neurons which are able to detect note offsets, even though it's very well established that in behaviour we're very bad at noticing them - i.e. we often can't tell what happened when a sound stops, but it's usually pretty noticeable when a sound starts!

Those findings aren't completely incompatible, of course. It's plausible that in human evolution, sudden sounds were more important than sudden silences, even though both are informative.


Maneesh Sahani talked about two of his students' work. The one that was new to me was Phillip Herrmann's thesis on pitch perception and was a really interesting approach - rather than using a spectral or autocorrelation method, they started from a generative model in which we assume there is some pitch rate generating an impulse train, and some impulse response convolved with it, and also some gaussian noise etc, then this goes into some auditory model before arriving at a representation which we have to make inferences about. They then did inference applying this model to audio signals. The point is not whether this is an appropriate model for most sounds, just whether this assumption gets you far enough to do pitch perception in similar ways as humans do (with some of the attendant peculiarities).

One particularly nice experiment they came up with is another kind of "bistable perception" experiment where you have a train of impulses separated by 2ms, and every second impulse is optionally attenuated by some amount. So if there's no attenuation, you have a 2ms impulse train; if there's full attentuation, you have a 4ms impulse train; somewhere imbetween, you're somewhere imbetween. If you play these sounds to humans, they can report ambiguous pitch perception, sometimes detecting the higher octave, sometimes the lower, and this Herrmann/Sahani model apparently replicates the human data in a pretty good way that is not reflected in autocorrelation models.

Oh, also, over a diverse dataset, they apparently found a really clear square-root correlation between fundamental frequency and spectral centroid. (In other parts of the literature, it's not clear whether or not the two are correlated.) I'd like to see the data for this one - as I mentioned to Maneesh, there might be reasons to expect some data to do this by design (e.g. professional singers' voices). The point for Herrmann/Sahani is to see if the correlation exists in the data that might have "trained" our perception, so I'm not sure if things like professional singers should be included or not.

Maneesh Sahani also said at the start of his talk that Helmholtz (in the 19th century) came up with this idea of "perception as inference" - but then the electrical/computational signal-processing paradigm came along and everyone treated perception as processing. The modern Bayesian tendency, and its use to model perception, is a return to this "perception as inference". Is there anything that wasn't originally invented by Helmholtz?


Also Tom Walters' demo of his AIMC real-time perceptual model in C++ was nice, and it's code I'd like to make use of some time.

My own contribution, a poster about using chirplets to analyse birdsong, led to some interesting conversations. At least one person was sure I should be using auditory models instead of chirplets - which, given the context, I should have expected :)

Thursday 23rd February 2012 | science | Permalink

The Impact agenda, and public engagement

I was at a meeting recently, going through research proposal documents, and I realised that the previous government's "impact agenda" might be having an unintended effect on public engagement:

One of the things that has happened in research in the past few years is that the government now demands that we now have to state what kind of "impact" our research will have. Now, the problem is that impact is notoriously and demonstrably unpredictable - we don't know if we're going to discover anything world-changing, until we actually try it, and even then we might not realise the impact for decades - but the previous government wanted to try and pin it down somehow.

So every proposal now (in the UK) has to have a two-page "Pathways to Impact" summary. If you're doing applied research it's pretty easy - you say things like "We're going to study the resilience of welded grommets under pressure, which means the grommet industry will produce more reliable grommets and there will be fewer grommet-related fatalities." In you're doing theoretical or basic research, in principle you still have a story to tell: you say something like "Our research will lead to a greater understanding of the number five, which is widely used in the natural sciences, industry and the financial sector. Future researchers will be able to build on these theoretical advances to develop new techniques for counting grommets or whatever."

So, in theory every research project has something they can say about this. (And they don't have to fill up the two pages, if they don't have much to say.) But that's not what happens.

Here's a very rough transcript of a conversation that went on in the meeting:

P: "Your proposal is good, Q, but there's not really anything about impact. The reviewers will have to rate you on impact so you need to say something here."

Q: "Oh blooming heck, but it's basic research, you can't really say what the impact is. I suppose I'll have to stick a schools talk in or something?"

R: "I know a couple of schools, I can arrange for you to do a talk, put that in."

Q: "Yeah OK."

Now I want to emphasise, this was not the end of the conversation. But I'm in favour of public engagement - perhaps a little more imagination is needed than just some generic schools talk, but it's interesting to see that this criterion is pushing people towards that little bit more public engagement.

Also: this is not a particularly unusual approach to filling in those impact pages. Impact is not supposed to be the tail that wags the dog, research excellence is supposed to be the number one criterion. But there are two whole pages which we have to use to say something about impact. And we know that the reviewers have got to read those pages, and rate us in terms of how strong or weak our pathways to impact are.

As I've said, impact is unpredictable. So what can you write, to make a reviewer say, "Yep, that's credible"? Your biggest impact might be to invent a whole new type of science, or to change the way we all think about the universe, but that won't happen for decades and it depends on a whole vague network of people taking your research and running with it. Can you talk about that? You could do, and that might be the truth about the likely impact of the research. But we know we'll get a bigger tick if we have something demonstrable that we can actually propose to do - even if it's not really connected with the research's biggest likely impact on society. A schools talk is a good thing to do, but is it the biggest impact your research will have on society in general? I hope not!

So, it happens quite often that people conflate public engagement with impact. A schools talk is not impact. An article in a newspaper is not impact. They might be tools that help spread research out of the university into the wider world, and they might faciliate impact, but they're not really the point of the hurdle that the government set for us.

Unfortunately, in science - unlike in politics - we formally review each others' work, and we can't hide behind wooly generalities. The strange thing is that regarding impact, the wooly generalities are the truth.

Tuesday 22nd November 2011 | science | Permalink

ISMIR 2011: the year of bigness

I'm blogging from the ISMIR 2011 conference, about music information retrieval. One of the interesting trends is how a lot of people are focusing on how to scale things up, to handle millions of audio files (or users, or scores) rather than just hundreds or thousands. Why? Well, in real-world applications it's often important: big music services like Spotify and iTunes have about 15 million tracks, Facebook has millions of users, etc. In ISMIR one of the stars of the show is the Million Song Dataset, just released, which should help many many researchers to develop and test on a big scale. Here I'm going to note some of the talks/posters I've seen with interesting approaches to scalability:

Brian McFee described a simple tweak to the kd-tree data structure called "spill tree" which improves approximate search. Basically, when you split the data in two you allow some of the data points to spill over and fall on both sides. Simple but apparently effective.

Dominik Schnitzer introduced a nice way to smooth out a search space and reduce the problem of hub-ness. One way to do it could be to use a minimum spanning tree, for example, but that involes a whole-dataset analysis so it might not scale well. In Dominik's approach, for each data point X you find an estimate of what he calls "mutual proximity": randomly sample 100 data points from your dataset and measure their distance to X, then fit a gaussian to those distances. Then to find the "mutual proximity" between two data points X and Y, you just evaluate X's gaussian at Y's location to get a kind of "probability of being a near neighbour". He also makes this a symmetric measure by combining the X->Y measure with the Y->X measure, but I'd imagine you don't always need to do that, depending on your purpose. The end result is a distance measure that pretty much eliminates hubs.

Shazam's music recognition algorithm, described in this 2006 paper, is one of the commercial success stories of scalable audio MIR. Sebastien Fenet tweaked it to be robust to pitch-shifting, essentially by using a log-frequency spectrogram and using delta-log-frequency rather than frequency in the fingerprints.

A small note from the presentation of the Million Song Dataset: apparently if you want a good online linear-predictor than is fast for large data, try out Vowpal Wabbit.

Also, Thierry mentioned that he was a fan of using Amazon's cloud storage/processing - if you store data with Amazon you can run MapReduce jobs over it easily, apparently. Mark Levy of last.fm is also a fan of MapReduce, having done a lot of work using Hadoop (Yahoo's implementation of MapReduce) for big data-crunching jobs.

Mikael Henaff presented a technique for learning a sparse spectrum-derived feature set, similar in spirit to KSVD. The thing I found interesting was how he then made a fast way of decomposing a new signal (once you've derived your feature basis from some training data). Ordinarily you'd have to do an optimisation - the dictionary is overcomplete so it can't be done as easily as an orthogonal transform. But you don't want to do that on a lot of data. Instead, he first trains a nonlinear projection which approximates that decomposition (it's a matrix rotation followed by a shrinkage nonlinearity, really simple mathematically). So you have to train that, but then when you want to analyse new data there's no optimisation needed, you just apply the simple transform.

There's been plenty of interesting stuff here at ISMIR that isn't about bigness, and it was good of Douglas Eck (of Google) to emphasise that there are still lots of interesting and important problems in MIR that don't need scalability and don't even benefit from it. But there are interesting developments in this area, hence this note.

Thursday 27th October 2011 | science | Permalink

Separating the repeating part out of a piece of music

Via the SuperCollider users list I heard about a nice trick for extracting the repeating part out of a recorded piece of music. Source-separation, vocal-extraction etc are massive topics which I won't go into right now, but suffice to say it's not easy. So I was interested to read this nice simple technique (scroll down to "REpeating Pattern Extraction Technique (REPET)") described in an ICASSP paper this year.

Basically it uses spectral subtraction and binary masking - two of the simplest "source separation" tricks you can do to a signal. In general they produce kinda rough results - they don't adapt the phase information at all, for a start, so they can give some smeary MP3ish artefacts. But in this case the authors have applied them to a task where they can produce decent results: here you don't have to try and separate all the instruments out, you just want to divide the recording into two, the repeaty bit and the non-repeaty bit.

If you read the ICASSP paper you'll find they describe it well, it's a nice readable paper. (However they do make the task a bit more complex than it needs to be: they do a load of calculations then take the log-spectrum near the end, whereas if they took the log-spectrum at the start the calculations would be a little simpler.) The basic idea is:

  • Find the tempo of your piece of music. Then you know how long one bar is going to be.
  • Chop the music into bar-long sections, and average their spectrograms. This averaged-spectrogram should in theory represent the repeated bit, with the varying bits getting mostly washed out.
  • Use spectral subtraction to subtract this average from each bar-long segment.
  • Then, for each spectral bin, if there is a significant amount of energy left over, you mark this as being a bin belonging to some non-repeating audio. Otherwise you mark it as belonging to repeating audio.
  • Then you go back to the spectrogram of the original song, and silence the bins you want to get rid of (either the repeating or nonrepeating ones).

From a theoretical point of view there are all sorts of quibbles you could come up with, for example that it might fall apart if a song has varying tempo. But for a fairly large range of tracks, this looks like it could give useful results.

So I decided to implement a real-time version in SuperCollider. I like real-time stuff (meaning you can work with audio as-it-happens rather than just a fixed recording), but the above approach is non-realtime: it takes the average spectrogram over the whole track, for example, so you can't calculate the first ten seconds until you've analysed the whole thing.

What to do? I replaced the usual averaging process with what I call recursive average (can't find a nice online explanation of that right now, hm). You still need to know the tempo, but given the tempo you then have a real-time estimate of the average spectrum caused by the repeating bit.

One interesting thing is that when a new beat kicks in, it's not immediately detected as a loop - so usually, it plays through once and then gets suppressed. You might think of this as a system to separate "novelty" from "boring loops"...?

I've published this for SuperCollider as a UGen called PV_ExtractRepeat (available in the sc3-plugins collection).

Here's an example of it in action, applied to "Rehab" by Amy Winehouse. As you listen, notice a couple of things: (1) during the first bar there is poor separation, then it gets better; (2) the repeating-only bit (the rhythm section) sounds pretty good, could easily be used as a karaoke-version, while the non-repeating bit (mainly the vocals) sounds pretty messy...

Rehab - just the house by danmisc

Rehab - just the wine by danmisc

So, not perfect, but potentially useful, maybe for karaoke or maybe for further audio analysis. Thanks to Zafar Rafii and Bryan Pardo for publishing the method - note that their examples sound better than my real-time example here (real-time often means compromises in analysis).

Wednesday 16th March 2011 | science | Permalink

My PhD thesis now online

I'm glad to say the thesis corrections have been approved so my PhD thesis is now in its finished form - available here:

The title is "Making music through real-time voice timbre analysis: machine learning and timbral control". (Tip for future PhDs, try to choose a title that you can say in one breath...)

I'm really grateful to all the fab people in C4DM - I've got so much from being in a research environment with so many people knowledgeable about such a variety of cool things - and, well, I don't want to rewrite the whole acknowledgments here (they're on page 3) but all the people who took part in experiments or just chatted about research. (Including the folks at humanbeatbox.com)

The thesis is available under creative commons. And, because I uploaded it to archive.org they also seem to have converted it into some crazy ebook formats, so you can presumably read a garbled version of it on your kindle if you like ;) probably best to use the original PDF if possible, though (the TeX source is also included).

Saturday 7th August 2010 | science | Permalink

SMC 2010 conference notes

I've just been at SMC2010, the Sound and Music Computing conference. It's the first time I've been so one question I had was, what differentiates it from other conferences in this research area like NIME, DAFX, ISMIR, ICMC? What's its specialist subject? The answer is that it deliberately tries not to over-specialise, they keep the topic broad to encourage cross-disciplinary thinking, and there's a good strong representation of young researchers so it's a good place for fresh ideas and making new connections. My paper about timbre remapping came across pretty well I think.

One reason I was keen to go to this conference was that it was hosted by UPF's Music Technology Group in Barcelona, because that group is the main place where people have done research on very similar lines as my PhD topic of beatbox-based control. It was great to meet Jordi Janer whose PhD was about singing-based control, and Marco Marchini and Hendrik Purwins who presented a poster about a kind of rhythmic beatboxing equivalent to the continuator - give it a piece of rhythmic audio and it will try to continue by chopping up the sound and outputting patterns in (hopefully) the same style. The most interesting part of their work is the automatic approach to clustering, where they hierarchically cluster all the sound events, and then let the system choose the appropriate clustering level (i.e. how many clusters to lump the events into) at playback time, by judging how 'informative' the markov-model resynthesis is at each level of clumpiness.

Also interesting was Ho-Hsiang Wu and Juan Bello's poster about representing the musical structure of a song. We all know that many songs have repetition in them, whether it's verse-chorus-verse-chorus or something else - and we can analyse this automatically from the audio, for example by detecting repeated sub-sequences of chord patterns or timbre. Their contribution is to visualise this detected repetition using 'arc plots', pretty little monochrome rainbows that reminded me of the kind of information aesthetics practised by Information Is Beautiful. The end result is that pop songs create little plots which generally all look quite similar but with little shape differences that you could spot by eye, whereas I imagine classical music pieces would probably each have their own visual signature that could be quite different. Could be a nice way to get an instant visual impression of the musical structure of a piece of recorded music.

The keynote talk by Ricard Sole was thought-provoking, discussing the theory of complex networks, with some results of his created by applying this theory to languages, software, and other things. Sound and music wasn't mentioned, but I know it's useful stuff that was food for thought for many people. (In our group we have some researchers who have looked at this kind of thing already - when you consider the network of MySpace bands & friends, for example, that's a complex network where issues of small-world-ness, hubs, etc. Which reminds me, I wonder how Kurt is getting on with his thesis... :)

In fact some of the research presented at SMC was grappling with these issues too, such as the work by Martin Gasser et al showing that the problem of hubs in music similarity (i.e. songs that keep getting returned as good similarity matches to various input songs, even if they don't sound that similar) may be affected by the "homogeneity" of the audio in the music database.

The concert programme was packed full of things: lots of soundscape-based work, and more generally electroacoustic stuff. My favourites out of those were Impulsus I by Lina Bativa (an audio-visual piece which had a great narrative energy despite being really abstract), and Juan Parra Cancino's reacTable performance which I mentioned in my post about the reacTable.

But one of the things I was most grateful for was the deliberate non-art-music session. Electroacoustic stuff is all very well, but I can't generally cope with so much of it packed into a week and after all, this is a broad conference where many of the researchers are working on pop music, techno, breakbeats, and stuff like that. As the conference chair (Xavier Serra) said, it's actually quite difficult to get the non-art-music in the conference, since research conferences aren't usually their scene and most of the good examples of techno-enhanced popular music are quite happily making music in front of normal crowds... So, many of us were glad to spend an hour listening to Japanese pop made using Vocaloid, and a dance set made using Loopmash. (Sergi Jorda also told me he had hoped to get a dance music set in the reacTable concert, but the performer wasn't available.)

This is something that we need to work on as a research community - the SMC hosts did well, assisted by the fact that some of their own technology has gone directly and quite notably into music tech used by producers - but it's one of those things that's going to need a constant bit of extra effort to try and encourage that kind of thing into these conferences.

Monday 26th July 2010 | science | Permalink

Automatic birdsong analysis

I've started my first project after my PhD, a small feasibility study into automatic birdsong analysis.

The picture visualises a few seconds of a skylark recording by Dr Elodie Briefer (in QMUL's School of Biological and Chemical Sciences), from her PhD research into the structure of skylark song.

What we're doing is looking at the potential for automatically analysing birdsong signals, which could mean picking them out of recordings, identifying species, identifying individual "syllables" in the song... who knows.

There are already a fair few published research papers about automatic birdsong analysis. I'm looking at the state of the art to determine the scope for future work, such as applying machine learning techniques we've developed in our group, or particular forms of signal analysis such as adaptive transforms.

In my PhD I was looking a lot at voice and music. Birdsong has interesting similarities to both music and spoken language - plus differences of course. So watch this space. And of course get in touch if you're interested.

Monday 5th July 2010 | science | Permalink

Unpredictable impact

There's a big change happening in UK science+engineering at the moment, and it goes by the name of Impact. What does it mean? When we do science we often do it just to find new things out, yet whether we intend it or not one of the great things about science is that it actually makes important changes to the world outside our research group. Impact is formally defined as being that effect that we have - on business and economy, on health, on public policy, on culture and the arts. There are billions of ways that impact spreads.

This has always been a very unpredictable thing and pretty hard to measure, so the government now has created a formal process for trying to account for the types of impact that we get out of research - and even further, to think hard about impact when deciding what research to fund. In a lot of cases the predicted impact will now account for up to 25% of the considerations in rating academic departments or allocating funding.

Sounds reasonable? Well many scientists are against it - and it's not because they don't like having to justify themselves (they already have to do that when they write grant applications etc), but because the real impact of science often happens in surprising ways, sometimes many years down the line. Take DNA fingerprinting for example. The scientists who came up with it were working with DNA, trying to measure various things, but they had no idea that the best thing they could do was make an unruly collection of DNA form patterns on a sheet of film - they discovered it by accident. And now it's an important part of many of the most serious court cases we have. Think of all the people who were convicted or freed based on DNA evidence - that's some serious impact there.

There are lots more examples of unpredictable impact - such as:

  • Email, when it was invented, was only able to send messages to people using the same mainframe. No-one predicted that tweaking it to send messages around the world would make it one of the most important communication tools we have.
  • Gregor Mendel - a lone priest planting peas in a garden, trying out different cross-breeds and making careful notes. It wasn't until years after his death that biologists realised how Mendel's laws of inheritance fit with Darwinian evolution, and formed the foundation of modern biology, with massive impact throughout society.
  • Texting. A phone is for phoning, right? Text messages were never planned to be the mainstay of what mobile phones were about, just a way to get a message through when you couldn't talk. But now many people text more than they call.
  • Liquid crystal displays eventually arose from the basically curiosity-driven research of Friedrich Reinitzer looking at the chemical cholesteryl benzoate. Now it's used in TVs, phones, watches...
  • Fibre optics was demonstrated as a curiosity and a demonstration of physical principles in the 19th century; but it wasn't until way into the 20th century that it became important for data transmission, for example in phone networks.

And the opposite is also true - history is littered with examples of discoveries/inventions that were widely expected to change the world, but didn't:

  • Video messaging: the phone companies seem to have thought that if we liked text messaging we were going to love video messaging. No.
  • Artificial intelligence: In the 1960s the artificial intelligence research community was an incredibly optimistic one, with leading lights such as Marvin Minsky basically thinking they would be able to recreate the intelligence of a whole human brain within a few years, and then we'd all be having conversations with robot pals. That optimism came crashing down. Sure, you can now buy robot pals, and sure, we're still researching artificial intelligence and indeed using it in various applications, but it hasn't yet been the revolutionary impact it was going to be.
  • Hovercrafts and maglev: these have become the clich�s of misplaced futurology. After their invention they seemed to have been poised to take over the world - but no, we're still mostly using the good old wheel to get around.

So with all this evidence, it's not surprising that scientists are worried about this new approach of trying to plan your impact - much of the curiosity-driven stuff that has real impact could well get sidelined in favour of things which might be a bit less imaginitive but which seem like they'll definitely make some public or business connection.

OK fine - seems like there's some misguided bureaucracy coming down from government, and we have to try and make sure it doesn't end up stifling what it's supposed to be helping. But there's a bigger question that maybe we can think about. As I've said, "impact" is very hard to pin down or predict, and we don't really know how predictable it could or should be. But in many grant applications and suchlike, scientists are now writing down their predictions about the impact they'll have. Are those predictions useful data? Could we use "impact plans" as a great big study about whether impact can be predictable?

We could for example wait for five years, then look back at the pile of impact plans and ask, how many of those predictions (the ones which got funded, at least) came true? What percentage? What proportion of the observable scientific+engineering impact made over the next five years will have been predicted, in writing, in advance?

It would still leave a million questions unanswered, especially about unidentifiable impact (subtle things which are hard to count), long-term impact, and really it would still be a very reductive way to think about how science affects our society. But I wonder... would that make all these "impact statements" worth their while?


POSTSCRIPT: Some further examples of unexpected impact, collected after this article was written:

  • Nice unpredictable-impact example from Martin Rees in his Reith lecture (Tuesday 1st June 2010), of the laser: its possibility was there in Einstein's ideas at the start of the 20th century; then there was a 40yr gap before it was actually conceived and made to happen; and the inventors of the laser would never have predicted DVD players and laser eye surgery 40yrs after that...
  • The bizarre and useful technique of optogenetics was enabled after researchers studied light-sensitivity in "pond scum" microbes.
  • "DNA restriction enzymes, once the province of obscure microbiological investigation, ultimately enabled the entire recombinant DNA revolution." (Quoted from this Science editorial 2013)
  • "Measurement of the ratios of heavy and light isotopes of oxygen, once a limited area of geochemistry, eventually allowed the interpretation of prior climate change." (Quoted from this Science editorial 2013)
Friday 4th December 2009 | science | Permalink

Tree recursion, python/octave/matlab/sc3, informal benchmark

I'm writing a tree data structure as part of my research. I'm not going to describe the algorithm in detail, but it takes a set of data points and repeatedly chops them into two groups so that you can divide a dataset up into spatial subgroups.

Anyway, my first implementation (in SuperCollider 3) was running fairly slowly so I tried it in three other languages, to see which would be most practical for my situation.

It's an informal kind of benchmark - informal cos I'm not going to show you the code, and I haven't run the tests dozens of times, etc. (Some of the tests I ran just once, since they took so long.) The datasets consisted of artificially-generated 3D points sampled from a mixture of a cubic and a toroidal distribution. In the following graph, lower results (shorter times) are better:

The results show a couple of interesting things. SuperCollider was my starting point and it was never developed for large data-crunching tasks so I'm not surprised that it becomes the worst performer once we get to large datasets, although it actually doesn't do too badly. To be ten times as slow as Python or Matlab on big datasets is not embarrassing when both of those have had so many more person-hours of development effort specifically for big data crunching.

The comparison against Octave is illuminating. Octave was originally my open-source Matlab alternative of choice, but I've come to feel like it has all the drawbacks of Matlab (mainly the godawful design of the Matlab language) and none of the advantages (under-the-hood optimisation tricks, great plotting). Here I was running exactly the same code in Matlab (7.4) and Octave (3.0.5). I expected Octave to be roughly competitive, since this branching recursive code is quite difficult to auto-optimise, but Matlab generally handles it something like ten times as fast. So here I find another sign that Octave isn't quite there.

I now know, of course, that Python + numpy is the open-source Matlab alternative of choice. The language design is much better, and numpy (the module that provides all the matrix-crunching tools) has undergone lots of development effort and become better and better. And this (informal!) benchmark shows python (2.5.4, with numpy 1.3.0) performing just as well as Matlab on the large data.

(There is one thing that Python definitely lacks compared to Matlab: decent well-integrated 3D plotting. matplotlib doesn't have it except in old deprecated versions; python's gnuplot interface is poorly developed; other python plotting libs have drawbacks such as non-interactivity. I've mentioned this before.)

So I'll probably be using my Python implementation of the tree data structure. It's right up there in terms of speed, plus the code is conceptually cleaner than the Matlab version, so it'll easier to maintain, and easier for others to grok, so it's better for reproducible research. Remember, this benchmark was only informal so do your own tests if you care about this kind of thing...

Tuesday 10th November 2009 | science | Permalink

Are probiotics real, or meaningless?

Today Danone was forced to withdraw an advert for probiotic yoghurt because the scientific evidence didn't support it. The company claimed it boosted children's "defences" and cited various research studies to support it. The Advertising Standards Authority read the studies and found that although the studies were good, most of them weren't about the children in question, some of them used the wrong dosage of yoghurt or an inappropriate test group, and overall the results were inconsistent and didn't particularly support the claim.

I'm interested in this because probiotics is one of those weird new turns in commercialism in which you can't quite tell if there's real science there, or if there is nothing but an actor on screen grinning and rubbing her belly, saying "I trust good bacteria" over and over again.

I've heard some scientists saying that probiotics have been shown to be good for ill people recovering in hospital (whose natural gut flora might need "topping up") but that the evidence isn't there yet for any point at all in healthy people gulping down these yoghurts once a day as if they were your daily medication.

There are moves afoot in the EU which sound to me like a good idea. In 2006 a new EU law came in, stipulating that all medical-sounding marketing claims must be verified, and they now have a committee which looks at the evidence and pronounces yes or no on them. The claims for various yoghurt drinks, as well as all kinds of other products, has been submitted to this committee. They made the judgment that general probiotic claims aren't supported by evidence, although they'll be looking at more specific manufacturers' claims later.

The change hasn't actually come into force yet, but when it does, hopefully it won't be down to us to peer at the TV advert and think to ourselves, "Is that science or is that bullshit?" - it's only reasonable that we shouldn't have to do that, and companies should have to prove their stuff works before they parade it around in scientific clothing.

Wednesday 14th October 2009 | science | Permalink

InterSpeech09 conference: emotional speech

The InterSpeech conference was in Brighton this year - now, my research is all about "non-speech" voice (e.g. beatboxing) but I took the opportunity to go down and see what the speech folks were up to.

Automatic speech recognition is the "traditional" problem for computers+speech, but there's been a tendency recently to try and automatically recognise the emotional content too. This year was the first year of the InterSpeech "emotion challenge", in which researchers were challenged to automatically detect a range of emotions in a dataset of audio - recorded from schoolchildren who were trying to guide an Aibo round a track, apparently with emotive consequences...

I was surprised that many of the approaches to emotion recognition were so similar to the standard speech-recognition model: take MFCCs plus maybe some other measurements, model them with GMMs, classify the results (maybe with a HMM), so far so 1960s. The spectral measures (MFCCs) were typically augmented with prosodic measures such as the amount of pauses in a sentence, or measures about how the speaking pitch varied, and in quite a few of the papers it seemed that these prosodic features actually perform pretty strongly, often beating the spectral features. But I was surprised they were still relatively simple measures - no intricate prosody-specific models of temporal variation, for example, most seemed to use the average+minimum+maximum pitch. Combining the two types of data (spectral plus prosodic) was often the best but didn't seem to give a dramatic uplift vs using just one type. I suspect that more specific models could push the prosodic side a long way in the next few years. The winner of the "emotion challenge" was a kind of hand-designed decision-tree approach, pretty nice because they'd designed the classifier from theoretical motivations.

One thing about "emotion" is the same problem as for "timbre" (the musical attribute which I deal with in my research): it's still very hard to pin down exactly what you mean by it, specifically whether it's a continuous attribute or a set of categories. It seems that many datasets are labelled categorically - people mark a given word or sentence as being neutral/scared/happy/anxious/etc. But increasingly people are focusing on the continuous approach where emotion is treated as a 3D space, where one dimension is "arousal" (varying from calm to excited), one is "valence" (bad to good), and one is "potency" (dominated to dominant). If you combine those 3 dimensions variously you can cover the standard emotions pretty well (excitement, depression, boredom, anger, etc etc). This 3D approach gets around various cultural issues in the exact meaning of the labels, allows for some more refined analysis, and I believe it comes from a pretty well-validated area in psychology, although I don't know the literature on that.

Oh and there was a nice talk about automatically analysing and detecting laughter. Laughter is characterised by the bouts of vocal effort we push in, via the lungs and the tension in the vocal folds. That distinguishes it quite well from ordinary speech. So what these people did was a nice simple technique to estimate the glottal pulses (the moments of energy that come from our vocal folds), and to spot when these became more effortful and more frequent. You can't use an ordinary pitch tracker because each laugh is far too brief for a standard tracker to latch on to the quick pitch changes, but their custom analysis (plus a very basic classifier) seemed able to detect moments of laughter in TV talk shows etc. The analysis method (the zero-frequency filter) is technically very simple and potentially a useful trick...

Saturday 12th September 2009 | science | Permalink

Does processed meat cause cancer?

It's been on the radio news this morning, so it's timely that David Colquhoun has written this excellent article about diet and health. He goes through what the scientific evidence can and can't say about questions such as "Does eating processed food cause cancer?" - it's a long article but really clears things up.

Monday 17th August 2009 | science | Permalink

Vitamin supplements: avoid them?

This caught my eye in the paper this weekend: someone wrote in to the doctor's column asking if they should take vitamin A and E supplements to prevent cancer and heart disease, and the doctor's response was:

Several long-term and large trials have shown that taking extra vitamins A (such as betacarotene) and E does not reduce heart attack risk. In fact, some of the trials were stopped because there were more deaths in the vitamin groups than in those given placebos. As long ago as 14 June 2003 the Lancet reviewed the evidence and strongly discouraged any more research into the long-term use of such vitamin supplements. We get enough for our needs from anormal diet.

Blimey! I already knew that vitamin supplements were pointless (for healthy people) as long as you eat right. But do they actually do harm?

The doctor was referring to this 2004 review in the Lancet, which is a pretty good source. A web search also finds a 2008 Cochrane review of the evidence (another good source, but it's essentially an update of the earlier paper), which concludes:

We found no evidence to support antioxidant supplements for primary or secondary prevention. Vitamin A, beta-carotene, and vitamin E may increase mortality. Future randomised trials could evaluate the potential effects of vitamin C and selenium for primary and secondary prevention. Such trials should be closely monitored for potential harmful effects. Antioxidant supplements need to be considered medicinal products and should undergo sufficient evaluation before marketing.

This is pretty scary. According to these authors, there's no evidence that these supplements prevent cancer but there are hints that they might increase mortality? Such meta-analyses, when done properly, are very good ways to summarise the current state of research, but they're not set in stone - for example, when that review was published in the Lancet, the next issue featured some responses from some of the studies involved, who took issue with the general conclusion. But then, if the possibility of a negative effect looms strongly enough out of a systematic review like this, then it certainly needs to be considered.

Even this year more evidence arrives: this 2009 study finds that supplements of vitamins C or E or beta-carotene have no statistically significant effect on mortality (they don't increase or decrease the risk of death).

A couple of things to note:

  • This isn't about all vitamins, just about the vitamins mentioned above. As one correspondence notes, most people don't get enough Vitamin D, so maybe it's still worth taking Vitamin D supplements? (I haven't looked up any evidence about that yet.)
  • This is about vitamin supplements, not about vitamins in general. Fresh fruit and veg is a much better source of these vitamins in my opinion, and the evidence would seem to bear it out: here's a 2003 review which says, "A great deal of epidemiologic evidence has indicated that fruits and vegetables are protective against numerous forms of cancer." And here's a 2005 review which says a similar thing, and considers reasons why fruit and veg might be better than supplements.
Tuesday 28th July 2009 | science | Permalink

How does a PhD affect your salary?

In the lab we're chatting about what effect a PhD has on your career and your earning potential. This article is slightly old (2001) but it has some solid figures which are interesting:

Seems that a PhD in an electrical-engineering discipline (the closest match to ours) could raise your salary by around 8 or 9 percent.

Of course the economic car-crash puts a lot of things in question. But I'm glad at least that a PhD doesn't on average push your salary down, which some people say (and maybe it's true for some disciplines).

Wednesday 22nd April 2009 | science | Permalink

Distance analysis methods: Multidimensional Scaling and SplitsTree try to unravel the Tube map

In scientific research, one of the things you sometimes need to do is take a set of distance measurements (e.g. "it's 5 metres from A to B, 4 metres from A to C, and 3 metres from B to C") and try to reconstruct the actual spatial layout underlying that data.

So how to do it? Well one approach is Multidimensional Scaling (MDS) and it's been known for a few decades in timbre research. It assumes that the data exist in a Euclidean space (a pretty straightforward space like ordinary 3D space we're used to) and arranges the points in a layout that gives the least disagreement with the distance measurements. So if we have a set of musical timbre judgments (e.g. "a bassoon sounds quite like an oboe, but not much like a violin") we can try and force those objects into a spatial arrangement that suits their relationships, and then view the resulting map.

But there's a problem. Who in the world said that audio timbre behaved like a standard Euclidean space? Does it depend on context? (Yes.) Is the difference between A and B always the same as the difference between B and A? Does timbre behave more like categories (e.g. woody vs metallic vs watery) than like a space?

That's a big problem and there's no clear solution. I saw a talk by Ashley Burgoyne at ICMC 2007 which suggested some modifications to MDS to help account for the weirdness of timbre-space. Some of it makes intuitive sense: e.g. the use of "specificities" builds in the idea that one data-point may be more unique than it should be, having its own special distance to cover the fact that a trumpet sounds uniquely different from everything else. And he argued that the nonlinear versions coped better with the evidence about timbre judgments.

Then I heard about another completely different approach. Geneticists have developed rather clever ways of analysing the genes of different creatures, to produce "genetic distance" measures and then use those to reconstruct what the evolutionary tree could have been. The maths can be applied to any set of distance measurements (aha!) and creates a tree that best represents them - the "tree" is actually a kind of space, not the same as Euclidean space.

For an introduction to the maths involved, see Metric Spaces in Pure and Applied Mathematics.

I needed to get my head around how this approach might work, and whether it might be useful. So I decided to apply it to a weird space in which distance measurements might not correspond to actual spatial distances... the Tube map.

If you've been on the Tube you'll know that some journeys are longer than they should do, and the durations don't actually match up with the geographic distances they take you. You'll also know that the Tube map itself is highly nonlinear, the geographical layout is warped to make it neat and easy to read.

So I took this section of the Tube map:

and from the web I found two different sorts of data:

  1. how long it takes (in minutes) to walk from one station to another, overground;
  2. how long it takes to get from one station to another by tube.

Now the first set of data should be "more Euclidean" since walking is basically going in a straight line except for the buildings in the way; while the tube timings should be weirder because you're strongly constrained, there's only a few pipes you can go down and they don't always connect up in all the obvious ways.

So when you feed the walking-times into MDS you get this (I've painted the tube-lines back onto the map to make things more obvious):

Not bad eh? The arrangement is actually quite a lot like the real-world layout of the tube stations.

And here's what happened when the same walking-times were fed into SplitsTree:

Yes, it kind of works, except that Russell Square pokes out a bit weirdly, I think due to the algorithm's requirement that the data points sit at the edge of the graph. The SplitsTree representation is almost-but-not-quite happy to represent the data in 2D, shown by the patchwork of almost-rectangles.

Here's where the differences really show up though: the tube-timing data. The walking-time data was "easy"...

Tube-timing data after MDS:

Tube-timing data after SplitsTree:

Note that both algorithms push the circle line (the yellow line) away from the others, out towards the top-right of the space. That's because the circle line, although it crosses over the others, doesn't have as many intersections as it might do (it doesn't have a stop at Euston or Warren Street, for example). Both algorithms spot that Kings Cross is a hub in this network (meaning it's easy to get to most of these stops from Kings Cross), placing it right at the heart of the layout. More generally, neither algorithm reconstructs the geographical layout of the stations, simply because the time it takes to get from A to B isn't so much defined by geography but by the peculiarities of London Underground.

The SplitsTree representation seems here to use a lot of 3D boxes, and there are some convoluted goings-on inside the way it tries to rationalise all the distances.

Notice also that on the SplitsTree diagram, most stations have their own little spike to live on. These are similar to the "specificities" I mentioned earlier - each tube station takes that little bit of extra time because of the time needed to get up and down the escalators (or whatever). For the Piccadilly (dark blue) line, SplitsTree seems to suggest that the majority of the time taken is in getting up and down and the actual journey between stations is pretty quick, which I think pretty much reflects reality.

I did all this in order to try and grok the tree reconstruction algorithms. Not sure if I've got there yet, but this was definitely helpful...

Wednesday 11th February 2009 | science | Permalink

10 new PhD places in Media and Arts Technology

Our research group has 10 new fully-funded PhD places in Media and Arts Technology thanks to a big grant we've been awarded. The places include working with an industrial partner such as last.fm, the BBC, or Sony. If you know anyone who might be into that, let them know...

Tuesday 10th February 2009 | science | Permalink

Chaos theory is like biology used to be

Looking through the International Journal of Bifurcation and Chaos, the thing that strikes me is that chaos theory seems to be at the same kind of point that biology was at, before Darwin's work gave it a structure and an explanation. In the 19th century biologists would publish articles describing new species they'd found, saying it's a bit like this one, a bit like that one, but without evolution and genetics you can't really say much more than that - and you get the same feeling from modern chaos papers: look, I've found a new chaotic attractor, it's a double-scroll, it makes patterns like this.

There are all sorts of ways of categorising chaotic systems, characterising their general surface behaviour, even controlling them, but it looks like nothing really gets to the heart of what's going on. Is the study of chaotic systems waiting for some big explanation?

Wednesday 4th February 2009 | science | Permalink

My work in a BBC radio programme

The BBC reported on the "Augmented Instruments" concert that Jean-Baptiste Thiebaut organised a couple of weeks ago. As part of the feature, I gave a quick demo of my beatboxing synthesiser interface... Check out the podcast of the radio programme:

Tuesday 23rd September 2008 | science | Permalink

Some notes from DAFx08

Just returning from DAFX 2008 in Espoo, Finland, which was a good do. My first visit to DAFX - it's a smaller and friendlier conference than some others I've been to, a nice size (about 120 people). Met up with lots of good digital audio people, some new, some old. Some notes about a few topics that came up:

  • Vesa Valimaki's digital sound synthesis tutorial was good, including some tips about low-cost synth techniques ("Differentiated Parabolic Wave") coming from his lab, new to me. Similarly Ville Pulkki's spatial sound tutorial and demo, featuring the DirAC technique which seemed to give some nice sonic results.
  • Our lab was well-represented, and it was nice that Anssi Klapuri picked up on Becky Stewart's spatial music navigation ideas in his keynote. My talk on voice timbre went fine too, despite the interruption of an automatic blackboard...
  • The keynote by Hyri Huopaniemi (of Nokia) didnt have as much news as I was hoping, but it was nice to see a bit about how the Princeton group's mobile-phone synth system is put together, a python interface onto a C++ synthesis core.
  • Naofumi Aoki's poster on bandwidth extension of mobile phone audio was interesting, although not specifically for the bandwidth extension but for the steganography trick used to embed metadata into audio. This means you can do fancy things with mobile phone audio without having to change the way the worldwide phone system works...
  • There were quite a few good papers about guitar synthesis and guitar amp emulation, etc. Worth mentioning is Fredrik Eckerholm's guitar synth, just because to my ears it sounded very nice and had a lot of features (e.g. pickup placement, pick parameters).
  • Jari Kleimola's sound synthesis trick - essentially XOR on audio - caught a few people's attention, making some quite nice sounds despite its simplicity.
  • Damian Murphy's results on the quality of different DWM reverb techniques were interesting, although it's not my field so I can't judge it in detail.
  • Was nice to see spectutils which is a nice set of spectrogram plotting tools for GNU Octave. Should be useful.

170420081996 The conference banquet was v good too, good food and in a really nicely-architected building called Dipoli. Also had a good time in and around Helsinki but I've documented that elsewhere.

Saturday 6th September 2008 | science | Permalink

Beatboxing with a very different voice

Someone has written a very nice popular-science-type article... about me :)

Friday 27th June 2008 | science | Permalink

My reading list: the past 18 months

I decided to make a public archive of my Bibtex file - i.e. almost everything I've read, or not read, in my PhD so far.

This bibliography might be useful to people interested in sound/music technology, vocal timbre, real-time audio processing, etc.

The general angle of my research topic is summarised on my QMUL homepage

Wednesday 11th June 2008 | science | Permalink

Laryngographs of my beatboxing

So after I beatboxed for the scientists they've sent me some of the output from the laryngograph tests. Here it is!

First of all here's me doing a kick-drum-plus-bass sound:

  • laryng-kick-drum-plus-bass.mp3 WARNING: this is NOT a normal recording. On the LEFT channel you get the normal recording from a microphone, and on the RIGHT channel you get the direct output from the laryngograph - essentially, you get to listen to what my larynx is doing itself, without any of the complicated stuff that happens afterwards (in the throat, lips, tongue). Use your computer's left-right balance controls to choose what to listen to.

Here's a picture of that same clip:

Laryngogram of kick-plus-bass

In that picture the audio recording is the blue "Sp" line in the middle, and the larynx trace is the green "Lx" just below it - the signal goes up when vocal cords close, goes down when they open.

Towards the end of the clip my larynx is opening and closing normally, a regular opening-and-closing just like in normal speech. But towards the beginning it's a bit more chaotic than that, and it almost looks like there are two different frequencies competing to take over. I'm not entirely sure what this implies, but the researchers pointed that feature out, and maybe it's connected to the sound that's produced somehow.

OK, now here's a bit of "vocal scratching":

  • laryng-vocal-scratching.mp3 WARNING: this is NOT a normal recording either. On the LEFT channel you get the normal recording from a microphone, and on the RIGHT channel you get the direct output from the laryngograph.

Here's a picture of that same clip:

Laryngogram of vocal scratching

The main thing they were looking at on the scratching was the very fast pitch changes - look at the lowest panel and the green "Fx" line, which is the fundamental frequency. It changes by up to one-and-a-half octaves in 150 milliseconds, which apparently is ridiculously fast. Now I'm not the best vocal-scratcher in the world, so I bet that it goes even faster than that for others...

Thursday 13th March 2008 | science | Permalink

They put a CAMERA up my NOSE

And it was all in the name of science. I volunteered for an experiment which wanted to look at beatboxer's voice-boxes while they were beatboxing, so I went and let someone put a camera up my nose (a nasal endoscopy). This was also being filmed for a Science Museum beatboxing project, so as well as the actual scientists there was a one-woman film crew plus a Science Museum person co-ordinating the thing and handing me the SM58 so I could bust some beats in the little clinic room.

I couldn't see the screen so I wasn't sure what my larynx was looking like but I dropped some of the usual beatbox stuff (some old-school hip-hop ones, a slightly poor DnB one, a quick rendition of If Your Mother Only Knew) and they seemed interested in what was happening. They'll take a while to do a proper analysis of the results but apparently there's a lot of muscular activity happening around and above the larynx while I'm doing kicks and snares and suchlike.

Some voice specialists are worried that beatboxing is bad for your voice so it was good to know that, after 7 years of beatboxing, I don't seem to have anything weird or wrong with my vocal folds, I'm not doing myself any damage.

One of the sounds that worries specialists is vocal scratching, so I gave them a bit of that. They confirmed that it involves a lot of constriction to produce that sound, and they also confirmed that there are lots of really fast pitch changes (one-and-a-half octaves in 150 milliseconds!). Whether that means it is bad for you I'm not sure. I don't actually do much vocal scratching myself.

There'll be more sessions, and at some point there'll be a video online, but that's all for now. I have a printed-out photo of my larynx but you don't want to see that ;)

There were also some tests with a laryngograph, which showed some of the controlled-weirdness involved in beatboxing, and some interesting discussion about whether super-deep bass tones were bad for you or not. The "received wisdom" is that they're dangerous since they involve your "false vocal folds" pushing down on your real vocal folds, but some researchers have evidence that if you do it right, that's not what's happening, instead your false vocal folds are basically flapping on their own. Watch this YouTube video on "Extreme vocal effects" to see what's happening when singers make deep growly sounds...

Tuesday 11th March 2008 | science | Permalink

Echinacea: Science says

There was an advert on the tube claiming echinacea could reduce my chance of developing a cold by 65%. Blimey, a big claim. So I went and found the source of the cliam, and a couple of other review papers. My summary of the research is this:

  • Although it's hard to be certain (partly because there are so many different sorts of echinacea plant and different ways to prepare it), it does look like echinacea helps to shorten the duration of a cold and make it less severe. It might also prevent a cold happening in the first place, but that's less clear. The most likely useful type of echinacea is echinacea purpurea.

There are all sorts of caveats on this summary. Firstly it's not recommended for children, or for people with immune problems such as arthritis or HIV, or people who might have an allergic reaction. Secondly we basically don't know how it might work (it contains a few chemicals that probably interact with the immune system... but in what way?). Thirdly we need more big studies before we can be sure about the effect on outcomes - so the picture might change, might even change dramatically, as more science gets done.

But I want to emphasise: in terms of real-life evidence, echinacea has much better evidence than homeopathy, or than other herbs or other such stuff you might find in that same section at the chemist's.

My main sources for all this are two recent research summaries, a meta-analysis published in a Lancet journal and a Cochrane systematic review.

I noticed that some science bloggers tried a little bit to poo-poo the meta-analysis, and to be blunt I suspect that's because it finds quite decisively in favour of echinacea. (None of my favourite science bloggers had this prejudice: David Colquhoun and Ben Goldacre are my favourites by the way.)

I do personally have an instinctive scepticism of complementary medicines because of the way things often try to side-step proper evaluation while at the same time giving themselves a white-coated pseudo-medical image. But in this case I'm happy to say that both the reviews find generally that there is a positive effect on cold from (some) echinacea preparations.

Friday 30th November 2007 | science | Permalink

A beatbox experiment

After a good few months of working on my PhD I'm finally ready to get some people to use my stuff and see what they make of it.

If you're near London and you're a beatboxer check this out, I'm recruiting for a beatbox experiment

Thursday 29th November 2007 | science | Permalink

Smoking ban definitely improved health

Good news from Scotland, where the smoking ban came in before ours in England. A large study has found measurable health improvements due to the ban, such as a large decrease in heart attack admissions (including a noticeable effect on non-smokers due to less passive smoking). Woo.

Monday 10th September 2007 | science | Permalink

Onscreen violence really is bad for us

Given the shootings in the USA this week, the main feature in this week's New Scientist is eerily apt. As summarised in their editorial, the research on the effect of TV / video game violence seems to be persuasive, that it has generally bad effects including aggression/desensitisation/etc.

While the report does concede that you can get useful skills from modern media (such as the dexterity and quick thinking which can be demonstrated to come from computer games), it makes the point quite clearly that the bad outweighs the good. I'm not sure what the picture would be like for people who see only "non-violent" media... I've never read any research papers on the subject so I can only be vague.

The strange prevalence of violence in films and computer games puzzles me quite a bit. I'm not one of those people that automatically tuts about violent media but it's weird how much violence there is. It must be what people want, but why? One answer might be "escapism", escaping from humdrum life into exciting scenarios, and maybe violence is one of the easiest ways to make things exciting. But there are loads of imaginitive ways to escape from the world... just look at some of the weird imaginitive stuff that the Japanese come up with. The Japanese come up with lots of really sick and violent stuff too of course ;) and maybe the grass looks a little greener on the other side, but our media's imaginitive range seems a bit stifled in comparison. Is poverty of imagination really anything to do with it? Or am I making it up?

Thursday 19th April 2007 | science | Permalink

Gillian McKeith stops calling herself a doctor!

The assumptions you make, eh? Not that I ever paid much attention to Gillian McKeith's TV programmes, but when someone called "Dr Gillian McKeith" appears regularly on Channel 4 telling people what they should be eating, who publishes books and so on, you tend to assume they've got medical qualifications in the straightforward sense just like my GP does.

This interesting article on Gillian McKeith throws a different light on the matter. Someone complained to the Advertising Standards Authority that calling herself "Dr Gillian McKeith" in advertising was misleading (since she's only a "Dr" by virtue of a correspondence course with a non-accredited American college). In order to avoid falling foul of a pending Advertising Standards Authority ruling (apparently a draft ruling seemed to be inclined in favour of the complaint) she's agreed not to use the term in future advertising.

The article has some really choice words to say about the woman, including quoting some of the very bizarre medical claims she's made, and the "Wild Pink Yam and Horny Goat Weed products" her company briefly marketed before the Medicines and Healthcare Regulatory Agency ordered her to stop selling them and said they "were never legal for sale in the UK". The article's written by a doctor and it makes quite a lot of good points in general about the difference between science and nonscience, and real doctors and sort-of-doctors...

Tuesday 13th February 2007 | science | Permalink

How many eggs should I eat?

OK, here's yet another food dilemma: should you eat plenty of eggs, because they contain various healthy vitamins and minerals? Or should you not eat many eggs, because of the cholesterol they contain? As usual I'm determined to find an evidence-based answer.

The first things I find in a web search come from the egg marketing boards. So, bearing in mind that they're obviously quite biased, I check out "Healthy eggs" from britegg.co.uk and "Eggs and cholesterol" from nutritionandeggs.co.uk. So, as expected, they confirm that eggs are full of lots and lots of nutritious things, but they also argue that recent evidence shows that eggs aren't bad for health. They have two scientific studies to support this argument: one which looked at a large number of people in the USA and found eggs didn't increase the risk of heart disease; and one which reviewed the current state of scientific knowledge and found that saturated fat (rather than dietary cholesterol) was the main cause of people having high blood cholesterol levels.

So far, so good, although the source is not what you'd call 100% neutral. And even if saturated fat is the main cause of high blood cholesterol, could dietary cholesterol be a lesser but still important cause?

So, I found the cholesterol review article and had a look. It's a very tricky subject to unpick, actually. For example, the study finds that people who eat more dietary fat tend to eat more dietary cholesterol too. So it could be tricky to separate out the effect of these two. There are methods for doing this, of course, and in the multiple regression analysis used by the researchers, it seems that there were three significant influences on a person's blood cholesterol levels: their intakes of saturated fat, polyunsaturated fat (the more people eat, the lower their cholesterol, for polyunsaturates), and cholesterol. However, although these influences all took part, the cholesterol influence is strongly outweighed by the influence of saturated fat vs unsaturated fat - if I gloss over some of the details to come up with a very approximate rule of thumb, the study finds that reducing saturated fat is someting on the scale of ten times more influential than reducing cholesterol.

OK, so what about some other sources of information? The BBC often has a lot of health information, but searching their site didn't actually find very much. The story Eggs 'protect against breast cancer' reports on a USA study of women, finding that eating eggs in teenage years seems to help lessen the likelihood of breast cancer; the study involved a large number of people and was published in a reputable journal so it seems trustworthy. The only other article I found was An egg a day 'is good for you' which seems to be based on the same studies as the ones I mentioned above. They did however confirm with a British Nutrition Foundation scientist, who agreed that there was unlikely to be a health risk from eating an egg a day (they recommend 2 or 3 a week apparently). There is opposition from the Vegan Society, but once again, they're hardly an unbiased source of information about whether people should eat eggs or not!

What about UK government advice? The UK government seems to be quite keen on inventing websites for public information these days, and one of their sites I searched is eatwell.gov.uk. They have two useful pages here: a page about eggs (including the section: "How many eggs?" - aha!) and a Q&A about eggs and cholesterol. The message from them is: eggs are good for you, and you don't need to cut down on them (unless your doctor tells you to for a specific reason). Just eat a balanced diet, as they always say.

And that's pretty much my conclusion. It seems that people used to (reasonably) assume that eating food with cholesterol in, would raise your blood cholesterol, and that was a reason not to eat too many eggs. But that assumption is too simple, and dietary cholesterol isn't that worrying after all. As long as you eat a balanced diet you can enjoy your eggs.

Sunday 10th December 2006 | science | Permalink

Does burnt food cause cancer?

Does burnt food cause cancer? Someone said to me that burnt food was "as dangerous as a cigarette", which is a pretty big claim, so I've been searching the web and some research databases, looking for evidence.

There's very little on the web about it, besides a lot of idle speculation on messageboards. This ScienceNews article from 2005 says that the US government now lists certain chemicals found in "meats when they're cooked too long at high temperature" as carcinogenic. It also says:

Finally, the report notes that while inconclusive, published studies in people "provide some indication" of human risks from eating broiled [grilled] or fried foods "that may contain IQ and/or other heterocyclic amines." The National Cancer Institute conducted one of those suggestive studies. It compared the diets of 176 stomach cancer patients and another 503 cancerfree individuals. Overall, people who regularly ate their beef medium-well or well-done faced more than three times the stomach cancer risk of those who ate their meat rare or medium-rare, according to a 1997 report of the research.

More information about this is in a very helpful summary by the USA National Cancer Institute. Note that one of the studies quoted looked at cooking at 200ºC or 250ºC, which is much hotter than ordinary baking/roasting. However, that is the kind of temperature you use to cook a pizza...

Statistics like "three times the cancer risk" always sound scary, but you need to ask, three times what? We need to know how the risk compares against other things. More on that later.

I found a messageboard thread on which someone said "You can put tomato sauce on it. I heard it helps lessen the production of carcinogen which causes the cancer." This is a big mistake. Fruits like tomatoes or cherries do contain antioxidants which counteract the formation of the carcinogens, but only during the cooking process, mixed in with the meat (e.g. in a burger mixture). Putting ketchup on afterwards will make zero difference.

I also found a journal article discussing the increased cancer risk from barbecued food especially (Lijinsky W, (1991), Mutation Research 259 (3-4): 251-261). It suggested that the reason for the risk was that fat will drip off the meat, then burn at high temperatures when it hits the coals, forming the cancer-causing substances that then mix in with the barbecue smoke and may then coat the outside of the meat being cooked. This explanation was proposed to explain their finding that the chemicals were mainly found in fattier foods cooked over burning logs.

Other relevant journal articles:

  1. One found a similar connection: the highest concentrations found in the Italian diet were in pizzas cooked in wood-burning ovens, and in barbecued beef and pork. Ludovici M et al (1995), Food Additives and Contaminants 12 (5): 703-713)
  2. One found that the Indian tradition of cooking with homemade clay-stoves, called "Chulha", created a lot of smoke containing the problematic chemicals.(Bhargava A et al (2004), Atmospheric Environment 38 (28): 4761-4767) This was said to increase the risk for people who cook with them - remember that inhaling carcinogens is typically much more dodgy than swallowing them, because the route into the body is more direct.

The relative risk? Is a barbecued steak as dangerous as a cigarette, as certain internet message boards might lead you to believe? Clearly not: many people eat well-cooked meat, yet nine-out-of-ten cancer deaths can be attributed to smoking. (The nine-out-of-ten figure comes from a study of USA deaths in 1995: source.) Scientists can calculate guideline statistics such as the "incremental cancer risk", an averaged-out measure of the risk from something. For cigarettes it's 0.079 (source); for burnt meat it's somewhere between 0.00001 and 0.00038 (source). So the risk is somewhere between 200 and 8000 times lower - there's no comparison between one cigarette and one burnt steak.

My conclusions:

  1. Regularly eating burnt or barbecued meat, especially meat that's been cooked at high temperatures for a long time, is relatively risky behaviour. But don't panic: it's not comparable to smoking.
  2. For non-meat food the research is less clear-cut: it's not obvious whether all smoke-cooked or overcooked food carries risks. Certainly if you don't eat it regularly there's nothing to worry about.
Monday 2nd October 2006 | science | Permalink

Digital vs analogue clocks

Both me and Philippa insist that it's easier to read an analogue clock-face (i.e. one with hands) than a digital clock-face. So I wondered: is there any research on the subject?

Of course there is! There's research about everything. But it doesn't seem to agree with us.

In Processing of visually presented clock times (Goolkasian, P and Park, D.C., 1980) the experimenters looked at the differences in speed for judging the time difference between two clocks, and found that "same/different reactions to digitally presented times were faster than to times presented on a clock face, and this format effect was found to be a result of differences in processing that occurred after encoding."

Minding the clock (Kathryn Bock, David E. Irwin, Douglas J. Davidson and W. J. M. Levelt, 2003) looked at explicitly linguistic effects (e.g. difference between Dutch and American English speakers). It also found that "responses to analog clocks were faster with relative expressions and responses to digital clocks were faster with absolute expressions," although overall it found again that digital clock-reading was faster than analogue in all cases. Note that the experimental method was explicitly linguistic - the speed measurements were measurements of how quickly the participants began to speak when they correctly named the time.

This is one of the most interesting (and most recent) results I found, partly because the experimental design included displaying the clocks for a short amount of time (as low as 0.1 seconds). 0.1 seconds is too quick for the eye to rove around the clock-face and fixate directly on the different parts of the display, and "the results from the 100 ms exposure conditions indicated that sufficient information for fairly accurate production can be extracted from the display without fixating the crucial information directly."

The effects of response format and other variables on comparisons of digital and dial displays (Miller R.J. and Penningroth S., 1997) "compared dial and digital clock displays to determine which could be read faster by 25 young adults" and found that "in general, digital displays led to faster responses than did dial displays. However, several combinations of the other variables, particularly those using the before-the-hour response format, effectively eliminated the superiority of digital displays. We suggest that in designing displays requiring such a response format, designers should not assume that a digital display is necessarily the best choice, especially if other factors encourage the selection of a dial display."

I haven't read the full paper (not available electronically; will have to visit my uni library) so I'm not sure if the experimental design was again based on participants reading the time out loud - and if so, I have an issue with that which I'll come to later. But this effect of before-the-hour responses is tantalising. For example: Philippa is a radio producer, and one of the things they need to do is glance at the clock to know how much time they've got before the programme ends at 6 o'clock precisely, so they can judge when to end interviews, when to bring in the next piece of music, etc. Philippa finds it much quicker to glance at an analogue clock in order to do this, and intuitively I can see why. You can literally see how much time is left (i.e. the size of the gap between the minute hand and the 12 o'clock mark), whereas with a digital clock you have to take in all the numbers and then do a quick arithmetic operation - not difficult, of course, but probably much slower, cognitively.

Judging a duration like this is very different from speaking the time. Reading out numbers is a one-to-one transformation which we do in so many contexts that it's very very easy; when reading out from a dial clock, we need to translate the hands' position into numbers before we can speak it. When using a dial clock to determine actions, however, we don't necessarily need to put the numerical step in the middle.

I'd like to run an experiment different to the ones I've found so far, one which tests the ability to comprehend clock-faces from a short glance - e.g. starting at 0.1 seconds and getting shorter. Rather than measuring the speed of vocalising, the measurement would be the minimum "glance time" for which the time could be correctly identified. My hunch is that the threshold will be a much shorter glance for analogue clocks.

Friday 23rd June 2006 | science | Permalink

A load of Boswelox

Here's an interesting article about the science, or lack of science, behind skincare products and their arbitrary pseudo-scientific claims. It even uses the word "Boswelox", possibly the funniest word ever.
Wednesday 22nd February 2006 | science | Permalink
Creative Commons License
Dan's blog articles may be re-used under the Creative Commons Attribution-Noncommercial-Share Alike 2.5 License. Click the link to see what that means...