We're pleased to announce a new data challenge: "Few-shot Bioacoustic Event Detection", a new task within the "DCASE 2021" data challenge event.
We challenge YOU to create a system to detect the calls of birds, hyenas, meerkats and more.
This is a "few shot" task, meaning we only ever have a small number of examples of the sound to be detected. This is a great challenge for machine-learning students and researchers: it is not yet solved, and it is great practical utility for scientists and conservationists monitoring animals in the wild.
We are able to launch this task thanks to a great collaboration of people who contributed data from their own projects. These newly-curated datasets are contributed from projects recorded in Germany, USA, Kenya and Poland.
The training and validation datasets are available now to download. You can use them to develop new recognition systems. In June, the test sets will be made available, and participants will submit the results from their systems for official scoring.
Much more information on the Few-shot Bioacoustic Event Detection DCASE 2021 page.
Within TDWG Audubon Core, we are considering what is a good standard to label information in sub-regions of sound recordings, images, etc. For example, I can draw a rectangular box in an image or a spectrogram, and give it a species label. This happens a lot! How can we exchange these "boxes" between software and databases reliably?
The question is: should we use the w3c’s "Media Fragments" syntax? In particular, I’m looking at section 4.2 about selecting temporal and spatial sub-regions.
Temporal region examples:
t=10,20 # => results in the time interval [10,20) t=,20 # => results in the time interval [0,20) t=10 # => results in the time interval [10,end)
Spatial region examples:
xywh=160,120,320,240 # => results in a 320x240 box at x=160 and y=120 xywh=pixel:160,120,320,240 # => results in a 320x240 box at x=160 and y=120 xywh=percent:25,25,50,50 # => results in a 50%x50% box at x=25% and y=25%
The definitions for the content of the values are good, and we should directly follow their example. (For time, the values are Normal Play Time (npt) RFC 2326 which can be purely in seconds or in hh:mm:ss.*, and other formats such as ISO 8601 datetime can be used as "advanced" use. For space, values are in pixels or percentages, with pixels as the default, and x=y=0 the top-left of the image.)
The structure of the selectors, however, I think could lead to problems for annotating biodiversity multimedia:
- Comma-separated formats for fields are likely to lead to errors when used in CSV data.
- There are existing use-cases which refer to single points in time/space rather than regions. (This could however be handled as regions of zero extent: t=10,10 or xywh=160,120,0,0.)
- The format "t=10" for a time interval [10,end) risks user error since it could be interpreted as, or used as, a representation of temporal points. (In retrospect it would have been better to define the format as "t=10,")
- We wish to provide for a frequency axis, with similar region-selection characteristics as the temporal and spatial. (See freqLow and freqHigh recently added to Audubon Core.)
- We would like to allow for 3D spatial extents (xyzwhd?).
So, as one possibility: we could use the w3c’s approach to defining the values, by explicitly referring across to their use of RFC 2326 etc; but instead of simply recommending to use Media Fragments, we do NOT recommend the
xywh selectors but instead recommend separate fields for
freqHigh, and so forth.
I should say that my background is with audio data, and so for selecting image regions there may be existing good practice/recommendations that I haven't spotted.
My blog doesn't have a "comments" function, but I'd like to read your comments! You can reach me using twitter or email dstowell (attt) tilburguniversity.edu
Ever since the immersive experience of the fantastic Biodiversity_Next conference 2019, I've been getting to grips with biodiversity data frameworks such as GBIF and TDWG. So I'm very pleased to tell you that I've been contributing to the Audubon Core standard, which is an open standard of vocabularies to be used when describing biodiversity multimedia. These standards help the world's collections and databases to talk to each other. It can enable some amazing stuff to happen, when the entire planet's evidence about animal and plant species can be treated almost as if it was one big dataset.
After community consultation we've just released an update to the Audubon Core Terms List which adds some terms that are very important for describing audio data. The terms added are:
- sampleRate for the sample-rate of the digitised recording.
- freqLow and freqHigh to specify the frequency range of the sound events captured in the recording.
Any audio experts reading this might find these rather unimpressive and basic metadata. But that's precisely the point - to add these basic and uncontroversial terms to the standard. I'm very happy with what seems to be the TDWG approach, which is to move forward by gradual consensus rather than over-engineering a standard in advance.
What can we do with this? Well, in the not-too-distant future I can imagine querying GBIF for all animal calls within a particular frequency band, or analysing frequency ranges globally to explore acoustic environmental correlations. We can't do this yet, but as these new terms get taken up it should happen.
These term additions came about primarily through joint efforts of me, Ed Baker, and Steve Baskauf - a TDWG expert who has guided us through the process very attentively. Plus, many people in the TDWG community who checked our work and gave input. Thank you!
I'd also like to acknowledge the Heidelberg Bioacoustics Symposium (Dec 2019) at which we had discussions with many different taxon experts on animal sound and how we can share it.
There are some presentation slides from the TDWG annual meeting, introducing the changes, and also looking forward to more detailed metadata that might be added in future (i.e. for sound-event annotations). We also proposed to import a term "dwc:individualCount", but we ran into some definitional issues so that will take a bit more time to resolve.
You can get involved in what happens next. Use TDWG standards such as Audubon Core to share your data. Get involved in the discussion about what else might be needed to share and aggregate bioacoustic datasets.
I'm extremely pleased to announce this publication, edited by Jérôme Sueur and myself: Ecoacoustics and Biodiversity Monitoring - a special issue in the journal "Remote Sensing in Ecology & Conservation".
It features 2 reviews and 6 original research articles, from research studies around the globe.
You can also read a brief introduction to the issue here on the RSEC blog.
I had a great time at the Biodiversity_next conference, meeting a lot of people involved in Biodiversity informatics. (I was part of an "AI" session discussing the state of the art in deep learning and bioacoustic audio.)
I was glad to get more familiar with the biodiversity information frameworks. GBIF is one worth knowing, an aggregator for worldwide observations of species. It's full of millions and millions of observations. Plants, animals, microbes... expert, amateur, automatic observations - lots of different types of "things spotted in the wild". They use the cutely-named "Darwin core" as a data formatting standard (informatics folks will get the joke!).
Here's my first play at downloading some GBIF data. I downloaded all the data they've got about rose-ringed parakeets in Britain - the bright green parrots that are quite a new arrival in Britain, an invasive species which we can see in many city parks now. I plotted the observations per year. I also plotted a second species on the same chart, just to have a baseline comparison. So the parakeets are plotted in green, and the other species (common sandpiper) in yellow:
Many caveats with this data. For a start, each dot represents an "observation" not an "individual" - some of the observations are of a whole flock. I chose to keep it simple, not least because some of the observations list "5000" birds at a time, which may well be true but might swamp the visualisation! Also, some of the co-ordinates are scrambled, for data-privacy reasons - you can see it in the slight grid-like layout of the dots - and some are exact.
Further, I don't think I have any way of normalising for the amount of survey effort, at least for most of the data points. There seems to be a strange spike of parakeet density in 2009 - probably due to some surveying initiative, not to some massive short-term surge in the bird numbers! I think if the numbers really had increased eight-fold and then fallen back again, someone would have said something...
Regarding "survey effort": GBIF does offer ways of indicating survey effort, and also "absences" as well as "presences", but most of the data submissions don't make use of those fields.
The sandpiper data fluctuates too. There's definitely an increase as time goes by, primarily due to the increasing amount of surveys adding to GBIF. That's why I added a comparison species. Even with that, you can clearly see the difference in distribution between the two.
The 2019 International Biocaoustics Congress (IBAC) was its fiftieth year! And it was a very stimulating conference, held in Brighton and organised by David Reby and a lovely team of co-organisers. Shout out to Eilean Reby who designed a really neat visual identity for the conference, with T-shirt and bags designed for everyone - we had a choice of designs featuring bat, bird, insect, fox...
It was my third IBAC (after IBAC 2015 in Germany and IBAC 2017 in India) and I was very happy to chair a session on advanced computational methods, and to see some great work in there. In this post I'll make notes of other things I saw.
Our QMUL bird group presented their recent work. Lies Zandberg spoke about her great work with Rob Lachlan, as part of our project to understand sound similarity from a birds' point of view. Mahalia Frank presented her studies (her PhD with David Clayton) on the neuroanatomy that might explain what zebra finches can hear while still in the egg.
The most shocking bird news, for me, came from the superb lyrebird. Now, the superb lyrebird is a famously showy Australian songbird - the male does this bizarre-looking dance on top of a little mound he's built. The shocking revelation was in Anastasia Dalziell's talk about her studies on the lyrebird's mimicking song. She showed us evidence that the song incorporates sounds from other bird species' mobbing calls (i.e. what groups of birds do when there are predators about), and further, that this mimicry might have evolved to con the female lyre bird into staying around for some copulation when she may be on the verge of wandering away. It's one possible way that complex mimcry might arise through evolution (antagonistic co-evolution) ...and it's very odd behaviour.
Also making a splash was ... RoboFinch! A very eyecatching piece of work presented by members of Katharina Riebel's lab, as part of a project called "Seeing Voices". Ralph Simon presented about the technical development, and Judith Varkevisser on her vocal learning studies. The idea is to explore how the visual modality affects vocal learning. They tried various things: young birds housed with the tutor, or watching the tutor from the next chamber, or hearing him only; or showing videos of the tutor to the young birds; then a "hologram" (a Pepper's ghost) bird... Then they went as far as building lifesize animatronic zebra finches, with robotic heads/beaks and colours painted to match the exact colour profile of real birds. It seems the young birds really do like to get along with RoboFinch, much more than the other methods, or the traditional audio-only "tape" tutoring.
We all like a clever and informative data visualisation technique. A few years ago it was Michael Towsey's "false colour spectrograms" I was excited about - and we saw some use of those at IBAC 2019. A new visualisation was Julia Hyland Bruno's work (part of her PhD with Ofer Tchernikovsky) - surprisingly simple, she renders many different renditions of a zebra finch song as a colourful image in which each row is a colour-mapped time series for one rendition. With a bit of judicious alignment, the effect is to give an immediate visual impression of the repeatability and the structure in the song. A paper showing the technique is here.
I continue to be interested in rhythm and timing in animal and human sounds, and IBAC is a good place to catch some research on this.
- Michelle Spierings presented a very intriguing new method she has been developing with Tecumseh Fitch: a "palimpsest" scramble of words on-screen, while the sound of a repetitive kick drum plays. The question: does the kick drum help you to entrain and spot the words appearing on the beat? The answer: yes. There are some other neat things you can think of trying with this method.
- Isabelle Charrier showed acoustic recognition of northern elephant seal mates used pulse rate (and spectral shape, spectral centroid) were super stable carriers of individual ID. --- Intriguingly, Juliette Linossier (separately) presented her work on indivudal ID in the sound of northern elephant seal pups: the mothers can recognise their pups' sounds, but those pups' sounds are weird-sounding creamy grunts, and it's very hard for us listeners to guess how the individual identity might be represented in there.
- Florencia Noriega showed her quantitative method for comparing rhythm patterns in animal sound sequences. It's based on the well-known inter-onset intervals (IOIs) and she has applied it to frogs, parrots and zebra finches, showing that it makes an apparently discriminative compact representation.
- Nora Carlson showed the very interesting setup she has at the Max Planck Institute of Animal Behaviour - a group of free-flying birds in a large barn, 3D motion-tracked using a Vicon camera system. She's running a study on vocal communication networks - which is of interest to me given that I've analysed smaller vocal networks of birds before. In fact in the paper I just linked to, there's a reanalysis of some of the work Manfred Gahr presented later in the same session - they have some great work studying bird vocal communication via tiny little backpacks that birds carry around. From the same group, Susanne Hoffmann presented their study of duetting weaver birds in the wild - amazing that they're live-recording the audio as well as neural activity of pairs of birds out in the wild.
Good to catch up on large-scale and ecoacoustic monitoring too.
- Sarab Sethi and Becky Heath presented their work on large amounts of acoustic data gathered in Borneo. They used 3G-connected Raspberry-pi based devices. I think Sarab's method of using the off-the-shelf Google "AudioSet features" as an audio representation has many appealing qualities: unsupervised, reproducible, and clearly representing ecoacoustic data in at least a partly-semantically-disengangled fashion. There are some definite residual questions: someone pointed out that since AudioSet is based on human-oriented Youtube audio, it's unlikely to represent high frequency events usefully, and clearly doesn't represent ultransonics at all. It would be a good thing if, as a community, we created a large "EcoAudioSet" and a fixed feature representation derived from it.
- Plenty of projects using AudioMoth as their audio gathering tool. One nice example is Tomas Honaiser Rostirolla's project to document Brazilian ecoacoustics - they made 20274 manual annotations (3988 of which at species level) from 6048 recordings, and openly published I believe. Great.
- Wesley Webb got a lot of attention, not only for his prize-winning presentation of his study on bellbird song in New Zealand, but also for introducing "Koe", a web-based an open-source tool intended to speed up the manual process of labelling large amounts of bioacoustic audio data.
- Paul Roe gave us an overview of their "acoustic observatory" in Australia, and in particular the institutional and collaborative efforts needed to create and maintain such a large monitoring project. For example, the use of piggy-backing on other researchers' projects ("heading out in to the bush? could you check this recorder for us while you're there?")
As usual for IBAC, I learnt some amazing things about bioacoustics outside the realm of birds. Some great bat work represented, e.g. from Sonja Vernes' group (I remember IBAC 2015 when I think she might have been literally the only person talking about bats).
One little observation: both Julie Oswald and Jack Fearey had used a neural net called "ARTwarp" to cluster the vocalisations in their whale data, to try and understand their repertoires. I don't know this method, but it seems to be an unsupervised clustering method incorporating time-warping - might be of interest.
And even more, I continue to learn from the weird world out there beyond the vertebrate animals. Amazing little insects that nibble holes in leaves, for example, so that they can create little acoustic cavities to broadcast well. The most out-there bit of acoustic communication was as follows... a type of ants drilling holes in acacia tree thorns, which then create a whistling signal in the wind, to scare away cattle. Even better, the researchers are currently tested this hypothesis by drilling thousands of little holes in trees in Africa, to see if the reduced whistling affects the cattle behaviour. (This was presented by Kathrin Krausa on behalf of Felix Hager.)
Oh one more thing -- the vocal imitation competition had some fantastic results. As the final event of the entire conference, we listened to a deluge, a menagerie of inexplicable animal sounds that had been imitated by IBAC attendees themselves. This could become a fixture in the calendar...
The H-index is one of my favourite publication statistics. It's really simple to define: a person's H-index is the biggest number H of publications which have been cited H times each. It's robust to outliers: if you've a million publications with no citations, or one publication with a million citations …
New journal article from us!
"Automatic acoustic identification of individuals in multiple species: improving identification across recording conditions" - a collaboration published in the Journal of the Royal Society Interface.
For machine learning, the main takeaway is that data augmentation is not just a way to create bigger training sets: used …
Based on a conversation we had in the Machine Listening Lab last week, here are some blogs and other things you can read when you're - say - a new PhD student who wants to get started with applying/understanding deep learning. We can recommend plenty of textbooks too, but here it's …
I've been struggling with the tension between academia and flying for a long time. The vast majority of my holidays I've done by train and the occasional boat - for example the train from London to southern Germany is a lovely ride, as is London to Edinburgh or Glasgow. But in …