This week we've been at the LVA-ICA 2018 conference, at the University of Surrey. A lot of papers presented on source separation. Here are some notes:
- Evrim Acar gave a great tutorial on tensor factorisation. Slides here
- Hiroshi Sawada described a nice extension of "joint diagonalisation", applying it in synchronised fashion across all frequency bands at once. He also illustrated well how this method reduces to some existing well-known methods, in certain limiting cases.
- Ryan Corey showed his work on helping smart-speaker devices (such as Alexa or whatever) to estimate the relative transfer function which helps with multi-microphone sound processing. He made use of the wake-up keywords that are used for such devices ("Hi Marvin" etc), taking advantage of the known content to estimate the RTF for "free" i.e. with no extra interaction. He DTW-aligned the spoken keyword against a dictionary, then used that to mask the recorded sound and estimate the RTF.
- Stefan Uhlich presented their (Sony group's) strongly-performing SiSEC sound separation method. Interestingly, they use a variant of DenseNet, as well as a BLSTM, to estimate a tf mask. Stefan also said that once the estimates have been made, a crucial improvement was to re-estimate them by putting the estimated masks together through a multichannel Wiener filtering stage.
- Ama Marina Kreme presented her new task of "phase inpainting" and methods to solve it - estimating a missing portion of phases in a spectrogram, when all of the magnitudes and some of the phases are known. I can see this being useful in post-processing of source separation outputs, though her application was in engine noise analysis with an industrial collaborator.
- Lucas Rencker presented some very nice ideas in "consistent dictionary learning" for signal declipping. Here, "consistent" means that the reconstructed signal should be painting the missing regions in a way that matches the clipping - if some part of the signal was clipped at a maximum of X, then its reconstruction should take values greater than or equal to X. Here's his Python code of the declipping method. Apparently also the state-of-the-art in this task is a method called "A-SPADE" by Kitic (2015). Pavel Zaviska presented an analysis of A-SPADE and S-SPADE, improving the latter but not beating A-SPADE.
An interesting feature of the week was the "SiSEC" Signal Separation Evaluation Challenge. We saw posters of some of the methods used to separate musical recordings into their component stems, but even better, we were used as guinea-pigs, doing a quick listening test to see which methods we thought were giving the best results. In most SiSEC work this is evaluated using computational measures such as signal-to-distortion ratio (SDR), but there's quite a lot of dissatisfaction with these "objective" measures since there's plenty that they get wrong. At the end of LVA-ICA the organisers announced the results of the listening test: surprisingly or not, the results of the listening test had broadly a strong correlation with the SDR measures, though there were some tracks for which this didn't hold. More analysis of the data to come, apparently.
From our gang, my students Will and Delia presented their posters and both went really well. Here's the photographic evidence:
- Delia Fano Yela's poster about source separation using graph theory and Kernel Additive Modelling read the preprint here
- Will Wilkinson's poster "A Generative Model for Natural Sounds Based on Latent Force Modelling" read the preprint here
Also from our research group (though not working with me) Daniel Stoller presented a poster as well as a talk, getting plenty of interest for his deep learning methods for source separation preprint here.
I've always thought fake meat was a bit silly. When I recently starting eating more veggy food I promised myself I wouldn't have to eat Quorn pieces, those fake chicken pieces that taste bland and (unlike chicken) don't respond to cooking. They don't caramelise, they don't get melty tender, they just warm up. If you like cooking, you're much better off cooking some actual veg.
So it's a shock to be saying that some of the best meals I've had in 2017 have been fake meat. It seems the veggie world is just stepping up and stepping up. I've been lucky enough to travel for work and here are some amazing things I ate:
In Beijing, there was this braised fish dish, an extravagant centrepiece to a meal. A big pot of braised Chinese vegetables, and at the centre a mock fish steak. I don't know what it was made of but it had been slashed across the upper surface (like you would do with meat to get flavours in) and that upper surface was grilled and caramelised, while the lower part in the braising sauce was meltingly tender.
In Sweden, I got off the train in Lund and within a few minutes my eyes lighted on a kebab shop (Lunda Kitchen) with a massive list of things labelled "vegan": burgers, kebabs, pepperoni pizzas... My host actually said that he thought "vegan" probably didn't mean the same thing as it did in English. Anyway it does. Their vegan doner kebab was just ace: just meaty and spicy enough, all the trimmings as usual.
In Germany, I had this literally unbelievable vegan schnitzel (at Max Pett, Munich). It wasn't just that it had the taste of a breaded steak "Wiener art", but also the structure, the resistance and texture you expect when you cut into an actual schnitzel. The only reason I didn't grab the serving staff and double-check whether it was veggie or not was that I was in a very definitely vegan restaurant.
In France the seitan bourgignon was a great idea but the execution wasn't ideal. However we had excellent seaweed "tartare" and artichoke "rillettes", both of which captured specific je-ne-sais-quoi tastes of the traditional dishes they were paying tribute to. These were in various Paris vegan bistros.
In India... I didn't have any fake meat at all. I had some amazing dishes, since they've a massive history of veggie cuisine of their own, but it doesn't centre around fake meat.
Back in London? Yes there's plenty of good food around, such as vegan doner kebab or cheezburger from "Vx". But... the veggie version of a roast beef Sunday lunch? I haven't seen it yet...
Excited today to get a delivery of the new mail-order vegan cheese from my friend's new London cheezmakery, Black Arts Vegan! It came beautifully packed, see:
Their first cheese is a vegan mozarella. We unpacked the cheese and had a taste - yes, a good clear taste like standard mozarella. But they've worked on getting it right so it goes melty and gooey, and browns nicely in the oven. So let's try it on a pizza!
It really does come into its own on the pizza - the lovely warm melted mozarella consistency is great, and it's easy to forget that it's plant-based and not dairy. Magic :)
Tamarind is ace. It imparts a deep, rich and sweet flavour to curries. Buy a block and put it in your fridge, it keeps for months, and you can hack a piece off and chuck it in your curry just like that. That's what I did in this lovely chana (chickpea) curry.
Note that the block sort-of dissolves as it cooks, and leaves behind inedible pips. If you prefer not to spit out pips then you could put the tamarind in a paper teabag perhaps, so you can fish it out afterwards.
You can change the veg choices in here - the red pepper is a nice bright contrasting flavour - but in particular the baby aubergines do this great thing of going gooey and helping to create the sauce. Full-sized aubergines don't seem to do that, in my experience. It's the tamarind and the aubergine that go to add body to the sauce, I think - I don't add any tomato or anything like that, and yet the sauce is flavoursome and thickened.
- 1 tbsp veg oil
- 1 tsp mustard seeds
- 1 tsp cumin seeds
- 2 cloves
- 1 onion, chopped fine-ish
- 1 red chilli, sliced (reduce amount if you want less heat)
- 1 tsp cumin powder
- 1 tsp coriander powder
- 1 tsp turmeric powder
- 1 red pepper, chopped into slices/dices
- 4 baby aubergines, chopped into 2cm chunks
- 1 400g tin chickpeas, drained and rinsed
- 1 packet of cooked beetroot, drained and quartered (you can add the drained beetroot juices to the pot later)
- About 2cm cubed of tamarind block
- Black pepper
- 1 bunch coriander leaves, rinsed and roughly chopped
Heat the oil in a largeish deep pan which has a lid, on quite a hot frying heat. Add the spice seeds and the cloves - you might like to put the lid half-on at this point because as the seeds fry and pop they'll jump around and may jump out at you.
After 30 secs or so with the seeds, add the onion, then the chilli and the powdered spices. Give it a good stir round. Let the onion fry for a minute or two before adding the red pepper and the aubergines. Fry this all for another couple of minutes, stirring occasionally.
Add the chickpeas, the beetroot with its juices, the tamarind block, and maybe 1 cup of boiling water (don't add too much water - not enough to cover the mixture). Give this a good stir, then put the lid on, turn the heat down to its lowest, and let it bubble for 30 minutes or so. It can be longer or shorter, I'd say 20 minutes is an absolute minimum. No need to stir now, you can go and do something else, as long as you're sure it's not going to bubble over!
When the curry is nearly ready, take the lid off, turn the heat up to thicken the liquid if needed, and give it all a stir.
Give it a good twist of black pepper, then serve it up in bowls, with coriander leaf sprinkled on top. Serve it with bread (eg naan or roti).
I'm very happy to publish a video of this installation piece that Sarah Angliss and I collaborated on a couple of years ago. We used computational methods to transcribe a dawn chorus birdsong recording into music for Sarah's robot carillon:
We presented this at Soundcamp in 2016. We'd also done a preview of it at an indoor event, but in this lush Spring morning with the very active birds all around in the park, it slotted in just perfectly.
If you listen you find that obviously the bells don't directly sound like birds singing. How could they! Ever since I started my research on birdsong, I've been fascinated by the rhythms of birdsong and how strongly they differ from human rhythms, and what I love about this piece is the way the bells take on that non-human patterning and re-present it in a way that makes it completely unfamiliar (yet still pleasing). We humans are too used to birdsong as background sound, we fail to notice what's so otherwordly about it. The piece has a lovely ebb and flow, and is full of little gestures and structures. None of that was composed by us - it all comes directly from an automatic transcription of a dawn chorus. (We did of course make creative decisions about how the automatic transcription was mapped. For example the pitch range we transposed to get the best alignment between birds' and bells' singing range.) And in context with the ongoing atmosphere of the park, the birdsong and the children, it works really well.
The paper "Wasserstein Learning of Deep Generative Point Process Models" published at the NIPS 2017 conference has some interesting ideas in it, connecting generative deep learning - which is mostly used for dense data such as pixels - together with point processes, which are useful for "spiky" timestamp events.
They use the Wasserstein distance (aka the "earth-mover's distance") to compare sequences of spikes, and they do acknowledge that this has advantages and disadvantages. It's all about pushing things around until they match up - e.g. move a spike a few seconds earlier in one sequence, so that it lines up with a spike in the other sequence. It doesn't nicely account for insertions or deletions, which is tricky because it's quite common to have "missing" spikes for added "clutter" in data coming from detectors, for example. It'd be better if this method could incorporate more general "edit distances", though that's non-trivial.
So I was thinking about distances between point processes. More reading to be done. But a classic idea, and a good way to think about insertions/deletions, is called "thinning". It's where you take some data from a point process and randomly delete some of the events, to create a new event sequence. If you're using Poisson processes then thinning can be used for example to sample from a non-stationary Poisson process, essentially by "rejection sampling" from a stationary one.
Thinning is a probabilistic procedure: in the simplest case, take each event, flip a coin, and keep the event only if the coin says heads. So if we are given one event sequence, and a specification of the thinning procedure, we can define the likelihood that this would have produced any given "thinned" subset of events. Thus, if we take two arbitrary event sequences, we can imagine their union was the "parent" from which they were both derived, and calculate a likelihood that the two were generated from it. (Does it matter if the parent process actually generated this union list, or if there were unseen "extra" parent events that were actually deleted from both? In simple models where the thinning is independent for each event, no: the deletion process can happen in any order, and so we can assume those common deletions happened first to take us to some "common ancestor". However, this does make it tricky to compare distances across different datasets, because the unseen deletions are constant multiplicative factors on the true likelihood.)
We can thus define a "thinning distance" between two point process realisations as the negative log-likelihood under this thinning model. Clearly, the distance depends entirely on the number of events the two sequences have in common, and the numbers of events that are unique to them - the actual time positions of the events has no effect, in this simple model, it's just whether they line up or not. It's one of the simplest comparisons we can make. It's complementary to the Wasserstein distance which is all about time-position and not about insertions/deletions.
This distance boils down to:
NLL = -( n1 * log(n1/nu) + n2 * log(n2/nu) + (nu-n1) * log(1 - n1/nu) + (nu-n2) * log(1 - n2/nu) )
where "n1" is the number of events in seq 1, "n2" in seq 2, and "nu" in their union.
Does this distance measure work? Yes, at least in limited toy cases. I generated two "parent" sequences (using the same rate for each) and separately thinned each one ten times. I then measured the thinning distance between all pairs of the child sequences, and there's a clear separation between related and unrelated sequences:
Distances between distinct children of same process: Min 75.2, Mean 93.3, Median 93.2, Max 106.4 Distances between children of different processes: Min 117.3, Mean 137.7, Median 138.0, Max 167.3
This is nice because easy to calculate, etc. To be able to do work like in the paper I cited above, we'd need to be able to optimise against something like this, and even better, to be able to combine it into a full edit distance, one which we can parameterise according to situation (e.g. to balance the relative cost of moves vs. deletions).
This idea of distance based on how often the spikes coincide relates to "co-occurrence metrics" previously described in the literature. So far, I haven't found a co-occurrence metric that takes this form. To relax the strict requirement of events hitting at the exact same time, there's often some sort of quantisation or binning involved in practice, and I'm sure that'd help for direct application to data. Ideally we'd generalise over the possible quantisations, or use a jitter model to allow for the fact that spikes might move.
I'm lucky to be working with a great set of PhD students on a whole range of exciting topics about sound and computation. (We're based in C4DM and the Machine Listening Lab.) Let me give you a quick snapshot of what my students are up to!
I'm primary supervisor for …
I was shocked - and frankly, really sceptical - to realise that eating meat was one of the biggest climate impacts I was having. On the flip side, that's a good thing, because it's one of the easiest things to change on our own, without upending society. Easier than rerouting the air …
SuperCollider works on Linux just great. I've been responsible for one specific part of that in recent years, which is that when a new release of SuperCollider is available, I put it into the Debian official package repository - which involves a few obscure admin processes - and then this means that …
I'm just flying from the International Bioacoustics Congress 2017, held in Haridwar in the north of India. It was a really interesting time. I'm glad that IBAC was successfully brought to India, i.e. to a developing country with a more fragmented bioacoustics community (I think!) than in the west …