Other things on this site...

Evolutionary sound
Listen to Flat Four Internet Radio
Learn about
The Molecules of HIV
Make Oddmusic!
Make oddmusic!
[Blog archives]

Non-negative matrix factorisation in Stan

Non-negative matrix factorisation (NMF) is a technique we use for various purposes in audio analysis, to decompose a sound spectrogram.

I've been dabbling with the Stan programming environment here and there. It's an elegant design for specifying and solving arbitrary probabilistic models.

(One thing I want to point out is that it's really for solving for continuous-valued parameters only - this means you can't explicitly do things like clustering etc (unless your approach makes sense with fuzzy cluster assignments). So it's not a panacea. In my experience it's not always obvious which problems it's going to be most useful for.)

So let's try putting NMF and Stan together.

First off, NMF is not always a probabilistic approach - at its core, NMF simply assumes you have a matrix V, which happens to be the product of two "narrower" matrices W and H, and all these matrices have non-negative values. And since Stan is a probilistic environment we need to choose a generative model for that matrix. Here are two alternatives I tried:

  1. We can assume that our data was generated by an independent random complex Gaussian for each "bin" in the spectrogram, each one scaled by some weight value specified by a "pixel" of WH. If we're working with the power spectrogram, this set of assumptions matches the model of Itakura-Saito NMF, as described in Fevotte et al 2009. (See also Turner and Sahani 2014, section 4A.)
  2. We can assume that our spectrogram data itself, if we normalise it, actually just represents one big multinomial probability distribution. Imagine that a "quantum" of energy is going to appear at some randomly-selected bin on your spectrogram (a random location in time AND frequency). There's a multinomial distribution which represents the probabilities, and we assume that our spectrogram represents it. This is a bit weird but if you assume we got our spectrogram by sampling lots of independent quanta and piling them up in a histogram, it would converge to that multinomial in the limit. This is the model used in PLCA.

So here is the Stan source code for my implementations of these models, plus a simple toy dataset as an example. They both converge pretty quickly and give decent results.

I designed these implementations with audio transcription in mind. When we're transcribing music or everyday sound, we often have some pre-specified categories that we want to identify. So rather than leaving the templates W completely free to choose, in these implementations I specify pre-defined spectral templates "Winit".

(Specifying these also breaks a permutation symmetry in the model, which probably helps the model to converge since it shouldn't keep flipping around through different permutations of the solution. Another thing I do is fix the templates W to sum up to 1 each [i.e. I force them to be simplexes] because otherwise there's a scaling indeterminacy: you could double W and halve H and have the same solution.)

I use a concentration parameter "Wconc" to tell the model how closely to stick to the Winit values, i.e. how tight to make the prior around them. I also use an exponential prior on the activations H, to encourage sparsity.

My implementation of the PLCA assumptions isn't quite traditional, because I think in PLCA the spectrogram is assumed to be a sample from a multinomial (which implies it's quantised). I felt it would be a bit nicer to assume the spectrogram is itself a multinomial, sampled from a Dirichlet. There's little difference in practice.

Monday 2nd February 2015 | science | Permalink / Comment

Islam and imagery

I'm quite naive about Islam, so it's hard to get a clear idea of "normal" Islam underneath the headlines about the tiny proportion of violent extremists. Part of the Charlie Hebdo thing was the question about whether it's OK to depict the prophet. So just for reference I found this quote from Tariq Ali in the LRB helpful:

"On the question of images there has always been a debate within Islam. The Quran itself contains warnings against the worship of idols and graven images, but this is taken straight from the Abrahamic tradition and the Old Testament. It’s a stricture on forms of worship. After all, images of the prophet were embossed on early Muslim coins to replace Byzantine and Persian potentates. A number of paintings by Muslim artists in the late medieval period depict the prophet with loving care. The Shia tradition has always ignored the supposed ban on images and portraits of Shia imams have never been forbidden. All the different schools of Sunni jurisprudence don’t agree on the question. It has only become a big issue since Saudi money pushed Wahhabi clerics onto the world stage to fight communism during the Cold War (with the total backing of Washington). Wahhabi literalism misinterprets the Quran and its hostility to images led the Saudi government to destroy the graves in Mecca of the prophet, his companions and his wives. There were no protests except by architects and historians who denounced the vandalism. One can only imagine the response in the world of Islam had the destruction of the graves been carried out, deliberately or accidentally, by a Western power."

Saturday 31st January 2015 | Permalink / Comment

Thomas Piketty: Capital in the 21st Century

Just finished reading Thomas Piketty's now-famous book Capital in the Twenty-First Century. It's a big book, takes a while, but it's thoroughly worth it. It's admirably data-driven, yet clear and readable.

One thing it gives you is a lot of tips about whether the future will be the same as the past. Will inflation, growth, inequality be similar in the next 50 years as in the past 50 years? There are some perennial factors, and some things that are actually blips (caused e.g. by the World Wars) when considered on a timescale of centuries - and many of these are not quite as I had expected. Plenty of interesting titbits about, for example, how the amount of wealth we inherit is or isn't changing as our average lifespan gets longer, and how much "gift-giving" has grown in recent years as an "alternative" to inheritance.

And the readability is enhanced with details from history and literature, such as the way authors such as Jane Austen reflect the financial certainties of their time in their prose.

(It's worth mentioning that the Financial Times briefly smudged the book's reputation by alleging some of the data was wrong. (See here for example.) However, as The Economist noted, "the analysis does not seem to support many of the allegations made by the FT, or the conclusion that the book's argument is wrong." The FT later awarded Piketty's book their business book of the year, perhaps sensing which way the wind was blowing.)

I agree with the LRB's review of the book that "nothing about the book is more impressive than the range and richness of its statistical information. [The book] both insists on the importance of data and, at least where modern societies are concerned, highlights the uncertainties involved in its collection." (Note: the LRB also takes issue with some of Piketty's interpretations - interesting review.)

The LRB review also points out that Piketty's proposed mechanism for fixing some of the problems of modern markets and global captialism - namely, a global progressive tax on capital - may be a nice theory to consider but it's so unrealistic as to be unhelpful. Piketty claims (p515) that even though it's utopian, it's worth considering, maybe even working towards. But to me it seems obviously to neglect so many extremely powerful problems. Of course the richest people have the most political power, and will fight to prevent such taxes being adopted (or if they are adopted, to build a new tax haven state on an oil-rig somewhere). Maybe this is just a practical problem, to be solved by politics. But the tax would require all forms of wealth, owned by every person on and off the planet, to be enumerated. If not, the rich and their financial engineers would transfer their assets into un-enumerated assets. This reminds me of the "information asymmetry" critique of neo-classical economics: many market theories assume that all market participants have perfect information, and this unrealistic assumption is required to prove those markets' effectiveness. Similarly, the effectiveness of the global capital tax in balancing out inequality rests really quite heavily on the idea of some tax agencies somewhere having essentially perfect knowledge. Unfortunately (and as Piketty notes) modern financial engineering means that many taxes, even progressive taxes, become a bit regressive when everyone is subject to them except for the super-rich.

Piketty also says (p519) that the main benefit of the wealth tax (charged at low annual rates, nothing too scary) is "more in the nature of a compulsory reporting law than a true tax" - in a sense, he's less interested in taxing wealth than encouraging transparency in wealth ownership. This again is a bit utopian, since of course people will be motivated to avoid some things being reported, but not so problematic, since the benefits of transparency don't require 100% transparency. And it chimes well with Piketty's insistence that it's not financial rules that can fix the world, but publicly-available information and democratic debate combined with the rule of law. Piketty points out (p570) that most businesses don't publish enough financial details for us to work out how the wealth is divided between profits and wages. If they were required to, that would empower workers. As well as economists ;)

Wednesday 28th January 2015 | economics | Permalink / Comment

Merge operation: in Chomsky, and in recursive neural networks for NLP

This is either a spooky coincidence, or a really neat connection I hadn't known:

For decades, Noam Chomsky and colleagues have famously been developing and advocating a "minimalist" idea about the machinery our brain uses to process language. There's a nice statement of it here in this 2014 paper. They propose that not much machinery is needed, and one of the key components is a "merge" operation that the brain uses in composing and decomposing grammatical structures. (Figure 1 shows it in action.)

Then yesterday I was reading this introduction to embeddings in deep neural networks and NLP, and I read the following:

"Models like [...] are powerful, but they have an unfortunate limitation: they can only have a fixed number of inputs. We can overcome this by adding an association module, A, which will take two word or phrase representations and merge them.

(From Bottou (2011))

"By merging sequences of words, A takes us from representing words to representing phrases or even representing whole sentences! And because we can merge together different numbers of words, we don’t have to have a fixed number of inputs."

This is a description of something called a "recursive neural network" (NOT a "recurrent neural network"). But look: the module "A" seems to do what the minimalists' "merge" operation does. The blogger quoted above even called it a "merge" operation...

As far as I can tell, the inventors of recursive neural networks were motivated by technical considerations - e.g. how to handle sentences of varying lengths - and not by the minimalist linguists. But it looks a little bit like they've created an artificial neural network embodiment of the minimalist programme! I'm not an NLP person, nor a linguist, however: surely I'm not the first to notice this connection? It would be a really neat convergence if it was indeed unconscious. Does this mean we can now test some Chomskian ideas (such as their explanation of word displacement) by implementing them in software?

UPDATE: After chatting with my QMUL colleague Matt Purver - he actually is a computational linguistics expert, unlike me - I should add that there's a little bit less to this analogy than I initially thought. The most obvious disjunction is that the ReNN model performs language analysis in a left-to-right (or right-to-left) fashion, whereas Chomskyan minimialists do not: one thing they preserve from "traditional" grammar is the varying nested constructions of linguistic trees, nothing like as neat in general as the "sat on the mat" example above.

The ReNN model also doesn't really give you anything about long-range dependencies such as the way questions are often constructed with a kind of implicit "move" of a word from one part of the tree to another.

Matt and many other linguists have also told me it's problematic to consider a model where words and sentences are both represented in the same conceptual space. For example, a complete utterance usually implies some practical consequence in the real world, whereas its individual components do not. I recognise that there are differences, but personally I haven't heard any killer argument that they shouldn't exist in the same underlying space-like representation. (After all, many utterances consist of single words; many utterances are partial fragments; many utterances lead to consequences before the speaker has finished speaking.)

I do still believe there's an interesting analogy here. I definitely can't claim that any current ReNN model is an implementation of the Strong Minimalist Programme, but it'd be interesting to see the analogy pushed further, see where it breaks and how it can be improved.

Wednesday 21st January 2015 | science | Permalink / Comment

Beetroot and tomato soup

It's always good to have recipes for those packs of cooked beetroot. So let's have a nice simple soup. Serves 4 to 5 people, takes about half an hour:

  • 1 pack cooked beetroot (4 beetroots), drained
  • 3 big tomatoes (or 5 small ones)
  • 1 small onion
  • 1 clove garlic
  • 1 tsp cumin seed (or less if you prefer)
  • Marge / butter / oil for frying
  • 1/2 cup of milk

On a medium heat, heat up a blob of marge in a deep saucepan with a lid. Roughly slice the onion and add it. Then add the cumin seed and stir. Slice the garlic and add that. Add salt and pepper. Let that all fry gently to soften, for about 5 minutes.

Roughly chop the tomatoes and pile them on top. Boil the kettle and add enough hot water so that it only-just-about covers the things in the pan (maybe 1/2 a cup). Stir, then put the pan lid on, turn the heat down to low, and let it bubble gently for 15 minutes.

Cut the beetroot roughly into chunks, add it to the pan and stir. Take the pan off the heat. Let everything cool for a minute, then carefully ladle it all into a big blender. (I say carefully because you still want to be careful about beetroot stains!) Whizz it all up, briefly so that there are no pieces left but it's still kinda thick. Return it to the pan.

Warm it up again, adding the milk at the end - not too much, just enough to slacken it and give it a touch of creaminess.

Serve with bread and butter.

Wednesday 7th January 2015 | recipes | Permalink / Comment

Tea-smoked turkish delight

At our local International Supermarket they do some great turkish delight. However, the batch I bought recently tasted funny - I think they must have stored the turkish delight alongside a big mound of parsley, because it had obviously absorbed some flavours which didn't really suit it!

So what can you do if your turkish delight has absorbed some flavours? Make it absorb some more!

So I experimented with tea-smoking the turkish delight. I was nervous that something weird would happen in the wok (I've never tried warming up turkish delight before...) but it turned out fine and the smoky flavour works well.

Actually I'd like them a bit more smoky than they are, so you might want to increase some of the proportions here:

  • One pack of turkish delight (mine had pistachios in)
  • Three teabags
  • Dark sugar
  • Rice

Get a wok (or similar) and put a layer of tinfoil in.

Rip open the teabags and pour their contents onto the foil. Then add roughly equal quantities of rice and sugar. Mix it up a bit with your fingers. Put the wok on a medium heat. It'll take a few minutes until it starts smoking.

Meanwhile, you'll need something into the wok which will hold the turkish delight well away from the heat, but will allow the smoke to circulate. Maybe some sort of steaming pan. I used a thing for stopping oil from spitting at you.

Don't put the "thing" into the wok yet, keep it to one side. On top of the thing, put some tinfoil and put the pieces of turkish delight onto that. Space them out, and make sure the foil won't stop smoke from circulating.

Turkish delight, ready to get smoked

When the tea has started smoking, put the thing-and-foil-and-delight into it, and put a tight lid on top. (You could use foil, if you don't have a good lid.)

Turn the heat down a bit and let the thing smoke gently for about 15 minutes. Don't peek inside! You don't want the smoke to escape! After 15 minutes, turn off the heat and just leave the pan there for a couple of hours to let the process continue.

When it's all finally done, open the pan - in a well-ventilated area. Take the delight out, and sprinkle with a bit more icing sugar to serve.

Friday 21st November 2014 | recipes | Permalink / Comment
Carpenters Estate - Is it viable or not? (Wednesday 1st October 2014)
Carpenters Estate, Stratford - some background (Saturday 27th September 2014)
[Blog archives]
This blog is powered by SamXom
Syndication: [Atom] [RSS]
Creative Commons License
Dan's blog articles may be re-used under the Creative Commons Attribution-Noncommercial-Share Alike 2.5 License. Click the link to see what that means...