How to use a different key for Return on Ubuntu

My Return key is getting a bit flaky, so I've just worked out how to reuse my "AltGr" key as a backup in case of emergency. Noting it down for my own reference.

In Linux you can configure anything, including custom keyboard layouts, but it can be tricky to find the advice online that actually matches your system. (Standard Ubuntu has some helper apps that a LXDE desktop doesn't, etc etc.) I think this method will work with any Debian-based system, and I took it from Debian's multimedia-keys guide.

Basically, follow the instructions in that wiki page:

  • I used "xev" to discover that my AltGr key emits keycode 108. I also noticed that when I pressed the Return key, it confirmed that the official name for the Return key event is "Return".
  • I made a file ".xmodmaprc" containing just the text inside these quotes: "keycode 108 = Return"
  • I ran "xmodmap .xmodmaprc"

That's enough to enable it for this session. Then you just have to add that last command as one of your autostart programs so it'll be available in future sessions. I did it by running "gnome-session-properties" tool, but also belt-and-braces in my .zshrc because of some other config I have in there.

Thursday 26th March 2015

Björk and Kim Gordon and me splitting up

I hadn't noticed that both Björk and Kim Gordon had split up from their long-term partners. Thing is, last summer I split up from my long-term partner. I think back then I would have gained some solace from the work of fabulous artists who were going through analogous pain - even if the analogies were only superficial. But then, I was deep in it all, too busy with our own difficulties and the horrible pile of practicalities that accompany a long-term breakup, to notice everything around me.

That changed this year, when Kim Gordon published a book, and Björk published an album, both of which foreground their breakups. But the timing has changed - it's not last summer any more, and I'm not in the heat of it, so I don't need to take solace from these things like I did back then. I can feel some empathy, remotely of course. I really wonder what I would have made of it all last summer.

Björk's album Vulnicura is a heartfelt and very raw heartbreak album. It does some amazing things, some really intense sonic moments, which work on these emotional issues universally, and Björk's lyrics are along similar lines but they often dive really deep into uncomfortably intimate detail. A bit like hearing someone listing all the bad things their ex did, and not being sure quite how literally to take it all:

"Family was always our sacred mutual mission/ Which you abandoned."

It's so direct that it makes me want to hear the other side's rejoinder. (I should say, by the way, that that quote is nothing like what was going on in my case.)

It's really quite striking though that it's Björk and Kim Gordon. Björk has for decades expressed a very open and deep emotional literacy in her music, and in recent years it's increasingly been less about torrid excitement and more about family bonds. Kim Gordon and Thurston Moore were the alt-rock equivalent of the ideal family: together for decades, raising kids while pushing Sonic Youth and their other projects ever onward. And if these paragons are not immune, well on the one hand that's really sad, while on the other hand maybe I can take some solace from it after all. Their lives aren't exactly average lives, but then whose is? We're still all tumbling through the same kaleidoscope.

Saturday 21st March 2015

A static site generator for a research group website using Poole and git

The whole idea of static site generators is interesting, especially for someone who has had to deal with the pain of content management systems for website making. I've been dabbling with a static site generator for our research group website and I think it's a good plan.

What's a static site generator?

Firstly here's a quick historical introduction to get you up to speed on what I'm talking about:

  1. When the web was invented it was largely based on "static" HTML webpages with no interaction, no user-customised content etc.
  2. If you wanted websites to remember user details, or generate content based on the weather, or based on some database, the server-side system had to run software to do the clever stuff. Eventually these evolved into widely-used "content management systems" (CMSes) - such as drupal, wordpress, plone, mediawiki.
  3. However, CMSes can be a major pain in the arse to look after. For example:
    • They're very heavily targeted by spammers these days. You can't just leave the site and forget about it, especially if you actually want to use CMS features such as user-edited content, or comments - you need to moderate the site.
    • You often have to keep updating the CMS software, for security patches, new versions of programming languages, etc.
    • They can be hard to move from one web host to another - they'll often have not-the-right version of PHP or whatever.
  4. More recently, HTML+CSS+JavaScript have developed to the point where they can do a lot of whizzy stuff themselves (such as loading and formatting data from some data source), so there's not quite as much need for the server-side cleverness. This led to the idea of a static site generator - why not do all the clever stuff at the point of authoring the site, rather than at the point of serving the site? The big big benefit there is that the server can be completely dumb, just serving files over the network as if it was 1994 again.
    • That gets rid of many security issues and compatibility issues.
    • It also frees you up a bit: you can use whatever software you like to generate the content, it doesn't have to be software that's designed for responding to HTTP requests.
    • It does prevent you from doing certain things - you can't really have a comments system (as in many blogs) if it's purely client-side, for example. There are workarounds but it's still a limitation.

It's not as if SSGs are poised to wipe out CMSes, not at all. But an SSG can be a really neat alternative for managing a website, if it suits your needs. There are lots of nice static site generators out there.

Static site generators for academic websites

So here in academia, we have loads of old websites everywhere. Some of them are plain HTML, some of them are CMSes set up by PhD students who left years ago, some of them are big whizzy CMSes that the central university admin paid millions for and doesn't quite do everything you want.

If you're setting up a new research group website, questions that come to mind are:

  • How much pain it would take to convince the IT department to install this specific version of python/PHP/ruby, plus all the weird little plugins that this software demands?
  • Who's going to maintain the website for years, applying security patches, dealing with hacks, etc?
  • If I go through this hassle of setting up a CMS, which of its whizzy features do I actually want to use? Often you don't really care about many core CMS features, and the features you do want (such as publications lists) are handled by some half-baked plugin that a half-distracted academic cobbled together years ago and now doesn't work properly.

So using a static site generator (SSG) might be a really handy idea. So that's what I've done. I used a static site generator called Poole which is written in Python and it appealed to me because of how minimal it is.


It has one HTML template which you can make yourself, and then it takes content written in markdown syntax and puts the two together to produce your HTML website. It lets you embed bits of python code in the markdown too, if there's any whizzy stuff needed during page generation. And that's it, it doesn't do anything else. Fab!

But there's more: how do people in our research group edit the site? Do they need to understand this crazy little coding system? No! I plugged Poole together with github for editing the markdown pages. The markdown files are in a github project. As with any github project, anyone can propose a change to one of the textfiles. If they're not pre-authorised then it becomes a "Pull Request" which someone like me checks before approving. Then, I have a little script that regularly checks the github project and regenerates the site if the content has changed.

(This is edging a little bit more towards the CMS side of things, with the server actually having to do stuff. But the neat thing is firstly that this auto-update is optional - this paradigm would work even if the server couldn't regularly poll github, for example - and secondly, because Poole is minimal the server requirements are minimal. It just needs Python plus the python-markdown module.)

We did need a couple of whizzy things for the research site: a publications list, and a listing of research group members. We wanted these to come from data such as a spreadsheet so it could be used in multiple pages and easily updated. This is achieved via the embedded bits of python code I mentioned: we have publications stored in bibtex files, and people stored in a CSV file, and the python loads the data and transforms it into HTML.

It's really neat that the SSG means we have all our content stored in a really portable format: a single git repository containing some of the most widely-handled file formats: markdown, bibtex and CSV.

So where is this website? Here: http://c4dm.eecs.qmul.ac.uk/

Saturday 21st March 2015

Non-negative matrix factorisation in Stan

Non-negative matrix factorisation (NMF) is a technique we use for various purposes in audio analysis, to decompose a sound spectrogram.

I've been dabbling with the Stan programming environment here and there. It's an elegant design for specifying and solving arbitrary probabilistic models.

(One thing I want to point out is that it's really for solving for continuous-valued parameters only - this means you can't explicitly do things like clustering etc (unless your approach makes sense with fuzzy cluster assignments). So it's not a panacea. In my experience it's not always obvious which problems it's going to be most useful for.)

So let's try putting NMF and Stan together.

First off, NMF is not always a probabilistic approach - at its core, NMF simply assumes you have a matrix V, which happens to be the product of two "narrower" matrices W and H, and all these matrices have non-negative values. And since Stan is a probilistic environment we need to choose a generative model for that matrix. Here are two alternatives I tried:

  1. We can assume that our data was generated by an independent random complex Gaussian for each "bin" in the spectrogram, each one scaled by some weight value specified by a "pixel" of WH. If we're working with the power spectrogram, this set of assumptions matches the model of Itakura-Saito NMF, as described in Fevotte et al 2009. (See also Turner and Sahani 2014, section 4A.)
  2. We can assume that our spectrogram data itself, if we normalise it, actually just represents one big multinomial probability distribution. Imagine that a "quantum" of energy is going to appear at some randomly-selected bin on your spectrogram (a random location in time AND frequency). There's a multinomial distribution which represents the probabilities, and we assume that our spectrogram represents it. This is a bit weird but if you assume we got our spectrogram by sampling lots of independent quanta and piling them up in a histogram, it would converge to that multinomial in the limit. This is the model used in PLCA.

So here is the Stan source code for my implementations of these models, plus a simple toy dataset as an example. They both converge pretty quickly and give decent results.

I designed these implementations with audio transcription in mind. When we're transcribing music or everyday sound, we often have some pre-specified categories that we want to identify. So rather than leaving the templates W completely free to choose, in these implementations I specify pre-defined spectral templates "Winit".

(Specifying these also breaks a permutation symmetry in the model, which probably helps the model to converge since it shouldn't keep flipping around through different permutations of the solution. Another thing I do is fix the templates W to sum up to 1 each [i.e. I force them to be simplexes] because otherwise there's a scaling indeterminacy: you could double W and halve H and have the same solution.)

I use a concentration parameter "Wconc" to tell the model how closely to stick to the Winit values, i.e. how tight to make the prior around them. I also use an exponential prior on the activations H, to encourage sparsity.

My implementation of the PLCA assumptions isn't quite traditional, because I think in PLCA the spectrogram is assumed to be a sample from a multinomial (which implies it's quantised). I felt it would be a bit nicer to assume the spectrogram is itself a multinomial, sampled from a Dirichlet. There's little difference in practice.

Monday 2nd February 2015

Islam and imagery

I'm quite naive about Islam, so it's hard to get a clear idea of "normal" Islam underneath the headlines about the tiny proportion of violent extremists. Part of the Charlie Hebdo thing was the question about whether it's OK to depict the prophet. So just for reference I found this quote from Tariq Ali in the LRB helpful:

"On the question of images there has always been a debate within Islam. The Quran itself contains warnings against the worship of idols and graven images, but this is taken straight from the Abrahamic tradition and the Old Testament. It’s a stricture on forms of worship. After all, images of the prophet were embossed on early Muslim coins to replace Byzantine and Persian potentates. A number of paintings by Muslim artists in the late medieval period depict the prophet with loving care. The Shia tradition has always ignored the supposed ban on images and portraits of Shia imams have never been forbidden. All the different schools of Sunni jurisprudence don’t agree on the question. It has only become a big issue since Saudi money pushed Wahhabi clerics onto the world stage to fight communism during the Cold War (with the total backing of Washington). Wahhabi literalism misinterprets the Quran and its hostility to images led the Saudi government to destroy the graves in Mecca of the prophet, his companions and his wives. There were no protests except by architects and historians who denounced the vandalism. One can only imagine the response in the world of Islam had the destruction of the graves been carried out, deliberately or accidentally, by a Western power."

Saturday 31st January 2015

Thomas Piketty: Capital in the 21st Century

Just finished reading Thomas Piketty's now-famous book Capital in the Twenty-First Century. It's a big book, takes a while, but it's thoroughly worth it. It's admirably data-driven, yet clear and readable.

One thing it gives you is a lot of tips about whether the future will be the same as the past. Will inflation, growth, inequality be similar in the next 50 years as in the past 50 years? There are some perennial factors, and some things that are actually blips (caused e.g. by the World Wars) when considered on a timescale of centuries - and many of these are not quite as I had expected. Plenty of interesting titbits about, for example, how the amount of wealth we inherit is or isn't changing as our average lifespan gets longer, and how much "gift-giving" has grown in recent years as an "alternative" to inheritance.

And the readability is enhanced with details from history and literature, such as the way authors such as Jane Austen reflect the financial certainties of their time in their prose.

(It's worth mentioning that the Financial Times briefly smudged the book's reputation by alleging some of the data was wrong. (See here for example.) However, as The Economist noted, "the analysis does not seem to support many of the allegations made by the FT, or the conclusion that the book's argument is wrong." The FT later awarded Piketty's book their business book of the year, perhaps sensing which way the wind was blowing.)

I agree with the LRB's review of the book that "nothing about the book is more impressive than the range and richness of its statistical information. [The book] both insists on the importance of data and, at least where modern societies are concerned, highlights the uncertainties involved in its collection." (Note: the LRB also takes issue with some of Piketty's interpretations - interesting review.)

The LRB review also points out that Piketty's proposed mechanism for fixing some of the problems of modern markets and global captialism - namely, a global progressive tax on capital - may be a nice theory to consider but it's so unrealistic as to be unhelpful. Piketty claims (p515) that even though it's utopian, it's worth considering, maybe even working towards. But to me it seems obviously to neglect so many extremely powerful problems. Of course the richest people have the most political power, and will fight to prevent such taxes being adopted (or if they are adopted, to build a new tax haven state on an oil-rig somewhere). Maybe this is just a practical problem, to be solved by politics. But the tax would require all forms of wealth, owned by every person on and off the planet, to be enumerated. If not, the rich and their financial engineers would transfer their assets into un-enumerated assets. This reminds me of the "information asymmetry" critique of neo-classical economics: many market theories assume that all market participants have perfect information, and this unrealistic assumption is required to prove those markets' effectiveness. Similarly, the effectiveness of the global capital tax in balancing out inequality rests really quite heavily on the idea of some tax agencies somewhere having essentially perfect knowledge. Unfortunately (and as Piketty notes) modern financial engineering means that many taxes, even progressive taxes, become a bit regressive when everyone is subject to them except for the super-rich.

Piketty also says (p519) that the main benefit of the wealth tax (charged at low annual rates, nothing too scary) is "more in the nature of a compulsory reporting law than a true tax" - in a sense, he's less interested in taxing wealth than encouraging transparency in wealth ownership. This again is a bit utopian, since of course people will be motivated to avoid some things being reported, but not so problematic, since the benefits of transparency don't require 100% transparency. And it chimes well with Piketty's insistence that it's not financial rules that can fix the world, but publicly-available information and democratic debate combined with the rule of law. Piketty points out (p570) that most businesses don't publish enough financial details for us to work out how the wealth is divided between profits and wages. If they were required to, that would empower workers. As well as economists ;)

Wednesday 28th January 2015
