On Wednesday we had a "flagship seminar" from Prof Andy Hopper on Computing for the future of the planet. How can computing help in the quest for sustainability of the planet and humanity?
Lots of food for thought in the talk. I was surprised to come out with a completely different take-home message than I'd expected - and a different take-home message than I think the speaker had in mind too. I'll come back to that in a second.
Some of the themes he discussed:
- Green computing. This is pretty familiar: how can computing be less wasteful? Low-energy systems, improving the efficiency of computer chips, that kind of thing. A good recent example is how DeepMind used machine learning to reduce the cooling requirements of a Google data centre by 40%. 40% reductions are really rare. Hopper also have a nice example of "free lunch" computing - the idea is that energy is going unused somewhere out there in the world (a remote patch of the sea, for example) so if you stick a renewable energy generator and a server farm there, you essentially get your computation done at no resource cost.
- Computing for green, i.e. using computation to help us do things in a more sustainable way. Hopper gave a slightly odd example of high-tech monitoring that improved efficiency of manufacturing in a car factory; not very clear to me that this is a particularly generalisable example. How about this much better example? Open source geospatial maps and cheap new tools improve farming in Africa. "Aerial drones, crowds of folks gathering soil samples and new analysis techniques combine as pieces in digital maps that improve crop yields on African farms. The Africa Soil Information Service is a mapping effort halfway through its 15-year timeline. Its goal is to publish dynamic digital maps of all of Sub-Saharan Africa at a resolution high enough to serve farmers with small plots. The maps will be dynamic because AfSIS is training people now to continue the work and update the maps." - based on crowdsourced and other data, machine-learning techniques are used to create a complete picture of soil characteristics, and can be used to predict where's good to farm what, what irrigation is needed, etc.
- While we're on the subject of computing-for-green - have a look at Jack Kelly's excellent list of tips Climate change mitigation for hackers
Then Hopper also talked about replacing physical activities by digital activities (e.g. shopping), and this led him on to the topic of the Internet, worldwide sharing of information and so on. He argued (correctly) that a lot of these developments will benefit the low-income countries even though they were essentially made by-and-for richer countries - and also that there's nothing patronising in this: we're not "developing" other countries to be like us, we're just sharing things, and whatever innovations come out of African countries (for example) might have been enabled by (e.g.) the Internet without anyone losing their own self-determination.
Hopper called this "wealth by proxy"... but it doesn't have to be as mystifying as that. It's a well-known idea called the commons.
The name "commons" originates from having a patch of land which was shared by all villagers, and that makes it a perfect term for what we're considering now. In the digital world the idea was taken up by the free software movement and open culture such as Creative Commons licensing. But it's wider than that. In computing, the commons consists of the physical fabric of the Internet, of the public standards that make the World Wide Web and other Internet actually work (http, smtp, tcp/ip), of public domain data generated by governments, of the Linux and Android operating systems, of open web browsers such as Firefox, of open collaborative knowledge-bases like Wikipedia and OpenStreetMap. It consists of projects like the Internet Archive, selflessly preserving digital content and acting as the web's long-term memory. It consists of the GPS global positioning system, invented and maintained (as are many things) by the US military, but now being complemented by Russia's GloNass and the EU's Galileo.
All of those are things you can use at no cost, and which anyone can use as bedrock for some piece of data analysis, some business idea, some piece of art, including a million opportunities for making a positive contribution to sustainability. It's an unbelievable wealth, when you stop to think about it, an amazing collection of achievements.
The surprising take-home lesson for me was: for sustainable computing for the future of the planet, we must protect and extend the digital commons. This is particularly surprising to me because the challenges here are really societal, at least as much as they are computational.
There's more we can add to the commons; and worse, the commons is often under threat of encroachment. Take the Internet and World Wide Web: it's increasingly becoming centralised into the control of a few companies (Facebook, Amazon) which is bad news generally, but also presents a practical systemic risk. This was seen recently when Amazon's AWS service suffered an outage. AWS powers so many of the commercial and non-commercial websites online that this one outage took down a massive chunk of the digital world. As another example, I recently had problems when Google's "ReCAPTCHA" system locked me out for a while - so many websites use ReCAPTCHA to confirm that there's a real human filling in a form, that if ReCAPTCHA decides to give you the cold shoulder then you instantly lose access to a weird random sample of services, some of those which may be important to you.
Another big issue is net neutrality. "Net neutrality is like free speech" and it repeatedly comes under threat.
Those examples are not green-related in themselves, but they illustrate that out of the components of the commons I've listed, the basic connectivity offered by the Internet/WWW is the thing that is, surprisingly, perhaps the flakiest and most in need of defence. Without a thriving and open internet, how do we join the dots of all the other things?
But onto the positive. What more can we add to this commons? Take the African soil-sensing example. Shouldn't the world have a free, public stream of such land use data, for the whole planet? The question, of course, is who would pay for it. That's a social and political question. Here in the UK I can bring the question even further down to the everyday. The UK's official database of addresses (the Postcode Address File) was... ahem... was sold off privately in 2013. This is a core piece of our information infrastructure, and the government - against a lot of advice - decided to sell it as part of privatisation, rather than make it open. Related is the UK Land Registry data (i.e. who owns what parcel of land) which is not published as open data but is stuck behind a pay-wall, all very inconvenient for data analysis, investigative journalism etc.
We need to add this kind of data to the commons so that society can benefit. In green terms, geospatial data is quite clearly raw material for clever green computing of the future, to do good things like land management, intelligent routing, resource allocation, and all kinds of things I can't yet imagine.
As citizens and computerists, what can we do?
- We can defend the free and open internet. Defend net neutrality. Support groups like the Mozilla Foundation.
- Support open initiatives such as Wikipedia (and the Wikimedia Foundation), OpenStreetMap, and the Internet Archive. Join a local Missing Maps party!
- Demand that your government does open data, and properly. It's a win-win - forget the old mindset of "why should we give away data that we've paid for" - open data leads to widespread economic benefits, as is well documented.
- Work towards filling the digital commons with ace opportunities for people and for computing. For example satellite sensing, as I've mentioned. And there's going to be lots of drones buzzing around us collecting data in the coming years; let's pool that intelligence and put it to good use.
If we get this right, 20 years from now our society's computing will be green as anything, not just because it's powered by innocent renewable energy but because it can potentially be a massive net benefit - data-mining and inference to help us live well on a light footprint. To do that we need a healthy digital commons which will underpin many of the great innovations that will spring up everywhere.
My blog has been running for more than a decade, using the same cute-but-creaky old software made by my chum Sam. It was a lo-fi PHP and MySQL blog, and it did everything I needed. (Oh and it suited my stupid lo-fi blog aesthetics too, the clunky visuals are entirely my fault.)
Now, if you were starting such a project today you wouldn't use PHP and you wouldn't use MySQL (just search the web for all the rants about those technologies). But if it isn't broken, don't fix it. So it ran for 10 years. Then my annoying web provider TalkTalk messed up and lost all the databases. They lost all ten years of my articles. SO. What to do?
Well, one thing you can do is simply drop it and move on. Make a fresh start. Forget all those silly old articles. Sure. But I have archivistic tendencies. And the web's supposed to be a repository for all this crap anyway! The web's not just a medium for serving you with Facebook memes, it's meant to be a stable network of stuff. So, ideal would be to preserve the articles, and also to prevent link rot, i.e. make sure that the URLs people have been using for years will still work...
So, job number one, find your backups. Oh dear. I have a MySQL database dump from 2013. Four years out of date. And anyway, I'm not going back to MySQL and PHP, I'm going to go to something clean and modern and ideally Python-based... in other words Pelican. So even if I use that database I'm going to have to translate it. So in the end I found three different sources for all my articles:
- The old MySQL backup from 2013. I had to install MySQL software on my laptop (meh), load the database, and then write a script to iterate through the database entries and output them as nice markdown files.
- archive.org's beautiful Wayback Machine. If you haven't already given money to archive.org then please do. They're the ones making sure that all the old crap from the web 5 years ago is still preserved in some form. They're also doing all kinds of neat things like preserving old video games, masses and masses of live music recordings, and more. ... Anyway I can find LOTS of old archived copies of my blog items. There are two problems with this though: firstly they don't capture everything and they didn't capture the very latest items; and secondly the material is not stored in "source" format but in its processed HTML form, i.e. the form you actually see. So to address the latter, I had to write a little regular expression based script to snip the right pieces out and put them into separate files.
- For the very latest stuff, much of it was still in Google's web cache. If I'd thought of this earlier, I could have rescued all the latest items, since Google is I think the only service that crawls fast enough and widely enough to have captured all the little pages on my little site. So, just like with archive.org, I can grab the HTML files from Google, and scrape the content out using regular expressions.
That got me almost everything. I think the only thing missing is one blog article from a month ago.
Next step: once you've rescued your data, build a new blog. This was easy because Pelican is really nice and well-documented too. I even recreated my silly old theme in their templating system. I thought I'd have problems configuring Pelican to reproduce my old site, but it's basically all done, even the weird stuff like my separate "recipes" page which steals one category from my blog and reformats it.
Now how to prevent linkrot? The Pelican pages have URLs like "/blog/category/science.html" instead of the old "/blog/blog.php?category=science", and if I'm moving away from PHP then I don't really want those PHP-based links to be the ones used in future. I need to catch people who are going to one of those old links, and point them straight at the new URLs. The really neat thing is that I could use Pelican's templating system to output a little lookup table, a CSV file listing all the URL rewrites needed. Then I write a tiny little PHP script which uses that files and emits HTTP Redirect messages. ........... and relax. a URL like http://www.mcld.co.uk/blog/blog.php?category=science is back online.
OK now here we've got lots of lovely good news. Not only have my colleague Andrew McPherson and his team created an ultra-low-latency linux audio board called Bela. Not only can it do audio I/O latencies measured in microseconds (as opposed to the usual milliseconds). Not only did it just finish its kickstarter launch and received eleven times more funding than they asked for.
The extra good news is that we've got SuperCollider running on Bela. So you can run your favourite crazy audio synthesis/processing ideas on a tiny little low-latency box, almost as easily as running it on a laptop.
Can everyone use it? Well not just yet - the code to use Bela's audio driver isn't yet merged into the main SuperCollider codebase, and you need to compile my forked version of SC. So this blog is just to preview it. But we've got the code, as well as instructions for compiling, in this fork over here, and two of the Bela crew (Andrew and Giulio) have helped get it to the point where now I can run it in low-latency mode with no audio glitching.
Where do we go from here? It'd be nice if other people can test it out. (All those Kickstarter backers who are receiving their boards sometime soon...) There are a couple of performance improvements that can hopefully be done. Then eventually I hope we can propose it gets merged in to the SC codebase, perhaps for SC 3.8 or suchlike.
The whole idea of static site generators is interesting, especially for someone who has had to deal with the pain of content management systems for website making. I've been dabbling with a static site generator for our research group website and I think it's a good plan.
What's a static site generator?
Firstly here's a quick historical introduction to get you up to speed on what I'm talking about:
- When the web was invented it was largely based on "static" HTML webpages with no interaction, no user-customised content etc.
- If you wanted websites to remember user details, or generate content based on the weather, or based on some database, the server-side system had to run software to do the clever stuff. Eventually these evolved into widely-used "content management systems" (CMSes) - such as drupal, wordpress, plone, mediawiki.
- However, CMSes can be a major pain in the arse to look after. For example:
- They're very heavily targeted by spammers these days. You can't just leave the site and forget about it, especially if you actually want to use CMS features such as user-edited content, or comments - you need to moderate the site.
- You often have to keep updating the CMS software, for security patches, new versions of programming languages, etc.
- They can be hard to move from one web host to another - they'll often have not-the-right version of PHP or whatever.
- That gets rid of many security issues and compatibility issues.
- It also frees you up a bit: you can use whatever software you like to generate the content, it doesn't have to be software that's designed for responding to HTTP requests.
- It does prevent you from doing certain things - you can't really have a comments system (as in many blogs) if it's purely client-side, for example. There are workarounds but it's still a limitation.
It's not as if SSGs are poised to wipe out CMSes, not at all. But an SSG can be a really neat alternative for managing a website, if it suits your needs. There are lots of nice static site generators out there.
Static site generators for academic websites
So here in academia, we have loads of old websites everywhere. Some of them are plain HTML, some of them are CMSes set up by PhD students who left years ago, some of them are big whizzy CMSes that the central university admin paid millions for and doesn't quite do everything you want.
If you're setting up a new research group website, questions that come to mind are:
- How much pain it would take to convince the IT department to install this specific version of python/PHP/ruby, plus all the weird little plugins that this software demands?
- Who's going to maintain the website for years, applying security patches, dealing with hacks, etc?
- If I go through this hassle of setting up a CMS, which of its whizzy features do I actually want to use? Often you don't really care about many core CMS features, and the features you do want (such as publications lists) are handled by some half-baked plugin that a half-distracted academic cobbled together years ago and now doesn't work properly.
So using a static site generator (SSG) might be a really handy idea. So that's what I've done. I used a static site generator called Poole which is written in Python and it appealed to me because of how minimal it is.
Poole does ALMOST NOTHING.
It has one HTML template which you can make yourself, and then it takes content written in markdown syntax and puts the two together to produce your HTML website. It lets you embed bits of python code in the markdown too, if there's any whizzy stuff needed during page generation. And that's it, it doesn't do anything else. Fab!
But there's more: how do people in our research group edit the site? Do they need to understand this crazy little coding system? No! I plugged Poole together with github for editing the markdown pages. The markdown files are in a github project. As with any github project, anyone can propose a change to one of the textfiles. If they're not pre-authorised then it becomes a "Pull Request" which someone like me checks before approving. Then, I have a little script that regularly checks the github project and regenerates the site if the content has changed.
(This is edging a little bit more towards the CMS side of things, with the server actually having to do stuff. But the neat thing is firstly that this auto-update is optional - this paradigm would work even if the server couldn't regularly poll github, for example - and secondly, because Poole is minimal the server requirements are minimal. It just needs Python plus the python-markdown module.)
We did need a couple of whizzy things for the research site: a publications list, and a listing of research group members. We wanted these to come from data such as a spreadsheet so it could be used in multiple pages and easily updated. This is achieved via the embedded bits of python code I mentioned: we have publications stored in bibtex files, and people stored in a CSV file, and the python loads the data and transforms it into HTML.
It's really neat that the SSG means we have all our content stored in a really portable format: a single git repository containing some of the most widely-handled file formats: markdown, bibtex and CSV.
So where is this website? Here: http://c4dm.eecs.qmul.ac.uk/
Agh, I just got caught out by a "silent" change in the behaviour of scipy for Python. By "silent" I mean it doesn't seem to be in the scipy 0.12 changelog even though it should be. I'm documenting it here in case anyone else needs to know:
Here's the simple code example - using scoreatpercentile to find a percentile for some 2D array:
import numpy as np from scipy.stats import scoreatpercentile scoreatpercentile(np.eye(5), 50)
On my laptop with scipy 11.0 (and numpy 1.7.1) the answer is:
array([ 0., 0., 0., 0., 0.])
On our lab machine with scipy 13.3 (and numpy 1.7.0) the answer is:
In the first case, it calculates the percentile along one axis. In the second, it calculates the percentile of the flattened array, because in scipy 12 someone added a new "axis" argument to the function, whose default value "None" means to analyse the flattened array. Bah! Nice feature, but a shame about the compatibility. (P.S. I've logged it with the scipy team.)
I'm going to a conference next week, and the conference invites me to "Download the app!" Well, OK, you think, maybe a bit of overkill, but it would be useful to have an app with schedules etc. Here is the app listed on google play.
Oh and here's a list (abbreviated) of permissions that the app requires:
"""This application has access to the following:
- Your precise location (GPS and network-based)
- Full network access
- Connect and disconnect from Wi-Fi - Allows the app to connect to and disconnect from Wi-Fi access points and to make changes to device configuration for Wi-Fi networks.
- Read calendar events plus confidential information
- Add or modify calendar events and send email to guests without owners' knowledge
- Read phone status and identity
- Camera - take pictures and videos. This permission allows the app to use the camera at any time without your confirmation.
- Modify your contacts - Allows the app to modify the data about your contacts stored on your device, including the frequency with which you've called, emailed, or communicated in other ways with specific contacts. This permission allows apps to delete contact data.
- Read your contacts - Allows the app to read data about your contacts stored on your device, including the frequency with which you've called, emailed, or communicated in other ways with specific individuals. This permission allows apps to save your contact data, and malicious apps may share contact data without your knowledge.
- Read call log - Allows the app to read your device's call log, including data about incoming and outgoing calls. This permission allows apps to save your call log data, and malicious apps may share call log data without your knowledge.
- Write call log - Allows the app to modify your device's call log, including data about incoming and outgoing calls. Malicious apps may use this to erase or modify your call log.
- Run at start-up.
Now tell me, what fraction of those permissions should a conference-information app legitimately use? (I've edited out some of the mundane ones.) Should ANYONE install this on their phone/tablet?
I saw Brandon Mechtley's splmap which is for plotting sound-pressure measurements on a map. He mentioned a problem: the default "heatmap" rendering you get in google maps is really a density estimate which combines the density of the points with their values. "I need to find a way to ...
Over Christmas I helped someone set up their brand new Kindle Fire HD. I hadn't realised quite how coercive Amazon have been: they're using Android as the basis for the system (for which there is a whole world of handy free stuff), but they've thrown various obstacles ...
I've been storing a lot of my files in a private git repository, for a long time now. Back when I started my PhD, I threw all kinds of things into it, including PDFs of handy slides, TIFF images I generated from data, journal-article PDFs... ugh. Mainly a lot ...
Just notes for posterity: I am installing Cyanogenmod on my HTC Tattoo Android phone.
There are instructions here which are perfectly understandable if you're comfortable with a command-line. But the instructions are incomplete. As has been discussed here I needed to install "tattoo-hack.ko" in order to get the ...