OKFN Edinburgh March 2014

The MSP Tag Cloud was demoed at the OKFN's Edinburgh meetup in March.

The Open Knowledge Foundation‘s Edinburgh meetup group gathered together at Napier University’s Merchiston Campus on Thursday to discuss Scotland’s open census, hack days, data protection, integrating datasets, data journalism, and interactive accountability.

After hearing about these guys at FOSS4G, I’m very glad to finally meet the community. The group was friendly and the talks were fascinating. Eagerly awaiting the next one!

Here are my highlights from an enlightening evening.

Lamine Lachhab, Ed Turnbull, and Sandy Taylor from National Records Scotland discussed the products and processes of the 2011 Scottish census.

About 25% of responses came through the new online form. The estimated response rate was about 94%. They imputed the data to make that up to 100% for analysis. I learned that ‘imputation‘ is a fancy word statisticians use to mean ‘making up missing data’.

All census products are available under the Open Government License. Some tables are still to be published so watch for updates throughout 2014.

If you want to compare regions visually, check out the area profiles map. The speaker demoed a neat lasso tool for selecting multiple regions, but I can’t find it yet.

Visit the data warehouse if you want to grab the data in bulk for your own analyses.

Sally Kerr from Edinburgh Council talked about the council’s continuing efforts to find and free our tax-funded digital assets.

Lots of useful data is still buried in an Excel spreadsheet somewhere!

Edinburgh Apps, launched in Leith late last year, was the council’s first ‘public data hackathon’, an attempt to rally the Edinburgh tech community to solving civic problems.

Volunteers exchanged ideas and prototypes for the modest prize of local fame and business advice. The event was so enthusiastically received that the council plans at least one repeat this year, with even greater incentives.

Glasgow Council is hosting four of these events. We’re not at all bitter!

Tim Musson, now self-employed after a long lecturing career at Napier, talked about his role in advising companies and lawmakers on computers, data, and privacy.

There are many social and legal challenges in implementing effective rules at government and company level. ‘Trusted’ third parties might pose a threat to vulnerable people when handling their valuable data.

For example, Ed Davey, the Energy Secretary, proposes that energy companies encode energy consumption on bills as a QR code. How many people even know what a QR code is, never mind how to read it? What stops a third party from capturing this?

Wilbert Kraan of CETIS talked about linking open educational data sets. His slides are available under a CC-BY license.

Wilbert wanted to join disparate data sets on educational institutions and courses to create a richer set of data for analysis.

He discussed the challenges of dirty data and resolving different keying conventions, and practical ways to concord each data set with all the others.

His most practical option is to key everything against Freebase, the most stable and comprehensive set of reference IDs for his topic.

Unusually for a Google product, the data in Freebase is under an open CC-BY license.

Wilbert prefers Freebase to Wikipedia-derived DBpedia because Freebase uses stable numbers as keys, whereas DBpedia uses names which can change over time.

Without getting bogged down in technical jargon, Wilbert was basically singing the praises of surrogate keys, an important part of an effective data warehouse.

Hey, Wilbert, have you ever considered a career in Business Intelligence? 🙂

Ally Tibbitt works at STV and is a budding data journalist.

He calls for contributors to help him build Placemakr, a website to make available the results of FOI requests in Scotland and analyses using such data.

Sounds like his long term plan is to build a more socially-minded ScraperWiki, to help those who can’t afford such services. Watch this space for more on that 🙂

What’s the gayest neighborhood in Edinburgh? How long will you have to wait for an allotment? What is the most car-clogged section of the city? What’s the noisiest town in Midlothian? It’s all on Placemakr.

Bruce Ryan demoed an in-progress interactive map of Scottish community council information sourced from various government APIs.

The map was implemented using GeoJSON for data interchange and leaflet.js for presentation, with the markercluster plugin to neatly split and group nearby councils at different zoom levels. Looking forward to the first public release!

Finally we heard from Daniel Duma, a PhD at Edinburgh Uni, after his team’s sleepless three-day toil to win Smart Data Hack 2014. I would call the result an experiment in “interactive accountability”.

The MSP Involvement Map is the result of a small team’s three day sprint to win a hack day competition. The app periodically parses the Holyrood transcripts to generate a tag cloud for what each MSP discusses in parliament.

Does your electee represent your interests? Is average word count and intervention count really an effective measure? Could the “MSP tag cloud” become another metric for politicians to game?

Questions like these generate a lot of excitement among parliamentarians, with Duma’s team meeting members this week. Where will it go?

The code’s on Github, so you can take it anywhere you want! 🙂

Starting up with Riak 1.0.2

Warning: installing Riak 1.0.2 was painful.

Basho no longer provides a package or any support for such an old version. The docs go back no further than version 1.1.0.

So why bother?

Seven Databases in Seven Weeks uses version 1.0.2 for its examples. I wanted to play along at home.

In the end, here’s what worked for me.

Install Git

This bit just worked.

sudo apt-get install git

Install Erlang

This was the hardest part to get right.

Apparently, Erlang has “slight nuances” between even minor versions.

Don’t install the latest version; it’s too new. Don’t install R14 either; it’s too old.

In the end, the exact version R14B02 worked for me.

The Basho docs recommend something called kerl for installing specific Erlang versions.

sudo apt-get install curl
sudo apt-get install libncurses5-dev
curl -O https://raw.github.com/spawngrid/kerl/master/kerl; chmod a+x kerl
./kerl build R14B01 r14b02
./kerl install r14b02 ~/erlang/r14b02
. ~/erlang/r14b02/activate

You have to run that last line every time you want to use Erlang in a new shell.

Getting Erlang right was made harder by a little quirk of the Riak complation process. Each component does its own version check. It can take a while for compilation to fail. Couldn’t that be checked up-front?

Before I read the manual more carefully, I saw lots of variants of this message.

ERROR: OTP release R14B does not match required regex R14B0[23]

So what’s the simplest way to check your Erlang version?

$ erl -eval 'erlang:display(erlang:system_info(otp_release)), halt().' -noshell
"R14B02"

That’s just obtuse. So obtuse that at some point I’ll have to read Seven Languages in Seven Weeks to find out why.

Install Riak

To be fair, this part was mostly difficult because other components were either buggy or unfamiliar.

Once I sorted out my envioronment, the Basho stuff did exactly what it said on the tin.

Use these commands to fetch version 1.0.2 from Github:

git clone https://github.com/basho/riak.git
cd riak
git checkout tags/riak-1.0.2

Cloning the whole repo to ‘checkout’ one tag doesn’t really make sense to me, but I don’t know Git very well. I was just following the winning answer on Stack Overflow.

Git warned me that I was in ‘detached HEAD’ state. Amusing and baffling in equal measure.

One more thing: there’s a crap bug in the default compiler tools on 64-bit Xubuntu 13.04. It might affect other distributions too.

To avoid an error like this:

ERROR: cc c_src/skein.o c_src/skein_api.o c_src/skein_block.o c_src/skein_debug.o c_src/skerl_nifs.o -lstdc++ -shared -L/home/sandport/erlang/r14b02/lib/erl_interface-3.7.3/lib -lerl_interface -lei -o priv/skerl_nifs.so failed with error: 1 and output:
/usr/bin/ld: cannot find -lstdc++

Do this:

sudo apt-get install g++-multilib

Now, finally, the good stuff.

make rel

Sit back, sip a beer, and wait about 15 minutes for Riak to compile.

Next, fire it up…

cd rel/riak
bin/riak start

…and kick the tyres.

$ bin/riak-admin test
Successfully completed 1 read/write cycle to 'riak@127.0.0.1'

It works!

Now for Day 1: CRUD, Links, and MIMEs.

Lessons

Data technology is changing fast. The literature can hardly keep up. It’s an exciting time to be a database engineer!

Avoid installing from source if possible. Packages exist to let you get on with your life.

Be skeptical when someone says “building my project from source is quite easy.” They don’t know your environment.