Convert JSON to CSV with jq and RecordStream

JSON is all the rage these days. XML and CSV are deeply uncool.

Jazzy or Jobsworth?

There are good reasons for JSON’s popularity as an interchange format, but right now you don’t care. You just want the data in a form understood by your tried-n-trusted tools.

Some trendy API is spraying out jazzy JSON at you, but you need something flat and tabular so you can populate your boring, business-critical spreadsheets and relational databases.

Sometimes you just want good old fashioned CSV. How are you gonna get it?

Say you want to import the list of metropolitan areas that Last.fm produces charts for. (I’m actually doing this for a project.)

Fetching the data from the Last.fm’s API is easy enough. Just make a web request with cURL and save the response to a file.

curl "http://ws.audioscrobbler.com/2.0/?method=geo.getmetros&api_key=7570768adbf999b04fadb54aa0548a96&format=json" \
> lastfm_metros.json

What did you get?

cat lastfm_metros.json

Yuck! How can you even read this, never mind work with it in a spreadsheet!?

{"metros":{"metro":[{"name":"Melbourne","country":"Australia"},{"name":"Adelaide","country":"Australia"},{"name":"Sydney","country":"Australia"},{"name":"Linz","country":"Austria"},{"name":"Graz","country":"Austria"},{"name":"Vienna","country":"Austria"},{"name":"Salzburg","country":"Austria"},{"name":"Innsbruck","country":"Austria"},{"name":"Minsk","country":"Belarus"},{"name":"Li\u00e8ge","country":"Belgium"},...

At this point you might be tempted to throw Perl or Python at the problem.

If you’re like me, you don’t like to write code when there’s already a tool out there that solves the problem.

Are there any tools out there that solve this problem? Yes!

The first is jq, a command-line JSON processor. It’s great for exploring and reshaping JSON.

If you use Ubuntu, you can install it as an apt package.

sudo apt-get install jq

The second tool is recs-tocsv, part of the RecordStream suite. It simply converts a certain shape of JSON to CSV.

RecordStream is written in Perl, so you can install it as a Perl module.

sudo cpanm -i App::RecordStream

With these tools you can solve the problem in three steps.

Step 1: Use jq to reformat the messy response so you can actually read it and figure out its structure.

Step 2: Use jq to transform the complicated response into a simpler intermediate form.

Step 3: Use recs-tocsv to turn the intermediate form into CSV, optionally specifying a column order.

Let’s get started with jq to complete step 1.

The simplest useful thing that jq does is pretty-printing. Just pipe some JSON into it and use the dot filter.

cat lastfm_metros.json |
jq .

The dot filter is jq’s identity transformation. Logically the output is identical to the output, but jq writes it out neatly with whitespace and colors so you can actually read it.

{
  "metros": {
    "metro": [
      {
        "country": "Australia",
        "name": "Melbourne"
      },
      {
        "country": "Australia",
        "name": "Adelaide"
      },
      {
        "country": "Australia",
        "name": "Sydney"
      },
...

Is it clearer now? The response is basically a list of country-metro pairs, but each pair is wrapped in an object, in a list, in an object, in an object.

The final output is obviously now a CSV file with two columns.

Let’s help recs-tocsv to help us by creating the the form that it works best with. This is step 2.

recs-tocsv prefers a stream of flat linear JSON objects as input.

Stream means the output should be one JSON object per record; not one big object. In this example, one metro is one record.

Flat means the keys of each object should refer only to scalar values; no lists or other objects. The metro objects are already flat.

Linear means each object should be output on one line of text; line breaks shall delimit records. The original response happens to be linear.

You can create a record stream with a sequence of jq filters. Just like how bash allows you to pipe cat into jq, so jq allows you to pipe one filter into the next.

cat metros.json |
jq ".metros | .metro | .[]"

The filter .metros outputs the value of the metros key, which is another object. .metro outputs the value of the metro key, which is a list. .[] outputs all the list elements as seperate values.

With this sequence of filters, jq produces a flat stream.

{
  "country": "Australia",
  "name": "Melbourne"
}
{
  "country": "Australia",
  "name": "Adelaide"
}
{
  "country": "Australia",
  "name": "Sydney"
}
...

To make the stream linear, switch on “compact” output.

cat ./responses/metros.json |
jq ".metros | .metro | .[]" --compact-output

Compact mode removes all the unnecessary whitespace, except for newlines at the end of records.

{"country":"Australia","name":"Melbourne"}
{"country":"Australia","name":"Adelaide"}
{"country":"Australia","name":"Sydney"}
{"country":"Austria","name":"Linz"}
{"country":"Austria","name":"Graz"}
...

Now you have the correct intermediate form, step 3 is gonna be simple.

Step 3: pipe the output into recs-tocsv, and save to a file.

cat ./responses/metros.json |
jq ".metros | .metro | .[]" --compact-output |
recs-tocsv

Ta-da! CSV at last.

country,name
Australia,Melbourne
Australia,Adelaide
Australia,Sydney
Austria,Linz
Austria,Graz
Austria,Vienna
Austria,Salzburg
Austria,Innsbruck
Belarus,Minsk
Belgium,"Liège"
...

If the order of columns doesn’t matter to you, then you’re done!

Bulk insert operations into a relational database often expect a certain column order in the flat file source.

If you need to swap the two columns around, use the key parameter.

cat ./responses/metros.json |
jq ".metros | .metro | .[]" --compact-output |
recs-tocsv --key name,country

The key parameter takes a comma-seperated list of key names in the JSON source, and basically looks just like the header of the CSV file.

name,country
Melbourne,Australia
Adelaide,Australia
Sydney,Australia
Linz,Austria
Graz,Austria
Vienna,Austria
Salzburg,Austria
Innsbruck,Austria
Minsk,Belarus
"Liège",Belgium
...

All done!