London’s Rail System as a Network

As many of you know, for quite some time I’ve been wrestling with a massive data set from TfL — I make it that I have 120 million trip segments across 88 million identifiable journeys in the course of just 2 weeks. Those are astronomical numbers, and it’s rather worth thinking about the fact that any of this runs smoothly on a day-to-day basis the next time there’s a problem with your Bus or Tube journey…┬áThe Bus system accounts for more than half of this total, but even after removing them from the analysis we’re left with some 40 million journeys (an average of 3 million journeys each day) across Tube, Overground, and National Rail.

The difference between segments (or ‘legs’ as TfL call them) and journeys is important because it affects how we think about commuter flows through the city. A segment (or leg) is the part of someone’s trip between two Oyster taps. A journey might contain just one segment, or it might contain several segments across multiple modes (e.g. the commuter starts their trip on a Bus before switching to the Tube and then back to the Bus). In the long run, journeys are more interesting to me since the are the basis of commuting analysis: where (roughly) do people live, and where (roughly) do they work? This helps planners to understand whether and how work/life patterns are changing, and what might need to be done to respond to those changes.

Segments, on the other hand, give us some insight into how people go about their journeys — what, for instance, are the most heavily-travelled routes in to Central London? In the maps below I’ve not yet connected the network segments to actual transportation infrastructure, which is something that I’ll need to tackle in order to make the map more legible and also to highlight the real ‘choke points’ in the system. So these give us some unusual results, but ones which are nonetheless quite interesting in terms of understanding how Londoners (and visitors) use TfL’s network.

For instance, Oxford Circus appears not to be an especially important end-point for travellers (at least, not when compared with London Bridge, Waterloo, Victoria, and Kings Cross), but anyone travelling through the station at rush hour will know that it is absolutely packed with commuters transferring between lines. Equally interesting, Fenchurch has comparatively few connections, but the ones that it does have are very large (the dominant ones being with National Rail at Upminster and Barking).

These images are all ‘early days’, and certainly not robust enough yet for use beyond the merely “Ooooh, I didn’t know ‘x’ “, but I hope you find them as interesting as I do — I’ve learned a great deal about Londoners’ travel patterns as a result and realised just how far beyond Zone 3 the London ‘commuter belt’ stretches. It’s also interesting to see the impact of the Overground in this data — the strength of the connection between Dalston and Camden is particularly surprising to me, and you can also see strong links emerging between the Clapham area and Shepherd’s Bush.

In addition, I should point out a few caveats:

  1. I have removed links between stations where there were less than 25k segments or 15k journeys over the course of two weeks to make the maps more legible. In the next iteration I’ll try binding these to the actual infrastructure so as to give a more nuanced understanding of loadings.
  2. The size of the station marker is proportional to the ‘degree’ of the node (how many other stations have connections to this one given the 25k or 15k filter?). This is quite interesting where it shows up stations that might not have large flows but do serve as start and end points for a lot of travel (e.g. Stratford), as well as ones where the reverse is true (e.g. Fenchurch).

I hope to make a more interactive version of this in the not-too-distant future, and too explore in more detail how some of the planned improvements to the Tube and Overground system will measurably affect commuter travel.

Addendum: thank you to Anil for pointing out that I had somehow neglected to provide a legend that would help readers to interpret the network maps… Ooops. The only thing that I can say in my defense is that Gephi doesn’t seem to make it easy to add one! So for those of you struggling to interpret the images: links shade from yellow to red to purple with increasing total flows; to reinforce this connection and make it easier for the really large links to stand out I’ve also made the links increase in width with total flows; node size is related to degree, which is to say the number of places to which the node is connected; note, however, that in suppressing the ‘smaller’ flows I will have affected the degree of each node since it only shows the number of other places to which a node is connected by large flows. Hope this clarifies things.


Large View Central London Detail


Large View Central London Detail