Exploring the Historical Determinants of Urban Growth Patterns through Cellular Automata

Kiril Stanilov has adapted RIKS METRONAMICA, an established cellular automata (CA) modelling system, to simulate the historical growth of a section of a large world city. The focus is on simulating change from the late 19th century until the modern day for a slice of London from west of Paddington to the M25 orbital road.

The model is tuned to reflect the morphology of land use patterns more accurately than traditional CA models, which abstract those patterns to more aggregate spatial scales. We explore the spatial determinants of land use patterns with detailed empirical data, documenting the historical growth of West London at an unusually high level of spatial and temporal resolution. The results of the study provide support for our considered speculations:

  1. that the spatial relationships between land uses and the physical environment are remarkably consistent through time, showing little variation relative to changes in historical context; and
  2. that these relationships constitute a basic code for urban growth which determines the spatial signature of land development in a given metropolitan area.

Read the article on this project which is published in the current issue of Transactions in GIS

Here is the slice of west London which has been modelled

and here is a sample of the simulations that replicate the actual development over the last 130 years or so

Kiril works on the GeNESIS project and was a Marie Curie fellow in CASA from 2008-2010.


London’s Rail System as a Network

As many of you know, for quite some time I’ve been wrestling with a massive data set from TfL — I make it that I have 120 million trip segments across 88 million identifiable journeys in the course of just 2 weeks. Those are astronomical numbers, and it’s rather worth thinking about the fact that any of this runs smoothly on a day-to-day basis the next time there’s a problem with your Bus or Tube journey… The Bus system accounts for more than half of this total, but even after removing them from the analysis we’re left with some 40 million journeys (an average of 3 million journeys each day) across Tube, Overground, and National Rail.

The difference between segments (or ‘legs’ as TfL call them) and journeys is important because it affects how we think about commuter flows through the city. A segment (or leg) is the part of someone’s trip between two Oyster taps. A journey might contain just one segment, or it might contain several segments across multiple modes (e.g. the commuter starts their trip on a Bus before switching to the Tube and then back to the Bus). In the long run, journeys are more interesting to me since the are the basis of commuting analysis: where (roughly) do people live, and where (roughly) do they work? This helps planners to understand whether and how work/life patterns are changing, and what might need to be done to respond to those changes.

Segments, on the other hand, give us some insight into how people go about their journeys — what, for instance, are the most heavily-travelled routes in to Central London? In the maps below I’ve not yet connected the network segments to actual transportation infrastructure, which is something that I’ll need to tackle in order to make the map more legible and also to highlight the real ‘choke points’ in the system. So these give us some unusual results, but ones which are nonetheless quite interesting in terms of understanding how Londoners (and visitors) use TfL’s network.

For instance, Oxford Circus appears not to be an especially important end-point for travellers (at least, not when compared with London Bridge, Waterloo, Victoria, and Kings Cross), but anyone travelling through the station at rush hour will know that it is absolutely packed with commuters transferring between lines. Equally interesting, Fenchurch has comparatively few connections, but the ones that it does have are very large (the dominant ones being with National Rail at Upminster and Barking).

These images are all ‘early days’, and certainly not robust enough yet for use beyond the merely “Ooooh, I didn’t know ‘x’ “, but I hope you find them as interesting as I do — I’ve learned a great deal about Londoners’ travel patterns as a result and realised just how far beyond Zone 3 the London ‘commuter belt’ stretches. It’s also interesting to see the impact of the Overground in this data — the strength of the connection between Dalston and Camden is particularly surprising to me, and you can also see strong links emerging between the Clapham area and Shepherd’s Bush.

In addition, I should point out a few caveats:

  1. I have removed links between stations where there were less than 25k segments or 15k journeys over the course of two weeks to make the maps more legible. In the next iteration I’ll try binding these to the actual infrastructure so as to give a more nuanced understanding of loadings.
  2. The size of the station marker is proportional to the ‘degree’ of the node (how many other stations have connections to this one given the 25k or 15k filter?). This is quite interesting where it shows up stations that might not have large flows but do serve as start and end points for a lot of travel (e.g. Stratford), as well as ones where the reverse is true (e.g. Fenchurch).

I hope to make a more interactive version of this in the not-too-distant future, and too explore in more detail how some of the planned improvements to the Tube and Overground system will measurably affect commuter travel.

Addendum: thank you to Anil for pointing out that I had somehow neglected to provide a legend that would help readers to interpret the network maps… Ooops. The only thing that I can say in my defense is that Gephi doesn’t seem to make it easy to add one! So for those of you struggling to interpret the images: links shade from yellow to red to purple with increasing total flows; to reinforce this connection and make it easier for the really large links to stand out I’ve also made the links increase in width with total flows; node size is related to degree, which is to say the number of places to which the node is connected; note, however, that in suppressing the ‘smaller’ flows I will have affected the degree of each node since it only shows the number of other places to which a node is connected by large flows. Hope this clarifies things.


Large View Central London Detail


Large View Central London Detail

Specialisation & Internationalisation

Historically, the downtown of major cities is where the most highly-skilled and highly-remunerated work was done — think: Wall Street, the City, etc. But as space has run out in the core, secondary centres such as Canary Wharf and La Défense have sprung up to meet the needs of increasingly globalised firms operating in everything from financial services to consultancy. And while these seemingly subsidiary centres are often built on reclaimed or rebuilt land that was considered undesirable (except, typically, by the people who were already living there), they are still very much a part of the `world cities’ and the relocations are relatively modest in terms of geographical and transportational distances.

What is less well-known — at least on a conscious level — outside of planning circles is that this process has now extended much, much further into the hinterlands of the very largest cities. It has, for instance, become increasingly obvious that the ‘London economy’ does not end at the M25. In fact, various types of specialised, highly-paid employment were springing up all across the Greater Southeast of England

Collectively, this behaviour is termed ‘polycentricity’ (i.e. many-centres), but for individual firms it’s simply the relocation of some (or all) functions from the main city centre (e.g. The City) to somewhere either suburban or in a subsidiary city elsewhere in the region. Below is a figure drawn from my doctoral research showing the dispersion of six key sectors as well as the degree to which firms in each those sub-regions are `internationalised’ (as measured by telecommunications usage).

Distribution of economic activity in the GSE by main sectors.

While the figure is necessarily  a simplification of the `true’ employment picture, it does make clear that there are some quite distinct behaviours on display:

  • As you might expect, Logistics sites are often found near major transport infrastructure and most especially infrastructure with a strong international component. This sector is also a heavy consumer of real estate and so you tend to see the largest developments in areas where land is cheap (East Anglia and so forth).
  • Rather more surprising is the distribution of ICT (Information and Communications Technology) activity. Or, in other words, high-tech work. Internationalised firms — many of them global firms with head offices in America — are strongly concentrated to the West of Greater London around the M25 and out as far as Reading. Beyond that there is still a great deal of ICT activity, but it is relatively less internationalised, as is the concentration around Cambridge. There is a small ICT grouping in Central London that, I suspect, is connected to the distinction between software as a service (central) and software as a product (West of London). Unfortunately, the level of detail in the NOMIS employment data doesn’t really allow me to get at this finer distinction.
  • Even more striking is the distribution of R&D work; whereas the majority of sectors show important `clumping’ together in space, R&D activity is widely dispersed and there is little obvious connection between the level of internationalisation and the location of the office. So in contrast to ICT, where proximity to Heathrow seems like a good proxy for predicting international calling activity, in this sector it’s very much down to the individual site — you can, for instance, see the impact of a major pharmaceutical research site (now closing) in Sandwich. My reading and research in this area suggests that a lot of major R&D employers are actively trying to avoid being too close to their competitors — it’s much easier to avoid sharing confidential knowledge if you’re not meeting your friends from a competing firm down at the pub. This is very different model from the one that politicians are usually trying to copy: Silicon Valley.
  • Living within the M25 I tend to equate financial services activity with the City and Canary Wharf. In the back of my head I knew that there was `back office’ activity in other areas, but it wasn’t top-of-mind for me. Looking at this map it became rather clear that there is a lot of activity happening elsewhere in the Greater Southeast. Croydon is a major back-office site, but so are Redhill and parts South towards Gatwick. The existence of major insurers in places like Norwich and Ipswich is also picked up rather strongly here. I’ve not yet examined the interaction between these areas, but I suspect that we’ll see very strong links binding together some of these highly-specialised sites.
  • Advanced Producer Services (APS) is a kind of catch-call term for a range of activities that help producers to produce, but which are often too highly-skilled to qualify as simple out-sourcing or temping. Think: management consultancy, HR services, etc. This group is particularly interesting since they depend on the other groups for business; they therefore have two locational options open to them: pick somewhere from which it’s easy to reach many possible clients; pick somewhere close to your primary clients. Here, specialisation should be a good predictor — the more specialised your client-base, the more likely you are to opt for somewhere close to your clients. There are three areas which show high levels of internationalisation: central London (global consultancies and accountancies for the most part), the M25, and to the north and east. Elsewhere, international calling activity is much lower, but the orientation towards transport infrastructure persists. The lack of finely-detailed employment data here keeps me from really digging into the patterns of location right now, but I think that the importance of access is really obvious, as is the tendency for specialised subcentres of APS activity near larger agglomerations of particular industrial groups.
  • Cultural activity is what most planners mean when they talk about the ‘Creative City’ (the reasons that I feel this is an oversimplification can wait for another post), and it’s always associated with our ‘coolest’ cities such as London and New York. Interestingly, although the centre of, for instance, the arts scene in London has shifted towards places like Hackney, you won’t see this in the data. Partly because it’s out-of-date, and partly, I would guess, because young artists will tend to be doing something else in order to pay the bills. What does come out really strongly here is that internationalisation is associated with large employers (the Beeb, Soho editing firms, etc.) and only when they are based within Zones 1-3. Beyond this, cultural employment seems to exist primarily as more locally- and regionally-oriented services.

So where does this leave us? I think that it’s especially important to note that a ‘good’ location means different things to different sectors — for some sectors an attractive location is one that is easily accessible to car-based commuters and offers a good quality-of-life (usually measured as good schools and cheap housing) to its employees, while for others the hustle and bustle is absolutely integral to doing business. I think it’s equally worth noting that businesses are not randomly distributed around the GSE and so it won’t be easy to induce companies to make radical moves or to form Silicon Valley-type social-technical hubs without a lot of work.

By way of example, a lot has recently been made of East London Tech City. I’m not going to suggest that these offices will never open but I am profoundly sceptical of whether they’ll have anything like the impact that Number 10 hopes:

  • First, these are all large tech firms with headquarters in America. Getting from East London to a flight from Heathrow is profoundly difficult and will only get somewhat easier when CrossRail finally comes on stream.
  • Second, the majority of work will be done from specialised offices so the type of exchange that happened in informal venues across Palo Alto during the early days of SV (and which set up the dynamic that persists today) is unlikely. Or, at least, unlikely to have anything like the resonance that it does on the other side of the Atlantic.
  • Third, what’s in it for the little guy? The real vibrancy of SV is in the foment of small firms and breakaways, but I can’t see how Intel, Google, Cisco, and Facebook are going to be enabling this. They’ll want to keep innovation in-house, not spread it around.
  • Fourth, what’s in it for East London? The Shoreditch phenomenon is based on small, indigenous businesses (principally advertising and design) clustering together and eliminating occupying space previously used by a furniture manufacturing cluster. I don’t see many senior engineers at Qualcomm wanting to live on Upper Clapton so the net effect of all of this development on the local community (in terms of employment, integration, etc.) I expect to be modest.

I would be very happy to be proved wrong on the East London Tech City, but the history of what the French call ‘grand projets’ is, at best, deeply mixed. Often, the impact is no more than skin-deep.

Early Views of Public Transit Usage in London

In fits and starts, over the past month I’ve been getting to grips with an exciting new Oyster Card data set from TfL and the wonderfully supportive Andrew Gaitskell, their resident Oyster Card data expert. For those few of you who live under a rock or have been unable to visit London: the Oyster Card is a contact-less electronic ticketing system for entering and exiting the Tube, Docklands Light Railway, Overground, and Rail (I’ll leave the Bus for another post since that turns out to be a much more difficult proposition analytically). Every time a user ‘taps in’ or ‘taps out’ of a given mode of transit this leaves a trace in TfL’s usage database, so this is a very different type of data set from the scheduled activities visualised by my colleague Joan Serras here.

Anyway, thanks to a research partnership established a few months ago we’ve got limited access to this data (i.e. I can’t see any information about the user of the card and, as an additional layer of protection, the card number has been encrypted). What makes this data particularly exciting to us urban planners/researchers is that we have both entries and exits. I never thought I’d find having to fish my Oyster Card out of my pocket on the way out of the Tube to be a ‘plus’, but as a researcher it means that I can assemble a series of discrete trip segments into a single journey. For example, using this data we can tell that user X started their day on a train to Euston from Watford, jumped on the Victoria Line when they reached London, and then exited in Green Park. Suddenly, we’ve gone from only being able to say that y commuters in Watford work in London to being able to talk (and think) about the distribution of such commuters across Central London.

[Note: I’ve reordered this post to bring the visualisations nearer to the start of the post, if you’d like to get more background detail then that’s now near the bottom.]


The visualisations below represent flows in and out of every Oyster-enabled station in Central London at several points in time over the course of a weekday (a Monday in November) and a weekend day (a Sunday in November). Each figure shows the totals over a 10-minute interval starting at (respectively) 8am, 1pm, and 6pm, and for comparative purposes, I’ve placed the Sunday and Monday visualisations next to each other.

The size of the pie chart corresponds to the total volume of people passing through the station (I’ve grouped National Rail and Tube stations of the same name — e.g. Waterloo, Kings Cross, … — into 1 super-total) scaled against the maximum flow from across the two day sample which was was, roughly, 17,000. And yes, that total is in the space of just 10 minutes! I couldn’t think of any ‘natural’ way to differentiate between entries and exits, so I decided to follow Ollie’s lead and use red for entry (think: red is ‘stop walking’) and green for exit (think: green is ‘go about your business’). Ollie is actually working on a similar (but not the same!) project right now that you’ll be seeing shortly at the London Transport Museum, and I don’t think I’m stealing any of his thunder here since he’s taken on the rather more difficult challenge of doing this type of mapping entirely within a web browser and using a dynamic, aggregated feed that could even be done in real-time.

Time of Day Sunday Monday
8 a.m. Station Flows on Sunday at 8am Station Volumes on Monday at 8am
1 p.m. Station Flows on Sunday at 1pm Station Volumes on Monday at 1pm
6 p.m. Station Flows on Sunday at 6pm Station Volumes on Monday at 6pm

So do these visualisations pass the common sense test? My feeling so far is ‘yes’ in that they capture broad-brush dynamics at work in London:

  1. Life on Sunday clearly gets off to a much later start than it does on the following Monday.
  2. Weekday peaks between 8am and 9am, and between 5pm and 6pm are self-evident and differ from Sunday’s usage.

What’s more interesting to me is the radical difference in entry/exit patterns over the course of the day on Monday: at 8am we have far more entries than exits on a station-by-station basis, except at the core stations in Central London; at 6pm the balance is reversing with more entries than exits in Central London and many more exits than entries elsewhere around the region. Interestingly, what becomes obvious in the interactive version of this project (done in Processing) is that the ‘peak’ evening load lasts much longer than the equivalent A.M. rush hour — load levels peak at 6pm, but remain quite high until well after 10pm at the most-used stations in the data set. Note too that at 1pm the system is broadly in-balance, something that it superficially appears to have in common with Sunday usage across much of the day. This general behaviour also seems to be largely in line with Ollie’s findings from the Barclay’s bike scheme

I’m hoping to put together a video of this activity so that you can watch the entire system in action — there are interesting deviations from expected behaviour that merit further attention: an early-moning fillip at Heathrow Airport of passengers departing the country (and we might predict that they will be London residents since they are more likely to have Oyster Cards). But for the time being this should give some sense of what the system looks like at the coarsest level. In a future post I hope to start reporting results from assembling the tap-ins/tap-outs into end-to-end journeys that will give us a sense of the principals axes along which commuters and visitors travel.

To my knowledge, the only work that has ever really tried to tackle this type of aggregate urban movement analysis head-on is Goddard’s Functional Regions within the City Centre: a Study by Factor Analysis of Taxi Flows in Central London (1970, Transactions of the Institute of British Geographers, pp.161-182). Goddard used logs collected from taxi drivers to derive ‘functional regions’ within Central London, showing the principal Origins/Destinations (O/D) for taxis (especially those picked up or dropped off at mainline stations) and highlighting the ‘neighbourhoods’ that these links implied.

Data Processing

As you might expect, there are a lot of tap-ins and tap-outs in the course of a week: I’m looking at 14 million records per week day, and 6 million records per weekend day. And that’s after cleaning out the irrelevant or corrupted records. So the numbers can add up pretty quickly, especially since, ideally, I want to examine at least a full month’s usage. What we’ve found from previous work with telecommunications is that several weeks’ worth of data should give us a pretty good baseline for what average system behaviour looks like and against which we can test for ‘unusual’ behaviour on a node or link in the network.

For now, however, I’m just looking at the first couple of days in order to see if the results I’m getting feel remotely right, and the visualisations below are the first fruits of that process. The common sense test for big data sets — i.e. do the simplest summary results come at all close to fitting with what a rational human being would expect to see? — is surprisingly useful: most of us have some sense of what the overall system should look like when visualised, and the collision between our expectations and what the data are telling us is happening is often the first sign that something has gone wrong in your processing. I point this issue out because there are no data feeds without their quirks and very ‘exciting’ early results are almost inevitably the product of a processing mistake.

For that same reason it’s often useful to capture debugging output (and to output as much of it from your processing scripts as you dare) as you go so that it is easy to look back through the log to see what might have happened. I’m a big fan of Perl for doing this type of work since it’s text-handling abilities are unmatched and it’s the quintessential ‘glue’ language: if you can’t do it directly in Perl, you can always use someone else’s module from CPAN to interface with the target language or application. The much bigger issue is storing the data since, ultimately, we’ll need the ability to query and aggregate something on the order of 330 million records. For that, I’m able to get away with using partitioned tables in MySQL and the performance (even on my iMac) is acceptable well into the billions; the only hitch is the indexing: once your indexes exceed your computer’s RAM you’ll be doing linear searches and performance will tank.


These visualisations would not have been possible without the support of TfL’s Andrew Gaitskell. I should also acknowledge the assistance of our own Dr. Martin Austwick and Ollie O’Brien.

Visualising Public Transport Networks

With the increasingly widespread availability of transport data, we can now visualise and explore new dynamic geographies of urban transport flows and networks. In this post, I show detailed animations of UK multi-modal public transport networks using timetable data. This data will form the basis of the public transport modelling in the Simulacra projects, and be used for analyses of accessibility, network structure and resilience.

The first movie maps train, coach, metro (tram and tube), ferry and air trips for England, Scotland and Wales over a typical weekday in 2009. Different modes of transport are assigned different colours, and time is represented by the clock at the top left. The animation clearly highlights the complexity of the networks, the distinct transport geographies of the UK’s cities and regions, and the daily peaks of activity.

When the day starts at 12am, London is pretty much the only city ‘awake’ with few tubes, trains, coaches and ferries running. They disappear quite quickly up to a point where there are mainly few coaches travelling around London (particularly to airports). From about 4:30am onwards, coaches first start their trips from the south-west of England and Wales into London and then trains start to crowd the whole of the UK. Metros around UK also start their services around 5:30am and by 6:30am they can be identified in London, Manchester and Newcastle.

It is also around 6:30am that planes start their trips, between Scotland to the south of UK. By 8:30am, much of the national rail network is clearly defined. Also, ferries can be spotted in Scotland and Cornwall. From then onwards, the public transport service reduces its service frequency even though this is not quite noticeable on the visualisation. The PM peak starts around 4pm and it peaks at about 6pm. The level of service decreases slowly afterwards with multimodal corridors slowly disappearing from the visualisation until the day ends.

In the second movie bus trips for the same region are displayed. Cities with noticeable night services are London, Manchester and Edinburgh. Around 4:30am, other major cities bus services start to operate progressively. At about 6:30am, smaller cities bus transport also starts to operate. The am peak is at 8am when around 30000 bus trips are operating in the region. The service then decays not noticeably until about 9am and then is sustained until the pm peak starts around 3pm. The pm peak occurs at about 4pm, and then onwards the service level decreases.

In the final movie we zoom in detail on Greater London, which features the UK’s most dense and complex multimodal networks. Individual vehicles are more obvious in this animation due to the smaller size of the area as compared to the previous one. It is interesting to visualise the bus network transition from night to day, the steady ‘pulse’ of the tube network throughout its service and the Stansted-Heathrow-Gatwick connection defined by the coach network.

Previous efforts on public transport visualisations which pushed me to do something similar can be seen here (from another CASA member, Anil Bawa-Cavia) and here.

These clips have been generated using OpenGL. The data used to produce them is from the UK Department for Transport and is available here and here. Vehicle trajectories in the animations use straight lines between stops. Waiting time in a public transport stop is defined in the dataset and so is also taken into account in the animation. Air travel data in the dataset is only available for Scotland.

The Simulacra team and particularly Duncan Smith have given very valuable advice to improve all visualisations. More animations and the application of these techniques to land use transport modelling will be posted here in the coming weeks.

Defining the Region Seminar

On the 9th September CASA hosted a seminar discussing relationships between London and the South East region with a view to developing the most appropriate framework for land use transport modelling.

Mike Batty introduced the SCALE and ARCADIA projects; Duncan Smith discussed journey-to-work and business relationships in the South East; Basak Ozkul presented on changes in travel patterns over the last thirty years; Jon Reades showed how communication flows can be used to analyse business specialisations; and finally Joan Serras presented a network analysis methodology for the SCALE project. These presentations can be downloaded in pdf form below-

Attendees with particular experience in this field included Paul Chesire from LSE; Kathy Pain from University of Reading; Michael Edwards from the Bartlett School of Planning; and Bridget Rosewell and Margarethe Theseira from GLA Economics. Many thanks to those who attended and contributed to the discussion.


» Newer posts