Subway Network study

This study is a network examination of the New York City subway, and is part of a project that was a predecessor to the Subway Impact Study. This project was conducted using the R statistical programming language, and the visualizations were made using Gephi. The repo is linked at the bottom of the page.

Circular Network Graph of the new york city subway

This circular graph represents subway stations as nodes on the outside of the ring. They are sized and colored by the number of connections to other stations. Most stations are only connected to at most two other stations. About a third of the stations contain more than two connections and are therefore transfer stations.

Force-Directed graph

Using a force-directed graph, we can more easily see the confluence of connections between the big transfer stations. Stations with more connections have more ‘gravity’ and pull other stations toward them. This explains the collection of nodes in the upper left portion of the graph.

Results

The overall density of the network is .006 indicating a very low density network, which is to be expected for a transit system, as not every station can be connected to every other station.

The mean distance across the network is 12.8 indicating that an individual could cross the city by passing through 13 stations.

station results

On a station level, the results of the network analysis are a bit more interesting. Depending on which measurement of centrality one uses, either the Times Square station or the Union Square station is the most central part of the network. By degree, Times Square is the most central with a degree score of 11, indicating that there are 11 stations connected to Times Square. It is to be expected that the most connected stations would be located in Manhattan, as every line in the study except for the G train passes through Manhattan. Of the top 16 most connected stations, 11 are in Manhattan. However, when one measures centrality using a betweenness measurement (how important a node is to the network by measuring how often the node appears in the shortest paths through the network), Times Square becomes less important and Union Square rises to the top. Union Square has a betweenness score of 37,619, indicating that it is vastly more important to the overall network than Times Square, which has a betweenness score of 20,085 and places 9th in the list of stations ordered by betweenness. This is likely due to Union Square’s positioning as a collector for lines that don’t frequently connect to other lines and its spatial separation from other large transit stations.

Click the button below to access the Github repo.

R Code