Weekly Viz 2018-06-25
London Cycle Hire Usage
About Makeover Monday
MakeoverMonday is a social data project: “Each week we post a link to a chart, and its data, and then you rework the chart. Maybe you retell the story more effectively, or find a new story in the data. We’re curious to see the different approaches you all take. Whether it’s a simple bar chart or an elaborate infographic, we encourage everyone of all skills to partake. Together we can have broader conversations about and with data.”
Starting from Jan 08, 2018, I decided to put aside one hour on Monday weekly to create some visualization and find some insights from the data.
The datasets are published each week at: MakeoverMonday Datasets.
Makeover Monday 0625
This week’s topic is Clycle Hire Usage in London in 2017. This is largest dataset I have ever seen in Makeover Monday… It has over 10 million rows, with start location coordinates, end location coordinates, and time info. Basically, it tells when a user rented a bike and rode it from where to where.
My Visualization
Visualizing this type of data is exciting - It has all the longtitudes and latitudes, and time dimension as well. And I decided to visualize all the routes.
However, the first problem I encoutered is that, the dataset itself is too large, that loading it into Tableau takes a long time. As a result, I decided to use R to pre-process the data.
The second challenge is about how to plot routes. These two posts (post 1 and post 2) helped a lot.
The idea is that, you have to put the start point and end point into two rows, with unique identifier to each route, and order column to indicate which is the start and which is the end.
I segmented the cycling activities by weekday and weekends, trying to find out the typical rountes during workdays and after.
It spent me more than two hours today to make all these happen. But I learned a lot :)
–
Please notice that all the visualizations are designed for desktop view, so it is recommended to view them on a desktop device.
–
Insights
- On median, each ride is 14 minutes - perfect for comuting.
- Most of the cylcings happened during the peak hour (8AM and 5PM). And we saw more rides in the Summer - probably just the best weather to bike.
- During the weekdays, most popular routes are those routes within the city, especially the area around Waterloo Station 3 to King’s Cross.
- During the weekends, most popular routes are those around the Hyde Park, and Olympic Park - makes a lot of sense :).
–
Appendix: R Code
library(dplyr)
library(lubridate)
df = read.csv('TFL Cycle Hire 2017.csv')
df$StartDate = ymd_hms(df$StartDate)
df$EndDate = ymd_hms(df$EndDate)
df$weekday = wday(df$StartDate, label = TRUE)
summary(df)
routes = df %>%
filter(StartStation.Name != EndStation.Name) %>%
mutate(route = paste(StartStation.Name,'->',EndStation.Name),
workday = ifelse(weekday == 'Sun' | weekday == 'Sat', 0, 1)) %>%
group_by(route, StartStation.Lat, StartStation.Lon, EndStation.Lat, EndStation.Lon, workday) %>%
summarise(count = n()) %>%
arrange(workday, -count)
routes_workday = routes %>% filter(workday == 1)
routes_workday$index = 1:nrow(routes_workday)
routes_weekend = routes %>% filter(workday == 0)
routes_weekend$index = 1:nrow(routes_weekend)
routes_workday_start = routes_workday %>%
ungroup() %>%
select(route, StartStation.Lat, StartStation.Lon, workday, count, index) %>%
rename(Lat = StartStation.Lat, Lon = StartStation.Lon) %>%
mutate(path = 0)
routes_workday_end = routes_workday %>%
ungroup() %>%
select(route, EndStation.Lat, EndStation.Lon, workday, count, index) %>%
rename(Lat = EndStation.Lat, Lon = EndStation.Lon) %>%
mutate(path = 1)
routes_workday_fin = bind_rows(routes_workday_start, routes_workday_end)
routes_weekend_start = routes_weekend %>%
ungroup() %>%
select(route, StartStation.Lat, StartStation.Lon, workday, count, index) %>%
rename(Lat = StartStation.Lat, Lon = StartStation.Lon) %>%
mutate(path = 0)
routes_weekend_end = routes_weekend %>%
ungroup() %>%
select(route, EndStation.Lat, EndStation.Lon, workday, count, index) %>%
rename(Lat = EndStation.Lat, Lon = EndStation.Lon) %>%
mutate(path = 1)
routes_weekend_fin = bind_rows(routes_weekend_start, routes_weekend_end)
routes_fin = bind_rows(routes_workday_fin, routes_weekend_fin)
write.csv(routes_fin, 'routes.csv')
Follow this link to find more weekly vizzes :)