Using Python to analyze New York City bike sharing data
The Dataset
Millions of New Yorkers rely on the city’s bike ride sharing program, Citi Bikes. The city has thousands of bikes that can be rented from one of several hundred dock stations pictured above.
The bikes are unlocked electronically so the data from each ride is logged and stored in a database that is shared with the public monthly. Since the data is public, we can access the data for every Citi Bike ride since 2013 here.
Tens of millions of rides are completed annually so Python is the perfect program language to analyze this data. Let’s take a look at some cool insights we can find by diving into the data using Python.
Note: if you want to copy and paste parts of the code seen here, visit this link which contains all of the code snippets in a format that can be copied.
Data Analysis
How many rides were taken in January 2019?
We can find the length of the dataframe using the len() function. As seen in the code below, we use the len() function to find out there were 967,287 rides in January 2019.
Which station had the oldest riders on average?
To answer this question, we need to combine several Pandas functions: groupby(), mean(), and sort_values(). We start by grouping all of the bike stations using groupby(). Next we take the average of each column for each station id. Lastly, we sort the stations by birth year as seen in the code snippet below.
When we run this code, we see that station 3437 had the oldest riders. Of all the riders who rented a bike from this station, the average person was born in 1971 (meaning in 2019 when this data is from, the average rider was 48 years old).
Similarly, the station had the youngest riders was station number 3432, where the average rider was born in 1992 (age 27).
Want to learn this code in less than one month?
It may surprise you that you can learn all of this Python code in less than a month from our Computer Science for Business Program.
The program is not an online course. It is a career-focused program that teaches business professionals SQL, Python, and VBA using real world practice without boring lecture videos.
Check out our courses here.
Data Investigation
Where is station 3437 located?
Looking at the data, we see that station 3437 is located at the following coordinates: 40.793135, -73.977004.
We can input the coordinates into Google Maps and it shows this station is located in Manhattan’s Upper West Side neighborhood. This neighborhood is known for being a wealthier and older neighborhood, which would explain why the average age of riders who rent a bike from this station is the oldest of any bike station in NYC.
Lastly, if we use the street view feature on Google Maps, we see our beautiful Citi Bike station!
Recap
In this article we explored different ways to use Python to analyze large real world data sets. We hope you learned something new!