1. 程式人生 > >Using Google maps’ Location History to calculate and visualize my own costs of traffic congestion

Using Google maps’ Location History to calculate and visualize my own costs of traffic congestion

During a year of employment at the Vrije Universiteit in Amsterdam, I worked on modelling a scheme of tradable road permits to battle peak hour traffic congestion. Now I work in consulting and I became part of the problem. I drive 4 times a week to a client located almost 60 km away from my apartment and I experience the “welfare loss”, as economics call being annoyed by heavy traffic, first hand. But when I was recently looking for a free data set to experiment with, my daily commute turned out to come in handy. As I use Google maps as my GPS, my location history contains all my trips (and more!) and now it is only waiting for me to use it for my own purposes!

Can I use it to calculate the welfare loss I’ve experienced in the last 18 months? Can I use it to calculate the total time I’ve spent in traffic congestion? Can I use it to investigate what impact factors like school holidays or my departure time have on my commuting time? The answer to all this is ‘yes’… or at least ‘to some degree’.

Google maps data basically consists of coordinates, a timestamp, an accuracy measure and an activity variable, so it is easy to find out where you were at a specific point in time and to some degree also what you were doing. The latter one is highly unreliable though, especially for my purposes, as slow velocity in the car often gets interpreted as a bike ride or even walking. So what I need to do is combine location snapshots into trips, and isolate my commutes in the car from other trips.

I worked with Python and pandas dataframes and will only display the more critical code here, but feel free to contact me for the entire notebook.

My json file resulted in almost 800.000 rows of raw data. First, I brought the timestamp and coordinates into a format that could be interpreted by the datetime and geodesic modules.

with open (‘Location History.json’) as f: data = json.load(f)df = pd.DataFrame(data[‘locations’])df = df[[‘accuracy’,’timestampMs’,’latitudeE7',’longitudeE7']].copy()df[‘timestampMs’] = df[‘timestampMs’].astype(float)/1000df[‘datetime’] = df[‘timestampMs’].apply(lambda t: dt.datetime.fromtimestamp(t))df[‘latitudeE7’] = df[‘latitudeE7’]/10**7df[‘longitudeE7’] = df[‘longitudeE7’]/10**7

Now I could already delete everything before 2017 as I did not have a car back then. By splitting the timestamp into a time and date and applying datetime.weekday functionality, I could identify weekends and delete those from the data set as well. I then entered the coordinates of my apartment and the client I am commuting to as reference points. With the geodesic module I could calculate the distance of me to my clients location for every point in time. Grouped by days, I would select only days on which my minimum distance to the clients office was less than 500 meters. I am 100% certain that I have never passed there in my spare time, so this attribute identifies immediately a day on which I commuted to work.

distance_client_by_date = df.groupby('date', as_index = False)['distance_client'].min().apply(lambda x: x)trip_days = distance_client_by_date.loc[distance_client_by_date['distance_client']<0.5].copy()df_trips=df.loc[df['date'].isin(trip_days['date'])].copy()

Going on, I define two different peaks, a morning peak from 7:00 to 9:30 a.m. and an evening peak from 4:00 to 8:30 p.m. Everything not within those time periods is dropped. In the morning, identifying a trip is easy as my location is either at home or in the car. No other possibilities. As soon as I gain distance from my home (>100 m), it means I am in the car and on my way. As soon as I am reasonably close (<200 m due to bad reception and therefore lower accuracy) to my clients location, I arrived. I set a binary indicator for a trip as:

np.where((df_trips['distance_home'] > 0.1) & (df_trips_periods['distance_client'] > 0.2) & (df_trips_periods['period']=='Morning'), 1,0 )

All datapoints during a morning peak on a given day with these attributes constitute a trip. In the evening, things are a bit more complex, because after I get home, I do not necessarily stay home. I need to identify the earliest point in time after I left my clients office that my distance to my apartment is below 100m. Otherwise going out again to the shop for oude kaasblokjes could be wrongly included in my commute. Therefore I created a flag for when I arrived at home and then selected the earliest (’minimum’) point in time this happened per evening peak:

df_trips['arrived'] = np.where((df_trips_periods['smth_dist_home'] < 0.1) & (df_trips_periods['period']=='Evening'), 1,0)
df_trips['arrival_time'] = df_trips.loc[df_trips['arrived']==1].groupby('peak')['time'].transform('min')

With the timedelta functionality I calculated the travel time as the delta between the earliest- and the latest point in time within a peak trip. After cleaning out few odd outliers, I can start with the fun part and investigate about 300 trips(it is easy to check outliers by comparing what Google maps recorded on the same day. It seems that I teleported on some days without even noticing). I could run some regression models, but pictures speak louder than words.

One of the things I was interested in, was if my commuting time is significantly less during school holidays. Subjectively, the roads feel a lot more empty, but would the data support my gut feel? I added a flag for days during official school holidays and plotted the travel times on vacation days vs non-vacation days in a density plot:

The graph supports my gut feeling. Vacation days see a more narrow distribution of travel times with its peak further to the left. I seem to be longer on the road outside of school holidays. But that does not immediately have to say that there is less traffic. Maybe I leave work earlier during the warm days of summer which happen to also be school holidays, thus avoiding the peak traffic congestion by departing during the early fringe period of the peak(If the reader happens to be my employer: this is a strictly hypothetical scenario, of course I work everyday until I collapse).

A density plot shows that my departure time is pretty independent of school holidays.

Does the departure time generally have an influence on my travel time? I would expect a nonlinear influence, with travel times increasing at first, but after the bulk of the peak has passed, travel times should fall again. I chose regplots using the departure time as the explanatory variable of the linefit for the depending variable, the travel time. To check for my suspicion of nonlinear effects, we can compare several plots with polynomials of various degrees.

The concavity of the curve during the bulk of the peak in all scenarios displays the suspected effect, but given the few data points observed for later departure times, this should not be taken too seriously — in case you were planning to. Another question I had was, are there visible differences during the week? Wednesday is a typical “Daddy day” here in the Netherlands where parents have a parental leave day and traffic feels more relaxed.

The Wednesday swarm certainly has a heavy bottom (Is it okay to use “thicc” in such a context?), indicating more observations with shorter travel times, but some more observations would of course help to further investigate if the differences are truly significant. Lastly, did I adjust my travel behavior to avoid congestion? After all, economists often expect equilibria to establish over longer time periods. Did I learn from my everyday experience and see shorter travel times now than when I just started and knew nothing about the behavior of other commuters?

Nope.

No learning effect visible. My experience from driving the same route over and over again does not manifest in any ability to avoid congestion.

So much for the visualization, the transport economist in me is of course also interested in the welfare losses my poor soul had to experience due to traffic congestion. Based on the minimum travel time for the trip of 41 min and the average trip time of 59 min, I derive that on average 18 min of my trip are extra time due to traffic congestion. Applying a standard value of 15 euro for the so called “value of time”, a measure used in economics to express time in financial terms that can be used in cost benefit analyses, means that per day my welfare loss adds up to roughly 9 euros. From the beginning of the project up to now, I made about 300 trips, so my total welfare loss is around 1.300 euros and I spent roughly 6.000 minutes or 100 hours or 4 days in congestion. Not on the road. But purely on extra travel time due to congestion. I listen to audio books in the car to make some use of my time and in “Homo Sapiens” the author explains that Buddhists think that all suffering is created by craving. Craving for more pleasant experiences, or craving to escape unpleasant experiences. The hidden Buddhist in me should therefore refrain from thinking about whether 4 days lost in traffic congestion is too much as it might lead to craving for less congestion and ultimately my own suffering. Buddhists are rarely good transport economists.