import pandas
import numpy
import geopandas
import matplotlib.pyplot as plt
import geopy
%matplotlib inline

Read in the Austin 311 data

It’s in the data folder, called austin_311.

reports = pandas.read_csv('../data/austin_311.csv.gz')

reports.head()

	request_number	type_code	description	department	method_received	status	location	street_number	street_name	zipcode	latitude	longitude
0	16-00108244	TRASIGMA	Traffic Signal - Maintenance	Transportation	Phone	Duplicate (closed)	6001 MANCHACA RD, AUSTIN, TX 78745	6001	MANCHACA	78745.0	30.212695	-97.801522
1	16-00108269	TRASIGMA	Traffic Signal - Maintenance	Transportation	Phone	Duplicate (closed)	6001 MANCHACA RD, AUSTIN, TX 78745	6001	MANCHACA	78745.0	30.212695	-97.801522
2	16-00324071	SWSDEADA	ARR Dead Animal Collection	Austin Resource Recovery	Phone	Closed	2200 E OLTORF ST, AUSTIN, TX 78741	2200	OLTORF	78741.0	30.230164	-97.731776
3	16-00108062	TRASIGMA	Traffic Signal - Maintenance	Transportation	Phone	Duplicate (closed)	8401 N CAPITAL OF TEXAS HWY NB, AUSTIN, TX 78759	8401	CAPITAL OF TEXAS	78759.0	30.384989	-97.766471
4	16-00107654	STREETL2	Street Light Issue- Address	Austin Energy Department	Phone	Closed	300 WEST AVE, AUSTIN, TX 78703	300	WEST	78703.0	30.268090	-97.751739

What are the 10 most common types of events in the data?

reports.groupby('type_code').type_code.count().sort_values(ascending=False).head(10)

type_code
CODECOMP    121953
ACLONAG      35776
SWSRECYC     32610
ACINFORM     30093
SWSDEADA     28561
STREETL2     24070
SWSYARDT     22930
COAACINJ     19681
HHSGRAFF     19406
WWREPORT     15192
Name: type_code, dtype: int64

Fixing missing latitude/longitude values

how many records have a missing latitude/longitude pair?
using the location field and the geocoding tools we discussed before, create latitude/longitude pairs for the records with missing latitude and longitude values.
update your dataframe with the new geocoded longitude and latitude values.
show that there is no missing latitude/longitude values in the updated data.

missing_coordinates = reports.latitude.isnull() | reports.longitude.isnull()

missing_coordinates.sum()

import geopy

coder = geopy.Nominatim(user_agent='scipy2019-intermediate-gds')

def latlng(address):
    coded = coder.geocode(address)
    return coded.latitude, coded.longitude

reports[missing_coordinates].location

224035           10008 DORSET DR, AUSTIN, TX
259989    2400 E OLTORF ST, AUSTIN, TX 78741
340148       506 ZENNIA ST, AUSTIN, TX 78751
378492       2006 S 6TH ST, AUSTIN, TX 78704
486128    1414 WESTOVER RD, AUSTIN, TX 78703
Name: location, dtype: object

locations = reports[missing_coordinates].location.apply(latlng).apply(pandas.Series)

reports.loc[missing_coordinates, 'latitude'] = locations[0]
reports.loc[missing_coordinates, 'longitude'] = locations[1]

reports.latitude.isnull().any() | reports.longitude.isnull().any()

False

Fixing missing addresses

how many records are missing location field entries?
using the latitude and longitude fields, find the street locations for the records with missing location values.
update your dataframe with the new location values. (BONUS: Update the street_number, street_name, and zipcode if you can, too)
show that there are no more missing location values in your data.

missing_locations = reports.location.isnull()

missing_locations.sum()

def reverse(coordinate):
    return coder.reverse(coordinate).address

locations = reports.loc[missing_locations,['latitude','longitude']].apply(reverse, axis=1)
locations

23659     West Gate Boulevard, Pheasant Run, Austin, Tra...
73195     11509, January Drive, Walnut Ridge, Austin, Tr...
142636    800, Gullett Street, Govalle, Austin, Travis C...
180796    6205, Shoal Creek Boulevard, Allandale, Austin...
214871    3300, Burleson Road, Parker Lane, Austin, Trav...
267054    10400, Charette Cove, Prominent Point, Jollyvi...
366275    9725, Spanish Wells Drive, Jollyville, Austin,...
421322    3902, Carmel Drive, MLK, Austin, Travis County...
441041    1528, Payton Falls Drive, Four Seasons, Austin...
476115    1615, Rutherford Lane, Berkley Square - Headwa...
491802    3930, Bee Caves Road, Ledgeway, West Lake Hill...
536128    6913, Wentworth Drive, Loma Vista, Austin, Tra...
562261    600, North Marly Way, Austin Lake Hills, Travi...
dtype: object

reports.loc[missing_locations, ['location']] = locations

def maybe_get_first_number(splitstring):
    try:
        return int(splitstring[0])
    except ValueError:
        return numpy.nan

street_numbers = locations.str.split(',').apply(maybe_get_first_number)

def maybe_get_streetname(splitstring):
    try:
        int(splitstring[0])
        return splitstring[1]
    except ValueError:
        return splitstring[0]

street_names = locations.str.split(',').apply(maybe_get_streetname)

street_names

23659        West Gate Boulevard
73195              January Drive
142636            Gullett Street
180796     Shoal Creek Boulevard
214871             Burleson Road
267054             Charette Cove
366275       Spanish Wells Drive
421322              Carmel Drive
441041        Payton Falls Drive
476115           Rutherford Lane
491802            Bee Caves Road
536128           Wentworth Drive
562261           North Marly Way
dtype: object

zipcodes = locations.str.split(',').apply(lambda split: split[-2])

zipcodes

23659            78748
73195            78753
142636           78702
180796           78757
214871           78741
267054           78759
366275           78717
421322           78721
441041           78754
476115     78753:78754
491802           78746
536128           78724
562261           78733
dtype: object

reports.loc[missing_locations, ['street_number']] = street_numbers
reports.loc[missing_locations, ['street_name']] = street_names
reports.loc[missing_locations, ['zipcode']] = zipcodes

reports.location.isnull().any()

False

Check miscoded latitude, longitude values.

Use the total bounds of the austin neighborhoods data to identify observations that may be mis-coded as outside of Austin.

neighborhoods = geopandas.read_file('../data/neighborhoods.gpkg')

known_bounds = neighborhoods.total_bounds
known_bounds

array([-98.071453,  30.068439, -97.541566,  30.521356])

too_far_ns = (reports.latitude < known_bounds[1]) | (reports.latitude > known_bounds[3])
too_far_we = (reports.longitude < known_bounds[0]) | (reports.longitude > known_bounds[2])
outside = too_far_ns | too_far_we

outside.sum()

reports = reports[~outside]

Remove duplicate reports

311 data is very dirty. Let’s keep only tickets whose status suggests they’re reports with full information that are not duplicated.

reports.status.unique()

array(['Duplicate (closed)', 'Closed', 'Open', 'New', 'Resolved',
       'Closed -Incomplete Information', 'Duplicate (open)',
       'Work In Progress', 'Transferred', 'TO BE DELETED', 'Pending',
       'CancelledTesting', 'Closed -Incomplete', 'Incomplete'],
      dtype=object)

to_drop = ('Duplicate (closed)', 'Closed -Incomplete Information', 'Duplicate (open)', 
           'TO BE DELETED', 'CancelledTesting', 'Closed -Incomplete', 'Incomplete')
reports = reports.query('status not in @to_drop')

reports.status.unique()

array(['Closed', 'Open', 'New', 'Resolved', 'Work In Progress',
       'Transferred', 'Pending'], dtype=object)

Build a GeoDataFrame from the locations

reports = geopandas.GeoDataFrame(reports, geometry=geopandas.points_from_xy(reports.longitude, 
                                                                            reports.latitude))

Make a map of the report instances with a basemap

import contextily

reports.crs = {'init':'epsg:4326'}
reports = reports.to_crs(epsg=3857)

basemap, basemap_extent = contextily.bounds2img(*reports.total_bounds, zoom=10, ll=False)

plt.figure(figsize=(15,15))
plt.imshow(basemap, extent=basemap_extent)
reports.plot(ax=plt.gca(), marker='.', markersize=1, alpha=.25)

<matplotlib.axes._subplots.AxesSubplot at 0x7f7f6dc6dd30>

png

reports.shape

(555497, 13)

How many incidents with the Public Health department are within each neighborhood?

reports.department.unique()

array(['Austin Resource Recovery', 'Austin Energy Department',
       'Transportation', 'Animal Services Office',
       'Austin Code Department', 'Parks & Recreation Department',
       'Economic Development Department', 'Austin Water Utility',
       'Public Works', 'Health & Human Services', 'Watershed Protection',
       'Austin Water', 'Public Health',
       'Neighborhood Housing & Community Development',
       'Austin Fire Department', 'Neighborhood Housing & Community',
       'Office of Emergency Management'], dtype=object)

health = reports.query('department == "Public Health" '
              'or department == "Health & Human Services"')

plt.figure(figsize=(15,15))
plt.imshow(basemap, extent=basemap_extent)
health.plot(ax=plt.gca(), marker='.', markersize=1, alpha=.25)
plt.axis(health.total_bounds[[0,2,1,3]])

array([-10916755.97016249, -10858605.25412386,   3512792.57783966,
         3570235.76893318])

png

neighborhoods = neighborhoods.to_crs(epsg=3857)

hood_counts = geopandas.sjoin(neighborhoods, health, op='contains')\
                       .groupby('hood_id').index_right.count()
neighborhoods['health_incidents'] = hood_counts.values

plt.figure(figsize=(15,15))
plt.imshow(basemap, extent=basemap_extent)
neighborhoods.plot('health_incidents', ax=plt.gca(), 
                   cmap='plasma', alpha=.5)
plt.axis(health.total_bounds[[0,2,1,3]])

array([-10916755.97016249, -10858605.25412386,   3512792.57783966,
         3570235.76893318])

png

How many public health events are within 1km of each airbnb downtown?

listings = geopandas.read_file('../data/listings.gpkg')
listings = listings.to_crs(epsg=3857)
downtown_hoods = ('Downtown', 'East Downtown')
downtown_listings = listings.query('hood in @downtown_hoods').sort_values('id')
downtown_listings['buffer'] = downtown_listings.buffer(1000)
within_each_buffer = geopandas.sjoin(downtown_listings.set_geometry('buffer'), 
                                     health, op='contains')

event_counts = within_each_buffer.groupby('id').request_number.count()

downtown_listings['event_counts'] = event_counts.values

downtown_listings.sort_values('event_counts', ascending=False).head(5)[['id', 'name', 'event_counts']]

	id	name	event_counts
9950	30853001	Hip, Trendy Eastside Suite - 5min from Downtown	1159
6294	20971212	Awesome Eastside Rental	1158
9084	28463789	Inn Cahoots: 3 combined units, 39 beds on 6th St	1135
10991	32833881	Airy 1BR in East Austin by Sonder	1130
10494	32203710	East Austin Loft	1127

downtownmap, downtownmap_extent = contextily.bounds2img(*downtown_listings.buffer(1000).total_bounds, 
                                                        zoom=13, ll=False)

plt.figure(figsize=(10,10))
plt.imshow(downtownmap, extent=downtownmap_extent)
listings.plot(color='k', marker='.', markersize=5, ax=plt.gca())
downtown_listings.plot('event_counts', ax=plt.gca())
plt.axis(downtown_listings.buffer(1000).total_bounds[[0,2,1,3]], markersize=5)
plt.title('311 Public Health incidents', fontsize=20)

Text(0.5, 1.0, '311 Public Health incidents')

png

What’s the event type that is closest to each airbnb in Austin?

from pysal.lib.weights.distance import get_points_array
from scipy.spatial import cKDTree

report_coordinates = get_points_array(reports.geometry)
airbnb_coordinates = get_points_array(listings.geometry)

report_kdt = cKDTree(report_coordinates)

distances, indices = report_kdt.query(airbnb_coordinates, k=2)

listings['nearest_type'] = reports.iloc[indices[:,1]]['description'].values

listings.groupby('nearest_type').id.count().sort_values(ascending=False).head(20)

nearest_type
Austin Code - Request Code Officer              2165
ARR Missed Recycling                             758
Street Light Issue- Address                      584
Animal Control - Assistance Request              562
ARR Dead Animal Collection                       523
Loose Dog                                        517
Injured / Sick Animal                            480
Water Waste Report                               336
Pothole Repair                                   282
Graffiti Abatement                               276
ARR Missed Yard Trimmings /Organics              256
ARR Brush and Bulk                               221
Traffic Signal - Dig Tess Request                218
Austin Code - Short Term Rental Complaint SR     217
Found Animal Report - Keep                       212
ARR Missed Yard Trimmings/Compost                212
Loud Commercial Music                            178
Wildlife Exposure                                173
Public Health - Graffiti Abatement               156
Animal Bite                                      150
Name: id, dtype: int64