{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploratory Analysis of Spatial Data: Spatial Autocorrelation #\n", "\n", "\n", "In this notebook we introduce methods of _exploratory spatial data analysis_\n", "that are intended to complement geovizualization through formal univariate and\n", "multivariate statistical tests for spatial clustering.\n", "\n", "\n", "## Imports" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2025-06-29T19:45:05.685918Z", "iopub.status.busy": "2025-06-29T19:45:05.685755Z", "iopub.status.idle": "2025-06-29T19:45:06.978825Z", "shell.execute_reply": "2025-06-29T19:45:06.978572Z", "shell.execute_reply.started": "2025-06-29T19:45:05.685896Z" } }, "outputs": [], "source": [ "import warnings\n", "\n", "import geopandas as gpd\n", "import libpysal as lps\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "from geopandas import GeoDataFrame\n", "from shapely.geometry import Point\n", "\n", "import esda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our data set comes from the Berlin airbnb scrape taken in April 2018. This dataframe was constructed as part of the [GeoPython 2018 workshop](https://github.com/ljwolf/geopython) by [Levi Wolf](https://ljwolf.org) and [Serge Rey](https://sergerey.org). As part of the workshop a geopandas data frame was constructed with one of the columns reporting the median listing price of units in each neighborhood in Berlin:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2025-06-29T19:45:06.979309Z", "iopub.status.busy": "2025-06-29T19:45:06.979177Z", "iopub.status.idle": "2025-06-29T19:45:07.034037Z", "shell.execute_reply": "2025-06-29T19:45:07.033763Z", "shell.execute_reply.started": "2025-06-29T19:45:06.979300Z" } }, "outputs": [], "source": [ "gdf = gpd.read_file(\"data/berlin-neighbourhoods.geojson\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2025-06-29T19:45:07.035156Z", "iopub.status.busy": "2025-06-29T19:45:07.035051Z", "iopub.status.idle": "2025-06-29T19:45:07.555111Z", "shell.execute_reply": "2025-06-29T19:45:07.554867Z", "shell.execute_reply.started": "2025-06-29T19:45:07.035149Z" } }, "outputs": [], "source": [ "bl_df = pd.read_csv(\"data/berlin-listings.csv\")\n", "geometry = [Point(xy) for xy in zip(bl_df.longitude, bl_df.latitude, strict=True)]\n", "crs = 4326\n", "bl_gdf = GeoDataFrame(bl_df, crs=crs, geometry=geometry)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2025-06-29T19:45:07.555486Z", "iopub.status.busy": "2025-06-29T19:45:07.555413Z", "iopub.status.idle": "2025-06-29T19:45:07.604152Z", "shell.execute_reply": "2025-06-29T19:45:07.603946Z", "shell.execute_reply.started": "2025-06-29T19:45:07.555477Z" } }, "outputs": [ { "data": { "text/plain": [ "neighbourhood_group\n", "Charlottenburg-Wilm. 58.556408\n", "Friedrichshain-Kreuzberg 55.492809\n", "Lichtenberg 44.584270\n", "Marzahn - Hellersdorf 54.246754\n", "Mitte 60.387890\n", "Neukölln 45.135948\n", "Pankow 60.282516\n", "Reinickendorf 43.682465\n", "Spandau 48.236561\n", "Steglitz - Zehlendorf 54.445683\n", "Tempelhof - Schöneberg 53.704407\n", "Treptow - Köpenick 51.222004\n", "Name: price, dtype: float32" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bl_gdf[\"price\"] = bl_gdf[\"price\"].astype(\"float32\")\n", "sj_gdf = gpd.sjoin(\n", " gdf, bl_gdf, how=\"inner\", predicate=\"intersects\", lsuffix=\"left\", rsuffix=\"right\"\n", ")\n", "median_price_gb = sj_gdf[\"price\"].groupby([sj_gdf[\"neighbourhood_group\"]]).mean()\n", "median_price_gb" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2025-06-29T19:45:07.604627Z", "iopub.status.busy": "2025-06-29T19:45:07.604538Z", "iopub.status.idle": "2025-06-29T19:45:07.611243Z", "shell.execute_reply": "2025-06-29T19:45:07.611016Z", "shell.execute_reply.started": "2025-06-29T19:45:07.604618Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | neighbourhood | \n", "neighbourhood_group | \n", "geometry | \n", "median_pri | \n", "
---|---|---|---|---|
0 | \n", "Blankenfelde/Niederschönhausen | \n", "Pankow | \n", "MULTIPOLYGON (((13.41191 52.61487, 13.41183 52... | \n", "60.282516 | \n", "
1 | \n", "Helmholtzplatz | \n", "Pankow | \n", "MULTIPOLYGON (((13.41405 52.54929, 13.41422 52... | \n", "60.282516 | \n", "
2 | \n", "Wiesbadener Straße | \n", "Charlottenburg-Wilm. | \n", "MULTIPOLYGON (((13.30748 52.46788, 13.30743 52... | \n", "58.556408 | \n", "
3 | \n", "Schmöckwitz/Karolinenhof/Rauchfangswerder | \n", "Treptow - Köpenick | \n", "MULTIPOLYGON (((13.70973 52.3963, 13.70926 52.... | \n", "51.222004 | \n", "
4 | \n", "Müggelheim | \n", "Treptow - Köpenick | \n", "MULTIPOLYGON (((13.73762 52.4085, 13.73773 52.... | \n", "51.222004 | \n", "
5 | \n", "Biesdorf | \n", "Marzahn - Hellersdorf | \n", "MULTIPOLYGON (((13.56643 52.5351, 13.56697 52.... | \n", "54.246754 | \n", "
6 | \n", "Nord 1 | \n", "Reinickendorf | \n", "MULTIPOLYGON (((13.33669 52.62265, 13.33663 52... | \n", "43.682465 | \n", "
7 | \n", "West 5 | \n", "Reinickendorf | \n", "MULTIPOLYGON (((13.28138 52.59958, 13.28158 52... | \n", "43.682465 | \n", "
8 | \n", "Frankfurter Allee Nord | \n", "Friedrichshain-Kreuzberg | \n", "MULTIPOLYGON (((13.4532 52.51682, 13.45321 52.... | \n", "55.492809 | \n", "
9 | \n", "Buch | \n", "Pankow | \n", "MULTIPOLYGON (((13.4645 52.65055, 13.46457 52.... | \n", "60.282516 | \n", "
10 | \n", "Kaulsdorf | \n", "Marzahn - Hellersdorf | \n", "MULTIPOLYGON (((13.62135 52.52704, 13.62196 52... | \n", "54.246754 | \n", "
11 | \n", "None | \n", "None | \n", "MULTIPOLYGON (((13.61659 52.58154, 13.61458 52... | \n", "NaN | \n", "
12 | \n", "None | \n", "None | \n", "MULTIPOLYGON (((13.61668 52.57868, 13.60703 52... | \n", "NaN | \n", "
13 | \n", "nördliche Luisenstadt | \n", "Friedrichshain-Kreuzberg | \n", "MULTIPOLYGON (((13.4443 52.50066, 13.44266 52.... | \n", "55.492809 | \n", "
14 | \n", "Nord 2 | \n", "Reinickendorf | \n", "MULTIPOLYGON (((13.3068 52.58606, 13.30667 52.... | \n", "43.682465 | \n", "