{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Cluster points and explore boundary *blurriness* with A-DBSCAN \n", "\n", "In this example, we will illustrate how to use A-DBSCAN ([Arribas-Bel et al., 2019](https://www.sciencedirect.com/science/article/abs/pii/S0094119019300944)) with a sample of AirBnb properties in Berlin. A-DBSCAN will allow us do two things:\n", "\n", "- Identify clusters of high density of AirBnb properties and delineate their boundaries\n", "- Explore the stability of such boundaries\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2025-06-29T18:55:42.014235Z", "iopub.status.busy": "2025-06-29T18:55:42.013979Z", "iopub.status.idle": "2025-06-29T18:55:43.618254Z", "shell.execute_reply": "2025-06-29T18:55:43.618002Z", "shell.execute_reply.started": "2025-06-29T18:55:42.014207Z" } }, "outputs": [], "source": [ "import contextily as cx\n", "import geopandas\n", "import matplotlib.pyplot as plt\n", "import numba\n", "import numpy as np\n", "import pandas\n", "from libpysal.cg.alpha_shapes import alpha_shape_auto\n", "from shapely import Polygon\n", "\n", "from esda.adbscan import ADBSCAN, get_cluster_boundary, remap_lbls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data\n", "\n", "We will be using the Berlin extract from [Inside Airbnb](http://insideairbnb.com/). This is the same dataset used in the [Scipy 2018 tutorial on Geospatial data analysis with Python](https://github.com/geopandas/scipy2018-geospatial-data)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2025-06-29T18:55:43.618841Z", "iopub.status.busy": "2025-06-29T18:55:43.618664Z", "iopub.status.idle": "2025-06-29T18:55:44.091014Z", "shell.execute_reply": "2025-06-29T18:55:44.090801Z", "shell.execute_reply.started": "2025-06-29T18:55:43.618832Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | Unnamed: 0 | \n", "id | \n", "listing_url | \n", "scrape_id | \n", "last_scraped | \n", "name | \n", "summary | \n", "space | \n", "description | \n", "experiences_offered | \n", "... | \n", "review_scores_value | \n", "requires_license | \n", "license | \n", "jurisdiction_names | \n", "instant_bookable | \n", "cancellation_policy | \n", "require_guest_profile_picture | \n", "require_guest_phone_verification | \n", "calculated_host_listings_count | \n", "reviews_per_month | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "17260587 | \n", "https://www.airbnb.com/rooms/17260587 | \n", "20170507222235 | \n", "2017-05-08 | \n", "Kunterbuntes Zimmer mit eigenem Bad für jedermann | \n", "Meine Unterkunft ist gut für paare, alleinreis... | \n", "NaN | \n", "Meine Unterkunft ist gut für paare, alleinreis... | \n", "none | \n", "... | \n", "10.0 | \n", "f | \n", "NaN | \n", "NaN | \n", "t | \n", "flexible | \n", "f | \n", "f | \n", "3 | \n", "2.00 | \n", "
| 1 | \n", "1 | \n", "17227881 | \n", "https://www.airbnb.com/rooms/17227881 | \n", "20170507222235 | \n", "2017-05-08 | \n", "Modernes Zimmer in Berlin Pankow | \n", "Es ist ein schönes gepflegtes und modernes Zim... | \n", "Das Haus befindet sich direkt vor eine Tram Ha... | \n", "Es ist ein schönes gepflegtes und modernes Zim... | \n", "none | \n", "... | \n", "10.0 | \n", "f | \n", "NaN | \n", "NaN | \n", "t | \n", "flexible | \n", "f | \n", "f | \n", "1 | \n", "1.29 | \n", "
2 rows × 96 columns
\n", "ADBSCAN(eps=500, keep_solus=True, min_samples=20, pct_exact=0.5, reps=10)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
| \n", " | eps | \n", "500 | \n", "
| \n", " | min_samples | \n", "20 | \n", "
| \n", " | algorithm | \n", "'auto' | \n", "
| \n", " | n_jobs | \n", "1 | \n", "
| \n", " | pct_exact | \n", "0.5 | \n", "
| \n", " | reps | \n", "10 | \n", "
| \n", " | keep_solus | \n", "True | \n", "
| \n", " | pct_thr | \n", "0.9 | \n", "
| \n", " | cluster_id | \n", "geometry | \n", "rep | \n", "
|---|---|---|---|
| 0 | \n", "0 | \n", "LINESTRING (200621.527 168784.817, 200463.72 1... | \n", "rep-00 | \n", "
| 1 | \n", "1 | \n", "LINESTRING (202254.25 171209.23, 202196.004 17... | \n", "rep-00 | \n", "
| 2 | \n", "2 | \n", "LINESTRING (197037.991 172632.547, 196828.409 ... | \n", "rep-00 | \n", "
| 3 | \n", "3 | \n", "LINESTRING (194118.515 169319.718, 193986.09 1... | \n", "rep-00 | \n", "
| 4 | \n", "4 | \n", "LINESTRING (196753.952 171244.854, 196482.217 ... | \n", "rep-00 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "
| 2 | \n", "2 | \n", "LINESTRING (196610.634 176487.827, 196591.6 17... | \n", "rep-09 | \n", "
| 3 | \n", "4 | \n", "LINESTRING (195517.485 168999.87, 195258.086 1... | \n", "rep-09 | \n", "
| 4 | \n", "5 | \n", "LINESTRING (194609.924 176051.122, 194202.994 ... | \n", "rep-09 | \n", "
| 5 | \n", "6 | \n", "LINESTRING (193340.736 174061.951, 193257.01 1... | \n", "rep-09 | \n", "
| 6 | \n", "8 | \n", "LINESTRING (193714.183 168555.181, 193550.932 ... | \n", "rep-09 | \n", "
72 rows × 3 columns
\n", "