{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Datasets for use with libpysal\n", "As of version 4.2, libpysal has refactored the `examples` package to:\n", "\n", "- reduce the size of the source installation\n", "- allow the use of remote datasets from the [Center for Spatial Data Science at the Unversity of Chicago](https://spatial.uchicago.edu/), and other remotes\n", "\n", "This notebook highlights the new functionality" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Backwards compatibility is maintained" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you were familiar with previous versions of libpysal, the newest version maintains backwards compatibility so any code that relied on the previous API should work. \n", "\n", "For example:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from libpysal.examples import get_path" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'/home/serge/para/1_projects/code-pysal-libpysal/libpysal/libpysal/examples/mexico/mexicojoin.dbf'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_path(\"mexicojoin.dbf\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An important thing to note here is that the path to the file for this particular example is within the source distribution that was installed. Such an example data set is now referred to as a `builtin` dataset." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import libpysal\n", "\n", "dbf = libpysal.io.open(get_path(\"mexicojoin.dbf\"))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['POLY_ID',\n", " 'AREA',\n", " 'CODE',\n", " 'NAME',\n", " 'PERIMETER',\n", " 'ACRES',\n", " 'HECTARES',\n", " 'PCGDP1940',\n", " 'PCGDP1950',\n", " 'PCGDP1960',\n", " 'PCGDP1970',\n", " 'PCGDP1980',\n", " 'PCGDP1990',\n", " 'PCGDP2000',\n", " 'HANSON03',\n", " 'HANSON98',\n", " 'ESQUIVEL99',\n", " 'INEGI',\n", " 'INEGI2',\n", " 'MAXP',\n", " 'GR4000',\n", " 'GR5000',\n", " 'GR6000',\n", " 'GR7000',\n", " 'GR8000',\n", " 'GR9000',\n", " 'LPCGDP40',\n", " 'LPCGDP50',\n", " 'LPCGDP60',\n", " 'LPCGDP70',\n", " 'LPCGDP80',\n", " 'LPCGDP90',\n", " 'LPCGDP00',\n", " 'TEST']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dbf.header" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function `available` is also available but has been updated to return a Pandas DataFrame. In addition to the builtin datasets, `available` will report on what datasets are available, either as builtin or remotes." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from libpysal.examples import available" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "df = available()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(99, 3)" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.shape" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "99 datasets available, 29 installed, 70 remote.\n" ] } ], "source": [ "libpysal.examples.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that there are 98 total datasets available for use with PySAL. On an initial install (i.e., `examples` has not been used yet), 27 of these are builtin datasets and 71 are remote. The latter can be downloaded and installed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Downloading Remote Datasets" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Name | \n", "Description | \n", "Installed | \n", "
---|---|---|---|
0 | \n", "10740 | \n", "Albuquerque, New Mexico, Census 2000 Tract Dat... | \n", "True | \n", "
1 | \n", "AirBnB | \n", "Airbnb rentals, socioeconomics, and crime in C... | \n", "False | \n", "
2 | \n", "Atlanta | \n", "Atlanta, GA region homicide counts and rates | \n", "False | \n", "
3 | \n", "Baltimore | \n", "Baltimore house sales prices and hedonics | \n", "False | \n", "
4 | \n", "Bostonhsg | \n", "Boston housing and neighborhood data | \n", "False | \n", "
\n", " | Name | \n", "Description | \n", "Installed | \n", "
---|---|---|---|
0 | \n", "10740 | \n", "Albuquerque, New Mexico, Census 2000 Tract Dat... | \n", "True | \n", "
1 | \n", "AirBnB | \n", "Airbnb rentals, socioeconomics, and crime in C... | \n", "True | \n", "
2 | \n", "Atlanta | \n", "Atlanta, GA region homicide counts and rates | \n", "False | \n", "
3 | \n", "Baltimore | \n", "Baltimore house sales prices and hedonics | \n", "True | \n", "
4 | \n", "Bostonhsg | \n", "Boston housing and neighborhood data | \n", "False | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
94 | \n", "taz | \n", "Traffic Analysis Zones in So. California | \n", "True | \n", "
95 | \n", "tokyo | \n", "Tokyo Mortality data | \n", "True | \n", "
96 | \n", "us_income | \n", "Per-capita income for the lower 48 US states 1... | \n", "True | \n", "
97 | \n", "virginia | \n", "Virginia counties shapefile | \n", "True | \n", "
98 | \n", "wmat | \n", "Datasets used for spatial weights testing | \n", "True | \n", "
99 rows × 3 columns
\n", "