Datasets for use with libpysal

As of version 4.2, libpysal has refactored the examples package to:

This notebook highlights the new functionality

Backwards compatibility is maintained

If you were familiar with previous versions of libpysal, the newest version maintains backwards compatibility so any code that relied on the previous API should work.

For example:

from libpysal.examples import get_path
get_path("mexicojoin.dbf")
'/home/runner/micromamba/envs/test/lib/python3.14/site-packages/libpysal/examples/mexico/mexicojoin.dbf'

An important thing to note here is that the path to the file for this particular example is within the source distribution that was installed. Such an example data set is now referred to as a builtin dataset.

import libpysal

dbf = libpysal.io.open(get_path("mexicojoin.dbf"))
dbf.header
['POLY_ID',
 'AREA',
 'CODE',
 'NAME',
 'PERIMETER',
 'ACRES',
 'HECTARES',
 'PCGDP1940',
 'PCGDP1950',
 'PCGDP1960',
 'PCGDP1970',
 'PCGDP1980',
 'PCGDP1990',
 'PCGDP2000',
 'HANSON03',
 'HANSON98',
 'ESQUIVEL99',
 'INEGI',
 'INEGI2',
 'MAXP',
 'GR4000',
 'GR5000',
 'GR6000',
 'GR7000',
 'GR8000',
 'GR9000',
 'LPCGDP40',
 'LPCGDP50',
 'LPCGDP60',
 'LPCGDP70',
 'LPCGDP80',
 'LPCGDP90',
 'LPCGDP00',
 'TEST']

The function available is also available but has been updated to return a Pandas DataFrame. In addition to the builtin datasets, available will report on what datasets are available, either as builtin or remotes.

from libpysal.examples import available
df = available()
df.shape
(99, 3)
libpysal.examples.summary()
99 datasets available, 27 installed, 72 remote.

We see that there are 98 total datasets available for use with PySAL. On an initial install (i.e., examples has not been used yet), 27 of these are builtin datasets and 71 are remote. The latter can be downloaded and installed.

Downloading Remote Datasets

df.head()
Name Description Installed
0 10740 Albuquerque, New Mexico, Census 2000 Tract Dat... True
1 AirBnB Airbnb rentals, socioeconomics, and crime in C... False
2 Atlanta Atlanta, GA region homicide counts and rates False
3 Baltimore Baltimore house sales prices and hedonics False
4 Bostonhsg Boston housing and neighborhood data False

The remote AirBnB can be installed by calling load_example:

airbnb = libpysal.examples.load_example("AirBnB")
Downloading AirBnB to /home/runner/.local/share/pysal/AirBnB
libpysal.examples.summary()
99 datasets available, 28 installed, 71 remote.

And we see that the number of remotes as declined by one and the number of installed has increased by 1.

Trying to load an example that doesn’t exist will return None and alert the user:

libpysal.examples.load_example("dataset42")
Example not available: dataset42

Getting remote urls

If the url, rather than the dataset, is needed this can be obtained on a remote with get_url. As the Baltimore dataset has not yet been downloaded in this example, we can grab it’s url:

balt_url = libpysal.examples.get_url("Baltimore")
balt_url
'https://geodacenter.github.io/data-and-lab//data/baltimore.zip'

Explaining a dataset

libpysal.examples.explain("taz")
taz
===

Dataset used for regionalization
--------------------------------

* taz.dbf: attribute data. (k=14)
* taz.shp: Polygon shapefile. (n=4109)
* taz.shx: spatial index.
taz = libpysal.examples.load_example("taz")
Downloading taz to /home/runner/.local/share/pysal/taz
taz.get_file_list()
['/home/runner/.local/share/pysal/taz/taz-master/taz.shp',
 '/home/runner/.local/share/pysal/taz/taz-master/README.md',
 '/home/runner/.local/share/pysal/taz/taz-master/taz.dbf',
 '/home/runner/.local/share/pysal/taz/taz-master/taz.shx']
libpysal.examples.explain("Baltimore")
balt = libpysal.examples.load_example("Baltimore")
Downloading Baltimore to /home/runner/.local/share/pysal/Baltimore
libpysal.examples.available()
Name Description Installed
0 10740 Albuquerque, New Mexico, Census 2000 Tract Dat... True
1 AirBnB Airbnb rentals, socioeconomics, and crime in C... True
2 Atlanta Atlanta, GA region homicide counts and rates False
3 Baltimore Baltimore house sales prices and hedonics True
4 Bostonhsg Boston housing and neighborhood data False
... ... ... ...
94 taz Traffic Analysis Zones in So. California True
95 tokyo Tokyo Mortality data True
96 us_income Per-capita income for the lower 48 US states 1... True
97 virginia Virginia counties shapefile True
98 wmat Datasets used for spatial weights testing True

99 rows × 3 columns

Working with an example dataset

explain will render maps for an example if available

from libpysal.examples import explain

explain("Tampa1")
from libpysal.examples import load_example

tampa1 = load_example("Tampa1")
Downloading Tampa1 to /home/runner/.local/share/pysal/Tampa1
tampa1.installed
True
tampa1.get_file_list()
['/home/runner/.local/share/pysal/Tampa1/__MACOSX/._TampaMSA',
 '/home/runner/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_final_census2.sbx',
 '/home/runner/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_counties.sbx',
 '/home/runner/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_final_census2.sbn',
 '/home/runner/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_counties.sbn',
 '/home/runner/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._2000 Census Data Variables_Documentation.pdf',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/2000 Census Data Variables_Documentation.pdf',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.geojson',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sbn',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sbn',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.mid',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.gpkg',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.mid',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.kml',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.mif',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sqlite',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByName.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByBackwardLabel.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbindexes',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByUUID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByParentTypeID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/gdb',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.spx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbindexes',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbindexes',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbindexes',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000002.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbindexes',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByOriginID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByType.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByOriginItemTypeID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.spx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByName.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.CatItemsByType.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.spx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByForwardLabel.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/timestamps',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByUUID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.CatItemsByPhysicalName.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbindexes',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000002.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbindexes',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.TablesByName.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByDestItemTypeID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbtable',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbtablx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.FDO_UUID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.FDO_UUID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByDestinationID.atx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbindexes',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.shx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.shp',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.prj',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.prj',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.xlsx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sqlite',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.dbf',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sbx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.dbf',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.gpkg',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.shx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.mif',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.shp',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sbx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.xlsx',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.kml',
 '/home/runner/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.geojson']
tampa_counties_shp = tampa1.load("tampa_counties.shp")
tampa_counties_shp
<libpysal.io.iohandlers.pyShpIO.PurePyShpWrapper at 0x7f19a51b2120>
import geopandas
tampa_df = geopandas.read_file(tampa1.get_path("tampa_counties.shp"))
%matplotlib inline
tampa_df.plot()
<Axes: >
../../_images/f78d27c234a1eded1002825ca49eb3862dcac4b3e2007a04c085a87d4f12e8c3.png

Other Remotes

In addition to the remote datasets from the GeoData Data Science Center, there are several large remotes available at github repositories.

libpysal.examples.explain("Rio Grande do Sul")
Rio_Grande_do_Sul
======================

Cities of the Brazilian State of Rio Grande do Sul
-------------------------------------------------------

* 43MUE250GC_SIR.dbf: attribute data (k=2)
* 43MUE250GC_SIR.shp: Polygon shapefile (n=499)
* 43MUE250GC_SIR.shx: spatial index
* 43MUE250GC_SIR.cpg: encoding file 
* 43MUE250GC_SIR.prj: projection information 
* map_RS_BR.dbf: attribute data (k=3)
* map_RS_BR.shp: Polygon shapefile (no lakes) (n=497)
* map_RS_BR.prj: projection information
* map_RS_BR.shx: spatial index



Source: Renan Xavier Cortes 
Reference: https://github.com/pysal/pysal/issues/889#issuecomment-396693495

Note that the explain function generates a textual description of this example dataset - no rendering of the map is done as the source repository does not include that functionality.

rio = libpysal.examples.load_example("Rio Grande do Sul")
Downloading Rio Grande do Sul to /home/runner/.local/share/pysal/Rio_Grande_do_Sul
libpysal.examples.remote_datasets.datasets  # a listing of all remotes
{'AirBnB': <libpysal.examples.base.Example at 0x7f19e79b57f0>,
 'Atlanta': <libpysal.examples.base.Example at 0x7f19e79a6710>,
 'Baltimore': <libpysal.examples.base.Example at 0x7f19e79a6990>,
 'Bostonhsg': <libpysal.examples.base.Example at 0x7f19e78a4050>,
 'Buenosaires': <libpysal.examples.base.Example at 0x7f19e78a4180>,
 'Charleston1': <libpysal.examples.base.Example at 0x7f19e7a0f570>,
 'Charleston2': <libpysal.examples.base.Example at 0x7f19e7a0f790>,
 'Chicago Health': <libpysal.examples.base.Example at 0x7f19e7da2750>,
 'Chicago commpop': <libpysal.examples.base.Example at 0x7f19e7a37d50>,
 'Chicago parcels': <libpysal.examples.base.Example at 0x7f19e7a333e0>,
 'Chile Labor': <libpysal.examples.base.Example at 0x7f19e7a334d0>,
 'Chile Migration': <libpysal.examples.base.Example at 0x7f19e7a3e5f0>,
 'Cincinnati': <libpysal.examples.base.Example at 0x7f19e7a3e7b0>,
 'Cleveland': <libpysal.examples.base.Example at 0x7f19e7d886d0>,
 'Columbus': <libpysal.examples.base.Example at 0x7f19e79dd610>,
 'Elections': <libpysal.examples.base.Example at 0x7f19e79dd550>,
 'Grid100': <libpysal.examples.base.Example at 0x7f1a48b62e60>,
 'Groceries': <libpysal.examples.base.Example at 0x7f19e8fac1b0>,
 'Guerry': <libpysal.examples.base.Example at 0x7f19e789c0f0>,
 'Health+': <libpysal.examples.base.Example at 0x7f19e789c2d0>,
 'Health Indicators': <libpysal.examples.base.Example at 0x7f19e789c690>,
 'Hickory1': <libpysal.examples.base.Example at 0x7f19e789c410>,
 'Hickory2': <libpysal.examples.base.Example at 0x7f19e789c4b0>,
 'Home Sales': <libpysal.examples.base.Example at 0x7f19e789c550>,
 'Houston': <libpysal.examples.base.Example at 0x7f19e789c5f0>,
 'Juvenile': <libpysal.examples.base.Example at 0x7f19e789c730>,
 'Lansing1': <libpysal.examples.base.Example at 0x7f19e789c7d0>,
 'Lansing2': <libpysal.examples.base.Example at 0x7f19e789c870>,
 'Laozone': <libpysal.examples.base.Example at 0x7f19e789ca50>,
 'LasRosas': <libpysal.examples.base.Example at 0x7f19e789d630>,
 'Liquor Stores': <libpysal.examples.base.Example at 0x7f19e789cb90>,
 'Malaria': <libpysal.examples.base.Example at 0x7f19e789cc30>,
 'Milwaukee1': <libpysal.examples.base.Example at 0x7f19e789ccd0>,
 'Milwaukee2': <libpysal.examples.base.Example at 0x7f19e789cd70>,
 'NCOVR': <libpysal.examples.base.Example at 0x7f19e789ce10>,
 'Natregimes': <libpysal.examples.base.Example at 0x7f19e789ceb0>,
 'NDVI': <libpysal.examples.base.Example at 0x7f19e789cf50>,
 'Nepal': <libpysal.examples.base.Example at 0x7f19e789cff0>,
 'NYC': <libpysal.examples.base.Example at 0x7f19e789d090>,
 'NYC Earnings': <libpysal.examples.base.Example at 0x7f19e789d130>,
 'NYC Education': <libpysal.examples.base.Example at 0x7f19e789d1d0>,
 'NYC Neighborhoods': <libpysal.examples.base.Example at 0x7f19e789d270>,
 'NYC Socio-Demographics': <libpysal.examples.base.Example at 0x7f19e789d310>,
 'Ohiolung': <libpysal.examples.base.Example at 0x7f19e789d3b0>,
 'Orlando1': <libpysal.examples.base.Example at 0x7f19e789d450>,
 'Orlando2': <libpysal.examples.base.Example at 0x7f19e789d4f0>,
 'Oz9799': <libpysal.examples.base.Example at 0x7f19e789d590>,
 'Phoenix ACS': <libpysal.examples.base.Example at 0x7f19e789d6d0>,
 'Pittsburgh': <libpysal.examples.base.Example at 0x7f19e789d770>,
 'Police': <libpysal.examples.base.Example at 0x7f19e789d810>,
 'Sacramento1': <libpysal.examples.base.Example at 0x7f19e789d8b0>,
 'Sacramento2': <libpysal.examples.base.Example at 0x7f19e789d950>,
 'SanFran Crime': <libpysal.examples.base.Example at 0x7f19e789d9f0>,
 'Savannah1': <libpysal.examples.base.Example at 0x7f19e789da90>,
 'Savannah2': <libpysal.examples.base.Example at 0x7f19e789db30>,
 'Scotlip': <libpysal.examples.base.Example at 0x7f19e789dbd0>,
 'Seattle1': <libpysal.examples.base.Example at 0x7f19e789dc70>,
 'Seattle2': <libpysal.examples.base.Example at 0x7f19e789dd10>,
 'SIDS': <libpysal.examples.base.Example at 0x7f19e789ddb0>,
 'SIDS2': <libpysal.examples.base.Example at 0x7f19e789de50>,
 'Snow': <libpysal.examples.base.Example at 0x7f19e789def0>,
 'South': <libpysal.examples.base.Example at 0x7f19e789df90>,
 'Spirals': <libpysal.examples.base.Example at 0x7f19e789e030>,
 'StLouis': <libpysal.examples.base.Example at 0x7f19e789e0d0>,
 'Tampa1': <libpysal.examples.base.Example at 0x7f19e789e170>,
 'US SDOH': <libpysal.examples.base.Example at 0x7f19e789e210>,
 'Rio Grande do Sul': <libpysal.examples.base.Example at 0x7f19e789e2b0>,
 'nyc_bikes': <libpysal.examples.base.Example at 0x7f19e789e350>,
 'taz': <libpysal.examples.base.Example at 0x7f19e789e3f0>,
 'clearwater': <libpysal.examples.base.Example at 0x7f19e789e490>,
 'newHaven': <libpysal.examples.base.Example at 0x7f19e789e530>,
 'chicagoSDOH': <libpysal.examples.base.Example at 0x7f19e789e5d0>}