This page was generated from notebooks/examples.ipynb. Interactive online version: Binder badge

Datasets for use with libpysal

As of version 4.2, libpysal has refactored the examples package to:

This notebook highlights the new functionality

Backwards compatibility is maintained

If you were familiar with previous versions of libpysal, the newest version maintains backwards compatibility so any code that relied on the previous API should work.

For example:

[1]:
from libpysal.examples import get_path

[2]:
get_path("mexicojoin.dbf")
[2]:
'/home/serge/Documents/p/pysal/src/subpackages/libpysal/libpysal/examples/mexico/mexicojoin.dbf'

An important thing to note here is that the path to the file for this particular example is within the source distribution that was installed. Such an example data set is now referred to as a builtin dataset.

[3]:
import libpysal
dbf = libpysal.io.open(get_path("mexicojoin.dbf"))
[4]:
dbf.header
[4]:
['POLY_ID',
 'AREA',
 'CODE',
 'NAME',
 'PERIMETER',
 'ACRES',
 'HECTARES',
 'PCGDP1940',
 'PCGDP1950',
 'PCGDP1960',
 'PCGDP1970',
 'PCGDP1980',
 'PCGDP1990',
 'PCGDP2000',
 'HANSON03',
 'HANSON98',
 'ESQUIVEL99',
 'INEGI',
 'INEGI2',
 'MAXP',
 'GR4000',
 'GR5000',
 'GR6000',
 'GR7000',
 'GR8000',
 'GR9000',
 'LPCGDP40',
 'LPCGDP50',
 'LPCGDP60',
 'LPCGDP70',
 'LPCGDP80',
 'LPCGDP90',
 'LPCGDP00',
 'TEST']

The function available is also available but has been updated to return a Pandas DataFrame. In addition to the builtin datasets, available will report on what datasets are available, either as builtin or remotes.

[5]:
from libpysal.examples import available
[6]:
df = available()
[7]:
df.shape
[7]:
(98, 3)
[8]:
libpysal.examples.summary()
98 datasets available, 27 installed, 71 remote.

We see that there are 98 total datasets available for use with PySAL. On an initial install (i.e., examples has not been used yet), 27 of these are builtin datasets and 71 are remote. The latter can be downloaded and installed.

Downloading Remote Datasets

[9]:
df.head()
[9]:
Name Description Installed
0 10740 Albuquerque, New Mexico, Census 2000 Tract Dat... True
1 AirBnB Airbnb rentals, socioeconomics, and crime in C... False
2 Atlanta Atlanta, GA region homicide counts and rates False
3 Baltimore Baltimore house sales prices and hedonics False
4 Bostonhsg Boston housing and neighborhood data False

The remote AirBnB can be installed by calling load_example:

[10]:
airbnb = libpysal.examples.load_example("AirBnB")
Downloading AirBnB to /home/serge/.local/share/pysal/AirBnB
[11]:
libpysal.examples.summary()
98 datasets available, 28 installed, 70 remote.

And we see that the number of remotes as declined by one and the number of installed has increased by 1.

Trying to load an example that doesn’t exist will return None and alert the user:

[12]:
libpysal.examples.load_example('dataset42')
Example not available: dataset42

Getting remote urls

If the url, rather than the dataset, is needed this can be obtained on a remote with get_url. As the Baltimore dataset has not yet been downloaded in this example, we can grab it’s url:

[13]:
balt_url = libpysal.examples.get_url('Baltimore')
balt_url
[13]:
'https://geodacenter.github.io/data-and-lab//data/baltimore.zip'

Explaining a dataset

[14]:
libpysal.examples.explain('taz')
taz
===

Dataset used for regionalization
--------------------------------

* taz.dbf: attribute data. (k=14)
* taz.shp: Polygon shapefile. (n=4109)
* taz.shx: spatial index.

[15]:
taz = libpysal.examples.load_example('taz')
Downloading taz to /home/serge/.local/share/pysal/taz
[16]:
taz.get_file_list()
[16]:
['/home/serge/.local/share/pysal/taz/taz-master/taz.dbf',
 '/home/serge/.local/share/pysal/taz/taz-master/taz.shp',
 '/home/serge/.local/share/pysal/taz/taz-master/README.md',
 '/home/serge/.local/share/pysal/taz/taz-master/taz.shx']
[17]:
libpysal.examples.explain('Baltimore')
[17]:
[18]:
balt = libpysal.examples.load_example('Baltimore')
Downloading Baltimore to /home/serge/.local/share/pysal/Baltimore
[19]:
libpysal.examples.available()
[19]:
Name Description Installed
0 10740 Albuquerque, New Mexico, Census 2000 Tract Dat... True
1 AirBnB Airbnb rentals, socioeconomics, and crime in C... True
2 Atlanta Atlanta, GA region homicide counts and rates False
3 Baltimore Baltimore house sales prices and hedonics True
4 Bostonhsg Boston housing and neighborhood data False
... ... ... ...
93 taz Traffic Analysis Zones in So. California True
94 tokyo Tokyo Mortality data True
95 us_income Per-capita income for the lower 48 US states 1... True
96 virginia Virginia counties shapefile True
97 wmat Datasets used for spatial weights testing True

98 rows × 3 columns

Working with an example dataset

explain will render maps for an example if available

[20]:
from libpysal.examples import explain
explain('Tampa1')
[20]:
[21]:
from libpysal.examples import load_example
tampa1 = load_example('Tampa1')
Downloading Tampa1 to /home/serge/.local/share/pysal/Tampa1
[22]:
tampa1.installed
[22]:
True
[23]:
tampa1.get_file_list()
[23]:
['/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.shp',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.prj',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/2000 Census Data Variables_Documentation.pdf',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.kml',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.dbf',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.kml',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sbn',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.mif',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.prj',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sqlite',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.shx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sbn',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sbx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.FDO_UUID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000002.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByName.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000002.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByType.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.spx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByDestItemTypeID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/timestamps',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByName.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByUUID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByParentTypeID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.CatItemsByType.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByForwardLabel.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.FDO_UUID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByUUID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.TablesByName.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByOriginID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.spx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.CatItemsByPhysicalName.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByOriginItemTypeID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/gdb',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.spx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByBackwardLabel.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByDestinationID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.mid',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sbx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.geojson',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.mid',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.xlsx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.mif',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.dbf',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.shp',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.gpkg',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.gpkg',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.xlsx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.shx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sqlite',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.geojson',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._2000 Census Data Variables_Documentation.pdf',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_counties.sbn',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_final_census2.sbn',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_counties.sbx',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_final_census2.sbx',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/._TampaMSA']
[24]:
tampa_counties_shp = tampa1.load('tampa_counties.shp')
[25]:
tampa_counties_shp
[25]:
<libpysal.io.iohandlers.pyShpIO.PurePyShpWrapper at 0x7febc3b1bd00>
[26]:
import geopandas
[27]:
tampa_df = geopandas.read_file(tampa1.get_path('tampa_counties.shp'))
[28]:
%matplotlib inline
tampa_df.plot()
[28]:
<AxesSubplot:>
../_images/notebooks_examples_40_1.png

Other Remotes

In addition to the remote datasets from the GeoData Data Science Center, there are several large remotes available at github repositories.

[29]:
libpysal.examples.explain('Rio Grande do Sul')
Rio_Grande_do_Sul
======================

Cities of the Brazilian State of Rio Grande do Sul
-------------------------------------------------------

* 43MUE250GC_SIR.dbf: attribute data (k=2)
* 43MUE250GC_SIR.shp: Polygon shapefile (n=499)
* 43MUE250GC_SIR.shx: spatial index
* 43MUE250GC_SIR.cpg: encoding file
* 43MUE250GC_SIR.prj: projection information
* map_RS_BR.dbf: attribute data (k=3)
* map_RS_BR.shp: Polygon shapefile (no lakes) (n=497)
* map_RS_BR.prj: projection information
* map_RS_BR.shx: spatial index



Source: Renan Xavier Cortes
Reference: https://github.com/pysal/pysal/issues/889#issuecomment-396693495


Note that the explain function generates a textual description of this example dataset - no rendering of the map is done as the source repository does not include that functionality.

[30]:
rio = libpysal.examples.load_example('Rio Grande do Sul')
Downloading Rio Grande do Sul to /home/serge/.local/share/pysal/Rio_Grande_do_Sul
[31]:
libpysal.examples.remote_datasets.datasets # a listing of all remotes
[31]:
{'AirBnB': <libpysal.examples.base.Example at 0x7febc78c00d0>,
 'Atlanta': <libpysal.examples.base.Example at 0x7febc4a5efb0>,
 'Baltimore': <libpysal.examples.base.Example at 0x7febc4a5ed40>,
 'Bostonhsg': <libpysal.examples.base.Example at 0x7febc4a5ef50>,
 'Buenosaires': <libpysal.examples.base.Example at 0x7febc4a5efe0>,
 'Charleston1': <libpysal.examples.base.Example at 0x7febc4a5ef80>,
 'Charleston2': <libpysal.examples.base.Example at 0x7febc4a5ee90>,
 'Chicago Health': <libpysal.examples.base.Example at 0x7febc4a5ef20>,
 'Chicago commpop': <libpysal.examples.base.Example at 0x7febc4a5f070>,
 'Chicago parcels': <libpysal.examples.base.Example at 0x7febc4a5f0a0>,
 'Chile Labor': <libpysal.examples.base.Example at 0x7febc4a5f100>,
 'Chile Migration': <libpysal.examples.base.Example at 0x7febc4a5f130>,
 'Cincinnati': <libpysal.examples.base.Example at 0x7febc4a5f1f0>,
 'Cleveland': <libpysal.examples.base.Example at 0x7febc4a5f190>,
 'Columbus': <libpysal.examples.base.Example at 0x7febc4a5f250>,
 'Elections': <libpysal.examples.base.Example at 0x7febc4a5f0d0>,
 'Grid100': <libpysal.examples.base.Example at 0x7febc4a5f220>,
 'Groceries': <libpysal.examples.base.Example at 0x7febc4a5f160>,
 'Guerry': <libpysal.examples.base.Example at 0x7febc4a5f280>,
 'Health+': <libpysal.examples.base.Example at 0x7febc4a5f040>,
 'Health Indicators': <libpysal.examples.base.Example at 0x7febc4a5f2b0>,
 'Hickory1': <libpysal.examples.base.Example at 0x7febc4a5f2e0>,
 'Hickory2': <libpysal.examples.base.Example at 0x7febc4a5f310>,
 'Home Sales': <libpysal.examples.base.Example at 0x7febc4a5f340>,
 'Houston': <libpysal.examples.base.Example at 0x7febc4a5f370>,
 'Juvenile': <libpysal.examples.base.Example at 0x7febc4a5f3a0>,
 'Lansing1': <libpysal.examples.base.Example at 0x7febc4a5f3d0>,
 'Lansing2': <libpysal.examples.base.Example at 0x7febc4a5f400>,
 'Laozone': <libpysal.examples.base.Example at 0x7febc4a5f430>,
 'LasRosas': <libpysal.examples.base.Example at 0x7febc4a5f460>,
 'Liquor Stores': <libpysal.examples.base.Example at 0x7febc4a5f490>,
 'Malaria': <libpysal.examples.base.Example at 0x7febc4a5f4c0>,
 'Milwaukee1': <libpysal.examples.base.Example at 0x7febc4a5f4f0>,
 'Milwaukee2': <libpysal.examples.base.Example at 0x7febc4a5f520>,
 'NCOVR': <libpysal.examples.base.Example at 0x7febc4a5f550>,
 'Natregimes': <libpysal.examples.base.Example at 0x7febc4a5f580>,
 'NDVI': <libpysal.examples.base.Example at 0x7febc4a5f5b0>,
 'Nepal': <libpysal.examples.base.Example at 0x7febc4a5f5e0>,
 'NYC': <libpysal.examples.base.Example at 0x7febc4a5f610>,
 'NYC Earnings': <libpysal.examples.base.Example at 0x7febc4a5f640>,
 'NYC Education': <libpysal.examples.base.Example at 0x7febc4a5f670>,
 'NYC Neighborhoods': <libpysal.examples.base.Example at 0x7febc4a5f6a0>,
 'NYC Socio-Demographics': <libpysal.examples.base.Example at 0x7febc4a5f6d0>,
 'Ohiolung': <libpysal.examples.base.Example at 0x7febc4a5f700>,
 'Orlando1': <libpysal.examples.base.Example at 0x7febc4a5f730>,
 'Orlando2': <libpysal.examples.base.Example at 0x7febc4a5f760>,
 'Oz9799': <libpysal.examples.base.Example at 0x7febc4a5f790>,
 'Phoenix ACS': <libpysal.examples.base.Example at 0x7febc4a5f7c0>,
 'Pittsburgh': <libpysal.examples.base.Example at 0x7febc4a5f7f0>,
 'Police': <libpysal.examples.base.Example at 0x7febc4a5f820>,
 'Sacramento1': <libpysal.examples.base.Example at 0x7febc4a5f850>,
 'Sacramento2': <libpysal.examples.base.Example at 0x7febc4a5f880>,
 'SanFran Crime': <libpysal.examples.base.Example at 0x7febc4a5f8b0>,
 'Savannah1': <libpysal.examples.base.Example at 0x7febc4a5f8e0>,
 'Savannah2': <libpysal.examples.base.Example at 0x7febc4a5f910>,
 'Scotlip': <libpysal.examples.base.Example at 0x7febc4a5f940>,
 'Seattle1': <libpysal.examples.base.Example at 0x7febc4a5f970>,
 'Seattle2': <libpysal.examples.base.Example at 0x7febc4a5f9a0>,
 'SIDS': <libpysal.examples.base.Example at 0x7febc4a5f9d0>,
 'SIDS2': <libpysal.examples.base.Example at 0x7febc4a5fa00>,
 'Snow': <libpysal.examples.base.Example at 0x7febc4a5fa30>,
 'South': <libpysal.examples.base.Example at 0x7febc4a5fa60>,
 'Spirals': <libpysal.examples.base.Example at 0x7febc4a5fa90>,
 'StLouis': <libpysal.examples.base.Example at 0x7febc4a5fac0>,
 'Tampa1': <libpysal.examples.base.Example at 0x7febc4a5faf0>,
 'US SDOH': <libpysal.examples.base.Example at 0x7febc4a5fb20>,
 'Rio Grande do Sul': <libpysal.examples.base.Example at 0x7febc78c0130>,
 'nyc_bikes': <libpysal.examples.base.Example at 0x7febc4a5fb80>,
 'taz': <libpysal.examples.base.Example at 0x7febc4a5fc70>,
 'clearwater': <libpysal.examples.base.Example at 0x7febc4a5fbe0>,
 'newHaven': <libpysal.examples.base.Example at 0x7febc4a5eec0>}
[ ]: