This page was generated from user-guide/data/examples.ipynb. Interactive online version: Binder badge

Datasets for use with libpysal

As of version 4.2, libpysal has refactored the examples package to:

This notebook highlights the new functionality

Backwards compatibility is maintained

If you were familiar with previous versions of libpysal, the newest version maintains backwards compatibility so any code that relied on the previous API should work.

For example:

[1]:
from libpysal.examples import get_path

[2]:
get_path("mexicojoin.dbf")
[2]:
'/home/serge/para/1_projects/code-pysal-libpysal/libpysal/libpysal/examples/mexico/mexicojoin.dbf'

An important thing to note here is that the path to the file for this particular example is within the source distribution that was installed. Such an example data set is now referred to as a builtin dataset.

[3]:
import libpysal
dbf = libpysal.io.open(get_path("mexicojoin.dbf"))
[4]:
dbf.header
[4]:
['POLY_ID',
 'AREA',
 'CODE',
 'NAME',
 'PERIMETER',
 'ACRES',
 'HECTARES',
 'PCGDP1940',
 'PCGDP1950',
 'PCGDP1960',
 'PCGDP1970',
 'PCGDP1980',
 'PCGDP1990',
 'PCGDP2000',
 'HANSON03',
 'HANSON98',
 'ESQUIVEL99',
 'INEGI',
 'INEGI2',
 'MAXP',
 'GR4000',
 'GR5000',
 'GR6000',
 'GR7000',
 'GR8000',
 'GR9000',
 'LPCGDP40',
 'LPCGDP50',
 'LPCGDP60',
 'LPCGDP70',
 'LPCGDP80',
 'LPCGDP90',
 'LPCGDP00',
 'TEST']

The function available is also available but has been updated to return a Pandas DataFrame. In addition to the builtin datasets, available will report on what datasets are available, either as builtin or remotes.

[5]:
from libpysal.examples import available
[6]:
df = available()
[7]:
df.shape
[7]:
(99, 3)
[8]:
libpysal.examples.summary()
99 datasets available, 29 installed, 70 remote.

We see that there are 98 total datasets available for use with PySAL. On an initial install (i.e., examples has not been used yet), 27 of these are builtin datasets and 71 are remote. The latter can be downloaded and installed.

Downloading Remote Datasets

[9]:
df.head()
[9]:
Name Description Installed
0 10740 Albuquerque, New Mexico, Census 2000 Tract Dat... True
1 AirBnB Airbnb rentals, socioeconomics, and crime in C... False
2 Atlanta Atlanta, GA region homicide counts and rates False
3 Baltimore Baltimore house sales prices and hedonics False
4 Bostonhsg Boston housing and neighborhood data False

The remote AirBnB can be installed by calling load_example:

[10]:
airbnb = libpysal.examples.load_example("AirBnB")
Downloading AirBnB to /home/serge/.local/share/pysal/AirBnB
[11]:
libpysal.examples.summary()
99 datasets available, 30 installed, 69 remote.

And we see that the number of remotes as declined by one and the number of installed has increased by 1.

Trying to load an example that doesn’t exist will return None and alert the user:

[12]:
libpysal.examples.load_example('dataset42')
Example not available: dataset42

Getting remote urls

If the url, rather than the dataset, is needed this can be obtained on a remote with get_url. As the Baltimore dataset has not yet been downloaded in this example, we can grab it’s url:

[13]:
balt_url = libpysal.examples.get_url('Baltimore')
balt_url
[13]:
'https://geodacenter.github.io/data-and-lab//data/baltimore.zip'

Explaining a dataset

[14]:
libpysal.examples.explain('taz')
taz
===

Dataset used for regionalization
--------------------------------

* taz.dbf: attribute data. (k=14)
* taz.shp: Polygon shapefile. (n=4109)
* taz.shx: spatial index.

[15]:
taz = libpysal.examples.load_example('taz')
[16]:
taz.get_file_list()
[16]:
['/home/serge/.local/share/pysal/taz/taz-master/taz.dbf',
 '/home/serge/.local/share/pysal/taz/taz-master/README.md',
 '/home/serge/.local/share/pysal/taz/taz-master/taz.shx',
 '/home/serge/.local/share/pysal/taz/taz-master/taz.shp']
[17]:
libpysal.examples.explain('Baltimore')
[17]:
[18]:
balt = libpysal.examples.load_example('Baltimore')
Downloading Baltimore to /home/serge/.local/share/pysal/Baltimore
[19]:
libpysal.examples.available()
[19]:
Name Description Installed
0 10740 Albuquerque, New Mexico, Census 2000 Tract Dat... True
1 AirBnB Airbnb rentals, socioeconomics, and crime in C... True
2 Atlanta Atlanta, GA region homicide counts and rates False
3 Baltimore Baltimore house sales prices and hedonics True
4 Bostonhsg Boston housing and neighborhood data False
... ... ... ...
94 taz Traffic Analysis Zones in So. California True
95 tokyo Tokyo Mortality data True
96 us_income Per-capita income for the lower 48 US states 1... True
97 virginia Virginia counties shapefile True
98 wmat Datasets used for spatial weights testing True

99 rows × 3 columns

Working with an example dataset

explain will render maps for an example if available

[20]:
from libpysal.examples import explain
explain('Tampa1')
[20]:
[21]:
from libpysal.examples import load_example
tampa1 = load_example('Tampa1')
Downloading Tampa1 to /home/serge/.local/share/pysal/Tampa1
[22]:
tampa1.installed
[22]:
True
[23]:
tampa1.get_file_list()
[23]:
['/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.mif',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sbn',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.mif',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.dbf',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sqlite',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.xlsx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.kml',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.shp',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.xlsx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.shx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.gpkg',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.geojson',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.gpkg',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.shp',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.geojson',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.kml',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.prj',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/2000 Census Data Variables_Documentation.pdf',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.mid',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sbx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sbn',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.sbx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.shx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.mid',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.dbf',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_final_census2.sqlite',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByType.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000002.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByName.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.spx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.spx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByUUID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.TablesByName.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000001.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByBackwardLabel.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.FDO_UUID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.CatItemTypesByParentTypeID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/gdb',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByOriginItemTypeID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000005.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.CatItemsByPhysicalName.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByForwardLabel.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByOriginID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.gdbindexes',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByName.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000009.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.CatItemsByType.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000004.FDO_UUID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByUUID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000007.CatRelTypesByDestItemTypeID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a0000000a.spx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000006.CatRelsByDestinationID.atx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/timestamps',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000003.gdbtable',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/TampaMSA.gdb/a00000002.gdbtablx',
 '/home/serge/.local/share/pysal/Tampa1/TampaMSA/tampa_counties.prj',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_counties.sbx',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._2000 Census Data Variables_Documentation.pdf',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_final_census2.sbx',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_counties.sbn',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/TampaMSA/._tampa_final_census2.sbn',
 '/home/serge/.local/share/pysal/Tampa1/__MACOSX/._TampaMSA']
[24]:
tampa_counties_shp = tampa1.load('tampa_counties.shp')
[25]:
tampa_counties_shp
[25]:
<libpysal.io.iohandlers.pyShpIO.PurePyShpWrapper at 0x7f22604b2810>
[26]:
import geopandas
[27]:
tampa_df = geopandas.read_file(tampa1.get_path('tampa_counties.shp'))
[28]:
%matplotlib inline
tampa_df.plot()
[28]:
<Axes: >
../../_images/user-guide_data_examples_40_1.png

Other Remotes

In addition to the remote datasets from the GeoData Data Science Center, there are several large remotes available at github repositories.

[29]:
libpysal.examples.explain('Rio Grande do Sul')
Rio_Grande_do_Sul
======================

Cities of the Brazilian State of Rio Grande do Sul
-------------------------------------------------------

* 43MUE250GC_SIR.dbf: attribute data (k=2)
* 43MUE250GC_SIR.shp: Polygon shapefile (n=499)
* 43MUE250GC_SIR.shx: spatial index
* 43MUE250GC_SIR.cpg: encoding file
* 43MUE250GC_SIR.prj: projection information
* map_RS_BR.dbf: attribute data (k=3)
* map_RS_BR.shp: Polygon shapefile (no lakes) (n=497)
* map_RS_BR.prj: projection information
* map_RS_BR.shx: spatial index



Source: Renan Xavier Cortes
Reference: https://github.com/pysal/pysal/issues/889#issuecomment-396693495


Note that the explain function generates a textual description of this example dataset - no rendering of the map is done as the source repository does not include that functionality.

[30]:
rio = libpysal.examples.load_example('Rio Grande do Sul')
Downloading Rio Grande do Sul to /home/serge/.local/share/pysal/Rio_Grande_do_Sul
[31]:
libpysal.examples.remote_datasets.datasets # a listing of all remotes
[31]:
{'AirBnB': <libpysal.examples.base.Example at 0x7f2265395550>,
 'Atlanta': <libpysal.examples.base.Example at 0x7f22653954f0>,
 'Baltimore': <libpysal.examples.base.Example at 0x7f2265395580>,
 'Bostonhsg': <libpysal.examples.base.Example at 0x7f2264367800>,
 'Buenosaires': <libpysal.examples.base.Example at 0x7f22643677d0>,
 'Charleston1': <libpysal.examples.base.Example at 0x7f2264367860>,
 'Charleston2': <libpysal.examples.base.Example at 0x7f2264367890>,
 'Chicago Health': <libpysal.examples.base.Example at 0x7f22643678c0>,
 'Chicago commpop': <libpysal.examples.base.Example at 0x7f22643678f0>,
 'Chicago parcels': <libpysal.examples.base.Example at 0x7f2264367920>,
 'Chile Labor': <libpysal.examples.base.Example at 0x7f2264367950>,
 'Chile Migration': <libpysal.examples.base.Example at 0x7f2264367980>,
 'Cincinnati': <libpysal.examples.base.Example at 0x7f22643679b0>,
 'Cleveland': <libpysal.examples.base.Example at 0x7f22643679e0>,
 'Columbus': <libpysal.examples.base.Example at 0x7f2264367a10>,
 'Elections': <libpysal.examples.base.Example at 0x7f2264367a40>,
 'Grid100': <libpysal.examples.base.Example at 0x7f2264367a70>,
 'Groceries': <libpysal.examples.base.Example at 0x7f2264367aa0>,
 'Guerry': <libpysal.examples.base.Example at 0x7f2264367ad0>,
 'Health+': <libpysal.examples.base.Example at 0x7f2264367b60>,
 'Health Indicators': <libpysal.examples.base.Example at 0x7f2264367b90>,
 'Hickory1': <libpysal.examples.base.Example at 0x7f2264367b00>,
 'Hickory2': <libpysal.examples.base.Example at 0x7f2264367b30>,
 'Home Sales': <libpysal.examples.base.Example at 0x7f2264367bc0>,
 'Houston': <libpysal.examples.base.Example at 0x7f2264367bf0>,
 'Juvenile': <libpysal.examples.base.Example at 0x7f2264367c20>,
 'Lansing1': <libpysal.examples.base.Example at 0x7f2264367c50>,
 'Lansing2': <libpysal.examples.base.Example at 0x7f2264367c80>,
 'Laozone': <libpysal.examples.base.Example at 0x7f2264367cb0>,
 'LasRosas': <libpysal.examples.base.Example at 0x7f2264367ce0>,
 'Liquor Stores': <libpysal.examples.base.Example at 0x7f2264367d10>,
 'Malaria': <libpysal.examples.base.Example at 0x7f2264367d40>,
 'Milwaukee1': <libpysal.examples.base.Example at 0x7f2264367d70>,
 'Milwaukee2': <libpysal.examples.base.Example at 0x7f2264367da0>,
 'NCOVR': <libpysal.examples.base.Example at 0x7f2264367dd0>,
 'Natregimes': <libpysal.examples.base.Example at 0x7f2264367e00>,
 'NDVI': <libpysal.examples.base.Example at 0x7f2264367e30>,
 'Nepal': <libpysal.examples.base.Example at 0x7f2264367e60>,
 'NYC': <libpysal.examples.base.Example at 0x7f2264367e90>,
 'NYC Earnings': <libpysal.examples.base.Example at 0x7f2264367ec0>,
 'NYC Education': <libpysal.examples.base.Example at 0x7f2264367ef0>,
 'NYC Neighborhoods': <libpysal.examples.base.Example at 0x7f2264367f20>,
 'NYC Socio-Demographics': <libpysal.examples.base.Example at 0x7f2264367f50>,
 'Ohiolung': <libpysal.examples.base.Example at 0x7f2264367fb0>,
 'Orlando1': <libpysal.examples.base.Example at 0x7f2264367f80>,
 'Orlando2': <libpysal.examples.base.Example at 0x7f2264367fe0>,
 'Oz9799': <libpysal.examples.base.Example at 0x7f2264390080>,
 'Phoenix ACS': <libpysal.examples.base.Example at 0x7f2264390050>,
 'Pittsburgh': <libpysal.examples.base.Example at 0x7f22643900b0>,
 'Police': <libpysal.examples.base.Example at 0x7f22643900e0>,
 'Sacramento1': <libpysal.examples.base.Example at 0x7f2264390110>,
 'Sacramento2': <libpysal.examples.base.Example at 0x7f2264390140>,
 'SanFran Crime': <libpysal.examples.base.Example at 0x7f2264390170>,
 'Savannah1': <libpysal.examples.base.Example at 0x7f22643901a0>,
 'Savannah2': <libpysal.examples.base.Example at 0x7f22643901d0>,
 'Scotlip': <libpysal.examples.base.Example at 0x7f2264390200>,
 'Seattle1': <libpysal.examples.base.Example at 0x7f2264390230>,
 'Seattle2': <libpysal.examples.base.Example at 0x7f2264390260>,
 'SIDS': <libpysal.examples.base.Example at 0x7f2264390290>,
 'SIDS2': <libpysal.examples.base.Example at 0x7f22643902c0>,
 'Snow': <libpysal.examples.base.Example at 0x7f22643902f0>,
 'South': <libpysal.examples.base.Example at 0x7f2264390320>,
 'Spirals': <libpysal.examples.base.Example at 0x7f2264390350>,
 'StLouis': <libpysal.examples.base.Example at 0x7f2264390380>,
 'Tampa1': <libpysal.examples.base.Example at 0x7f22643903b0>,
 'US SDOH': <libpysal.examples.base.Example at 0x7f22643903e0>,
 'Rio Grande do Sul': <libpysal.examples.base.Example at 0x7f2264390440>,
 'nyc_bikes': <libpysal.examples.base.Example at 0x7f22643904a0>,
 'taz': <libpysal.examples.base.Example at 0x7f2264390410>,
 'clearwater': <libpysal.examples.base.Example at 0x7f2264390530>,
 'newHaven': <libpysal.examples.base.Example at 0x7f2264390560>,
 'chicagoSDOH': <libpysal.examples.base.Example at 0x7f22643904d0>}
[ ]:

[ ]:

[ ]: