{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7b8975c4",
   "metadata": {},
   "source": [
    "# Spatial Weights\n",
    "\n",
    "### Luc Anselin\n",
    "\n",
    "### 09/06/2024\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4cfd0985",
   "metadata": {},
   "source": [
    "## Preliminaries\n",
    "\n",
    "In this notebook, basic operations pertaining to spatial weights are reviewed. Two major cases are considered: reading weights files constructed by other software, such as *GeoDa*, and creating weights from GeoDataFrames or spatial layers using the functionality in *libpysal.weights*. In addition, some special operations are covered, such as creating spatial weights for regular grids and turning a *PySAL* weights object into a full matrix. The computation of a spatially lagged variable is illustrated as well.\n",
    "\n",
    "A video recording is available from the GeoDa Center YouTube channel playlist *Applied Spatial Regression - Notebooks*, at https://www.youtube.com/watch?v=IbmTItot0q8&list=PLzREt6r1NenmhNy-FCUwiXL17Vyty5VL6&index=4."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6494b68c",
   "metadata": {},
   "source": [
    "### Modules Needed\n",
    "\n",
    "The main functionality is provided by the utilities in *libpysal* for spatial weights, and the functionality in *geopandas* for data input and output. All of these rely on *numpy* as a dependency.\n",
    "\n",
    "To simplify notation, the `libpysal.weights` module is imported as `weights`, and `get_path` and `open` are imported from respectively `libpysal.examples` and `libpysal.io`.\n",
    "\n",
    "The `warnings` module filters some warnings about future changes. To avoid some arguably obnoxious new features of *numpy* 2.0, it is necessary to include the `set_printoptions` command if you are using a Python 3.12 environment with numpy 2.0 or greater.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "e398e42f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "import numpy as np\n",
    "import os\n",
    "os.environ['USE_PYGEOS'] = '0'\n",
    "import geopandas as gpd\n",
    "from libpysal.examples import get_path\n",
    "from libpysal.io import open\n",
    "import libpysal.weights as weights\n",
    "np.set_printoptions(legacy=\"1.25\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1ac85fb3",
   "metadata": {},
   "source": [
    "### Functions Used\n",
    "\n",
    "- from numpy:\n",
    "  - array\n",
    "  - mean\n",
    "  - std\n",
    "  - flatten\n",
    "  - @\n",
    "\n",
    "- from geopandas:\n",
    "  - read_file\n",
    "  - astype\n",
    "  \n",
    "- from libpysal.examples:\n",
    "  - get_path\n",
    "\n",
    "- from libpysal.io:\n",
    "  - open\n",
    "\n",
    "- from libpysal.weights:\n",
    "  - neighbors\n",
    "  - weights\n",
    "  - n\n",
    "  - min_neighbors, max_neighbors, mean_neighbors\n",
    "  - pct_nonzero\n",
    "  - asymmetry, asymmetries\n",
    "  - Kernel.from_file\n",
    "  - Queen.from_dataframe\n",
    "  - transform\n",
    "  - Queen.from_file\n",
    "  - KNN.from_dataframe\n",
    "  - symmetrize\n",
    "  - Kernel\n",
    "  - Kernel.from_shapefile\n",
    "  - lat2W\n",
    "  - full\n",
    "  - lag_spatial"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67da216d",
   "metadata": {},
   "source": [
    "### Files and Variables\n",
    "\n",
    "This notebook uses data on socio-economic correlates of health outcomes contained in the **chicagoSDOH** sample shape files and associated spatial weights. It is assumed that all sample files have been installed.\n",
    "\n",
    "- **Chi-SDOH.shp,shx,dbf,prj**: socio-economic indicators of health for 2014 in 791 Chicago tracts\n",
    "- **Chi-SDOH_q.gal**: queen contiguity spatial weights from `GeoDa`\n",
    "- **Chi-SDOH_k6s.gal**: k-nearest neighbor weights for k=6, made symmetric in `GeoDa`\n",
    "- **Chi-SDOH_k10tri.kwt**: triangular kernel weights based on a variable bandwidth with 10 nearest neighbors from `GeoDa`\n",
    "\n",
    "As before, file names and variable names are specified at the top of the notebook so that this is the only part that needs to be changed for other data sets and variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "12a910c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "infileshp = \"Chi-SDOH.shp\"         # input shape file\n",
    "infileq = \"Chi-SDOH_q.gal\"         # queen contiguity from GeoDa\n",
    "infileknn = \"Chi-SDOH_k6s.gal\"     # symmetric k-nearest neighbor weights from GeoDa\n",
    "infilekwt = \"Chi-SDOH_k10tri.kwt\"  # triangular kernel weights for a variable knn bandwidth from GeoDa\n",
    "outfileq = \"test_q.gal\"            # output file for queen weights computed with libpysal\n",
    "outfilek = \"test_k.kwt\"            # outpuf file for kernel weights computed with libpysal\n",
    "y_name = [\"YPLL_rate\"]             # variable to compute spatial lag"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6051ceb7",
   "metadata": {},
   "source": [
    "## Spatial Weights from a File (GeoDa)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51160fd3",
   "metadata": {},
   "source": [
    "Spatial weights are an essential part of any spatial autocorrelation analysis and spatial regression. Functionality to create and analyze spatial weights is contained in the `libpysal.weights` library.\n",
    "The full range of functions is much beyond the current scope and can be found at https://pysal.org/libpysal/api.html.\n",
    "\n",
    "Only the essentials are covered here, sufficient to proceed\n",
    "with the spatial regression analysis. Also, only the original `Weights` class is considered. A newer alternative is provided by the `Graph` class, but it is not further discussed here. Full details can be found at https://pysal.org/libpysal/user-guide/graph/w_g_migration.html.\n",
    "\n",
    "Arguably the easiest way to create spatial weights is to use the *GeoDa* software (https://geodacenter.github.io/download.html), which\n",
    "provides functionality to construct a wide range of contiguity as well as distance\n",
    "based weights through a graphical user interface. The weights information is stored as **gal**, **gwt** or **kwt** files. Importing these weights into *PySAL* is considered first.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85f6ef20",
   "metadata": {},
   "source": [
    "### Queen Contiguity Weights\n",
    "\n",
    "Contiguity weights can be read into PySAL spatial weights objects using the `read` function, after opening the file with `libpysal.io.open` (here, just `open`). This is applied to the queen contiguity weights created by `GeoDa`, contained in the file **infileq**, after obtaining its path using `get_path`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce630240",
   "metadata": {},
   "outputs": [],
   "source": [
    "inpath = get_path(infileq)\n",
    "wq = open(inpath).read()\n",
    "wq"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8519f624",
   "metadata": {},
   "source": [
    "The result is a PySAL spatial weights object of the class `libpysal.weights.weights.W`. This object contains lists of `neighbors` and `weights` as well as many other attributes and methods. \n",
    "\n",
    "It is useful to remember that the `neighbors` and `weights` are dictionaries that use an ID variable or simple sequence number as the key. A quick view of the relevant keys is obtained by converting them to a `list` and printing out the first few elements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72590f7d",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(list(wq.neighbors.keys())[0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e69b779d",
   "metadata": {},
   "source": [
    "This reveals that the keys are simple strings, starting at **'1'** and not at **0** as in the usual Python indexing. The IDs of the neighbors for a given observation can be listed by specifying the key. For example, for observation with ID='1', this yields:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ca74d6fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "wq.neighbors['1']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8772b742",
   "metadata": {},
   "source": [
    "When an inappropriate key is used, an error is generated (recall that dictionaries have no order, so there are no sequence numbers). For example, here `1` is entered as an integer, but it should have been a string, as above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b8f34cf",
   "metadata": {},
   "outputs": [],
   "source": [
    "wq.neighbors[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d40d3b5c",
   "metadata": {},
   "source": [
    "The weights associated with each observation key are found using `weights`. For example, for observation with ID='1' this yields:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1342d4e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "wq.weights['1']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f645036",
   "metadata": {},
   "source": [
    "At this point, all the weights are simply binary. Row-standardization is considered below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f90c1984",
   "metadata": {},
   "source": [
    "#### Weights Characteristics"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50ee4c2d",
   "metadata": {},
   "source": [
    "A quick check on the number of observations, i.e., the number of rows in the weights matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a60409f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "wq.n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1751a0a5",
   "metadata": {},
   "source": [
    "Minimum, maximum and average number of neighbors and percent non-zero (an indication of sparsity)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5de46f3",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "wq.min_neighbors,wq.max_neighbors,wq.mean_neighbors,wq.pct_nonzero"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c406929f",
   "metadata": {},
   "source": [
    "There is no explicit check for symmetry as such, but instead the lack of symmetry can be assessed by means of the `asymmetry` method, or the list of id pairs with asymmetric weights is obtained by means of the `asymmetries` attribute."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "430a2f59",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(wq.asymmetry())\n",
    "print(wq.asymmetries)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "545d5999",
   "metadata": {},
   "source": [
    "Since contiguity weights are symmetric by construction, the presence of an asymmetry would indicate some kind of error. This is not the case here."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa2ee8a3",
   "metadata": {},
   "source": [
    "### K-Nearest Neighbors Weights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e856073b",
   "metadata": {},
   "source": [
    "Similarly, the symmetric knn weights (k=6) created by `GeoDa` can be read from the file **infileknn**:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a86f85ce",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "inpath = get_path(infileknn)\n",
    "wk6s = open(inpath).read()\n",
    "wk6s"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4ea8c6c",
   "metadata": {},
   "source": [
    "Some characteristics:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "74c91c22",
   "metadata": {},
   "outputs": [],
   "source": [
    "wk6s.n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f55fc305",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(wk6s.min_neighbors,wk6s.max_neighbors,wk6s.mean_neighbors,wk6s.pct_nonzero)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "212cd898",
   "metadata": {},
   "source": [
    "Note how the operation to make the initially asymmetric k-nearest neighbor weights symmetric has resulted in many observations having more than 6 neighbors (`max_neighbors` is larger than 6). That is the price to pay to end up with symmetric weights, which is required for some of the estimation methods. We can list neighbors and weights in the usual way. As it turns out, the observation with key `1` is not adjusted, but observation with key `3` now has eight neighbors (up from the original six).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8fb35473",
   "metadata": {},
   "outputs": [],
   "source": [
    "wk6s.neighbors['1']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3bf5d207",
   "metadata": {},
   "outputs": [],
   "source": [
    "wk6s.neighbors['3']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69dc85a3",
   "metadata": {},
   "source": [
    "### Kernel Weights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f523e10a",
   "metadata": {},
   "source": [
    "Triangular kernel weights based on a variable bandwidth with 10 nearest neighbors created by `GeoDa` are contained in the file **infilekwt**. The properties of kernel weights are considered in more detail in a later notebook.\n",
    "\n",
    "The weights can be read in the usual fashion, by means of `libpysal.io.open`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "cc957469",
   "metadata": {},
   "outputs": [],
   "source": [
    "inpath = get_path(infilekwt)\n",
    "kwtri = open(inpath).read()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7f9a159",
   "metadata": {},
   "source": [
    "However, this does not give the desired result. The object is not recognized as kernel weights, but\n",
    "as a standard spatial weights object, revealed by checking the `type`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b8aa4de",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(type(kwtri))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b69738a",
   "metadata": {},
   "source": [
    "Tthe kernel weights can be checked with the usual `weights` attribute. However, the values for the keys in this example are not characters, but simple integers. This is revealed by a quick check of the keys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2b738d26",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(list(kwtri.neighbors.keys())[0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "70e0fafb",
   "metadata": {},
   "source": [
    "Now, with the integer 1 as the key, the contents of the weights can be listed. Note the presence of the weights 1.0 (for the diagonal). All is fine, except that *PySAL* does not recognize the weights as kernel weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "996a5c2b",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(kwtri.weights[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4584ce6",
   "metadata": {},
   "source": [
    "The alternative, using the `weights.Kernel.from_file` method from `libpysal` has the same problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dbf96029",
   "metadata": {},
   "outputs": [],
   "source": [
    "kwtri10f = weights.Kernel.from_file(inpath)\n",
    "print(type(kwtri10f))\n",
    "print(kwtri10f.weights[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f9b8309",
   "metadata": {},
   "source": [
    "#### Changing the class of weights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d28c0700",
   "metadata": {},
   "source": [
    "In this particular case, a hack is to force the class of the weights object to be a kernel weight. This is generally not recommended, but since the object in question has all the characteristics of kernel weights, it is safe to do so.\n",
    "\n",
    "It is accomplished by setting the attribute `__class__` of the weights object to `libpysal.weights.distance.Kernel`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a323c47",
   "metadata": {},
   "outputs": [],
   "source": [
    "kwtri10f.__class__ = weights.distance.Kernel\n",
    "print(type(kwtri10f))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2efc8b5b",
   "metadata": {},
   "source": [
    "## Creating Weights from a GeoDataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c193a5dd",
   "metadata": {},
   "source": [
    "### Queen Contiguity Weights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "713be4b9",
   "metadata": {},
   "source": [
    "In *PySAL*, the spatial weights construction is handled by `libpysal.weights`. The generic function is `weights.<weights_type>.from_dataframe` with as arguments the geodataframe and optionally the `ids` (recommended). For the Chicago data, the ID variable is **OJECTID**. To make sure the latter is an integer (it is not in the original data frame), its type is changed by means of the `astype` method. \n",
    "\n",
    "The same operation can also create a contiguity weights file from a shape file, using `weigths.<weights_type>.from_shapefile`, but this is left as an exercise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "015c694f",
   "metadata": {},
   "outputs": [],
   "source": [
    "inpath = get_path(infileshp)\n",
    "dfs = gpd.read_file(inpath)\n",
    "dfs = dfs.astype({'OBJECTID':'int'})\n",
    "wq1 = weights.Queen.from_dataframe(dfs,ids='OBJECTID')\n",
    "wq1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9029a443",
   "metadata": {},
   "source": [
    "A quick check on the keys reveals these are integers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e404c2c",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(list(wq1.neighbors.keys())[0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b9263cc3",
   "metadata": {},
   "source": [
    "Again, some characteristics:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d0f697e",
   "metadata": {},
   "outputs": [],
   "source": [
    "wq1.n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cbad937",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(wq1.min_neighbors,wq1.max_neighbors,wq1.mean_neighbors,wq1.pct_nonzero)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "baf427f5",
   "metadata": {},
   "source": [
    "The structure of the weights is identical to that from the file read from `GeoDa`. For example, the first set of neighbors and weights are:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7321592",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(wq1.neighbors[1])\n",
    "print(wq1.weights[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7dfc46a3",
   "metadata": {},
   "source": [
    "### Row-standardization"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0756cb5e",
   "metadata": {},
   "source": [
    "As created, the weights are simply 1.0 for binary weights. To turn the weights into row-standardized form, a *transformation* is needed, `wq1.transform = 'r'`:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "931dda95",
   "metadata": {},
   "outputs": [],
   "source": [
    "wq1.transform = 'r'\n",
    "wq1.weights[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b659247c",
   "metadata": {},
   "source": [
    "### Writing a Weights File"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05e387f1",
   "metadata": {},
   "source": [
    "To write out the weights object to a GAL file, `libpysal.io.open` is used with the `write` method. The argument to the `open` command is the filename and `mode='w'` (for writing a file). The weights object itself is the argument to the `write` method.\n",
    "\n",
    "Note that even though the weights are row-standardized, this information is lost in the output file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "d0547d6d",
   "metadata": {},
   "outputs": [],
   "source": [
    "open(outfileq,mode='w').write(wq1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c928df1",
   "metadata": {},
   "source": [
    "A quick check using the `weights.Queen.from_file` operation on the just created weights file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "681ef494",
   "metadata": {},
   "outputs": [],
   "source": [
    "wq1a = weights.Queen.from_file(outfileq)\n",
    "print(wq1a.n)\n",
    "print(list(wq1a.neighbors.keys())[0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f26a5b6",
   "metadata": {},
   "source": [
    "Note how the type of the key has changed from integer above to character after reading from the outside file. This again stresses the importance of checking the keys before any further operations.\n",
    "\n",
    "The weights are back to their original binary form, so the row-standardization is lost after writing the output file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc1e60e9",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "wq1a.weights['1']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4e53bca",
   "metadata": {},
   "source": [
    "### KNN Weights\n",
    "\n",
    "The corresponding functionality for k-nearest neighbor weights is `weights.KNN.from_dataframe`. An important argument is `k`, the number of neighbors, with the default set to `2`, which is typically not that useful. Again, it is useful to include OBJECTID as the ID variable. Initially the weights are in binary form. As before, they are row-standardized.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2128e659",
   "metadata": {},
   "outputs": [],
   "source": [
    "wk6 = weights.KNN.from_dataframe(dfs,k=6,ids='OBJECTID')\n",
    "print(wk6.n)\n",
    "print(list(wk6.neighbors.keys())[0:5])\n",
    "wk6"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e5173eb",
   "metadata": {},
   "source": [
    "To compare the just created weights to the symmetric form read into **wk6s**, the list of neighbors for observation 3 is informative. It consists of a subset of six from the list of eight from the above symmetric knn weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "599541a1",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(wk6.neighbors[3])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d7b4f435",
   "metadata": {},
   "source": [
    "The k-nearest neighbor weights are intrinsically asymmetric. Rather than listing all the pairs that contain such asymmetries, the length of this list can be checked using the `asymmetry` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b112430",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(len(wk6.asymmetry()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02aec84c",
   "metadata": {},
   "source": [
    "KNN weights have a built-in method to make them symmetric: `symmetrize`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "239b43ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "wk6s2 = wk6.symmetrize()\n",
    "print(len(wk6.asymmetry()))\n",
    "print(len(wk6s2.asymmetry()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd55eca5",
   "metadata": {},
   "source": [
    "The entries are now the same as for the symmetric knn GAL file that was read in from `GeoDa`. For example, the neighbors of observation with key `3` are:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea5eb99d",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(wk6s2.neighbors[3])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c828bbe",
   "metadata": {},
   "source": [
    "Finally, to make them row-standardized, the same transformation is used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e98d4fde",
   "metadata": {},
   "outputs": [],
   "source": [
    "wk6s2.transform = 'r'\n",
    "wk6s2.weights[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a380252",
   "metadata": {},
   "source": [
    "## Kernel Weights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d805bada",
   "metadata": {},
   "source": [
    "There are several ways to create the kernel weights that are used later in the course, for example to compute HAC standard errors in ordinary least squares regression. One is to create the weights in `GeoDa` and save them as a weights file with a **kwt** extension. However, currently, there is a bug in libpysal so that the proper class needs to be set explicitly.\n",
    "\n",
    "The alternative is to compute the weights directly with `PySAL`. This can be implemented in a number of ways. One is to create the weights using the `libpysal.weights.Kernel` function, with a matrix of x-y coordinates passed. Another is to compute the weights directly from the information in a shape file, using `libpysal.weights.Kernel.from_shapefile`.\n",
    "\n",
    "Each is considered in turn."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8b452302",
   "metadata": {},
   "source": [
    "### Kernel Weights Computation"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3734b6f",
   "metadata": {},
   "source": [
    "Direct computation of kernel weights takes as input an array of coordinates. Typically these are the coordinates of the locations, but it is a perfectly general approach and can take any number of variables to compute *general* distances (or economic distances). In the example, the X and Y coordinates contained in the geodataframe **dfs** are used as `COORD_X` and `COORD_Y`. \n",
    "\n",
    "First, the respective columns from the data frame are turned into a numpy array.\n",
    "\n",
    "The command to create the kernel weights is `libpysal.weights.Kernel`. It takes the array as the first argument, followed by a number of options. To have a variable bandwidth that follows the 10 nearest neighbors, \n",
    "`fixed = False` (the default is a fixed bandwidth) and `k=10`. The kernel function is selected as `function=\"triangular\"` (this is also the default, but it is included here for clarity). Finally, the use of kernel weights in the HAC calculations requires the diagonals to be set to the value of one, achieved by means\n",
    "of `diagonal=True`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3a98748",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "coords = np.array(dfs[['COORD_X','COORD_Y']])\n",
    "kwtri10 = weights.Kernel(coords,fixed=False,k=10,\n",
    "                                   function=\"triangular\",diagonal=True)\n",
    "print(type(kwtri10))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80d86e62",
   "metadata": {},
   "source": [
    "The result is an object of class `libpysal.weights.distance.Kernel`. This contains several attributes, such as the kernel function used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "82abf5af",
   "metadata": {},
   "outputs": [],
   "source": [
    "kwtri10.function"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3a6d3c3",
   "metadata": {},
   "source": [
    "A check on the keys. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ff74853e",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(list(kwtri10.neighbors.keys())[0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19215410",
   "metadata": {},
   "source": [
    "Note that the index starts at 0 and the keys are integers. The neighbors for the first observation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f47c9c44",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "kwtri10.neighbors[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e6ea4dc",
   "metadata": {},
   "source": [
    "The kernel weights for the first observations:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "380606c6",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "kwtri10.weights[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75e81a3c",
   "metadata": {},
   "source": [
    "These are the same values as we obtained above from reading the kwt file, but now they are recognized as a proper kernel weights object."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc1b40f5",
   "metadata": {},
   "source": [
    "### Kernel Weights from a Shape File"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "df575721",
   "metadata": {},
   "source": [
    "Contiguity weights, distance weights and kernel weights can also be constructed directly from a shape file, using the relevant `from_shapefile` methods. For kernel weights, this can be based on either point coordinates or on the coordinates of polygon centroids to compute the distances needed. The relevant function is `libpysal.weights.Kernel.from_shapefile` with as its main argument the file (path) name of the \n",
    "shape file involved. The other arguments are the same options as before. The shape file in **infileshp**  is used as the input file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7cc1d8ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "inpath = get_path(infileshp)\n",
    "kwtri10s = weights.Kernel.from_shapefile(inpath,\n",
    "                                                 fixed=False,k=10,\n",
    "                                   function=\"triangular\",diagonal=True)\n",
    "print(type(kwtri10s))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "380bf1a0",
   "metadata": {},
   "source": [
    "The result is of the proper type, contains the same structure as before, with matching function, neighbors and weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d776d966",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(kwtri10s.function)\n",
    "print(list(kwtri10s.neighbors.keys())[0:5])\n",
    "print(kwtri10s.neighbors[0])\n",
    "print(kwtri10s.weights[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "948f1fb0",
   "metadata": {},
   "source": [
    "### Writing the Kernel Weights"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03f68cfb",
   "metadata": {},
   "source": [
    "We use the same method as for the queen weights to write the just constructed kernel weights to an outside kwt file. The output file is `outfilek`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "9ec284cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "open(outfilek,mode='w').write(kwtri10s)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d05c15d0",
   "metadata": {},
   "source": [
    "Quick check:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7ab1227",
   "metadata": {},
   "outputs": [],
   "source": [
    "kk = weights.Kernel.from_file(outfilek)\n",
    "print(type(kk))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "021da548",
   "metadata": {},
   "source": [
    "So, the same problem as mentioned above persists for weights files written by *PySAL* and the proper class needs to be set explicitly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "183515cd",
   "metadata": {},
   "outputs": [],
   "source": [
    "kk.__class__ = weights.distance.Kernel\n",
    "print(type(kk))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0eeb182",
   "metadata": {},
   "source": [
    "## Special Weights Operations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c7a1d94",
   "metadata": {},
   "source": [
    "A few special weights operations will come in handy later on. One is to create spatial weights for a regular grid setup, which is very useful for simulation designs. The other is to turn a spatial weights object into a standard numpy array, which can be be used in all kinds of matrix operations."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3507af4d",
   "metadata": {},
   "source": [
    "### Weights for Regular Grids"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1686fa14",
   "metadata": {},
   "source": [
    "The `weights.lat2W` operation creates rook contiguity spatial weights (the default, queen contiguity is available for `rook = False`) for a regular rectangular grid with the number of rows and the number of columns as the arguments. The result is a simple binary weights object, so row-standardization is typically needed as well.\n",
    "\n",
    "For a square grid, with **gridside=20** as the number of rows/columns, the result has dimension 400."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "32e033b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "gridside = 20\n",
    "wgrid = weights.lat2W(gridside,gridside,rook=True)\n",
    "wgrid.n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80e3a3ae",
   "metadata": {},
   "source": [
    "Quick check on the neighbor keys."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d19a637f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(list(wgrid.neighbors.keys())[0:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6979a434",
   "metadata": {},
   "source": [
    "Since this is a square grid, the first observation, in the upper left corner, has only two neighbors, one\n",
    "to the right (1) and one below (20 - since the first row goes from 0 to 19)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d97e2ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "wgrid.neighbors[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfd62652",
   "metadata": {},
   "source": [
    "Row-standardization yields the actual weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "75cc9d20",
   "metadata": {},
   "outputs": [],
   "source": [
    "wgrid.transform = 'r'\n",
    "wgrid.weights[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b930a20a",
   "metadata": {},
   "source": [
    "Any non-border cell has four neighbors, one to the left, right, up and down."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4fe7013",
   "metadata": {},
   "outputs": [],
   "source": [
    "wgrid.weights[21]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "634ddd29",
   "metadata": {},
   "source": [
    "### Weights as Matrices"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f79a8b0",
   "metadata": {},
   "source": [
    "The `weights.full` operation turns a spatial weights object into a standard numpy array. The function returns a tuple, of which the first element is the actual matrix and the second consists of a list of keys. For actual matrix operations, the latter is not that useful.\n",
    "\n",
    "It is important to remember to always extract the first element of the tuple as the matrix of interest. Otherwise, one quickly runs into trouble with array operations.\n",
    "\n",
    "This is illustrated for the row-standardized queen weights **wq1** created earlier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8a19df1",
   "metadata": {},
   "outputs": [],
   "source": [
    "wq1full, wqfkeys = weights.full(wq1)\n",
    "print(type(wq1full),type(wqfkeys))\n",
    "wq1full.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b47feef",
   "metadata": {},
   "source": [
    "## Spatially Lagged Variables"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "355c9d91",
   "metadata": {},
   "source": [
    "Spatially lagged variables are essential in the specification of spatial regression models. They are the product of a spatial weight matrix with a vector of observations and yield new values as (weighted) averages of the values observed at neighboring locations (with the neighbors defined by the spatial weights).\n",
    "\n",
    "This is illustrated for the variable **y_name** extracted from the data frame. Its mean and standard deviation are listed using the standard `numpy` methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f357ab21",
   "metadata": {},
   "outputs": [],
   "source": [
    "y = np.array(dfs[y_name])\n",
    "print(y.shape)\n",
    "print(y.mean())\n",
    "print(y.std())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f369e77",
   "metadata": {},
   "source": [
    "The new spatially lagged variable is created with the `weights.lag_spatial` command, passing the weights object **wq1** and the vector of interest, **y**. Its important to make sure that the dimensions match. In particular, if the vector in question is not an actual column vector, but a one-dimensional array, the result will not be a vector, but an array. This may cause trouble in some applicaitons."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc5f2f0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "wy = weights.lag_spatial(wq1,y)\n",
    "print(wy.shape)\n",
    "print(wy.mean())\n",
    "print(wy.std())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d706e58",
   "metadata": {},
   "source": [
    "The result is a column vector. The mean roughly corresponds to that of the original variable, but the spatially lagged variable has a smaller standard deviation. This illustrates the *smoothing* implied by the spatial lag operation.\n",
    "\n",
    "To illustrate the problem with numpy arrays rather than vectors, the original vector is flattened and then the `lag_spatial` operation is applied to it. Everything works fine, except that the result is an array, and not a column vector."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b45d600a",
   "metadata": {},
   "outputs": [],
   "source": [
    "yy = y.flatten()\n",
    "print(yy.shape)\n",
    "wyy = weights.lag_spatial(wq1,yy)\n",
    "print(wyy.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "682469c5",
   "metadata": {},
   "source": [
    "The same result can also be obtained using an explicit matrix-vector multiplication with the full matrix **wq1full** just created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "622eb958",
   "metadata": {},
   "outputs": [],
   "source": [
    "wy1 = wq1full @ y\n",
    "print(wy1.shape)\n",
    "print(wy1.mean())\n",
    "print(wy1.std())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65aadbff",
   "metadata": {},
   "source": [
    "## Practice\n",
    "\n",
    "Experiment with various spatial weights for your own data set or for one of the PySAL sample data sets. Create a spatially lagged variable for each of the weights and compare their properties, such as the mean, standard deviation, correlation between the original variable and the spatial lag, etc.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}