This page was generated from notebooks/facloc-disperse-real-world.ipynb. Interactive online version: Binder badge

The P-Dispersion Problem: An Empirical Example

Authors: Erin Olson, Germano Barcelos, James Gaboardi, Levi J. Wolf, Qunshan Zhao

This tutorial extends the Empirical examples notebook, specifically for the \(p\)-dispersion problem. A deeper dive into the \(p\)-dispersion problem can be found here. Also, this tutorial demonstrates the use of different solvers that PULP supports.

[1]:
%config InlineBackend.figure_format = "retina"
%load_ext watermark
%watermark
Last updated: 2023-12-10T13:43:20.123733-05:00

Python implementation: CPython
Python version       : 3.11.6
IPython version      : 8.18.0

Compiler    : Clang 15.0.7
OS          : Darwin
Release     : 23.1.0
Machine     : x86_64
Processor   : i386
CPU cores   : 8
Architecture: 64bit

[2]:
import geopandas
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import matplotlib.lines as mlines
import matplotlib_scalebar
from matplotlib_scalebar.scalebar import ScaleBar
import numpy
import pandas
import pulp
import shapely
from shapely.geometry import Point
import spopt
from spopt.locate import PDispersion
import time
import warnings

%watermark -w
%watermark -iv
Watermark: 2.4.3

matplotlib         : 3.8.2
matplotlib_scalebar: 0.8.1
numpy              : 1.26.2
geopandas          : 0.14.1
pandas             : 2.1.3
spopt              : 0.5.1.dev59+g343ef27
shapely            : 2.0.2
pulp               : 2.7.0


We use 2 data files as input:

  • facility_points represents the stores that are candidate facility sites

  • tract is the polygon of census tract 205.

Note that all other ‘Real World Facility Location’ demonstration notebooks utilize this file which contains facility-client network distances that were calculated using the ArcGIS Network Analyst Extension. This notebook, solving for \(P\)-Dispersion, does not use this file or any network distances and instead relies solely on euclidean distance for solving the model.

All datasets are available online in this repository.

[3]:
DIRPATH = "../spopt/tests/data/"

facility_points dataframe

[4]:
facility_points = pandas.read_csv(DIRPATH + "SF_store_site_16_longlat.csv", index_col=0)
facility_points = facility_points.reset_index(drop=True)
facility_points
[4]:
OBJECTID NAME long lat
0 1 Store_1 -122.510018 37.772364
1 2 Store_2 -122.488873 37.753764
2 3 Store_3 -122.464927 37.774727
3 4 Store_4 -122.473945 37.743164
4 5 Store_5 -122.449291 37.731545
5 6 Store_6 -122.491745 37.649309
6 7 Store_7 -122.483182 37.701109
7 8 Store_11 -122.433782 37.655364
8 9 Store_12 -122.438982 37.719236
9 10 Store_13 -122.440218 37.745382
10 11 Store_14 -122.421636 37.742964
11 12 Store_15 -122.430982 37.782964
12 13 Store_16 -122.426873 37.769291
13 14 Store_17 -122.432345 37.805218
14 15 Store_18 -122.412818 37.805745
15 16 Store_19 -122.398909 37.797073

study_area dataframe

[5]:
study_area = geopandas.read_file(DIRPATH + "ServiceAreas_4.shp").dissolve()
study_area
[5]:
geometry
0 POLYGON ((-122.45299 37.63898, -122.45415 37.6...

Plot study_area

[6]:
base = study_area.plot()
base.axis("off");
../_images/notebooks_facloc-disperse-real-world_10_0.png

Create a geodataframe of the candidate facility sites.

[7]:
process = lambda df: as_gdf(df).sort_values(by=["NAME"]).reset_index(drop=True)
as_gdf = lambda df: geopandas.GeoDataFrame(df, geometry=pnts(df))
pnts = lambda df: geopandas.points_from_xy(df.long, df.lat)
[8]:
facility_points = process(facility_points)

Reproject the input spatial data.

[9]:
for _df in [facility_points, study_area]:
    _df.set_crs("EPSG:4326", inplace=True)
    _df.to_crs("EPSG:7131", inplace=True)

Set parameter P_FACILITIES for the number for candidate facilities to include in the optimal selection set.

[10]:
# number of candidate facilities in optimal solution
P_FACILITIES = 3
[11]:
solver = pulp.PULP_CBC_CMD(msg=False)

P-Dispersion

[12]:
pdispersion = PDispersion.from_geodataframe(
    facility_points,
    "geometry",
    P_FACILITIES,
    distance_metric="euclidean",
    name=f"p-dispersion"
)
pdispersion = pdispersion.solve(solver)
pdispersion
[12]:
<spopt.locate.p_dispersion.PDispersion at 0x161b4e750>
[13]:
n_fac_pnts = facility_points.shape[0]
pdispersion_obj = round(pdispersion.problem.objective.value(), 3)
print(
    "A maximized minimum inter-facility distance between any two sited candiate "
    f"facilities of {pdispersion_obj} meters is observed while siting "
    f"facilities at {P_FACILITIES} of the available {n_fac_pnts} locations."
)
A maximized minimum inter-facility distance between any two sited candiate facilities of 10164.495 meters is observed while siting facilities at 3 of the available 16 locations.

Define the decision variable names used for mapping later.

[14]:
facility_points["dv"] = pdispersion.fac_vars
facility_points["dv"] = facility_points["dv"].map(
    lambda x: x.name.replace("_", "").replace("x", "y")
)
facility_points
[14]:
OBJECTID NAME long lat geometry dv
0 1 Store_1 -122.510018 37.772364 POINT (42712.165 26483.898) y0
1 8 Store_11 -122.433782 37.655364 POINT (49431.133 13496.279) y1
2 9 Store_12 -122.438982 37.719236 POINT (48971.439 20585.532) y2
3 10 Store_13 -122.440218 37.745382 POINT (48862.129 23487.462) y3
4 11 Store_14 -122.421636 37.742964 POINT (50499.936 23219.396) y4
5 12 Store_15 -122.430982 37.782964 POINT (49675.336 27658.898) y5
6 13 Store_16 -122.426873 37.769291 POINT (50037.687 26141.402) y6
7 14 Store_17 -122.432345 37.805218 POINT (49554.745 30128.981) y7
8 15 Store_18 -122.412818 37.805745 POINT (51274.389 30188.010) y8
9 16 Store_19 -122.398909 37.797073 POINT (52499.809 29225.972) y9
10 2 Store_2 -122.488873 37.753764 POINT (44574.304 24418.447) y10
11 3 Store_3 -122.464927 37.774727 POINT (46684.891 26744.653) y11
12 4 Store_4 -122.473945 37.743164 POINT (45889.483 23241.485) y12
13 5 Store_5 -122.449291 37.731545 POINT (48062.508 21951.687) y13
14 6 Store_6 -122.491745 37.649309 POINT (44315.976 12824.977) y14
15 7 Store_7 -122.483182 37.701109 POINT (45073.750 18574.015) y15

P-Dispersion with selection of predefined candidate facilities

However, in many real world applications there may already be existing facility locations with the goal being to add one or more new facilities. Here we will define facilites :math:`y_{11}` and :math:`y_{15}` as already existing (they must be present in the model solution). This will lead to a sub-optimal solution.

Important: The facilities in "predefined_loc" are a binary array where 1 means the associated location must appear in the solution.

[15]:
facility_points["predefined_loc"] = 0
facility_points.loc[(5, 10), "predefined_loc"] = 1
facility_points
[15]:
OBJECTID NAME long lat geometry dv predefined_loc
0 1 Store_1 -122.510018 37.772364 POINT (42712.165 26483.898) y0 0
1 8 Store_11 -122.433782 37.655364 POINT (49431.133 13496.279) y1 0
2 9 Store_12 -122.438982 37.719236 POINT (48971.439 20585.532) y2 0
3 10 Store_13 -122.440218 37.745382 POINT (48862.129 23487.462) y3 0
4 11 Store_14 -122.421636 37.742964 POINT (50499.936 23219.396) y4 0
5 12 Store_15 -122.430982 37.782964 POINT (49675.336 27658.898) y5 1
6 13 Store_16 -122.426873 37.769291 POINT (50037.687 26141.402) y6 0
7 14 Store_17 -122.432345 37.805218 POINT (49554.745 30128.981) y7 0
8 15 Store_18 -122.412818 37.805745 POINT (51274.389 30188.010) y8 0
9 16 Store_19 -122.398909 37.797073 POINT (52499.809 29225.972) y9 0
10 2 Store_2 -122.488873 37.753764 POINT (44574.304 24418.447) y10 1
11 3 Store_3 -122.464927 37.774727 POINT (46684.891 26744.653) y11 0
12 4 Store_4 -122.473945 37.743164 POINT (45889.483 23241.485) y12 0
13 5 Store_5 -122.449291 37.731545 POINT (48062.508 21951.687) y13 0
14 6 Store_6 -122.491745 37.649309 POINT (44315.976 12824.977) y14 0
15 7 Store_7 -122.483182 37.701109 POINT (45073.750 18574.015) y15 0
[16]:
pdispersion_pre = PDispersion.from_geodataframe(
    facility_points,
    "geometry",
    P_FACILITIES,
    distance_metric="euclidean",
    predefined_facility_col="predefined_loc",
    name=f"p-dispersion-predefined"
)
pdispersion_pre = pdispersion_pre.solve(solver)
pdispersion_pre
[16]:
<spopt.locate.p_dispersion.PDispersion at 0x161b80790>
[17]:
pdispersion_obj = round(pdispersion_pre.problem.objective.value(), 3)
print(
    "A maximized minimum inter-facility distance between any two sited candiate "
    f"facilities of {pdispersion_obj} meters is observed while siting "
    f"facilities at {P_FACILITIES} of the available {n_fac_pnts} locations."
)
A maximized minimum inter-facility distance between any two sited candiate facilities of 6043.265 meters is observed while siting facilities at 3 of the available 16 locations.

Plotting the results

[18]:
dv_colors_arr = [
    "darkcyan",
    "mediumseagreen",
    "saddlebrown",
    "darkslategray",
    "lightskyblue",
    "thistle",
    "lavender",
    "darkgoldenrod",
    "peachpuff",
    "coral",
    "mediumvioletred",
    "blueviolet",
    "fuchsia",
    "cyan",
    "limegreen",
    "mediumorchid",
]
dv_colors = {f"y{i}": dv_colors_arr[i] for i in range(len(dv_colors_arr))}
dv_colors
[18]:
{'y0': 'darkcyan',
 'y1': 'mediumseagreen',
 'y2': 'saddlebrown',
 'y3': 'darkslategray',
 'y4': 'lightskyblue',
 'y5': 'thistle',
 'y6': 'lavender',
 'y7': 'darkgoldenrod',
 'y8': 'peachpuff',
 'y9': 'coral',
 'y10': 'mediumvioletred',
 'y11': 'blueviolet',
 'y12': 'fuchsia',
 'y13': 'cyan',
 'y14': 'limegreen',
 'y15': 'mediumorchid'}
[19]:
def plot_results(model, p, facs, clis=None, ax=None):
    """Visualize optimal solution sets and context."""
    if not ax:
        multi_plot = False
        fig, ax = plt.subplots(figsize=(6, 9))
        markersize, markersize_factor = 4, 4
    else:
        ax.axis("off")
        multi_plot = True
        markersize, markersize_factor = 2, 2
    ax.set_title(model.name, fontsize=15)

    # extract facility-client relationships for plotting (except for p-dispersion)
    plot_clis = isinstance(clis, geopandas.GeoDataFrame)
    if plot_clis:
        cli_points = {}
    fac_sites = {}
    for i, dv in enumerate(model.fac_vars):
        if dv.varValue:
            dv, predef = facs.loc[i, ["dv", "predefined_loc"]]
            fac_sites[dv] = [i, predef]
            if plot_clis:
                geom = clis.iloc[model.fac2cli[i]]["geometry"]
                cli_points[dv] = geom

    # study area and legend entries initialization
    study_area.plot(ax=ax, alpha=0.5, fc="tan", ec="k", zorder=1)
    _patch = Patch(alpha=0.5, fc="tan", ec="k", label="Dissolved Service Areas")
    legend_elements = [_patch]

    if plot_clis:
        # any clients that not asscociated with a facility
        if model.name.startswith("mclp"):
            c = "k"
            if model.n_cli_uncov:
                idx = [i for i, v in enumerate(model.cli2fac) if len(v) == 0]
                pnt_kws = dict(ax=ax, fc=c, ec=c, marker="s", markersize=7, zorder=2)
                clis.iloc[idx].plot(**pnt_kws)
            _label = f"Demand sites not covered ($n$={model.n_cli_uncov})"
            _mkws = dict(marker="s", markerfacecolor=c, markeredgecolor=c, linewidth=0)
            legend_elements.append(mlines.Line2D([], [], ms=3, label=_label, **_mkws))

    # all candidate facilities
    facs.plot(ax=ax, fc="brown", marker="*", markersize=80, zorder=8)
    _label = f"Facility sites ($n$={len(model.fac_vars)})"
    _mkws = dict(marker="*", markerfacecolor="brown", markeredgecolor="brown")
    legend_elements.append(mlines.Line2D([], [], ms=7, lw=0, label=_label, **_mkws))

    # facility-(client) symbology and legend entries
    zorder = 4
    for fname, (fac, predef) in fac_sites.items():
        cset = dv_colors[fname]
        if plot_clis:
            # clients
            geoms = cli_points[fname]
            gdf = geopandas.GeoDataFrame(geoms)
            gdf.plot(ax=ax, zorder=zorder, ec="k", fc=cset, markersize=100 * markersize)
            _label = f"Demand sites covered by {fname}"
            _mkws = dict(markerfacecolor=cset, markeredgecolor="k", ms=markersize + 7)
            legend_elements.append(
                mlines.Line2D([], [], marker="o", lw=0, label=_label, **_mkws)
            )
        # facilities
        ec = "k"
        lw = 2
        predef_label = "predefined"
        if model.name.endswith(predef_label) and predef:
            ec = "r"
            lw = 3
            fname += f" ({predef_label})"
        facs.iloc[[fac]].plot(
            ax=ax, marker="*", markersize=1000, zorder=9, fc=cset, ec=ec, lw=lw
        )
        _mkws = dict(markerfacecolor=cset, markeredgecolor=ec, markeredgewidth=lw)
        legend_elements.append(
            mlines.Line2D([], [], marker="*", ms=20, lw=0, label=fname, **_mkws)
        )
        # increment zorder up and markersize down for stacked client symbology
        zorder += 1
        if plot_clis:
            markersize -= markersize_factor / p

    if not multi_plot:
        # legend
        kws = dict(loc="upper left", bbox_to_anchor=(1.05, 0.7))
        plt.legend(handles=legend_elements, **kws)

P-Dispersion considering all candidate facilities

[20]:
plot_results(pdispersion, P_FACILITIES, facility_points)
../_images/notebooks_facloc-disperse-real-world_32_0.png

P-Dispersion with selection of predefined candidate facilities

[21]:
plot_results(pdispersion_pre, P_FACILITIES, facility_points)
../_images/notebooks_facloc-disperse-real-world_34_0.png

Comparing all models

[22]:
fig, axarr = plt.subplots(1, 2, figsize=(12, 6))
fig.subplots_adjust(wspace=-0.25)
for i, m in enumerate([pdispersion, pdispersion_pre]):
    plot_results(m, P_FACILITIES, facility_points, ax=axarr[i])
../_images/notebooks_facloc-disperse-real-world_36_0.png

When stipulating that \(y_{5}\) and \(y_{10}\) must be included in the solution a similar triangulated spatial arrangement is observed to satisfy the \(p\)-dispersion objective: maximizing the interfacilty distance.


Evaluating available solvers

First we’ll determine which solvers are installed locally.

[23]:
with warnings.catch_warnings(record=True) as w:
    solvers = pulp.listSolvers(onlyAvailable=True)
for _w in w:
    print(_w.message)
solvers
GUROBI error:
Failed to set up a license

Error 10009: License expired 2022-08-07


.
[23]:
['GLPK_CMD', 'CPLEX_CMD', 'PULP_CBC_CMD', 'COIN_CMD', 'SCIP_CMD', 'HiGHS_CMD']

Above we can see that it returns a list with different solvers that are available. So, it’s up to the user to choose the best solver that fits the model. Let’s get the percentage as a metric to evaluate which solver is the best or improves the model.

[24]:
pdispersion = PDispersion.from_geodataframe(
    facility_points, "geometry", P_FACILITIES, distance_metric="euclidean",
)
[25]:
results = pandas.DataFrame(
    columns=["MaxMinMin", "Solve Time (sec.)"], index=solvers
)
for solver in solvers:
    _solver = pulp.get_solver(solver, msg=False)
    if _solver.name == "HiGHS_CMD":
        # HiGHS doesn't directly support maximization
        with warnings.catch_warnings(record=True) as w:
            _pdispersion = pdispersion.solve(_solver)
        print(w[0].message)
    else:
        _pdispersion = pdispersion.solve(_solver)
    results.loc[solver] = [
        _pdispersion.problem.objective.value(), _pdispersion.problem.solutionTime
    ]
results
HiGHS_CMD does not currently allow maximization, we will minimize the inverse of the objective function.
[25]:
MaxMinMin Solve Time (sec.)
GLPK_CMD 10164.5 0.031998
CPLEX_CMD 10164.494501 0.051266
PULP_CBC_CMD 10164.495 0.077388
COIN_CMD 10164.495 0.138758
SCIP_CMD 10164.494501 0.215406
HiGHS_CMD 10164.494501 0.068443