This page was generated from notebooks/facloc-disperse-real-world.ipynb. Interactive online version:
The P-Dispersion Problem: An Empirical Example¶
Authors: Erin Olson, Germano Barcelos, James Gaboardi, Levi J. Wolf, Qunshan Zhao
This tutorial extends the Empirical examples notebook, specifically for the \(p\)-dispersion problem. A deeper dive into the \(p\)-dispersion problem can be found here. Also, this tutorial demonstrates the use of different solvers that PULP supports.
[1]:
%config InlineBackend.figure_format = "retina"
%load_ext watermark
%watermark
Last updated: 2023-12-10T13:43:20.123733-05:00
Python implementation: CPython
Python version : 3.11.6
IPython version : 8.18.0
Compiler : Clang 15.0.7
OS : Darwin
Release : 23.1.0
Machine : x86_64
Processor : i386
CPU cores : 8
Architecture: 64bit
[2]:
import geopandas
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import matplotlib.lines as mlines
import matplotlib_scalebar
from matplotlib_scalebar.scalebar import ScaleBar
import numpy
import pandas
import pulp
import shapely
from shapely.geometry import Point
import spopt
from spopt.locate import PDispersion
import time
import warnings
%watermark -w
%watermark -iv
Watermark: 2.4.3
matplotlib : 3.8.2
matplotlib_scalebar: 0.8.1
numpy : 1.26.2
geopandas : 0.14.1
pandas : 2.1.3
spopt : 0.5.1.dev59+g343ef27
shapely : 2.0.2
pulp : 2.7.0
We use 2 data files as input:
facility_points
represents the stores that are candidate facility sitestract
is the polygon of census tract 205.
Note that all other ‘Real World Facility Location’ demonstration notebooks utilize this file which contains facility-client network distances that were calculated using the ArcGIS Network Analyst Extension. This notebook, solving for \(P\)-Dispersion, does not use this file or any network distances and instead relies solely on euclidean distance for solving the model.
All datasets are available online in this repository.
[3]:
DIRPATH = "../spopt/tests/data/"
facility_points
dataframe
[4]:
facility_points = pandas.read_csv(DIRPATH + "SF_store_site_16_longlat.csv", index_col=0)
facility_points = facility_points.reset_index(drop=True)
facility_points
[4]:
OBJECTID | NAME | long | lat | |
---|---|---|---|---|
0 | 1 | Store_1 | -122.510018 | 37.772364 |
1 | 2 | Store_2 | -122.488873 | 37.753764 |
2 | 3 | Store_3 | -122.464927 | 37.774727 |
3 | 4 | Store_4 | -122.473945 | 37.743164 |
4 | 5 | Store_5 | -122.449291 | 37.731545 |
5 | 6 | Store_6 | -122.491745 | 37.649309 |
6 | 7 | Store_7 | -122.483182 | 37.701109 |
7 | 8 | Store_11 | -122.433782 | 37.655364 |
8 | 9 | Store_12 | -122.438982 | 37.719236 |
9 | 10 | Store_13 | -122.440218 | 37.745382 |
10 | 11 | Store_14 | -122.421636 | 37.742964 |
11 | 12 | Store_15 | -122.430982 | 37.782964 |
12 | 13 | Store_16 | -122.426873 | 37.769291 |
13 | 14 | Store_17 | -122.432345 | 37.805218 |
14 | 15 | Store_18 | -122.412818 | 37.805745 |
15 | 16 | Store_19 | -122.398909 | 37.797073 |
study_area
dataframe
[5]:
study_area = geopandas.read_file(DIRPATH + "ServiceAreas_4.shp").dissolve()
study_area
[5]:
geometry | |
---|---|
0 | POLYGON ((-122.45299 37.63898, -122.45415 37.6... |
Plot study_area
[6]:
base = study_area.plot()
base.axis("off");
Create a geodataframe of the candidate facility sites.
[7]:
process = lambda df: as_gdf(df).sort_values(by=["NAME"]).reset_index(drop=True)
as_gdf = lambda df: geopandas.GeoDataFrame(df, geometry=pnts(df))
pnts = lambda df: geopandas.points_from_xy(df.long, df.lat)
[8]:
facility_points = process(facility_points)
Reproject the input spatial data.
[9]:
for _df in [facility_points, study_area]:
_df.set_crs("EPSG:4326", inplace=True)
_df.to_crs("EPSG:7131", inplace=True)
Set parameter P_FACILITIES
for the number for candidate facilities to include in the optimal selection set.
[10]:
# number of candidate facilities in optimal solution
P_FACILITIES = 3
[11]:
solver = pulp.PULP_CBC_CMD(msg=False)
P-Dispersion¶
[12]:
pdispersion = PDispersion.from_geodataframe(
facility_points,
"geometry",
P_FACILITIES,
distance_metric="euclidean",
name=f"p-dispersion"
)
pdispersion = pdispersion.solve(solver)
pdispersion
[12]:
<spopt.locate.p_dispersion.PDispersion at 0x161b4e750>
[13]:
n_fac_pnts = facility_points.shape[0]
pdispersion_obj = round(pdispersion.problem.objective.value(), 3)
print(
"A maximized minimum inter-facility distance between any two sited candiate "
f"facilities of {pdispersion_obj} meters is observed while siting "
f"facilities at {P_FACILITIES} of the available {n_fac_pnts} locations."
)
A maximized minimum inter-facility distance between any two sited candiate facilities of 10164.495 meters is observed while siting facilities at 3 of the available 16 locations.
Define the decision variable names used for mapping later.
[14]:
facility_points["dv"] = pdispersion.fac_vars
facility_points["dv"] = facility_points["dv"].map(
lambda x: x.name.replace("_", "").replace("x", "y")
)
facility_points
[14]:
OBJECTID | NAME | long | lat | geometry | dv | |
---|---|---|---|---|---|---|
0 | 1 | Store_1 | -122.510018 | 37.772364 | POINT (42712.165 26483.898) | y0 |
1 | 8 | Store_11 | -122.433782 | 37.655364 | POINT (49431.133 13496.279) | y1 |
2 | 9 | Store_12 | -122.438982 | 37.719236 | POINT (48971.439 20585.532) | y2 |
3 | 10 | Store_13 | -122.440218 | 37.745382 | POINT (48862.129 23487.462) | y3 |
4 | 11 | Store_14 | -122.421636 | 37.742964 | POINT (50499.936 23219.396) | y4 |
5 | 12 | Store_15 | -122.430982 | 37.782964 | POINT (49675.336 27658.898) | y5 |
6 | 13 | Store_16 | -122.426873 | 37.769291 | POINT (50037.687 26141.402) | y6 |
7 | 14 | Store_17 | -122.432345 | 37.805218 | POINT (49554.745 30128.981) | y7 |
8 | 15 | Store_18 | -122.412818 | 37.805745 | POINT (51274.389 30188.010) | y8 |
9 | 16 | Store_19 | -122.398909 | 37.797073 | POINT (52499.809 29225.972) | y9 |
10 | 2 | Store_2 | -122.488873 | 37.753764 | POINT (44574.304 24418.447) | y10 |
11 | 3 | Store_3 | -122.464927 | 37.774727 | POINT (46684.891 26744.653) | y11 |
12 | 4 | Store_4 | -122.473945 | 37.743164 | POINT (45889.483 23241.485) | y12 |
13 | 5 | Store_5 | -122.449291 | 37.731545 | POINT (48062.508 21951.687) | y13 |
14 | 6 | Store_6 | -122.491745 | 37.649309 | POINT (44315.976 12824.977) | y14 |
15 | 7 | Store_7 | -122.483182 | 37.701109 | POINT (45073.750 18574.015) | y15 |
P-Dispersion with selection of predefined candidate facilities¶
However, in many real world applications there may already be existing facility locations with the goal being to add one or more new facilities. Here we will define facilites :math:`y_{11}` and :math:`y_{15}` as already existing (they must be present in the model solution). This will lead to a sub-optimal solution.
Important: The facilities in "predefined_loc"
are a binary array where 1
means the associated location must appear in the solution.
[15]:
facility_points["predefined_loc"] = 0
facility_points.loc[(5, 10), "predefined_loc"] = 1
facility_points
[15]:
OBJECTID | NAME | long | lat | geometry | dv | predefined_loc | |
---|---|---|---|---|---|---|---|
0 | 1 | Store_1 | -122.510018 | 37.772364 | POINT (42712.165 26483.898) | y0 | 0 |
1 | 8 | Store_11 | -122.433782 | 37.655364 | POINT (49431.133 13496.279) | y1 | 0 |
2 | 9 | Store_12 | -122.438982 | 37.719236 | POINT (48971.439 20585.532) | y2 | 0 |
3 | 10 | Store_13 | -122.440218 | 37.745382 | POINT (48862.129 23487.462) | y3 | 0 |
4 | 11 | Store_14 | -122.421636 | 37.742964 | POINT (50499.936 23219.396) | y4 | 0 |
5 | 12 | Store_15 | -122.430982 | 37.782964 | POINT (49675.336 27658.898) | y5 | 1 |
6 | 13 | Store_16 | -122.426873 | 37.769291 | POINT (50037.687 26141.402) | y6 | 0 |
7 | 14 | Store_17 | -122.432345 | 37.805218 | POINT (49554.745 30128.981) | y7 | 0 |
8 | 15 | Store_18 | -122.412818 | 37.805745 | POINT (51274.389 30188.010) | y8 | 0 |
9 | 16 | Store_19 | -122.398909 | 37.797073 | POINT (52499.809 29225.972) | y9 | 0 |
10 | 2 | Store_2 | -122.488873 | 37.753764 | POINT (44574.304 24418.447) | y10 | 1 |
11 | 3 | Store_3 | -122.464927 | 37.774727 | POINT (46684.891 26744.653) | y11 | 0 |
12 | 4 | Store_4 | -122.473945 | 37.743164 | POINT (45889.483 23241.485) | y12 | 0 |
13 | 5 | Store_5 | -122.449291 | 37.731545 | POINT (48062.508 21951.687) | y13 | 0 |
14 | 6 | Store_6 | -122.491745 | 37.649309 | POINT (44315.976 12824.977) | y14 | 0 |
15 | 7 | Store_7 | -122.483182 | 37.701109 | POINT (45073.750 18574.015) | y15 | 0 |
[16]:
pdispersion_pre = PDispersion.from_geodataframe(
facility_points,
"geometry",
P_FACILITIES,
distance_metric="euclidean",
predefined_facility_col="predefined_loc",
name=f"p-dispersion-predefined"
)
pdispersion_pre = pdispersion_pre.solve(solver)
pdispersion_pre
[16]:
<spopt.locate.p_dispersion.PDispersion at 0x161b80790>
[17]:
pdispersion_obj = round(pdispersion_pre.problem.objective.value(), 3)
print(
"A maximized minimum inter-facility distance between any two sited candiate "
f"facilities of {pdispersion_obj} meters is observed while siting "
f"facilities at {P_FACILITIES} of the available {n_fac_pnts} locations."
)
A maximized minimum inter-facility distance between any two sited candiate facilities of 6043.265 meters is observed while siting facilities at 3 of the available 16 locations.
Plotting the results¶
[18]:
dv_colors_arr = [
"darkcyan",
"mediumseagreen",
"saddlebrown",
"darkslategray",
"lightskyblue",
"thistle",
"lavender",
"darkgoldenrod",
"peachpuff",
"coral",
"mediumvioletred",
"blueviolet",
"fuchsia",
"cyan",
"limegreen",
"mediumorchid",
]
dv_colors = {f"y{i}": dv_colors_arr[i] for i in range(len(dv_colors_arr))}
dv_colors
[18]:
{'y0': 'darkcyan',
'y1': 'mediumseagreen',
'y2': 'saddlebrown',
'y3': 'darkslategray',
'y4': 'lightskyblue',
'y5': 'thistle',
'y6': 'lavender',
'y7': 'darkgoldenrod',
'y8': 'peachpuff',
'y9': 'coral',
'y10': 'mediumvioletred',
'y11': 'blueviolet',
'y12': 'fuchsia',
'y13': 'cyan',
'y14': 'limegreen',
'y15': 'mediumorchid'}
[19]:
def plot_results(model, p, facs, clis=None, ax=None):
"""Visualize optimal solution sets and context."""
if not ax:
multi_plot = False
fig, ax = plt.subplots(figsize=(6, 9))
markersize, markersize_factor = 4, 4
else:
ax.axis("off")
multi_plot = True
markersize, markersize_factor = 2, 2
ax.set_title(model.name, fontsize=15)
# extract facility-client relationships for plotting (except for p-dispersion)
plot_clis = isinstance(clis, geopandas.GeoDataFrame)
if plot_clis:
cli_points = {}
fac_sites = {}
for i, dv in enumerate(model.fac_vars):
if dv.varValue:
dv, predef = facs.loc[i, ["dv", "predefined_loc"]]
fac_sites[dv] = [i, predef]
if plot_clis:
geom = clis.iloc[model.fac2cli[i]]["geometry"]
cli_points[dv] = geom
# study area and legend entries initialization
study_area.plot(ax=ax, alpha=0.5, fc="tan", ec="k", zorder=1)
_patch = Patch(alpha=0.5, fc="tan", ec="k", label="Dissolved Service Areas")
legend_elements = [_patch]
if plot_clis:
# any clients that not asscociated with a facility
if model.name.startswith("mclp"):
c = "k"
if model.n_cli_uncov:
idx = [i for i, v in enumerate(model.cli2fac) if len(v) == 0]
pnt_kws = dict(ax=ax, fc=c, ec=c, marker="s", markersize=7, zorder=2)
clis.iloc[idx].plot(**pnt_kws)
_label = f"Demand sites not covered ($n$={model.n_cli_uncov})"
_mkws = dict(marker="s", markerfacecolor=c, markeredgecolor=c, linewidth=0)
legend_elements.append(mlines.Line2D([], [], ms=3, label=_label, **_mkws))
# all candidate facilities
facs.plot(ax=ax, fc="brown", marker="*", markersize=80, zorder=8)
_label = f"Facility sites ($n$={len(model.fac_vars)})"
_mkws = dict(marker="*", markerfacecolor="brown", markeredgecolor="brown")
legend_elements.append(mlines.Line2D([], [], ms=7, lw=0, label=_label, **_mkws))
# facility-(client) symbology and legend entries
zorder = 4
for fname, (fac, predef) in fac_sites.items():
cset = dv_colors[fname]
if plot_clis:
# clients
geoms = cli_points[fname]
gdf = geopandas.GeoDataFrame(geoms)
gdf.plot(ax=ax, zorder=zorder, ec="k", fc=cset, markersize=100 * markersize)
_label = f"Demand sites covered by {fname}"
_mkws = dict(markerfacecolor=cset, markeredgecolor="k", ms=markersize + 7)
legend_elements.append(
mlines.Line2D([], [], marker="o", lw=0, label=_label, **_mkws)
)
# facilities
ec = "k"
lw = 2
predef_label = "predefined"
if model.name.endswith(predef_label) and predef:
ec = "r"
lw = 3
fname += f" ({predef_label})"
facs.iloc[[fac]].plot(
ax=ax, marker="*", markersize=1000, zorder=9, fc=cset, ec=ec, lw=lw
)
_mkws = dict(markerfacecolor=cset, markeredgecolor=ec, markeredgewidth=lw)
legend_elements.append(
mlines.Line2D([], [], marker="*", ms=20, lw=0, label=fname, **_mkws)
)
# increment zorder up and markersize down for stacked client symbology
zorder += 1
if plot_clis:
markersize -= markersize_factor / p
if not multi_plot:
# legend
kws = dict(loc="upper left", bbox_to_anchor=(1.05, 0.7))
plt.legend(handles=legend_elements, **kws)
P-Dispersion considering all candidate facilities¶
[20]:
plot_results(pdispersion, P_FACILITIES, facility_points)
P-Dispersion with selection of predefined candidate facilities¶
[21]:
plot_results(pdispersion_pre, P_FACILITIES, facility_points)
Comparing all models¶
[22]:
fig, axarr = plt.subplots(1, 2, figsize=(12, 6))
fig.subplots_adjust(wspace=-0.25)
for i, m in enumerate([pdispersion, pdispersion_pre]):
plot_results(m, P_FACILITIES, facility_points, ax=axarr[i])
When stipulating that \(y_{5}\) and \(y_{10}\) must be included in the solution a similar triangulated spatial arrangement is observed to satisfy the \(p\)-dispersion objective: maximizing the interfacilty distance.
Evaluating available solvers¶
First we’ll determine which solvers are installed locally.
[23]:
with warnings.catch_warnings(record=True) as w:
solvers = pulp.listSolvers(onlyAvailable=True)
for _w in w:
print(_w.message)
solvers
GUROBI error:
Failed to set up a license
Error 10009: License expired 2022-08-07
.
[23]:
['GLPK_CMD', 'CPLEX_CMD', 'PULP_CBC_CMD', 'COIN_CMD', 'SCIP_CMD', 'HiGHS_CMD']
Above we can see that it returns a list with different solvers that are available. So, it’s up to the user to choose the best solver that fits the model. Let’s get the percentage as a metric to evaluate which solver is the best or improves the model.
[24]:
pdispersion = PDispersion.from_geodataframe(
facility_points, "geometry", P_FACILITIES, distance_metric="euclidean",
)
[25]:
results = pandas.DataFrame(
columns=["MaxMinMin", "Solve Time (sec.)"], index=solvers
)
for solver in solvers:
_solver = pulp.get_solver(solver, msg=False)
if _solver.name == "HiGHS_CMD":
# HiGHS doesn't directly support maximization
with warnings.catch_warnings(record=True) as w:
_pdispersion = pdispersion.solve(_solver)
print(w[0].message)
else:
_pdispersion = pdispersion.solve(_solver)
results.loc[solver] = [
_pdispersion.problem.objective.value(), _pdispersion.problem.solutionTime
]
results
HiGHS_CMD does not currently allow maximization, we will minimize the inverse of the objective function.
[25]:
MaxMinMin | Solve Time (sec.) | |
---|---|---|
GLPK_CMD | 10164.5 | 0.031998 |
CPLEX_CMD | 10164.494501 | 0.051266 |
PULP_CBC_CMD | 10164.495 | 0.077388 |
COIN_CMD | 10164.495 | 0.138758 |
SCIP_CMD | 10164.494501 | 0.215406 |
HiGHS_CMD | 10164.494501 | 0.068443 |