This page was generated from notebooks/facloc-disperse-real-world.ipynb. Interactive online version:

The P-Dispersion Problem: An Empirical Example¶

Authors: Erin Olson, Germano Barcelos, James Gaboardi, Levi J. Wolf, Qunshan Zhao

This tutorial extends the Empirical examples notebook, specifically for the \(p\)-dispersion problem. A deeper dive into the \(p\)-dispersion problem can be found here. Also, this tutorial demonstrates the use of different solvers that PULP supports.

[1]:

%config InlineBackend.figure_format = "retina"
%load_ext watermark
%watermark

Last updated: 2025-04-07T13:52:41.571790-04:00

Python implementation: CPython
Python version       : 3.12.9
IPython version      : 9.0.2

Compiler    : Clang 18.1.8
OS          : Darwin
Release     : 24.4.0
Machine     : arm64
Processor   : arm
CPU cores   : 8
Architecture: 64bit

[2]:

import time
import warnings

import geopandas
import matplotlib.lines as mlines
import matplotlib.pyplot as plt
import matplotlib_scalebar
import numpy
import pandas
import pulp
import shapely
from matplotlib.patches import Patch
from matplotlib_scalebar.scalebar import ScaleBar
from shapely.geometry import Point

import spopt
from spopt.locate import PDispersion

%watermark -w
%watermark -iv

Watermark: 2.5.0

matplotlib         : 3.10.1
geopandas          : 1.0.1
shapely            : 2.1.0
pulp               : 2.8.0
spopt              : 0.6.2.dev3+g13ca45e
numpy              : 2.2.4
pandas             : 2.2.3
matplotlib_scalebar: 0.9.0

We use 2 data files as input:

facility_points represents the stores that are candidate facility sites
tract is the polygon of census tract 205.

Note that all other ‘Real World Facility Location’ demonstration notebooks utilize this file which contains facility-client network distances that were calculated using the ArcGIS Network Analyst Extension. This notebook, solving for \(P\)-Dispersion, does not use this file or any network distances and instead relies solely on euclidean distance for solving the model.

All datasets are available online in this repository.

[3]:

DIRPATH = "../spopt/tests/data/"

facility_points dataframe

[4]:

facility_points = pandas.read_csv(DIRPATH + "SF_store_site_16_longlat.csv", index_col=0)
facility_points = facility_points.reset_index(drop=True)
facility_points

[4]:

	OBJECTID	NAME	long	lat
0	1	Store_1	-122.510018	37.772364
1	2	Store_2	-122.488873	37.753764
2	3	Store_3	-122.464927	37.774727
3	4	Store_4	-122.473945	37.743164
4	5	Store_5	-122.449291	37.731545
5	6	Store_6	-122.491745	37.649309
6	7	Store_7	-122.483182	37.701109
7	8	Store_11	-122.433782	37.655364
8	9	Store_12	-122.438982	37.719236
9	10	Store_13	-122.440218	37.745382
10	11	Store_14	-122.421636	37.742964
11	12	Store_15	-122.430982	37.782964
12	13	Store_16	-122.426873	37.769291
13	14	Store_17	-122.432345	37.805218
14	15	Store_18	-122.412818	37.805745
15	16	Store_19	-122.398909	37.797073

study_area dataframe

[5]:

study_area = geopandas.read_file(DIRPATH + "ServiceAreas_4.shp").dissolve()
study_area

[5]:

	geometry
0	POLYGON ((-122.45299 37.63898, -122.45415 37.6...

Plot study_area

[6]:

base = study_area.plot()
base.axis("off");

../_images/notebooks_facloc-disperse-real-world_10_0.png

Create a geodataframe of the candidate facility sites.

[7]:

process = lambda df: as_gdf(df).sort_values(by=["NAME"]).reset_index(drop=True)  # noqa: E731
as_gdf = lambda df: geopandas.GeoDataFrame(df, geometry=pnts(df))  # noqa: E731
pnts = lambda df: geopandas.points_from_xy(df.long, df.lat)  # noqa: E731

[8]:

facility_points = process(facility_points)

Reproject the input spatial data.

[9]:

for _df in [facility_points, study_area]:
    _df.set_crs("EPSG:4326", inplace=True)
    _df.to_crs("EPSG:7131", inplace=True)

Set parameter P_FACILITIES for the number for candidate facilities to include in the optimal selection set.

[10]:

# number of candidate facilities in optimal solution
P_FACILITIES = 3

[11]:

solver = pulp.PULP_CBC_CMD(msg=False)

P-Dispersion¶

[12]:

pdispersion = PDispersion.from_geodataframe(
    facility_points,
    "geometry",
    P_FACILITIES,
    distance_metric="euclidean",
    name="p-dispersion",
)
pdispersion = pdispersion.solve(solver)
pdispersion

[12]:

<spopt.locate.p_dispersion.PDispersion at 0x15288cef0>

[13]:

n_fac_pnts = facility_points.shape[0]
pdispersion_obj = round(pdispersion.problem.objective.value(), 3)
print(
    "A maximized minimum inter-facility distance between any two sited candiate "
    f"facilities of {pdispersion_obj} meters is observed while siting "
    f"facilities at {P_FACILITIES} of the available {n_fac_pnts} locations."
)

A maximized minimum inter-facility distance between any two sited candiate facilities of 10164.495 meters is observed while siting facilities at 3 of the available 16 locations.

Define the decision variable names used for mapping later.

[14]:

facility_points["dv"] = pdispersion.fac_vars
facility_points["dv"] = facility_points["dv"].map(
    lambda x: x.name.replace("_", "").replace("x", "y")
)
facility_points

[14]:

	OBJECTID	NAME	long	lat	geometry	dv
0	1	Store_1	-122.510018	37.772364	POINT (42712.165 26483.898)	y0
1	8	Store_11	-122.433782	37.655364	POINT (49431.133 13496.279)	y1
2	9	Store_12	-122.438982	37.719236	POINT (48971.439 20585.532)	y2
3	10	Store_13	-122.440218	37.745382	POINT (48862.129 23487.462)	y3
4	11	Store_14	-122.421636	37.742964	POINT (50499.936 23219.396)	y4
5	12	Store_15	-122.430982	37.782964	POINT (49675.336 27658.898)	y5
6	13	Store_16	-122.426873	37.769291	POINT (50037.687 26141.402)	y6
7	14	Store_17	-122.432345	37.805218	POINT (49554.745 30128.981)	y7
8	15	Store_18	-122.412818	37.805745	POINT (51274.389 30188.01)	y8
9	16	Store_19	-122.398909	37.797073	POINT (52499.809 29225.972)	y9
10	2	Store_2	-122.488873	37.753764	POINT (44574.304 24418.447)	y10
11	3	Store_3	-122.464927	37.774727	POINT (46684.891 26744.653)	y11
12	4	Store_4	-122.473945	37.743164	POINT (45889.483 23241.485)	y12
13	5	Store_5	-122.449291	37.731545	POINT (48062.508 21951.687)	y13
14	6	Store_6	-122.491745	37.649309	POINT (44315.976 12824.977)	y14
15	7	Store_7	-122.483182	37.701109	POINT (45073.75 18574.015)	y15

P-Dispersion with selection of predefined candidate facilities¶

However, in many real world applications there may already be existing facility locations with the goal being to add one or more new facilities. Here we will define facilites :math:`y_{11}` and :math:`y_{15}` as already existing (they must be present in the model solution). This will lead to a sub-optimal solution.

Important: The facilities in "predefined_loc" are a binary array where 1 means the associated location must appear in the solution.

[15]:

facility_points["predefined_loc"] = 0
facility_points.loc[(5, 10), "predefined_loc"] = 1
facility_points

[15]:

	OBJECTID	NAME	long	lat	geometry	dv	predefined_loc
0	1	Store_1	-122.510018	37.772364	POINT (42712.165 26483.898)	y0	0
1	8	Store_11	-122.433782	37.655364	POINT (49431.133 13496.279)	y1	0
2	9	Store_12	-122.438982	37.719236	POINT (48971.439 20585.532)	y2	0
3	10	Store_13	-122.440218	37.745382	POINT (48862.129 23487.462)	y3	0
4	11	Store_14	-122.421636	37.742964	POINT (50499.936 23219.396)	y4	0
5	12	Store_15	-122.430982	37.782964	POINT (49675.336 27658.898)	y5	1
6	13	Store_16	-122.426873	37.769291	POINT (50037.687 26141.402)	y6	0
7	14	Store_17	-122.432345	37.805218	POINT (49554.745 30128.981)	y7	0
8	15	Store_18	-122.412818	37.805745	POINT (51274.389 30188.01)	y8	0
9	16	Store_19	-122.398909	37.797073	POINT (52499.809 29225.972)	y9	0
10	2	Store_2	-122.488873	37.753764	POINT (44574.304 24418.447)	y10	1
11	3	Store_3	-122.464927	37.774727	POINT (46684.891 26744.653)	y11	0
12	4	Store_4	-122.473945	37.743164	POINT (45889.483 23241.485)	y12	0
13	5	Store_5	-122.449291	37.731545	POINT (48062.508 21951.687)	y13	0
14	6	Store_6	-122.491745	37.649309	POINT (44315.976 12824.977)	y14	0
15	7	Store_7	-122.483182	37.701109	POINT (45073.75 18574.015)	y15	0

[16]:

pdispersion_pre = PDispersion.from_geodataframe(
    facility_points,
    "geometry",
    P_FACILITIES,
    distance_metric="euclidean",
    predefined_facility_col="predefined_loc",
    name="p-dispersion-predefined",
)
pdispersion_pre = pdispersion_pre.solve(solver)
pdispersion_pre

[16]:

<spopt.locate.p_dispersion.PDispersion at 0x153bf2b70>

[17]:

pdispersion_obj = round(pdispersion_pre.problem.objective.value(), 3)
print(
    "A maximized minimum inter-facility distance between any two sited candiate "
    f"facilities of {pdispersion_obj} meters is observed while siting "
    f"facilities at {P_FACILITIES} of the available {n_fac_pnts} locations."
)

A maximized minimum inter-facility distance between any two sited candiate facilities of 6043.265 meters is observed while siting facilities at 3 of the available 16 locations.

Plotting the results¶

[18]:

dv_colors_arr = [
    "darkcyan",
    "mediumseagreen",
    "saddlebrown",
    "darkslategray",
    "lightskyblue",
    "thistle",
    "lavender",
    "darkgoldenrod",
    "peachpuff",
    "coral",
    "mediumvioletred",
    "blueviolet",
    "fuchsia",
    "cyan",
    "limegreen",
    "mediumorchid",
]
dv_colors = {f"y{i}": dv_colors_arr[i] for i in range(len(dv_colors_arr))}
dv_colors

[18]:

{'y0': 'darkcyan',
 'y1': 'mediumseagreen',
 'y2': 'saddlebrown',
 'y3': 'darkslategray',
 'y4': 'lightskyblue',
 'y5': 'thistle',
 'y6': 'lavender',
 'y7': 'darkgoldenrod',
 'y8': 'peachpuff',
 'y9': 'coral',
 'y10': 'mediumvioletred',
 'y11': 'blueviolet',
 'y12': 'fuchsia',
 'y13': 'cyan',
 'y14': 'limegreen',
 'y15': 'mediumorchid'}

[19]:

def plot_results(model, p, facs, clis=None, ax=None):
    """Visualize optimal solution sets and context."""
    if not ax:
        multi_plot = False
        fig, ax = plt.subplots(figsize=(6, 9))
        markersize, markersize_factor = 4, 4
    else:
        ax.axis("off")
        multi_plot = True
        markersize, markersize_factor = 2, 2
    ax.set_title(model.name, fontsize=15)

    # extract facility-client relationships for plotting (except for p-dispersion)
    plot_clis = isinstance(clis, geopandas.GeoDataFrame)
    if plot_clis:
        cli_points = {}
    fac_sites = {}
    for i, dv in enumerate(model.fac_vars):
        if dv.varValue:
            dv, predef = facs.loc[i, ["dv", "predefined_loc"]]
            fac_sites[dv] = [i, predef]
            if plot_clis:
                geom = clis.iloc[model.fac2cli[i]]["geometry"]
                cli_points[dv] = geom

    # study area and legend entries initialization
    study_area.plot(ax=ax, alpha=0.5, fc="tan", ec="k", zorder=1)
    _patch = Patch(alpha=0.5, fc="tan", ec="k", label="Dissolved Service Areas")
    legend_elements = [_patch]

    if plot_clis and model.name.startswith("mclp"):
        # any clients that not asscociated with a facility
        c = "k"
        if model.n_cli_uncov:
            idx = [i for i, v in enumerate(model.cli2fac) if len(v) == 0]
            pnt_kws = {
                "ax": ax,
                "fc": c,
                "ec": c,
                "marker": "s",
                "markersize": 7,
                "zorder": 2,
            }
            clis.iloc[idx].plot(**pnt_kws)
        _label = f"Demand sites not covered ($n$={model.n_cli_uncov})"
        _mkws = {
            "marker": "s",
            "markerfacecolor": c,
            "markeredgecolor": c,
            "linewidth": 0,
        }
        legend_elements.append(mlines.Line2D([], [], ms=3, label=_label, **_mkws))

    # all candidate facilities
    facs.plot(ax=ax, fc="brown", marker="*", markersize=80, zorder=8)
    _label = f"Facility sites ($n$={len(model.fac_vars)})"
    _mkws = {"marker": "*", "markerfacecolor": "brown", "markeredgecolor": "brown"}
    legend_elements.append(mlines.Line2D([], [], ms=7, lw=0, label=_label, **_mkws))

    # facility-(client) symbology and legend entries
    zorder = 4
    for fname, (fac, predef) in fac_sites.items():
        cset = dv_colors[fname]
        if plot_clis:
            # clients
            geoms = cli_points[fname]
            gdf = geopandas.GeoDataFrame(geoms)
            gdf.plot(ax=ax, zorder=zorder, ec="k", fc=cset, markersize=100 * markersize)
            _label = f"Demand sites covered by {fname}"
            _mkws = {
                "markerfacecolor": cset,
                "markeredgecolor": "k",
                "ms": markersize + 7,
            }
            legend_elements.append(
                mlines.Line2D([], [], marker="o", lw=0, label=_label, **_mkws)
            )
        # facilities
        ec = "k"
        lw = 2
        predef_label = "predefined"
        if model.name.endswith(predef_label) and predef:
            ec = "r"
            lw = 3
            fname += f" ({predef_label})"
        facs.iloc[[fac]].plot(
            ax=ax, marker="*", markersize=1000, zorder=9, fc=cset, ec=ec, lw=lw
        )
        _mkws = {"markerfacecolor": cset, "markeredgecolor": ec, "markeredgewidth": lw}
        legend_elements.append(
            mlines.Line2D([], [], marker="*", ms=20, lw=0, label=fname, **_mkws)
        )
        # increment zorder up and markersize down for stacked client symbology
        zorder += 1
        if plot_clis:
            markersize -= markersize_factor / p

    if not multi_plot:
        # legend
        kws = {"loc": "upper left", "bbox_to_anchor": (1.05, 0.7)}
        plt.legend(handles=legend_elements, **kws)

P-Dispersion considering all candidate facilities¶

[20]:

plot_results(pdispersion, P_FACILITIES, facility_points)

../_images/notebooks_facloc-disperse-real-world_32_0.png

P-Dispersion with selection of predefined candidate facilities¶

[21]:

plot_results(pdispersion_pre, P_FACILITIES, facility_points)

../_images/notebooks_facloc-disperse-real-world_34_0.png

Comparing all models¶

[22]:

fig, axarr = plt.subplots(1, 2, figsize=(12, 6))
fig.subplots_adjust(wspace=-0.25)
for i, m in enumerate([pdispersion, pdispersion_pre]):
    plot_results(m, P_FACILITIES, facility_points, ax=axarr[i])

../_images/notebooks_facloc-disperse-real-world_36_0.png

When stipulating that \(y_{5}\) and \(y_{10}\) must be included in the solution a similar triangulated spatial arrangement is observed to satisfy the \(p\)-dispersion objective: maximizing the interfacilty distance.

Evaluating available solvers¶

First we’ll determine which solvers are installed locally.

[23]:

with warnings.catch_warnings(record=True) as w:
    solvers = pulp.listSolvers(onlyAvailable=True)
for _w in w:
    print(_w.message)
solvers

[23]:

['GLPK_CMD', 'PULP_CBC_CMD', 'COIN_CMD', 'SCIP_CMD', 'HiGHS', 'HiGHS_CMD']

Above we can see that it returns a list with different solvers that are available. So, it’s up to the user to choose the best solver that fits the model. Let’s get the percentage as a metric to evaluate which solver is the best or improves the model.

[24]:

pdispersion = PDispersion.from_geodataframe(
    facility_points,
    "geometry",
    P_FACILITIES,
    distance_metric="euclidean",
)

[25]:

results = pandas.DataFrame(columns=["MaxMinMin", "Solve Time (sec.)"], index=solvers)
for solver in solvers:
    _solver = pulp.getSolver(solver, msg=False)
    _pdispersion = pdispersion.solve(_solver)
    results.loc[solver] = [
        _pdispersion.problem.objective.value(),
        _pdispersion.problem.solutionTime,
    ]
results

Running HiGHS 1.10.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
MIP  has 121 rows; 17 cols; 376 nonzeros; 16 integer variables (16 binary)
Coefficient ranges:
  Matrix [1e+00, 2e+04]
  Cost   [1e+00, 1e+00]
  Bound  [1e+00, 1e+00]
  RHS    [3e+00, 6e+04]
Presolving model
121 rows, 17 cols, 376 nonzeros  0s
121 rows, 17 cols, 376 nonzeros  0s

Solving MIP model with:
   121 rows
   17 cols (16 binary, 0 integer, 0 implied int., 1 continuous)
   376 nonzeros

Src: B => Branching; C => Central rounding; F => Feasibility pump; H => Heuristic; L => Sub-MIP;
     P => Empty MIP; R => Randomized rounding; S => Solve LP; T => Evaluate node; U => Unbounded;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point; X => User solution

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap |   Cuts   InLp Confl. | LpIters     Time

         0       0         0   0.00%   -38968.867659   inf                  inf        0      0      0         0     0.0s
 R       0       0         0   0.00%   -33218.010966   -2958.343576    1022.86%        0      0      0        30     0.0s
 L       0       0         0   0.00%   -29397.190655   -8600.536216     241.81%      123     20      0        85     0.0s
 L       0       0         0   0.00%   -29397.190655   -9049.837011     224.84%      123      6      0       124     0.0s
 B       3       0         1  25.00%   -29397.190655   -9269.623677     217.13%      123      6      5       419     0.0s
 T      22       0        10  98.44%   -29397.190655   -9329.101184     215.11%      124      6     55      1102     0.0s
 B      25       0        11  99.22%   -29397.190655   -10164.494501    189.21%      125      6     55      1132     0.0s
        26       0        12 100.00%   -10164.494501   -10164.494501      0.00%      125      6     55      1139     0.0s

Solving report
  Status            Optimal
  Primal bound      -10164.4945011
  Dual bound        -10164.4945011
  Gap               0% (tolerance: 0.01%)
  P-D integral      0.0702164079302
  Solution status   feasible
                    -10164.4945011 (objective)
                    0 (bound viol.)
                    5.83462456625e-16 (int. viol.)
                    0 (row viol.)
  Timing            0.02 (total)
                    0.00 (presolve)
                    0.00 (solve)
                    0.00 (postsolve)
  Max sub-MIP depth 2
  Nodes             26
  Repair LPs        0 (0 feasible; 0 iterations)
  LP iterations     1139 (total)
                    859 (strong br.)
                    55 (separation)
                    64 (heuristics)

[25]:

	MaxMinMin	Solve Time (sec.)
GLPK_CMD	10164.5	0.016731
PULP_CBC_CMD	10164.495	0.048067
COIN_CMD	10164.495	0.055785
SCIP_CMD	10164.494501	0.178769
HiGHS	10164.494501	0.019386
HiGHS_CMD	10164.494501	0.038827