This page was generated from notebooks/facloc-real-world.ipynb. Interactive online version: Binder badge

Real World Facility Location Problem

Authors: Germano Barcelos, James Gaboardi, Levi J. Wolf, Qunshan Zhao

This tutorial aims to show a facility location application. To achieve this goal the tuorial will solve a real world facility location problem using a dataset describing an area of census tract 205, San Francisco. The problem can be written as: store sites should supply the demand in this census tract considering the distance between the two sites, a demand point and a supply candidate site.

This tutorial model this problem using four models already developed in literature: LSCP (Location Set Covering Problem), MCLP (Maximal Coverage Location Problem), P-Median and P-Center. If you don’t know any of these models, it’s recommended to see the notebooks that explain more deeply about each model: LSCP, MCLP, P-Median, P-Center

Besides that, the tutorial show how to use different solvers that PULP support.

[1]:
import numpy
import geopandas
import pandas
import pulp
from shapely.geometry import Point
import matplotlib.pyplot as plt

The problem is composed by 4 datafiles - network_distance is the calculus of distance between facility candidate sites calculated by ArcGIS Network Analyst Extension - demand_points represents the demand points with some important features for the facility location problem like population - facility_points represents the stores that are candidate facility sites - tract is the polygon of census tract 205.

All datasets are online on this repository

[2]:
DIRPATH = "../spopt/tests/data/"

network_distance = pandas.read_csv(DIRPATH+"SF_network_distance_candidateStore_16_censusTract_205_new.csv")
demand_points = pandas.read_csv(DIRPATH+ "SF_demand_205_centroid_uniform_weight.csv", index_col=0)
facility_points = pandas.read_csv(DIRPATH + "SF_store_site_16_longlat.csv", index_col=0)
study_area = geopandas.read_file(DIRPATH + "ServiceAreas_4.shp").dissolve()

Display network_distance dataframe

[3]:
display(network_distance)
distance name DestinationName demand
0 671.573346 Store_1 60750479.01 6540
1 1333.708063 Store_1 60750479.02 3539
2 1656.188884 Store_1 60750352.02 4436
3 1783.006047 Store_1 60750602.00 231
4 1790.950612 Store_1 60750478.00 7787
... ... ... ... ...
3275 19643.307257 Store_19 60816023.00 3204
3276 20245.369594 Store_19 60816029.00 4135
3277 20290.986235 Store_19 60816026.00 7887
3278 20875.680521 Store_19 60816025.00 5146
3279 20958.530406 Store_19 60816024.00 6511

3280 rows × 4 columns

Display demand points dataframe

[4]:
display(demand_points)
OBJECTID ID NAME STATE_NAME AREA POP2000 HOUSEHOLDS HSE_UNITS BUS_COUNT long lat
1 1 6081602900 60816029.00 California 0.48627 4135 1679 1715 112 -122.488653 37.650807
2 2 6081602800 60816028.00 California 0.47478 4831 1484 1506 59 -122.483550 37.659998
3 3 6081601700 60816017.00 California 0.46393 4155 1294 1313 55 -122.456484 37.663272
4 4 6081601900 60816019.00 California 0.81907 9041 3273 3330 118 -122.434247 37.662385
5 5 6081602500 60816025.00 California 0.46603 5146 1459 1467 44 -122.451187 37.640219
... ... ... ... ... ... ... ... ... ... ... ...
201 204 6075011100 60750111.00 California 0.09466 5559 2930 3037 362 -122.418479 37.791082
202 205 6075012200 60750122.00 California 0.07211 7035 3862 4074 272 -122.417237 37.785728
203 206 6075017601 60750176.01 California 0.24306 5756 2437 2556 943 -122.410115 37.779459
204 207 6075017800 60750178.00 California 0.27882 5829 3115 3231 807 -122.405411 37.778934
205 208 6075012400 60750124.00 California 0.17496 8188 4328 4609 549 -122.416455 37.782289

205 rows × 11 columns

Display facility_points dataframe

[5]:
display(facility_points)
OBJECTID NAME long lat
1 1 Store_1 -122.510018 37.772364
2 2 Store_2 -122.488873 37.753764
3 3 Store_3 -122.464927 37.774727
4 4 Store_4 -122.473945 37.743164
5 5 Store_5 -122.449291 37.731545
6 6 Store_6 -122.491745 37.649309
7 7 Store_7 -122.483182 37.701109
8 8 Store_11 -122.433782 37.655364
9 9 Store_12 -122.438982 37.719236
10 10 Store_13 -122.440218 37.745382
11 11 Store_14 -122.421636 37.742964
12 12 Store_15 -122.430982 37.782964
13 13 Store_16 -122.426873 37.769291
14 14 Store_17 -122.432345 37.805218
15 15 Store_18 -122.412818 37.805745
16 16 Store_19 -122.398909 37.797073

Plot tract

[9]:
study_area.plot()
[9]:
<AxesSubplot:>
../_images/notebooks_facloc-real-world_11_1.png

To start modeling the problem assuming the arguments expected by spopt.locate, we should pass a numpy 2D array as a cost_matrix. So, first we pivot the network_distance dataframe.

Note that the columns and rows are in alphabetically order.

[3]:
ntw_dist_piv = network_distance.pivot_table(values="distance", index="DestinationName", columns="name")
ntw_dist_piv
[3]:
name Store_1 Store_11 Store_12 Store_13 Store_14 Store_15 Store_16 Store_17 Store_18 Store_19 Store_2 Store_3 Store_4 Store_5 Store_6 Store_7
DestinationName
60750101.0 11495.190454 20022.666503 10654.593733 8232.543149 7561.399789 4139.772198 4805.805279 2055.530234 225.609240 1757.623456 11522.519829 7529.985950 10847.234951 10604.729605 20970.277793 15242.989416
60750102.0 10436.169910 19392.094770 10024.022001 7601.971416 6930.828057 3093.851654 4175.233547 1257.809690 1041.911304 2333.244000 10509.099285 6470.965406 10216.663219 9974.157873 20339.706061 14612.417684
60750103.0 10746.296811 19404.672860 10036.600090 7614.549505 6943.406146 3381.778555 4187.811636 2046.436590 744.584403 1685.517099 10800.926186 6778.892307 10229.241308 9986.735962 20352.284150 14624.995773
60750104.0 11420.492134 19808.368182 10440.295413 8018.244828 7347.101469 4044.473877 4591.506959 2463.736278 795.715285 1282.217412 11308.221508 7447.187630 10632.936630 10390.431285 20755.979472 15028.691095
60750105.0 11379.443952 19583.920000 10215.847231 7793.796646 7122.653287 4103.725695 4367.058776 3320.283731 1731.462738 249.669959 11083.773326 7379.539448 10408.488448 10165.983103 20531.531290 14804.242913
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
60816025.0 17324.066610 2722.031291 10884.063331 14178.007937 13891.857275 18418.384867 16726.951785 20834.395022 21441.247824 20875.680521 14662.484617 16569.371114 12483.322114 11926.727459 4968.842581 8648.054204
60816026.0 15981.172325 3647.137006 10299.369046 13593.313651 13307.162990 17833.690581 16142.257500 20249.700736 20856.553539 20290.986235 14050.290332 15963.776829 11871.527828 11342.033174 3625.948296 7919.659919
60816027.0 14835.342712 4581.333336 9637.139433 12931.084039 12644.933377 17171.460969 15480.027887 19587.471124 20194.323926 19628.756623 13341.313338 15301.547216 11209.298215 10679.803561 2290.818683 7242.830306
60816028.0 13339.491691 6392.856372 8577.488412 11871.433018 11585.282356 16111.809948 14420.376867 18527.820103 19134.672905 18569.105602 11845.462317 14241.196195 10155.147195 9620.152540 1846.795647 5746.979285
60816029.0 15257.855684 6394.920365 10253.752405 13547.697010 13261.546349 17788.073940 16096.640859 20204.084095 20810.936898 20245.369594 13763.826309 15918.160188 11825.911187 11296.416533 508.931655 7665.343278

205 rows × 16 columns

Here the pivot table is transformed to numpy 2D array such as spopt.locate expected. The matrix has a shape of 205x16.

[4]:
cost_matrix = ntw_dist_piv.to_numpy()
cost_matrix
[4]:
array([[11495.19045438, 20022.66650296, 10654.59373325, ...,
        10604.72960533, 20970.27779306, 15242.98941606],
       [10436.16991032, 19392.09477041, 10024.0220007 , ...,
         9974.15787278, 20339.70606051, 14612.41768351],
       [10746.29681106, 19404.67285964, 10036.60008993, ...,
         9986.73596201, 20352.28414974, 14624.99577275],
       ...,
       [14835.34271218,  4581.3333364 ,  9637.13943331, ...,
        10679.80356124,  2290.81868301,  7242.83030602],
       [13339.49169134,  6392.85637207,  8577.48841247, ...,
         9620.15254039,  1846.79564734,  5746.97928517],
       [15257.85568393,  6394.92036466, 10253.75240505, ...,
        11296.41653298,   508.93165475,  7665.34327776]])

Now, as the rows and columns of our cost_matrix are sorted, so we have to sort our facility points and demand points geodataframes too.

[5]:
facility_points_gdf = geopandas.GeoDataFrame(
    facility_points,
    geometry=geopandas.points_from_xy(
        facility_points.long, facility_points.lat
    ),
).sort_values(by=['NAME']).reset_index()

demand_points_gdf = geopandas.GeoDataFrame(
    demand_points,
    geometry=geopandas.points_from_xy(demand_points.long, demand_points.lat),
).sort_values(by=['NAME']).reset_index()

Here the rest of parameters are set. The ai is the demand weight and in this case we model as population in year 2000 of this tract.

[6]:
ai = demand_points_gdf['POP2000'].to_numpy()

service_dist = 5000
p_facility = 4

Below, the method is used to plot the results of the four models that we will prepare to solve the problem.

[34]:
from matplotlib.patches import Patch
import matplotlib.lines as mlines
from matplotlib_scalebar.scalebar import ScaleBar

dv_colors = [
    "saddlebrown",
    "darkgoldenrod",
    "mediumseagreen",
    "lightskyblue",
    "lavender",
    "darkslategray",
    "coral",
    "mediumvioletred",
    "darkcyan",
    "cyan",
    "limegreen",
    "peachpuff",
    "blueviolet",
    "fuchsia",
    "thistle",
]

def plot_results(model, facility_points_gdf, demand_points_gdf, facility_count, title, p):

    arr_points = []
    fac_sites = []

    for i in range(facility_count):
        if model.fac2cli[i]:
            geom = demand_points_gdf.iloc[model.fac2cli[i]]["geometry"]
            arr_points.append(geom)
            fac_sites.append(i)

    model.uncovered_clients()
    xcov = model.n_cli_uncov

    fig, ax = plt.subplots(figsize=(10, 15))
    legend_elements = []

    study_area.plot(ax=ax, alpha=.5, fc="tan", ec="k", zorder=1)
    _patch = Patch(alpha=.5, fc="tan", ec="k", label="Dissolved Service Areas")
    legend_elements.append(_patch)

    demand_points_gdf.plot(
        ax=ax, fc="k", ec="k", marker="s", markersize=7, zorder=2
    )
    legend_elements.append(
        mlines.Line2D(
            [],
            [],
            marker="s",
            markerfacecolor="k",
            markeredgecolor="k",
            ms=3,
            linewidth=0,
            label=f"Demand sites not covered ($n$={xcov})"
        )
    )

    facility_points_gdf.plot(
        ax=ax, fc="brown", marker="*", markersize=80, zorder=8
    )
    legend_elements.append(
        mlines.Line2D(
            [],
            [],
            marker="*",
            markerfacecolor="brown",
            markeredgecolor="brown",
            ms=7,
            lw=0,
            label=f"Store sites ($n$={facility_count})"
        )
    )

    _zo, _ms = 4, 4
    for i in range(len(arr_points)):

        cset = dv_colors[i]
        fac = fac_sites[i]
        fname = facility_points_gdf.iloc[[fac]]["NAME"]
        fname = f"{fname.squeeze().replace('_', ' ')}"

        gdf = geopandas.GeoDataFrame(arr_points[i])

        label = f"Demand sites covered by {fname}"
        gdf.plot(ax=ax, zorder=_zo, ec="k", fc=cset, markersize=100*_ms)
        legend_elements.append(
            mlines.Line2D(
                [],
                [],
                marker="o",
                markerfacecolor=cset,
                markeredgecolor="k",
                ms= _ms + 7,
                lw=0,
                label=label
            )
        )

        facility_points_gdf.iloc[[fac]].plot(
            ax=ax, marker="*", markersize=1000, zorder=9, fc=cset, ec="k", lw=2
        )
        legend_elements.append(
            mlines.Line2D(
                [],
                [],
                marker="*",
                markerfacecolor=cset,
                markeredgecolor="k",
                markeredgewidth=2,
                ms=20,
                lw=0,
                label=fname,
            )
        )

        _zo += 1
        _ms -= (1)*(4/p)

    plt.title(title, fontsize=20)
    kws = dict(loc="upper left", bbox_to_anchor=(1.05, .7), fontsize=15)
    plt.legend(handles=legend_elements, **kws)

    x, y, xyc, arrow_length, c = 0.925, 0.15, "axes fraction", 0.1 , "center"
    xy, xyt = (x, y), (x, y-arrow_length)
    ap = dict(facecolor="black", width=5, headwidth=10)
    kws = dict(arrowprops=ap, ha=c, va=c, fontsize=20)
    plt.annotate("N", xy=xy, xycoords=xyc, xytext=xyt, **kws)

    plt.gca().add_artist(ScaleBar(1))
[8]:
for _df in [facility_points_gdf, demand_points_gdf, study_area]:
    _df.set_crs("EPSG:4326", inplace=True)
    _df.to_crs("EPSG:7131", inplace=True)

LSCP

[35]:
from spopt.locate.coverage import LSCP

lscp = LSCP.from_cost_matrix(cost_matrix, service_dist)

lscp = lscp.solve(pulp.GLPK(msg=False))

lscp.facility_client_array()
plot_results(lscp, facility_points_gdf, demand_points_gdf, facility_points_gdf.shape[0], "LSCP", lscp.problem.objective.value())
../_images/notebooks_facloc-real-world_24_0.png
[13]:
lscp.problem.objective.value()
[13]:
8

MCLP

[36]:
from spopt.locate.coverage import MCLP

mclp = MCLP.from_cost_matrix(
    cost_matrix,
    ai,
    max_coverage=service_dist,
    p_facilities=4,
)

mclp = mclp.solve(pulp.GLPK(msg=False))

mclp.facility_client_array()
plot_results(mclp, facility_points_gdf, demand_points_gdf, facility_points_gdf.shape[0], "MCLP", 4)
../_images/notebooks_facloc-real-world_27_0.png
[15]:
mclp.problem.objective.value()
[15]:
875247

MCLP Percentage Covered

[16]:
mclp.uncovered_clients()
mclp.get_percentage()
mclp.percentage
[16]:
0.8975609756097561

P-Center

[38]:
from spopt.locate import PCenter
pcenter = PCenter.from_cost_matrix(
    cost_matrix, p_facilities=p_facility
)

pcenter = pcenter.solve(pulp.GLPK(msg=False))
pcenter.facility_client_array()
plot_results(pcenter, facility_points_gdf, demand_points_gdf, facility_points_gdf.shape[0], "P-Center", p_facility)
../_images/notebooks_facloc-real-world_32_0.png
[18]:
pcenter.problem.objective.value()
[18]:
7403.06

P-Median

[39]:
from spopt.locate import PMedian
pmedian = PMedian.from_cost_matrix(
    cost_matrix, ai, p_facilities=p_facility
)

pmedian = pmedian.solve(pulp.GLPK(msg=False))
pmedian.facility_client_array()
plot_results(pmedian, facility_points_gdf, demand_points_gdf, facility_points_gdf.shape[0], "P-Median", p_facility)
../_images/notebooks_facloc-real-world_35_0.png
[40]:
pmedian.problem.objective.value()
[40]:
2848268129.7145104
[41]:
pmedian.get_mean_distance(weight=ai)
pmedian.mean_dist
[41]:
2982.1268579890657

Using different solver for one model

For this task we get the MCLP model as example.

[22]:
from spopt.locate.coverage import MCLP

mclp = MCLP.from_cost_matrix(
            cost_matrix,
            ai,
            max_coverage=service_dist,
            p_facilities=4,
        )

First, it’s good to see which solver you have installed in your machine.

[23]:
solvers = pulp.listSolvers(onlyAvailable=True)
solvers
[23]:
['GLPK_CMD', 'CPLEX_CMD', 'GUROBI_CMD', 'PULP_CBC_CMD', 'COIN_CMD']

Above we can see that it returns a list with different solvers that are available. So, it’s up to user choose the best solver that fits the model. Let’s get the percentage as a metric to evaluate which solver is the best or improves the model.

[24]:
import time

for i in range(len(solvers)):
    solver = pulp.get_solver(solvers[i], msg=False)

    start = time.time()
    mclp = mclp.solve(solver)
    end = time.time()

    mclp.facility_client_array()
    mclp.uncovered_clients()
    mclp.get_percentage()

    percentage = mclp.percentage
    time_spent = end-start

    print(f"Solver: {solvers[i]} - Percentage: {percentage} - Time Spent: {time_spent}")
Solver: GLPK_CMD - Percentage: 0.8975609756097561 - Time Spent: 0.06669068336486816
Solver: CPLEX_CMD - Percentage: 0.8975609756097561 - Time Spent: 0.08563423156738281
Solver: GUROBI_CMD - Percentage: 0.8975609756097561 - Time Spent: 0.07414436340332031
Solver: PULP_CBC_CMD - Percentage: 0.8975609756097561 - Time Spent: 0.06678557395935059
Solver: COIN_CMD - Percentage: 0.8975609756097561 - Time Spent: 0.05470633506774902