Talking to a friend, a question came up about finding the point in LA's road network that's most inconvenient to get to, with inconvenient being a vague notion describing a closed residential neighborhood full of dead ends; the furthest of these dead ends would be most inconvenient indeed. I decided to answer that question. All the code lives in a new repo: https://github.com/dkogan/culdesacs
I want inconvenient to mean
Furthest to reach via the road network, but nearest as-the-crow-flies.
Note that this type of metric is not a universal one, but is relative to a particular starting point. This makes sense, however: a location that's inconvenient from one location could be very convenient from another.
This metric could be expressed in many ways. I keep it simple, and compute a relative inefficiency coefficient:
Thus the goal is to find a location within a given radius of the starting point that maximizes this relative inefficiency.
Approach
I use OpenStreetMap for the road data. This is all aimed at bicycling, so I'm looking at all roads except freeways and ones marked private. I am looking at footpaths, trails, etc.
Once I have the road network, I run Dijkstra's Algorithm to compute the shortest path from my starting point to every other point on the map. Then I can easily compute the inefficiency for each such point, and pick the point with the highest inefficiency. I use OSM nodes as the "points". It is possible that the location I'm looking for is inbetween a pair of nodes, but the nodes will really be close enough. Also, the "distance" between adjacent nodes can take into account terrain type, elevation, road type and so on. I ignore all that, and simply look at the distance.
Implementation
Each step in the process lives in its own program. This simplifies implementation and makes it easy to work on each piece separately.
Data import
First I query OSM. This is done with the query.pl
script. It takes in the
center point and the query radius. The query uses the OSM Overpass query
language. I use this simple query, filling in the center point and radius:
[out:json]; way ["highway"] ["highway" !~ "motorway|motorway_link" ] ["access" !~ "private" ] ["access" !~ "no" ] (around:$rad,$lat,$lon); (._;>;); out;
Sample invocation:
$ ./query.pl --center 34.0690448,-118.292924 --rad 20miles % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 ..... $ ls -lhrt *(om[1]) -rw-r--r-- 1 dima dima 81M Aug 14 00:44 query_34.0690448_-118.292924_20miles.json
Data massaging
Now I need to take the OSM query results, and manipulate them into a form
readable by the Dijkstra's algorithm solver. This is done by the
massage_input.pl
script. This script does nothing interesting, but it doesn it
inefficiently, so it's CPU and RAM-hungry and takes a few minutes. Sample
invocation:
$ ./massage_input.pl query_34.0690448_-118.292924_20miles.json > query_34.0690448_-118.292924_20miles.net
Neighbor list representation
An implementation choice here was how to represent the neighbor list for a node.
I want the main computation (next section) to be able to query this very
quickly, and I don't want the list to take much space, and I don't want to
fragment my memory with many small allocations. Thus I have a single contiguous
array of integers neighbor_pool
. Each node has a single integer index into
this pool. At this index the neighbor_pool
contains a list of node indices
that are neighbors of the node in question. A special node index of -1 signifies
the end of the neighbor list for that node.
Inefficiency coefficient computation
I now feed the massaged data to Dijkstra's algorithm implemented in compute.c
.
I need a priority queue where elements can be inserted, removed and updated.
Apparently most heap implementations don't have an 'update' mechanism, so it
took a little while to find a working one. I ended up using phk's b-heap
implementation from the varnish source tree. It stores arbitrary pointers
(64-bit on my box); 32-bit indices into a pool would be more efficient, but this
is fast enough.
Sample invocation:
$ ./compute < query_34.0690448_-118.292924_20miles.net > query_34.0690448_-118.292924_20miles.out $ head -n 2 query_34.0690448_-118.292924_20miles.out 34.069046 -118.292923 0.000000 0.000000 34.070034 -118.292931 109.863564 109.863564
The output is all nodes, sorted by the road distance to the node. The columns are lat,lon,droad,ddirect.
Distance from latitude/longitude pairs
One implementation note here is how to compute the distance between two latitude/longitude pairs. The most direct way is to convert each latitude/longitude pair into a unit vector, compute the dot product, take the arccos and multiply by the radius of the Earth. This requires 9 trigonometric operations and relies on the arccos of a number close to 1, which is inaccurate. One could instead compute the arcsin of the magnitude of the cross-product, but this requires even more computation. I want something simpler:
This is nice and simple. Is it sufficiently accurate? This python script tests it:
import numpy as np lat0,lon0 = 34.0690448,-118.292924 # 3rd/New Hampshire lat1,lon1 = 33.93,-118.4314 # LAX lat0,lon0,lat1,lon1 = [x * np.pi/180.0 for x in lat0,lon0,lat1,lon1] Rearth = 6371000 v0 = np.array((np.cos(lat0)*np.cos(lon0), np.cos(lat0)*np.sin(lon0),np.sin(lat0))) v1 = np.array((np.cos(lat1)*np.cos(lon1), np.cos(lat1)*np.sin(lon1),np.sin(lat1))) dist_accurate = np.sqrt( (lat0-lat1)**2 + (lon0-lon1)**2 * np.cos(lat0)*np.cos(lat1) ) * Rearth dist_approx = np.arccos(np.inner(v0,v1)) * Rearth print dist_accurate print dist_approx print dist_accurate - dist_approx
Between Koreatown and LAX there's quite a bit of difference in both latitude and longitude. Both methods say the distance is about 20km, with a disagreement of 3mm. This is plenty good enough.
Results
I want to find the least convenient location from the intersection of New Hampshire and 3rd street in Los Angeles within 20 miles or so.
All the inconvenience coefficients computed from this data look like this (source, data):
As expected, straight roads away from the starting point are highly efficient, and we see this in the Vermont corridor, 3rd street, and to a lesser extent 6th street going East. Also as expected, routes that do not align with the street grid are inefficient: we see inefficiency bumps going NW and SW from the starting point. Eastward from the start the street grid tilts, so things are more complicated in that direction.
We also see that hilly neighborhoods stand out: winding streets are not efficient. The Santa Monica mountains, Verdugos, Mount Washington and the San Gabriel Mountains clearly stand out. One can also make out the river paths in the bottom-right as ridges of inconvenience: their limited access nature makes it necessary to travel longer distance to get to them.
Onwards. The output of compute
is sorted by road distance from the start. I
prepend the coefficient of inconvenience, re-sort the list and take 50 most
inconvenient locations by invoking
$ <query_34.0690448_-118.292924_20miles.out awk '$4 {printf "%f %f %f %f %f\n",($3-$4)/$4,$1,$2,$3,$4}' | sort -n -k1 -r | head -n 50
The output is this:
Inconvenience | Latitude | Longitude | Road distance (m) | Direct distance (m) |
---|---|---|---|---|
1.142052 | 34.068104 | -118.290382 | 549.216980 | 256.397583 |
1.139839 | 34.071629 | -118.288956 | 994.499390 | 464.754242 |
1.139147 | 34.068066 | -118.290436 | 542.721497 | 253.709305 |
1.136799 | 34.068130 | -118.290329 | 554.962891 | 259.716919 |
1.127631 | 34.068031 | -118.290466 | 537.980652 | 252.854279 |
1.120537 | 34.068153 | -118.290253 | 562.437012 | 265.233337 |
1.106771 | 34.067982 | -118.290504 | 531.442017 | 252.254257 |
1.103518 | 34.068169 | -118.290184 | 568.985352 | 270.492218 |
1.083344 | 34.067940 | -118.290527 | 526.321899 | 252.633179 |
1.079027 | 34.068176 | -118.290100 | 576.762024 | 277.419189 |
1.041816 | 34.067883 | -118.290543 | 519.805908 | 254.580200 |
1.034252 | 34.070259 | -118.291237 | 418.454498 | 205.704315 |
1.019096 | 34.071594 | -118.287888 | 1097.392212 | 543.506653 |
0.974731 | 34.068214 | -118.289680 | 617.407532 | 312.654022 |
0.970095 | 34.068176 | -118.289719 | 611.899475 | 310.593842 |
0.917598 | 34.068111 | -118.289383 | 656.267517 | 342.234131 |
0.910048 | 34.068165 | -118.289383 | 650.329041 | 340.477783 |
0.902491 | 34.068214 | -118.289383 | 644.814758 | 338.931915 |
0.770809 | 34.067570 | -118.290543 | 485.023560 | 273.899414 |
0.760711 | 34.068214 | -118.288643 | 712.981384 | 404.939484 |
0.753344 | 34.068214 | -118.288597 | 717.197876 | 409.045654 |
0.750541 | 34.033188 | -118.279716 | 7297.569824 | 4168.751465 |
0.747349 | 34.031826 | -118.279968 | 7526.415039 | 4307.333008 |
0.743357 | 34.067772 | -118.289474 | 606.347107 | 347.804382 |
0.741902 | 34.067787 | -118.289436 | 610.249084 | 350.334900 |
0.740024 | 34.067749 | -118.289505 | 602.555115 | 346.291290 |
0.739944 | 34.031769 | -118.279823 | 7511.619141 | 4317.161621 |
0.738388 | 34.031582 | -118.280746 | 7499.802734 | 4314.228516 |
0.737889 | 34.067795 | -118.289398 | 613.863831 | 353.223755 |
0.737716 | 34.031742 | -118.279800 | 7507.977051 | 4320.601562 |
0.736297 | 34.031372 | -118.280258 | 7550.486816 | 4348.613770 |
0.735083 | 34.068108 | -118.288734 | 693.459473 | 399.669403 |
0.734607 | 34.067730 | -118.289520 | 600.010803 | 345.905945 |
0.732851 | 34.031685 | -118.279747 | 7499.933105 | 4328.088379 |
0.730817 | 34.067795 | -118.289352 | 618.080322 | 357.103241 |
0.730543 | 34.031654 | -118.279732 | 7496.259766 | 4331.739746 |
0.728622 | 34.031628 | -118.279724 | 7493.208496 | 4334.787109 |
0.727123 | 34.067707 | -118.289536 | 597.103455 | 345.721344 |
0.726802 | 34.031601 | -118.279724 | 7490.239258 | 4337.637207 |
0.724309 | 34.031563 | -118.279739 | 7485.770508 | 4341.315430 |
0.723138 | 34.067791 | -118.289307 | 622.318115 | 361.153992 |
0.722826 | 34.031540 | -118.279755 | 7482.862793 | 4343.366211 |
0.722384 | 34.094849 | -118.236145 | 10273.032227 | 5964.425293 |
0.721979 | 34.094719 | -118.235779 | 10309.708008 | 5987.128906 |
0.721011 | 34.094639 | -118.235474 | 10339.187500 | 6007.625977 |
0.720812 | 34.094620 | -118.235405 | 10345.856445 | 6012.193359 |
0.720105 | 34.031498 | -118.279778 | 7477.742188 | 4347.258789 |
0.720078 | 34.094543 | -118.235138 | 10371.867188 | 6029.880859 |
0.719789 | 34.031509 | -118.279755 | 7475.278809 | 4346.624512 |
0.719616 | 34.095020 | -118.236320 | 10248.023438 | 5959.484863 |
There are 3 clusters of data. All the stuff < 500m away from the start is mostly degenerate and uninteresting (source):
Most of the points are in walkways in Shatto Recreation Center. They're all so close to the start that any inefficiency is exaggerated by the small direct distance. I make the rules, so I claim these aren't the least convenient point.
Next we have the points about 4.2km away as the crow flies (source):
These all appear in an improperly-mapped group of sidewalks around Saint James park: http://www.openstreetmap.org/#map=18/34.03173/-118.27892.
Here the sidewalks appear as separate ways that don't connect with the roads they abut. So according to the data, connecting to the network of sidewalks can only happen in one location, making these appear less convenient than they actually are. (I think these should be removed from OSM, but it looks like the OSM committee people think that mapping sidewalks as separate ways is acceptable. OK; it'll be fixed eventually in some way).
The next cluster of data is about 6km away as the crow flies (source):
These are all at the road connecting to the Metrolink maintenance facility at Taylor Yard: http://www.openstreetmap.org/#map=17/34.09371/-118.23463. This makes sense! This location is on the other side of the LA river from Koreatown, so getting here requires a lengthy detour to the nearest bikeable bridge. The nearest one (Riverside Drive) is 2.5km by road away, but this is in the opposite direction from Koreatown. The nearest one in the other direction is Fletcher Drive, 3.8km by road.
So the least convenient point from New Hampshire / 3rd is at lat/lon 34.094849,-118.236145. This location is 10.3km away by road, but only 6.0km as the crow flies, for an inconvenience coefficient of 0.72.