Speeding up pgRouting

pgRouting and accessibility
pgRouting and accessibility

I have been using pgRouting for some accessibility analysis to various facilities on the network and experimenting with different ways of making the process faster.

My initial network had 28000 edges and to solve a catchment area problem for one location on the network to all other nodes on the network was taking 40 minutes on a 2.93GHz quad core processor with 4GB RAM (Windows 7 PostgreSQL 9.2 PostGIS 2.0.3 and pgRouting 1.0.7).  I put the query into a looping function that processed the facilities in order but any more than 4 and the machine would run out of memory as the complete solution is stored in RAM until the loop finishes.

First step, reduce the number of edges in the network to 23000 and number of nodes to 17000 by removing pedestrian walkways, alleys, private and restricted roads.  Now the query is solved in about 12-14 minutes using about 200MB RAM per facility.

Second, duplicate the function four times with each of the functions set to process a different set of facilities (function1 does 1 and 2, function2 does 3 and 4, etc.). I created four functions because when when each is run it spawns a postgres.exe process that utilises one of the four cores.  So four functions will use four cores and the same amount of memory.  If I stagger the start by a minute then the first process will complete and close while the other three are still going and release the memory.  By doing this I can process 8 facilities in 25 minutes instead of about 2 hours.

The output table is then grouped by minimum cost to create a final table of nodes which is then used to generate isochrones.

By reducing the complexity of the network the output surfaces and contours are much smoother and present a clearer picture of network accessibility.  The restricted edges introduced a lot of noise that obscured an understandable result.

Here is the process methodology which borrows heavily from work done by Anita Graser, Jo Cook and others:

Run this for each table of facilities. Change the table name “facilities” to match your table name.

===============================================
// 1. Adds a nearest_node field of type integer to the source table //
===============================================

ALTER TABLE facilities ADD COLUMN nearest_node INTEGER;

===============================================
// 2. Creates a temporary table to hold nodes and distances //
===============================================

CREATE TABLE temp_fac AS
 SELECT a.gid, b.id, min(a.dist)
 FROM
 (SELECT facilities.gid, 
 min(st_distance(facilities.wkb_geometry, vertices_tmp_nw.the_geom)) AS dist
 FROM facilities, vertices_tmp_nw
 GROUP BY facilities.gid) AS a,
 (SELECT facilities.gid, vertices_tmp_nw.id, 
 st_distance(facilities.wkb_geometry, vertices_tmp_nw.the_geom) AS dist
 FROM facilities, vertices_tmp_nw) AS b
 WHERE a.dist = b. dist
 AND a.gid = b.gid
 GROUP BY a.gid, b.id;

===============================================
// 3. Updates the source table with the ID of the node nearest to the facility//
===============================================

UPDATE facilities
SET nearest_node = (SELECT id 
  FROM temp_fac
  WHERE temp_fac.gid = facilities.gid);

===============================================
// 4. Run function to calculate cost from each node to each facility
===============================================

-- Function: insertfacilitiescosts()
-- DROP FUNCTION insertfacilitiescosts();
CREATE OR REPLACE FUNCTION insertfacilitiescosts()
 RETURNS integer AS
$BODY$
DECLARE
 nn INTEGER;
BEGIN
 RAISE NOTICE 'Selecting nodes...';

FOR nn IN SELECT nearest_node FROM facilities WHERE gid IN (1,2) -- this is changed 
 LOOP -- in each copy 
 RAISE NOTICE 'Processing node % ...', (nn); -- of the function
 INSERT INTO facilities_catchment (
 SELECT
 id,
 the_geom,
 (SELECT sum(cost) FROM (
 SELECT * FROM shortest_path('
 SELECT gid AS id,
 source::int4 AS source,
 target::int4 AS target,
 traveltime::float8 AS cost
 FROM angus_nw',
 nn,
 id,
 false,
 false)) AS foo ) AS cost
 FROM vertices_tmp_nw);
 END LOOP;
 RAISE NOTICE 'Completed node processing';
RETURN '1';
END;
$BODY$
 LANGUAGE plpgsql VOLATILE
 COST 100;
ALTER FUNCTION insertfacilitiescosts()
 OWNER TO postgres;

Run the functions in pgAdmin III

SELECT * FROM insertfacilitycosts1();
SELECT * FROM insertfacilitycosts2();
SELECT * FROM insertfacilitycosts3();

To optimise processing I create three or four copies of the function and get each function to process two or three facilities. I run all functions at the same time. I create duplicate functions as I have four cores in my machine and each function process is assigned to a core. This reduces the processing time by a factor of three (at least). Beware! This uses all CPU processing power and, depending on the size of the network, most of the system RAM. More RAM and faster processors are always better 🙂  Also the PostgreSQL docs say that:

Currently, the point at which data begins being written to disk is controlled by the work_mem configuration variable. Administrators who have sufficient memory to store larger result sets in memory should consider increasing this parameter.

===============================================
// 5. Creates an aggregated cost table of minimum cost from each point
===============================================

CREATE table facilities_catchment_final AS
SELECT id, the_geom, min (cost) AS cost
FROM facilities_catchment
GROUP By id, the_geom;

Before the tables is used in QGIS you need to add a unique id field (which creates a sequence), create a primary key constraint and add a spatial index. Makes everything faster.

ALTER TABLE facilities_catchment_final ADD COLUMN "objectid" SERIAL;
ALTER TABLE facilities_catchment_final ADD CONSTRAINT facilities_catchment_final_pkey PRIMARY KEY(objectid);
CREATE INDEX facilities_catchment_final_geom_idx
 ON facilities_catchment_final
 USING gist
 (the_geom);

Leave a Reply

Your email address will not be published. Required fields are marked *