So you’ve got a table of addresses that need to be matched up with their latitude/longitude coordinates… i.e. you need to geocode some addresses. Most of the time, an online tool is a better choice than running your own geocoder if all you need is a handful of addresses geolocated. In the case that you can’t share your addresses or you have too many to send through an online tool, batch geocoding on your own computer is a good option. In this post I’ll share how I’ve been doing this.
First, you’ll need a geocoder. If you’ve got a PostGIS Geocoder up an running, congratulations! That’s no easy feat. If you don’t, take a detour over to my post about how to make a PostGIS geocoder in less than 5 days.
Once you’ve made your geocoder, you’re probably fairly familiar with how to run queries in pgAdmin 4, but just to be on the safe side, I’m gonna give you all the details I think you’ll need.
Start pgAdmin 4. (If you get an error, you may need to start the postgresql service – see link above for details). Navigate in the browser panel on the left side of the window to your geocoder database and click on it to highlight it. Open the Query Tool – Tools menu at the top, then select Query Tool.
This is the code I’ve been running (with copious code comments to help you… and me later… understand what I did). You’ll need to make adjustments for your specific table of addresses, but after that, you can paste the whole thing into the Query Tool and hit the run button (why is it lightning bolt icon?).
--- remove any existing attempts DROP TABLE IF EXISTS address_table; --- Create an empty table --- Make the columns match those of the CSV that has the data --- you want to geocode CREATE TABLE address_table( per_id_fk varchar(255), acity varchar(255), astate varchar(255), azip varchar(255), acounty varchar(255), adate varchar(255), street varchar(255), input_address varchar(255)); --- Look at your new empty table: --- (I like to have visual confirmation that things worked the first time --- I run them... you don't need to do this when you run the whole script --- to batch process the address) --- Uncomment the SELECT query below if you're doing this one piece at a time: --- SELECT * FROM address_table; --- import data from a CSV - the columns in the CSV should be in the list above COPY address_table from 'C:\gisdata\TableOfAddresses.csv' WITH DELIMITER ',' NULL 'NA' CSV HEADER; --- Look at the data you added: --- SELECT * FROM address_table; --- Add the columns we'll need for geocoding ALTER TABLE address_table ADD lon numeric, ADD lat numeric, ADD geomout geometry, -- a point geometry in NAD 83 long lat. ADD new_address varchar(255), ADD rating integer; --- Look at your new empty columns: --- SELECT * FROM address_table; --- This function loops through the table, one row at a time: CREATE OR REPLACE FUNCTION mygeocoder() RETURNS void AS $$ BEGIN FOR i IN 1..(SELECT(count(per_id_fk)) from address_table) LOOP UPDATE address_table SET (rating, new_address, lon, lat, geomout) = (COALESCE(g.rating,-1),pprint_addy(g.addy), ST_X(g.geomout)::numeric(8,5), ST_Y(g.geomout)::numeric(8,5), (g.geomout) ) FROM (SELECT per_id_fk, input_address FROM address_table WHERE rating IS NULL ORDER BY per_id_fk LIMIT 1) As a LEFT JOIN LATERAL geocode(a.input_address,1) As g ON true WHERE a.per_id_fk = address_table.per_id_fk; END LOOP; RETURN; END; $$ LANGUAGE plpgsql; --- Run your geocoder to geocode your address table: SELECT mygeocoder(); --- Look at your table with geocoded addresses: SELECT * FROM address_table;
As you can see in the code comments, this function geocodes the table one row at a time. According to posts I’ve read, you don’t want to feed your whole table into the geocoder at once. It makes the computer hold a lot of data in memory and will slow things down. You can do a few lines at a time, but it makes the code for the loop easier if I didn’t have to figure out how many times to loop (… how many rows do I have… divide by 3… how to know when to stop…).
This is going to take a long time if you have many addresses. For testing purposes, I highly recommend making a subset of your data – maybe 100 or so addresses – to work on while you adapt your code. You don’t want to wait hours just to find out something was a little off.
In formulating this post and code, I found these resources to be very helpful and have adapted most of my code from them:
- gravitystorm’s geocoder code on GitHub –
has a more complex example with more functions than I needed here
- dracodoc’s blog post about geocoding – has examples of how to make a loop