Category Archives: Science

Batch Geocoding with PostGIS

So you’ve got a table of addresses that need to be matched up with their latitude/longitude coordinates… i.e. you need to geocode some addresses.  Most of the time, an online tool is a better choice than running your own geocoder if all you need is a handful of addresses geolocated.  In the case that you can’t share your addresses or you have too many to send through an online tool, batch geocoding on your own computer is a good option.  In this post I’ll share how I’ve been doing this.

First, you’ll need a geocoder.  If you’ve got a PostGIS Geocoder up an running, congratulations!  That’s no easy feat.  If you don’t, take a detour over to my post about how to make a PostGIS geocoder in less than 5 days.

Once you’ve made your geocoder, you’re probably fairly familiar with how to run queries in pgAdmin 4, but just to be on the safe side, I’m gonna give you all the details I think you’ll need.

Start pgAdmin 4.  (If you get an error, you may need to start the postgresql service – see link above for details).  Navigate in the browser panel on the left side of the window to your geocoder database and click on it to highlight it.  Open the Query Tool – Tools menu at the top, then select Query Tool.

This is the code I’ve been running (with copious code comments to help you… and me later… understand what I did).  You’ll need to make adjustments for your specific table of addresses, but after that, you can paste the whole thing into the Query Tool and hit the run button (why is it lightning bolt icon?).

--- remove any existing attempts
DROP TABLE IF EXISTS address_table;

--- Create an empty table
--- Make the columns match those of the CSV that has the data 
--- you want to geocode

CREATE TABLE address_table(
per_id_fk varchar(255), 
acity varchar(255), 
astate varchar(255),
azip varchar(255),
acounty varchar(255),
adate varchar(255),
street varchar(255),
input_address varchar(255));

--- Look at your new empty table:   
--- (I like to have visual confirmation that things worked the first time 
--- I run them... you don't need to do this when you run the whole script 
--- to batch process the address)
--- Uncomment the SELECT query below if you're doing this one piece at a time:
--- SELECT * FROM address_table;

--- import data from a CSV - the columns in the CSV should be in the list above
COPY address_table from 'C:\gisdata\TableOfAddresses.csv' WITH DELIMITER ',' NULL 'NA' CSV HEADER;

--- Look at the data you added:
--- SELECT * FROM address_table;

--- Add the columns we'll need for geocoding
ALTER TABLE address_table 
ADD lon numeric,  
ADD lat numeric,  
ADD geomout geometry, -- a point geometry in NAD 83 long lat. 
ADD new_address varchar(255),  
ADD rating integer;

--- Look at your new empty columns:
--- SELECT * FROM address_table;

--- This function loops through the table, one row at a time:
CREATE OR REPLACE FUNCTION mygeocoder()
RETURNS void
AS $$
BEGIN 
   FOR i IN 1..(SELECT(count(per_id_fk)) from address_table) LOOP 
      UPDATE address_table 
         SET  (rating, new_address, lon, lat, geomout) = (COALESCE(g.rating,-1),pprint_addy(g.addy), ST_X(g.geomout)::numeric(8,5), ST_Y(g.geomout)::numeric(8,5), (g.geomout) ) 
         FROM (SELECT per_id_fk, input_address FROM address_table WHERE rating IS NULL ORDER BY per_id_fk LIMIT 1) 
         As a 
      LEFT JOIN LATERAL geocode(a.input_address,1) As g ON true 
      WHERE a.per_id_fk = address_table.per_id_fk; 
   END LOOP; 
   RETURN;
END;
$$ LANGUAGE plpgsql;

--- Run your geocoder to geocode your address table:
SELECT mygeocoder();

--- Look at your table with geocoded addresses:
SELECT * FROM address_table;

As you can see in the code comments, this function geocodes the table one row at a time.  According to posts I’ve read, you don’t want to feed your whole table into the geocoder at once.  It makes the computer hold a lot of data in memory and will slow things down.  You can do a few lines at a time, but it makes the code for the loop easier if I didn’t have to figure out how many times to loop (… how many rows do I have… divide by 3… how to know when to stop…).

This is going to take a long time if you have many addresses.  For testing purposes, I highly recommend making a subset of your data – maybe 100 or so addresses – to work on while you adapt your code.  You don’t want to wait hours just to find out something was a little off.

In formulating this post and code, I found these resources to be very helpful and have adapted most of my code from them:

 

Advertisements

Change SP Object Polygon Rendering Order in R

The Problem

I have a geoJSON file that was made by combining many (as in ~250) geoJSON files each containing a single polygon in R.  The polygons are of varying size and can overlap in space.  When I open the file in a GIS, often the smaller polygons are hidden under the larger ones, making displaying the data challenging, if not impossible.

Map showing few polygons when there should be many

I know there are more polygons than are visible in this area, so they must be hiding under the larger polygons.

Map showing larger polygons obscuring smaller polygons

With transparency set to 60% for each polygon (see the Draw Effects dialog in QGIS for the fill symbol layer), you can see that smaller polygons are hiding under larger ones.

The Goal

I would prefer that the polygons stack up so that the largest is on the bottom and the smallest is on the top.  This requires that I change the rendering order based on the area of each polygon.

The Quick Fix

QGIS has an option to control the rendering order.  Open the Layer Properties; go to the Style tab; check the box for “Control feature rendering order”; click the button on the right with the A Z and an arrow to enter the expression you want (I would order by area for example).

Why isn’t this the right solution for me?  I’m sharing a user-contributed dataset.  One of the goals is that anyone can use it.  When polygons are obscured, it makes the dataset just a little harder to use and understand, which means people won’t like using it.  Another goal is that anyone with a reasonable understanding of GIS can contribute.  If I have to write a bunch of instructions on how to visualize the data before they can add to the dataset, people are less likely to contribute.

Map showing all the polygons expected.

Now I can see all of the polygons because the larger ones are on the bottom and the smaller ones are on top.

My Solution

Hat tip to Alex Mandel and Ani Ghosh for spending a couple of hours with me hammering out a solution.  We used R because I already had a script that takes all of the polygons made by contributors and combines them into one file.  It made sense in this case to add a few lines to this post-processing code to re-order the polygons before sending the results to the GitHub repository.

What you need to know about rendering order & SP Objects

The order in which items in an SP object are rendered is controlled by the object’s ID value.  The ID value is hidden in the ID slot nested inside the polygons slot.  If you change these values, you change the order items are rendered.  ID = 1 goes first, ID =2 goes on top of 1, 3 goes on top of 2, and so on.  So for my case, assigning the IDs based on the area will get me the solution.

How

This Stack Exchange Post on re-ording spatial data was a big help in the process.  Note that every SP object should have the slots and general structure I used here.  There’s nothing special about this dataset.  If you’d like the dataset and another copy of the R code, however, it is in the UC Davis Library’s AVA repository.

#load the libraries you'll need
library(raster)
library(geojsonio)
library(rgdal)

### FIRST: Some context about how I made my dataset in the first place

# search in my working directory for the files inside the folders 
# called "avas" and "tbd"
avas <- list.files(path="./avas", pattern = "*json$", full.names = "TRUE")
tbd <- list.files(path="./tbd", pattern = "*json$", full.names = "TRUE")

#put the two lists into one list
gj <- c(avas, tbd)

#read all the geojson files & create an SP object
vects <- lapply(gj, geojson_read, what="sp")

#combine all the vectors together. bind() is from the raster package.
#probably could just rbind geojson lists too, but thats harder to plot
all <- do.call(bind, vects)

#Change any "N/A" data to nulls
all@data[all@data=="N/A"]<- NA


### SECOND: How I did the sorting

#Calculate area of polygons - needed for sorting purposes
# the function returns the value in the area slot of each row
all@data$area<-sapply(slot(all, "polygons"), function(i){slot(i, "area")})

#add the row names in a column - needed for sorting purposes
all$rows<-row.names(all)

#Order by area - row names & area are needed here
# decreasing = TRUE means we list the large polygons first
all<-all[match(all[order(all$area, decreasing = TRUE),]$rows, row.names(all@data)),]

#add new ID column - essentially you are numbering the rows 
# from 1 to the number of rows you have but in the order of 
# largest to smallest area
all$newid<-1:nrow(all)

#assign the new id to the ID field of each polygon
for (i in 1:nrow(all@data)){
 all@polygons[[i]]@ID<-as.character(all$newid[i])}

#drop the 3 columns we added for sorting purposes (optional)
all@data<-all@data[,1:(ncol(all@data)-3)]

#write the data to a file in my working directory
geojson_write(all, file="avas.geojson", overwrite=TRUE, convert_wgs84 = TRUE)

Growing Cyclamen from Seed

The Backstory

A couple of years ago I bought a cyclamen.  I was having a bunch of people over for Thanksgiving and the patio off the dining room looked like death.  The large volume of red flowers on the Cyclamens at the hardware store was an easy fix.  I’ve heard they are hard to grow and that they usually die after the winter, but I was willing to give it a try.  My new plant went into a large pot visible from the back door in a fairly shady spot.  In the spring, as the flowers died back, I noticed that a couple of them had made seed pods.  Being a plant person, I couldn’t just leave them.  This was an opportunity! So I did a bunch of research online on how to start them from seed.

How NOT to Do It

It’s very difficult, according to the internet, to grow a Cyclamen from seed.  You need cold, and darkness, and it takes a very long time.  I did my best to provide the prescribed conditions, and hoping that since Cyclamens come from a Mediterranean climate like my own, any mistakes would be mitigated.  I was delighted to get 5 sprouts!  I was so careful with them, but only one survived.  I thought perhaps the internet was right and this WAS hard.  It didn’t help that the clamshell plastic container that I used (because the internet said I needed a lid) was shallow and prone to drying out.

IMG_4125

My one surviving plant from my first try. It’s grown a second leaf which makes me very happy.

The Epiphany

I consoled myself with the fact that I could buy more of these festive winter flowerers and the next winter bought myself a cheerful pink and white one.  Like the one before it (which is still going strong, unlike what the internet predicted), it did very well. Both plants had masses of flowers and ended up making heaps of seeds.  I collected them and decided to give it another try.  The seeds I collected were waiting inside for the weather to get cold.  After all, the internet says 104 degree days are not the time to grow a Cyclamen.  Around September, I was examining plant #2 for new leaves.  This summer’s heat hit it hard but it was coming back.  I noticed some small leaves that didn’t look like the big curled new growth on the plant I bought.  They looked like the sprouts I got last year!

IMG_4121

Three of the 5 sprouts that grew themselves in the pot with their mother plant.

You’re kidding me.  All they need is the conditions around their mother plant.  I transplanted the sprouts to their own pot.  A windstorm killed one, but the rest are going strong.

IMG_4130

My cyclamen collection… and a mass of oxallis in the big pot.

Success

I grabbed a glazed pot, filled it with quality potting soil, and sprinkled on some seeds.  The established plants had a layer of leaf litter from the last year’s spent leaves that seemed to provide good protection from the sun and hold in moisture, so I covered the pot with some spent Cyclamen and Oxallis leaves because that’s what I had on hand.

IMG_4129

Leaf litter covering freshly planted seeds.

It took several weeks but I started to see the seeds swell and form small bulbs.

IMG_4123

There! Right in the middle of the photo! It’s a tiny cyclamen bulb.

I knew I had the formula right when I saw that I had a leaf climbing out from the leaf mulch.

IMG_4115

Yes, that’s a leaf, although it kind of looks more like a mung bean sprout…

The key really seems to be keeping them moist.  I let my pots sit in a shallow pool of water so they can wick up what they need.

IMG_4119

Sprouts in progress. In the square pots are attempt #1 and the volunteers. The larger yellow round pot has attempt #2. The small round pots have the last of the seeds planted today.

I’m crossing my fingers that they keep growing.  The second leaf appearing on my plant from attempt #1 is encouraging.  I’m really curious to find out what color flowers they have.  All their leaves are different from their parent plants, so I really can’t say what color flowers I’ll get, if they decide to flower at all.  I imagine I have at least another year, maybe two, to wait.

So the Internet is Wrong…

I guess you can’t trust all the garden know-how posted on the internet.  Here’s a recap of things I’ve learned:

  • Cyclamens aren’t hard to grow, but they do need the right conditions.  Direct summer sun will nuke their leaves, but they can recover. Bright shade seems to be best.
  • Don’t keep them indoors either.  They aren’t house plants.
  • They are not difficult to grow from seed, but again, you need the right conditions.
  • Temperature isn’t really an issue.
  • Moisture seems to be key.  Covering the seeds with leaf litter really helps with this and keeps the light off (if that even matters).
  • A deep, glazed or plastic pot will prevent the pot from drying out.
  • Keep some water in the saucer to provide constant moisture to the seedlings.
  • You don’t need to grow them in a dark cold closet.  It sounded insane when I read it but enough sites said it, so I thought it might be true.

Dealing with Factors in R

What is the deal with the data type “Factor” in R?  It has a purpose and I know that a number of packages use this format, however, I often find that (1) my data somehow ends up in the format and (2) it’s not what I want.

My goal for this post: to write down what I’ve learned (this time, again!) before I forget and have to learn it all over again next time (just like all the other times).  If you found this, I hope it’s helpful and that you came here before you started tearing your hair out, yelling at the computer, or banging your head on the desk.

So here we go.  Add your ways to deal with factors in the comments and I’ll update the page as needed.

Avoid Creating Factors

Number 1 best way to deal with factors (when you don’t need them) is to not create them in the first place!  When you import a csv or other similar data, use the option stringsAsFactors = FALSE (or similar… read the docs for find the options for the command you’re using) to make sure your string data isn’t converted automatically to a factor.  R will sometimes also convert what seems to clearly be numerical data to a factor as well, so even if you only have numbers, you may still need this option.

MyData<-read.csv(file="SomeData.csv", header=TRUE, stringsAsFactors = FALSE)

Convert Data

Ok, but what if creating a factor is unavoidable?  You can convert it.  It’s not intuitive so I keep forgetting.  Wrap your factor in an as.character() to just get the data.  It’s now in string format, so if you need numbers, wrap all of that in as.numeric().

#Convert from a factor to a list
CharacterData<-as.character(MyFactor)

#Convert from a factor to numerical data
NumericalData<-as.numeric(as.character(MyFactor))

 

What’s Missing?

Do you have any other tricks to working with data that ends up as a Factor?  Let me know in the comments!


Making of a Moon Tree Map

29_CleanUp.png

I’m presenting a workflow for finishing maps in Inkscape at FOSS4G North America this year (2016). To really show the process effectively, I made a map and took screenshots along the way.

The Data

I decided to work with Moon Tree location data.  It’s quirky and interesting… and given that this is a geek conference I figured the space reference would be appreciated.  A few months ago I learned about Moon Trees watching an episode of Huell Howser on KVIE Public Television and then visited the one on the California State Capitol grounds.  I later learned from my aunt that my grandfather was a part of the telemetry crew that retrieved the Apollo 14 mission that carried the seeds that would become the Moon Trees, so there’s something of a connection to this idea.  Followers of my research also know that I’m a plant person, particularly plant geography.  So this seemed like the perfect dataset.  I was fortunate to find that Heather Archuletta had already digitized the locations of public trees and made them available in KML format.

Data Processing

The KML format is great for some applications (particularly Google maps, for which it was designed) but it poses some challenges.  I spent several hours… maybe more than I want to admit… formatting the .dbf to make the shapefile more useful for my purposes.  I created columns and standardized the content.  The map does not present all the data available (um… duh.).  It was challenge enough getting all this onto one page.

Yes, Inkscape is Necessary

You can’t make this map in QGIS completely.  I mean, normally you can make some fantastic maps in QGIS, but this one is actually not possible.  Right now, QGIS can’t handle having map frames with different projections.  I tried, but I found that even when the map composer looked right, the export in all three export options changed the projection and center of each frame to match that of the last active frame.  So I ended up with a layout with three zoom levels centered on Brazil… interesting, but not what I had in mind.  So I exported an .svg file three times from the map composer – one for each map frame – and put them together in Inkscape.

Sneaky Cartography

One of the methods I often use in my maps is to create subtle blurred halos behind text or icons that might otherwise get lost on a busy background.  I don’t like when the viewer sees the halos (maybe it’s from teaching ArcMap far too many years at universities).  It’s not quite a pet peeve, but I think there’s often better ways to handle busy backgrounds and readability.  My blog, my soapbox.  Can you spot them?  There are a couple in the map and in the final slide of the pitch video.  It doesn’t look like much, but I promise the text is easier to read.

The texture on the continents is the moon.  I clipped a photo of the moon using the continent outlines.  I liked the idea of trees on the moon.

Icons

The icons are special to me.  I’ve been really wanting to make a map using images from Phylopic and I thought this was the perfect opportunity… but… but… no one had uploaded outlines for any of the species I needed.  So I made them and uploaded them.  So if you want an .svg of these, help yourself.  If, however, you need dinosaurs, they’ve got you covered.

Watch it happen:

My pitch video captures the process from start to finish:

 Want more open source cartography?

Come to FOSS4G North America and see my and several other talks focused on cartography.  I’ll cover methods and tools in Inkscape common for cartography.


Accidental Ferns

I have this moss terrarium that is not doing that well.  It’s never done well.  I guess moss doesn’t really want to live in a jar.  Probably more than a year ago, I dropped a maidenhair fern frond in there that was loaded with spores.  I thought it might sprout, but after the leaf decayed, nothing happened so I forgot about it… not that I really knew what fern sprouts looked like.  Months ago, I started seeing this strange structure growing out of my dying moss clumps.  It kind of looks like tiny kelp.  I thought maybe it was a liverwort or some moss structure that grows from moss in some last attempt to live.  My “moss kelp” eventually grew some branches, so I thought it was making spores.

IMG_3354

From left to right: some scraggly moss, young maidenhair ferns, and the fern prothallia

Fast forward to today.  My maidenhair fern in my office (a division of the one mentioned earlier) is dropping spores all over the window sill, so I did some internet research on how to grow ferns from spores.  That’s when I discovered I’ve actually already done it.  The “moss kelp” is the prothallium or the gametophyte of the fern (the structure where fertilization happens).  From the prothallium, the fern that we recognize grows.  Now I wonder if I can do it again on purpose.

A note on the photograph: photographing prothallia is really difficult.  I was frustrated at the lack of good photos online, but now I understand, so please excuse my lack of detail in my photo.  I may try to get some better macro photos later.


ArcGIS Tabulate Area Error

Several times I’ve run into an error trying to run the Tabulate Area (Spatial Analyst) tool in ArcGIS.  [I know, I know… I’m more of an open source person, but you gotta use what they’ll let you have at work.]  The error code it gives is “Error 999999 : Error executing function.”  Great.  Cuz that’s helpful.  Here are some things to check.

  1. Did you put the right layers into each input field or should the be switched?  The order matters.  The one with “zone” in the description needs to have the shapefile or raster that defines the zones you want to use.
  2. Does the attribute information you are trying to use have spaces (or special characters like commas)?  Yeah, that doesn’t work.

Now, I don’t promise that one of these is going to solve your problem, but it’s at least something to look into, which Arc doesn’t give you.  I hope someone finds this helpful.  It was intended more as a note for me, since I’ve run into and troubleshot this problem more than once this year not remembering the solution.  That’s what blogs are for!