Category Archives: Science

Dealing with Factors in R

What is the deal with the data type “Factor” in R?  It has a purpose and I know that a number of packages use this format, however, I often find that (1) my data somehow ends up in the format and (2) it’s not what I want.

My goal for this post: to write down what I’ve learned (this time, again!) before I forget and have to learn it all over again next time (just like all the other times).  If you found this, I hope it’s helpful and that you came here before you started tearing your hair out, yelling at the computer, or banging your head on the desk.

So here we go.  Add your ways to deal with factors in the comments and I’ll update the page as needed.

Avoid Creating Factors

Number 1 best way to deal with factors (when you don’t need them) is to not create them in the first place!  When you import a csv or other similar data, use the option stringsAsFactors = FALSE (or similar… read the docs for find the options for the command you’re using) to make sure your string data isn’t converted automatically to a factor.  R will sometimes also convert what seems to clearly be numerical data to a factor as well, so even if you only have numbers, you may still need this option.

MyData<-read.csv(file="SomeData.csv", header=TRUE, stringsAsFactors = FALSE)

Convert Data

Ok, but what if creating a factor is unavoidable?  You can convert it.  It’s not intuitive so I keep forgetting.  Wrap your factor in an as.character() to just get the data.  It’s now in string format, so if you need numbers, wrap all of that in as.numeric().

#Convert from a factor to a list
CharacterData<-as.character(MyFactor)

#Convert from a factor to numerical data
NumericalData<-as.numeric(as.character(MyFactor))

 

What’s Missing?

Do you have any other tricks to working with data that ends up as a Factor?  Let me know in the comments!


Making of a Moon Tree Map

29_CleanUp.png

I’m presenting a workflow for finishing maps in Inkscape at FOSS4G North America this year (2016). To really show the process effectively, I made a map and took screenshots along the way.

The Data

I decided to work with Moon Tree location data.  It’s quirky and interesting… and given that this is a geek conference I figured the space reference would be appreciated.  A few months ago I learned about Moon Trees watching an episode of Huell Howser on KVIE Public Television and then visited the one on the California State Capitol grounds.  I later learned from my aunt that my grandfather was a part of the telemetry crew that retrieved the Apollo 14 mission that carried the seeds that would become the Moon Trees, so there’s something of a connection to this idea.  Followers of my research also know that I’m a plant person, particularly plant geography.  So this seemed like the perfect dataset.  I was fortunate to find that Heather Archuletta had already digitized the locations of public trees and made them available in KML format.

Data Processing

The KML format is great for some applications (particularly Google maps, for which it was designed) but it poses some challenges.  I spent several hours… maybe more than I want to admit… formatting the .dbf to make the shapefile more useful for my purposes.  I created columns and standardized the content.  The map does not present all the data available (um… duh.).  It was challenge enough getting all this onto one page.

Yes, Inkscape is Necessary

You can’t make this map in QGIS completely.  I mean, normally you can make some fantastic maps in QGIS, but this one is actually not possible.  Right now, QGIS can’t handle having map frames with different projections.  I tried, but I found that even when the map composer looked right, the export in all three export options changed the projection and center of each frame to match that of the last active frame.  So I ended up with a layout with three zoom levels centered on Brazil… interesting, but not what I had in mind.  So I exported an .svg file three times from the map composer – one for each map frame – and put them together in Inkscape.

Sneaky Cartography

One of the methods I often use in my maps is to create subtle blurred halos behind text or icons that might otherwise get lost on a busy background.  I don’t like when the viewer sees the halos (maybe it’s from teaching ArcMap far too many years at universities).  It’s not quite a pet peeve, but I think there’s often better ways to handle busy backgrounds and readability.  My blog, my soapbox.  Can you spot them?  There are a couple in the map and in the final slide of the pitch video.  It doesn’t look like much, but I promise the text is easier to read.

The texture on the continents is the moon.  I clipped a photo of the moon using the continent outlines.  I liked the idea of trees on the moon.

Icons

The icons are special to me.  I’ve been really wanting to make a map using images from Phylopic and I thought this was the perfect opportunity… but… but… no one had uploaded outlines for any of the species I needed.  So I made them and uploaded them.  So if you want an .svg of these, help yourself.  If, however, you need dinosaurs, they’ve got you covered.

Watch it happen:

My pitch video captures the process from start to finish:

 Want more open source cartography?

Come to FOSS4G North America and see my and several other talks focused on cartography.  I’ll cover methods and tools in Inkscape common for cartography.


Accidental Ferns

I have this moss terrarium that is not doing that well.  It’s never done well.  I guess moss doesn’t really want to live in a jar.  Probably more than a year ago, I dropped a maidenhair fern frond in there that was loaded with spores.  I thought it might sprout, but after the leaf decayed, nothing happened so I forgot about it… not that I really knew what fern sprouts looked like.  Months ago, I started seeing this strange structure growing out of my dying moss clumps.  It kind of looks like tiny kelp.  I thought maybe it was a liverwort or some moss structure that grows from moss in some last attempt to live.  My “moss kelp” eventually grew some branches, so I thought it was making spores.

IMG_3354

From left to right: some scraggly moss, young maidenhair ferns, and the fern prothallia

Fast forward to today.  My maidenhair fern in my office (a division of the one mentioned earlier) is dropping spores all over the window sill, so I did some internet research on how to grow ferns from spores.  That’s when I discovered I’ve actually already done it.  The “moss kelp” is the prothallium or the gametophyte of the fern (the structure where fertilization happens).  From the prothallium, the fern that we recognize grows.  Now I wonder if I can do it again on purpose.

A note on the photograph: photographing prothallia is really difficult.  I was frustrated at the lack of good photos online, but now I understand, so please excuse my lack of detail in my photo.  I may try to get some better macro photos later.


ArcGIS Tabulate Area Error

Several times I’ve run into an error trying to run the Tabulate Area (Spatial Analyst) tool in ArcGIS.  [I know, I know… I’m more of an open source person, but you gotta use what they’ll let you have at work.]  The error code it gives is “Error 999999 : Error executing function.”  Great.  Cuz that’s helpful.  Here are some things to check.

  1. Did you put the right layers into each input field or should the be switched?  The order matters.  The one with “zone” in the description needs to have the shapefile or raster that defines the zones you want to use.
  2. Does the attribute information you are trying to use have spaces (or special characters like commas)?  Yeah, that doesn’t work.

Now, I don’t promise that one of these is going to solve your problem, but it’s at least something to look into, which Arc doesn’t give you.  I hope someone finds this helpful.  It was intended more as a note for me, since I’ve run into and troubleshot this problem more than once this year not remembering the solution.  That’s what blogs are for!


Spatially Enabled Zotero Database

As a geographer, I’m a visual person.  I like to see distributions on a map and where things are matters to me.  A few years ago, while I was writing a paper I became overwhelmed with trying to remember the locations for the studies I had read (for coastal plants, latitude matters), so I started marking the locations of studies on a map and eventually turned it into a printed map.

CGS_Smaller

But adding new studies and sharing the results is a cumbersome and the spatial data is largely separate from the citation information.  So I set out to find a way to store spatial information in my citation database and access the spatial information for mapping purposes.  The end result (which is still a work in progress at press time) is a web map of coastal vegetation literature that updates when new citations are added to my Zotero database online.

Thumb_LiteratureMap

How I Did It:

Key ingredients: Zotero, QGIS, Spatialite, Zotero Online Account

I started working with the Zotero database I already have populated with literature relevant to my research on coastal vegetation.  I moved citations that I wanted to map into a separate folder just to make the API queries easier later.  I made a point in a shapefile for the location of each study using QGIS.  I gave the attribute table fields for the in-text citation and a text description of the location for human-readability, but the most important field is the ZoteroKey.  This is the item key that uniquely identifies each record in the Zotero database.  To find the key for each citation, in your local version of Zotero, right click on the record and pick “generate report”.  The text for the key is after the underscore in the URL for the report.  In the online version, click the citation in your list.  The key is at the end of the URL in the page that opens.

QGIS_Screenshot

My map only has point geometries right now, but that will change in the coming weeks.

The spatial information was then to be added to the Zotero database (specific queries can be found on GitHub) in Spatialite.  The Zotero schema is quite large but not impossible to navigate.  Currently, there is no option to add your own fields to Zotero (I tried… I failed… they tell me the option is coming soon) so I put my geometries into the “Extra” field.  Using Spatialite, I opened the Zotero database and imported my shapefile of citation locations (having new tables doesn’t break the database, thank goodness).  Then I removed any existing information in the “Extra” field and filled it in with geometry information in the style of geoJSON.  The string looks like this:

{"type": "Point", "coordinates": [-123.069403678033, 38.3159528822055]}

After updating the citation records to house the geometries, I synced the changes to my online Zotero repository from my desktop program.  Now it’s ready to go into a web map using the Zotero API.  My webmap code can be found in my GitHub Repository.

What’s Next?

I would like to develop a plug-in for QGIS that makes adding the geometries to the Zotero database easier because not everyone wants to run SQL queries on their active citation database that has been years in the making (I backed mine up first!).  The interface would show the citations you want to map, then users would pick a citation, then click the location on their QGIS project where the citations should be located.  The plug-in would insert the corresponding geometry for them.


Getting Started with LaTeX

I’ve been thinking for a while that I would like to learn how to use LaTeX.  Aside from being something that geeky types seem to love, it makes documents that look beautiful.  It actually looks easier than getting Word or LibreOffice to behave in predictable ways beyond simple text.  So why am I finally learning how to use this?  I want to submit an article to a journal and they require all submissions be in LaTeX format.  (As an aside, why did they have to make the capitalization of LaTeX so odd?  It’s hard to type!)  I thought I would post some notes on tools I found useful for learning.

Install

You need both a LaTeX engine and an editor.  I installed MikTex as the engine and TexMaker for the editor.

Tutorials

Michelle Krummel has a multi-part video tutorial on YouTube that moves at a good pace (not too fast or slow).  She teaches all the basics you need to understand how to set up a document and how formatting works.  Even though it’s specifically geared towards mathematics, the concepts all apply to what you would need for other sciences as well.

Cheat Sheets

Winston Chang wrote an excellent cheat sheet to remind you of the basic formatting you’ll need.


Batch Editing Text in Inkscape

(Sarcasm!) Thanks, R & Inkscape!  I totally wanted to mark my outliers with the letter q!

(Sarcasm!) Thanks, R & Inkscape! I totally wanted to mark my outliers with the letter q!

Have you ever opened a PDF of a graph made in R in Inkscape?  For some reason, it appears that my graphs are made with characters from the Dingbats font, which I guess Inkscape doesn’t like, so when I open the file in Inkscape, it changes all of my nice circles to the letter q.  That’s awesome.  How do you fix that?  Inkscape doesn’t let you batch change the text inside a textbox, as far as I can tell.  So, here’s one way to fix it:

  1. Open the PDF file in Inkscape and save it as an SVG file.  Yes, you’ve now got qs instead of os.  It will be ok.
  2. Close the file.
  3. Open the SVG in a text editor (I like Notepad++).
  4. Do a Find & Replace, finding “q” and replacing with “o”.  I recommend reviewing each instance it finds rather than using “replace all”, because one of those qs might be something else.  Here’s the general kind of text you’re looking for: id=”tspan4049″>q</tspan></text> See that q?  Change it to a o.
  5. Save the files and close.
  6. Open it up in Inkscape and see what you’ve got.

If anyone has a more elegant way to do this, let me know!  I’m sure there’s a way to change the R output from the start so you don’t have this problem, but this solution was quicker than messing with R for now.

All fixed!

All fixed!