Tag Archives: Science

Dealing with Factors in R

What is the deal with the data type “Factor” in R?  It has a purpose and I know that a number of packages use this format, however, I often find that (1) my data somehow ends up in the format and (2) it’s not what I want.

My goal for this post: to write down what I’ve learned (this time, again!) before I forget and have to learn it all over again next time (just like all the other times).  If you found this, I hope it’s helpful and that you came here before you started tearing your hair out, yelling at the computer, or banging your head on the desk.

So here we go.  Add your ways to deal with factors in the comments and I’ll update the page as needed.

Avoid Creating Factors

Number 1 best way to deal with factors (when you don’t need them) is to not create them in the first place!  When you import a csv or other similar data, use the option stringsAsFactors = FALSE (or similar… read the docs for find the options for the command you’re using) to make sure your string data isn’t converted automatically to a factor.  R will sometimes also convert what seems to clearly be numerical data to a factor as well, so even if you only have numbers, you may still need this option.

MyData<-read.csv(file="SomeData.csv", header=TRUE, stringsAsFactors = FALSE)

Convert Data

Ok, but what if creating a factor is unavoidable?  You can convert it.  It’s not intuitive so I keep forgetting.  Wrap your factor in an as.character() to just get the data.  It’s now in string format, so if you need numbers, wrap all of that in as.numeric().

#Convert from a factor to a list
CharacterData<-as.character(MyFactor)

#Convert from a factor to numerical data
NumericalData<-as.numeric(as.character(MyFactor))

 

What’s Missing?

Do you have any other tricks to working with data that ends up as a Factor?  Let me know in the comments!


Making of a Moon Tree Map

29_CleanUp.png

I’m presenting a workflow for finishing maps in Inkscape at FOSS4G North America this year (2016). To really show the process effectively, I made a map and took screenshots along the way.

The Data

I decided to work with Moon Tree location data.  It’s quirky and interesting… and given that this is a geek conference I figured the space reference would be appreciated.  A few months ago I learned about Moon Trees watching an episode of Huell Howser on KVIE Public Television and then visited the one on the California State Capitol grounds.  I later learned from my aunt that my grandfather was a part of the telemetry crew that retrieved the Apollo 14 mission that carried the seeds that would become the Moon Trees, so there’s something of a connection to this idea.  Followers of my research also know that I’m a plant person, particularly plant geography.  So this seemed like the perfect dataset.  I was fortunate to find that Heather Archuletta had already digitized the locations of public trees and made them available in KML format.

Data Processing

The KML format is great for some applications (particularly Google maps, for which it was designed) but it poses some challenges.  I spent several hours… maybe more than I want to admit… formatting the .dbf to make the shapefile more useful for my purposes.  I created columns and standardized the content.  The map does not present all the data available (um… duh.).  It was challenge enough getting all this onto one page.

Yes, Inkscape is Necessary

You can’t make this map in QGIS completely.  I mean, normally you can make some fantastic maps in QGIS, but this one is actually not possible.  Right now, QGIS can’t handle having map frames with different projections.  I tried, but I found that even when the map composer looked right, the export in all three export options changed the projection and center of each frame to match that of the last active frame.  So I ended up with a layout with three zoom levels centered on Brazil… interesting, but not what I had in mind.  So I exported an .svg file three times from the map composer – one for each map frame – and put them together in Inkscape.

Sneaky Cartography

One of the methods I often use in my maps is to create subtle blurred halos behind text or icons that might otherwise get lost on a busy background.  I don’t like when the viewer sees the halos (maybe it’s from teaching ArcMap far too many years at universities).  It’s not quite a pet peeve, but I think there’s often better ways to handle busy backgrounds and readability.  My blog, my soapbox.  Can you spot them?  There are a couple in the map and in the final slide of the pitch video.  It doesn’t look like much, but I promise the text is easier to read.

The texture on the continents is the moon.  I clipped a photo of the moon using the continent outlines.  I liked the idea of trees on the moon.

Icons

The icons are special to me.  I’ve been really wanting to make a map using images from Phylopic and I thought this was the perfect opportunity… but… but… no one had uploaded outlines for any of the species I needed.  So I made them and uploaded them.  So if you want an .svg of these, help yourself.  If, however, you need dinosaurs, they’ve got you covered.

Watch it happen:

My pitch video captures the process from start to finish:

 Want more open source cartography?

Come to FOSS4G North America and see my and several other talks focused on cartography.  I’ll cover methods and tools in Inkscape common for cartography.


Accidental Ferns

I have this moss terrarium that is not doing that well.  It’s never done well.  I guess moss doesn’t really want to live in a jar.  Probably more than a year ago, I dropped a maidenhair fern frond in there that was loaded with spores.  I thought it might sprout, but after the leaf decayed, nothing happened so I forgot about it… not that I really knew what fern sprouts looked like.  Months ago, I started seeing this strange structure growing out of my dying moss clumps.  It kind of looks like tiny kelp.  I thought maybe it was a liverwort or some moss structure that grows from moss in some last attempt to live.  My “moss kelp” eventually grew some branches, so I thought it was making spores.

IMG_3354

From left to right: some scraggly moss, young maidenhair ferns, and the fern prothallia

Fast forward to today.  My maidenhair fern in my office (a division of the one mentioned earlier) is dropping spores all over the window sill, so I did some internet research on how to grow ferns from spores.  That’s when I discovered I’ve actually already done it.  The “moss kelp” is the prothallium or the gametophyte of the fern (the structure where fertilization happens).  From the prothallium, the fern that we recognize grows.  Now I wonder if I can do it again on purpose.

A note on the photograph: photographing prothallia is really difficult.  I was frustrated at the lack of good photos online, but now I understand, so please excuse my lack of detail in my photo.  I may try to get some better macro photos later.


Spatially Enabled Zotero Database

As a geographer, I’m a visual person.  I like to see distributions on a map and where things are matters to me.  A few years ago, while I was writing a paper I became overwhelmed with trying to remember the locations for the studies I had read (for coastal plants, latitude matters), so I started marking the locations of studies on a map and eventually turned it into a printed map.

CGS_Smaller

But adding new studies and sharing the results is a cumbersome and the spatial data is largely separate from the citation information.  So I set out to find a way to store spatial information in my citation database and access the spatial information for mapping purposes.  The end result (which is still a work in progress at press time) is a web map of coastal vegetation literature that updates when new citations are added to my Zotero database online.

Thumb_LiteratureMap

How I Did It:

Key ingredients: Zotero, QGIS, Spatialite, Zotero Online Account

I started working with the Zotero database I already have populated with literature relevant to my research on coastal vegetation.  I moved citations that I wanted to map into a separate folder just to make the API queries easier later.  I made a point in a shapefile for the location of each study using QGIS.  I gave the attribute table fields for the in-text citation and a text description of the location for human-readability, but the most important field is the ZoteroKey.  This is the item key that uniquely identifies each record in the Zotero database.  To find the key for each citation, in your local version of Zotero, right click on the record and pick “generate report”.  The text for the key is after the underscore in the URL for the report.  In the online version, click the citation in your list.  The key is at the end of the URL in the page that opens.

QGIS_Screenshot

My map only has point geometries right now, but that will change in the coming weeks.

The spatial information was then to be added to the Zotero database (specific queries can be found on GitHub) in Spatialite.  The Zotero schema is quite large but not impossible to navigate.  Currently, there is no option to add your own fields to Zotero (I tried… I failed… they tell me the option is coming soon) so I put my geometries into the “Extra” field.  Using Spatialite, I opened the Zotero database and imported my shapefile of citation locations (having new tables doesn’t break the database, thank goodness).  Then I removed any existing information in the “Extra” field and filled it in with geometry information in the style of geoJSON.  The string looks like this:

{"type": "Point", "coordinates": [-123.069403678033, 38.3159528822055]}

After updating the citation records to house the geometries, I synced the changes to my online Zotero repository from my desktop program.  Now it’s ready to go into a web map using the Zotero API.  My webmap code can be found in my GitHub Repository.

What’s Next?

I would like to develop a plug-in for QGIS that makes adding the geometries to the Zotero database easier because not everyone wants to run SQL queries on their active citation database that has been years in the making (I backed mine up first!).  The interface would show the citations you want to map, then users would pick a citation, then click the location on their QGIS project where the citations should be located.  The plug-in would insert the corresponding geometry for them.


Batch Editing Text in Inkscape

(Sarcasm!) Thanks, R & Inkscape!  I totally wanted to mark my outliers with the letter q!

(Sarcasm!) Thanks, R & Inkscape! I totally wanted to mark my outliers with the letter q!

Have you ever opened a PDF of a graph made in R in Inkscape?  For some reason, it appears that my graphs are made with characters from the Dingbats font, which I guess Inkscape doesn’t like, so when I open the file in Inkscape, it changes all of my nice circles to the letter q.  That’s awesome.  How do you fix that?  Inkscape doesn’t let you batch change the text inside a textbox, as far as I can tell.  So, here’s one way to fix it:

  1. Open the PDF file in Inkscape and save it as an SVG file.  Yes, you’ve now got qs instead of os.  It will be ok.
  2. Close the file.
  3. Open the SVG in a text editor (I like Notepad++).
  4. Do a Find & Replace, finding “q” and replacing with “o”.  I recommend reviewing each instance it finds rather than using “replace all”, because one of those qs might be something else.  Here’s the general kind of text you’re looking for: id=”tspan4049″>q</tspan></text> See that q?  Change it to a o.
  5. Save the files and close.
  6. Open it up in Inkscape and see what you’ve got.

If anyone has a more elegant way to do this, let me know!  I’m sure there’s a way to change the R output from the start so you don’t have this problem, but this solution was quicker than messing with R for now.

All fixed!

All fixed!


Marxan Table Relationships

MarxanTables

Marxan is confusing.  There’s lots of pages of documentation and tutorials, but as a visual learner, all that text makes my head spin.  I find myself drawing pictures once I understand what’s going on so I can refer to them later.  The diagram above is a cleaned up, attractive, and cheerful rendition of the diagram I drew for myself on the whiteboard in my office (thank you, Inkscape, for having way more colors than I have for whiteboard markers).  The idea is taken from database diagrams.  The bold text is the table name and I put the recommended file name under it for easy reference.  The list beneath this is the column names for each file.  The lines connect columns with data that match (primary keys and whatnot).  So there you have it.  Maybe later I’ll post a new version with more notes about each file.  Let me know if that would be helpful.


Call for Guest Bloggers

Home-made kite tail for kite aerial photography

Home-made kite tail for kite aerial photography

One of the fascinating things I find about doing biological and geographical research is the tools that scientists make for themselves to help them in their research.  For example, I designed and my dad constructed a set of quadrat frames for me.  I’ve also constructed two low altitude remote sensing platforms and the accessories that go with them like a fuzzy tail and camera housing for my air photo kite.

Now it’s your turn!  What tools have you made or re-purposed to help you collect or process data?  Do you have a creative use for straws?  Have you constructed a tool you couldn’t buy anywhere?  I’m looking for researchers of all kinds who have made their own tools to write a blog post about their creation.  The text can be as short as a paragraph or as long as a page and should describe what the tool is and how you use it.  It can also include instructions for how to make it, if you would like.  All submissions should include at least one photograph of the tool.  Submissions should be emailed to micheletobias [at] yahoo [dot] com.