Category Archives: statistics

Dealing with Factors in R

What is the deal with the data type “Factor” in R?  It has a purpose and I know that a number of packages use this format, however, I often find that (1) my data somehow ends up in the format and (2) it’s not what I want.

My goal for this post: to write down what I’ve learned (this time, again!) before I forget and have to learn it all over again next time (just like all the other times).  If you found this, I hope it’s helpful and that you came here before you started tearing your hair out, yelling at the computer, or banging your head on the desk.

So here we go.  Add your ways to deal with factors in the comments and I’ll update the page as needed.

Avoid Creating Factors

Number 1 best way to deal with factors (when you don’t need them) is to not create them in the first place!  When you import a csv or other similar data, use the option stringsAsFactors = FALSE (or similar… read the docs for find the options for the command you’re using) to make sure your string data isn’t converted automatically to a factor.  R will sometimes also convert what seems to clearly be numerical data to a factor as well, so even if you only have numbers, you may still need this option.

MyData<-read.csv(file="SomeData.csv", header=TRUE, stringsAsFactors = FALSE)

Convert Data

Ok, but what if creating a factor is unavoidable?  You can convert it.  It’s not intuitive so I keep forgetting.  Wrap your factor in an as.character() to just get the data.  It’s now in string format, so if you need numbers, wrap all of that in as.numeric().

#Convert from a factor to a list

#Convert from a factor to numerical data


What’s Missing?

Do you have any other tricks to working with data that ends up as a Factor?  Let me know in the comments!


Improving R Figures with Inkscape

Here's an example of a figure made of three separate graphs, combined and cleaned up in Inkscape.

If you’re going to publish a graph or a model, you want something that looks clean and professional.  But R output, like graphs or a various plots, often don’t come out the way you want.  Even if you’re a wiz at manipulating the plot command, you can probably still see room for improvement.  How many times have you spent 30 minutes or more trying to figure out what parameter moves you label over 3 pixels, all the while dreaming that you could just reach into the computer and move the little bugger yourself?  I have the solution!  Here’s how you do it:

  1. Write your script in R to generate whatever plot or image you need.  Make sure all the labels are in there that you need R to generate.  Today I’m making ordination plots with the vegan package.

    Here's an example of a typical output from R. It's generally ok, but wouldn't you want to move some stuff around?

  2. Save the plot as a PDF file.  To save it, activate the plot window -> File -> Save As -> PDF  (I mean it.  Not an image file, but a PDF.  And I know it looks ugly, but we’ll fix it later.)  NOTE: Make sure you don’t  resize the image window before you save the file.  In the past I’ve had some funny things happen (like letters rotating 180°) if I tried to resize it before I saved it.  We’ll do any resizing later.
  3. Open up your favorite vector illustration software.  I like Inkscape because it works and it’s free/open source.  I highly recommend it.
  4. In Inkscape, File -> Open -> Find your PDF -> click OK -> pick the settings you like in the dialog that pops up -> click OK.
  5. Save it as an SVG file.  You’ll be saving now in Inkscape’s native format and have the original PDF in case you need to go back to the start.
  6. Now is the fun part… you get to make the image look better.  There are some tricks…
    1. If you can’t select something, like a text box, it’s probably a part of a group.  The bottom of the screen will tell you if you’ve selected a group.  To fix this, just click the Ungroup button.  For example, everything that falls inside the plot area gets grouped – so all the points, lines, and even a white box that you probably don’t even notice are in a group.
    2. Layers are your friend.  Put each type of item in it’s own layer.  For example, I like to put my axes in one layer, my graphed points in another, all the labels in another, etc.
    3. Ctrl + Alt + V (paste in place) is your other friend.  Use it to remove items from one layer into another without messing up the alignment.
    4. Did the import turn all of your points into something strange like q’ s?  Mine did.  Use the “Replace Text” tool in the Extension menu under Text.  If you change all you “q” points into “o”, you can then convert the o’s to paths and fill them with black so they look like points.
  7. Finally, export your image or save it as a new PDF.

Open Source Tools for Art + Science

I was thinking about how many software tools I use that are somewhat off the beaten trail and thought I would make a list of the tools I use often.  I thought it might be a good reference so in case I forget to link something later, the information is here.

I like using open source tools.  Not only are they free, but I find in many cases that the development and bug fixes go much more quickly in open source projects than in their proprietary cousins.  I’m not a programmer (well, I program in R, but that’s not the same kind of programming as big applications), so I appreciate that others are and that they put their time into these tools and that they do it free of cost.  The following tools come highly recommended by me for use in graphic art, photography, geospatial science, and their intersection – cartography.

  • Quantum GIS – a powerful geographic information system program for spatial analysis and data visualization
  • Inkscape – a vector illustration software similar to Adobe Illustrator (so I’ve heard… never worked with AI myself).
  • Gimp – a raster editor similar to Adobe Photoshop (again, I haven’t worked with Photoshop)
  • XnView – a photo organizer and editor, good for quick fixes and batch processing
  • Hugin – a photo panorama stitching software that I use to stitch my air photos into one scene; works much better on standard panoramas than what I try to get it to do
  • R – Stats software; you can get graphical interfaces for it, but writing scripts isn’t too complicated (and this is coming from a person with a loathing for command line).