Tag Archives: r statistics

Dealing with Factors in R

What is the deal with the data type “Factor” in R?  It has a purpose and I know that a number of packages use this format, however, I often find that (1) my data somehow ends up in the format and (2) it’s not what I want.

My goal for this post: to write down what I’ve learned (this time, again!) before I forget and have to learn it all over again next time (just like all the other times).  If you found this, I hope it’s helpful and that you came here before you started tearing your hair out, yelling at the computer, or banging your head on the desk.

So here we go.  Add your ways to deal with factors in the comments and I’ll update the page as needed.

Avoid Creating Factors

Number 1 best way to deal with factors (when you don’t need them) is to not create them in the first place!  When you import a csv or other similar data, use the option stringsAsFactors = FALSE (or similar… read the docs for find the options for the command you’re using) to make sure your string data isn’t converted automatically to a factor.  R will sometimes also convert what seems to clearly be numerical data to a factor as well, so even if you only have numbers, you may still need this option.

MyData<-read.csv(file="SomeData.csv", header=TRUE, stringsAsFactors = FALSE)

Convert Data

Ok, but what if creating a factor is unavoidable?  You can convert it.  It’s not intuitive so I keep forgetting.  Wrap your factor in an as.character() to just get the data.  It’s now in string format, so if you need numbers, wrap all of that in as.numeric().

#Convert from a factor to a list
CharacterData<-as.character(MyFactor)

#Convert from a factor to numerical data
NumericalData<-as.numeric(as.character(MyFactor))

 

What’s Missing?

Do you have any other tricks to working with data that ends up as a Factor?  Let me know in the comments!