Category Archives: Research

Dealing with Factors in R

What is the deal with the data type “Factor” in R?  It has a purpose and I know that a number of packages use this format, however, I often find that (1) my data somehow ends up in the format and (2) it’s not what I want.

My goal for this post: to write down what I’ve learned (this time, again!) before I forget and have to learn it all over again next time (just like all the other times).  If you found this, I hope it’s helpful and that you came here before you started tearing your hair out, yelling at the computer, or banging your head on the desk.

So here we go.  Add your ways to deal with factors in the comments and I’ll update the page as needed.

Avoid Creating Factors

Number 1 best way to deal with factors (when you don’t need them) is to not create them in the first place!  When you import a csv or other similar data, use the option stringsAsFactors = FALSE (or similar… read the docs for find the options for the command you’re using) to make sure your string data isn’t converted automatically to a factor.  R will sometimes also convert what seems to clearly be numerical data to a factor as well, so even if you only have numbers, you may still need this option.

MyData<-read.csv(file="SomeData.csv", header=TRUE, stringsAsFactors = FALSE)

Convert Data

Ok, but what if creating a factor is unavoidable?  You can convert it.  It’s not intuitive so I keep forgetting.  Wrap your factor in an as.character() to just get the data.  It’s now in string format, so if you need numbers, wrap all of that in as.numeric().

#Convert from a factor to a list
CharacterData<-as.character(MyFactor)

#Convert from a factor to numerical data
NumericalData<-as.numeric(as.character(MyFactor))

 

What’s Missing?

Do you have any other tricks to working with data that ends up as a Factor?  Let me know in the comments!


Spatially Enabled Zotero Database

As a geographer, I’m a visual person.  I like to see distributions on a map and where things are matters to me.  A few years ago, while I was writing a paper I became overwhelmed with trying to remember the locations for the studies I had read (for coastal plants, latitude matters), so I started marking the locations of studies on a map and eventually turned it into a printed map.

CGS_Smaller

But adding new studies and sharing the results is a cumbersome and the spatial data is largely separate from the citation information.  So I set out to find a way to store spatial information in my citation database and access the spatial information for mapping purposes.  The end result (which is still a work in progress at press time) is a web map of coastal vegetation literature that updates when new citations are added to my Zotero database online.

Thumb_LiteratureMap

How I Did It:

Key ingredients: Zotero, QGIS, Spatialite, Zotero Online Account

I started working with the Zotero database I already have populated with literature relevant to my research on coastal vegetation.  I moved citations that I wanted to map into a separate folder just to make the API queries easier later.  I made a point in a shapefile for the location of each study using QGIS.  I gave the attribute table fields for the in-text citation and a text description of the location for human-readability, but the most important field is the ZoteroKey.  This is the item key that uniquely identifies each record in the Zotero database.  To find the key for each citation, in your local version of Zotero, right click on the record and pick “generate report”.  The text for the key is after the underscore in the URL for the report.  In the online version, click the citation in your list.  The key is at the end of the URL in the page that opens.

QGIS_Screenshot

My map only has point geometries right now, but that will change in the coming weeks.

The spatial information was then to be added to the Zotero database (specific queries can be found on GitHub) in Spatialite.  The Zotero schema is quite large but not impossible to navigate.  Currently, there is no option to add your own fields to Zotero (I tried… I failed… they tell me the option is coming soon) so I put my geometries into the “Extra” field.  Using Spatialite, I opened the Zotero database and imported my shapefile of citation locations (having new tables doesn’t break the database, thank goodness).  Then I removed any existing information in the “Extra” field and filled it in with geometry information in the style of geoJSON.  The string looks like this:

{"type": "Point", "coordinates": [-123.069403678033, 38.3159528822055]}

After updating the citation records to house the geometries, I synced the changes to my online Zotero repository from my desktop program.  Now it’s ready to go into a web map using the Zotero API.  My webmap code can be found in my GitHub Repository.

What’s Next?

I would like to develop a plug-in for QGIS that makes adding the geometries to the Zotero database easier because not everyone wants to run SQL queries on their active citation database that has been years in the making (I backed mine up first!).  The interface would show the citations you want to map, then users would pick a citation, then click the location on their QGIS project where the citations should be located.  The plug-in would insert the corresponding geometry for them.


Getting Started with LaTeX

I’ve been thinking for a while that I would like to learn how to use LaTeX.  Aside from being something that geeky types seem to love, it makes documents that look beautiful.  It actually looks easier than getting Word or LibreOffice to behave in predictable ways beyond simple text.  So why am I finally learning how to use this?  I want to submit an article to a journal and they require all submissions be in LaTeX format.  (As an aside, why did they have to make the capitalization of LaTeX so odd?  It’s hard to type!)  I thought I would post some notes on tools I found useful for learning.

Install

You need both a LaTeX engine and an editor.  I installed MikTex as the engine and TexMaker for the editor.

Tutorials

Michelle Krummel has a multi-part video tutorial on YouTube that moves at a good pace (not too fast or slow).  She teaches all the basics you need to understand how to set up a document and how formatting works.  Even though it’s specifically geared towards mathematics, the concepts all apply to what you would need for other sciences as well.

Cheat Sheets

Winston Chang wrote an excellent cheat sheet to remind you of the basic formatting you’ll need.


When ImpactStory Won’t Update

I just started using ImpactStory to track my impact in the world of Academia.  The site is beautifully simple.  And when it works, it’s fantastic.  Occasionally though (I’ve been doing this less than 24 hours, but still) something goes wrong and finding help is almost impossible, given the simple nature of the site (just add a help page with commonly asked help questions, please!).

For example, what do you do when you upload something to a linked account (FigShare, SlideShare, etc.) but ImpactStory doesn’t update?  Now is when you make use of that “Import individual products” link.  Pay careful attention.  On the right side, the graphic tells you what you can put in the box and it’s not the same for all the sites.

Some platforms work with URLs (the link you use to get to the page) for the item.  That’s easy.

But some – CrossRef, Dryad, FigShare – work with the DOI assigned to the item.  PubMed uses an article ID called a PMID.  For the platforms that use DOIs or PMIDs, leave out the URL and just input the DOI or PMID.  They can be found on the page that contains details for the item you want to add to ImpactStory.  If you try to give it the whole URL, it doesn’t work the way you want it to… trust me.


Batch Editing Text in Inkscape

(Sarcasm!) Thanks, R & Inkscape!  I totally wanted to mark my outliers with the letter q!

(Sarcasm!) Thanks, R & Inkscape! I totally wanted to mark my outliers with the letter q!

Have you ever opened a PDF of a graph made in R in Inkscape?  For some reason, it appears that my graphs are made with characters from the Dingbats font, which I guess Inkscape doesn’t like, so when I open the file in Inkscape, it changes all of my nice circles to the letter q.  That’s awesome.  How do you fix that?  Inkscape doesn’t let you batch change the text inside a textbox, as far as I can tell.  So, here’s one way to fix it:

  1. Open the PDF file in Inkscape and save it as an SVG file.  Yes, you’ve now got qs instead of os.  It will be ok.
  2. Close the file.
  3. Open the SVG in a text editor (I like Notepad++).
  4. Do a Find & Replace, finding “q” and replacing with “o”.  I recommend reviewing each instance it finds rather than using “replace all”, because one of those qs might be something else.  Here’s the general kind of text you’re looking for: id=”tspan4049″>q</tspan></text> See that q?  Change it to a o.
  5. Save the files and close.
  6. Open it up in Inkscape and see what you’ve got.

If anyone has a more elegant way to do this, let me know!  I’m sure there’s a way to change the R output from the start so you don’t have this problem, but this solution was quicker than messing with R for now.

All fixed!

All fixed!


See Spot

Spotting. It’s that seemingly magical movement of the dancer’s head that is supposed to keep the dancer from getting dizzy and stabilize a turn. It also helps a dancer know how many revolutions they have finished.

How does it work? The dancer focuses their eyes on one place in the room or theater, keeping their head still as long as possible, even though their body is rotating in the turn. When the dancer can’t keep their head still any longer, they turn their head in the same direction as their body is turning but faster, and focus their eye on that same place in the room again. The visual result is only a brief moment of spinning, rather than the sensation of spinning for the whole turn.

But is it just an illusion? Sometimes when I practice turning over and over, I start to wonder if the act of spotting is just something we do to trick ourselves. Maybe it’s just a myth.

To test if spotting is an observable phenomenon, I strapped a GoPro point of view camera to my forehead and did some turns. The footage was rather informative. First, you CAN see the effect of spotting on the footage. I suspected that you might be able to see it, but I was surprised how clear the spot was. Second, the video lets you see just how quick a turn is. A double pirouette takes about 2 seconds! In the moment, a turn feels much longer since you’re constantly making adjustments.

I used the footage to make the video you’ll find embedded here.

My conclusion is that spotting is a real, observable phenomenon. Thank goodness!


Call for Guest Bloggers

Home-made kite tail for kite aerial photography

Home-made kite tail for kite aerial photography

One of the fascinating things I find about doing biological and geographical research is the tools that scientists make for themselves to help them in their research.  For example, I designed and my dad constructed a set of quadrat frames for me.  I’ve also constructed two low altitude remote sensing platforms and the accessories that go with them like a fuzzy tail and camera housing for my air photo kite.

Now it’s your turn!  What tools have you made or re-purposed to help you collect or process data?  Do you have a creative use for straws?  Have you constructed a tool you couldn’t buy anywhere?  I’m looking for researchers of all kinds who have made their own tools to write a blog post about their creation.  The text can be as short as a paragraph or as long as a page and should describe what the tool is and how you use it.  It can also include instructions for how to make it, if you would like.  All submissions should include at least one photograph of the tool.  Submissions should be emailed to micheletobias [at] yahoo [dot] com.