6  The file system

Important

You are, I’m sorry to inform you, reading the work-in-progress revision of “Learning Statistics with R”. This chapter is currently a mess, and I don’t recommend reading it.

In this section I talk a little about how R interacts with the file system on your computer. It’s not a terribly interesting topic, but it’s useful. As background to this discussion, I’ll talk a bit about how file system locations work in Section @ref(filesystem). Once upon a time everyone who used computers could safely be assumed to understand how the file system worked, because it was impossible to successfully use a computer if you didn’t! However, modern operating systems are much more “user friendly”, and as a consequence of this they go to great lengths to hide the file system from users. So these days it’s not at all uncommon for people to have used computers most of their life and not be familiar with the way that computers organise files. If you already know this stuff, skip straight to Section @ref(navigationR). Otherwise, read on. I’ll try to give a brief introduction that will be useful for those of you who have never been forced to learn how to navigate around a computer using a DOS or UNIX shell.

6.1 The file system itself

In this section I describe the basic idea behind file locations and file paths. Regardless of whether you’re using Window, Mac OS or Linux, every file on the computer is assigned a (fairly) human readable address, and every address has the same basic structure: it describes a path that starts from a root location , through as series of folders (or if you’re an old-school computer user, directories), and finally ends up at the file.

On a Windows computer the root is the physical drive on which the file is stored, and for most home computers the name of the hard drive that stores all your files is C: and therefore most file names on Windows begin with C:. After that comes the folders, and on Windows the folder names are separated by a \ symbol. So, the complete path to this book on my Windows computer might be something like this:

C:\Users\danRbook\LSR.pdf

and what that means is that the book is called LSR.pdf, and it’s in a folder called book which itself is in a folder called dan which itself is … well, you get the idea. On Linux, Unix and Mac OS systems, the addresses look a little different, but they’re more or less identical in spirit. Instead of using the backslash, folders are separated using a forward slash, and unlike Windows, they don’t treat the physical drive as being the root of the file system. So, the path to this book on my Mac might be something like this:

/Users/dan/Rbook/LSR.pdf

So that’s what we mean by the “path” to a file. The next concept to grasp is the idea of a working directory and how to change it. For those of you who have used command line interfaces previously, this should be obvious already. But if not, here’s what I mean. The working directory is just “whatever folder I’m currently looking at”. Suppose that I’m currently looking for files in Explorer (if you’re using Windows) or using Finder (on a Mac). The folder I currently have open is my user directory (i.e., C:\Users\dan or /Users/dan). That’s my current working directory.

The fact that we can imagine that the program is “in” a particular directory means that we can talk about moving from our current location to a new one. What that means is that we might want to specify a new location in relation to our current location. To do so, we need to introduce two new conventions. Regardless of what operating system you’re using, we use . to refer to the current working directory, and .. to refer to the directory above it. This allows us to specify a path to a new location in relation to our current location, as the following examples illustrate. Let’s assume that I’m using my Windows computer, and my working directory is C:\Users\danRbook). The table below shows several addresses in relation to my current one:

Basic arithmetic operations in R. These five operators are used very frequently throughout the text, so it’s important to be familiar with them at the outset.
absolute path (i.e., from root) relative path (i.e. from C:)
C:\Users\dan ..
C:\Users ..\.. \
C:\Users\danRbook\source .\source
C:\Users\dan\nerdstuff ..\nerdstuff

There’s one last thing I want to call attention to: the ~ directory. I normally wouldn’t bother, but R makes reference to this concept sometimes. It’s quite common on computers that have multiple users to define ~ to be the user’s home directory. On my Mac, for instance, the home directory ~ for the “dan” user is \Users\dan\. And so, not surprisingly, it is possible to define other directories in terms of their relationship to the home directory. For example, an alternative way to describe the location of the LSR.pdf file on my Mac would be

~Rbook\LSR.pdf

That’s about all you really need to know about file paths. And since this section already feels too long, it’s time to look at how to navigate the file system in R.

6.3 Why do the Windows paths use the wrong slash?

Let’s suppose I’m on Windows. As before, I can find out what my current working directory is like this:

getwd()
## [1] "C:/Users/dan/

This seems about right, but you might be wondering why R is displaying a Windows path using the wrong type of slash. The answer is slightly complicated, and has to do with the fact that R treats the \ character as “special” (see Section @ref(escapechars)). If you’re deeply wedded to the idea of specifying a path using the Windows style slashes, then what you need to do is to type / whenever you mean \. In other words, if you want to specify the working directory on a Windows computer, you need to use one of the following commands:

setwd( "C:/Users/dan" )
setwd( "C:\\Users\\dan" )

It’s kind of annoying to have to do it this way, but as you’ll see later on in Section @ref(escapechars) it’s a necessary evil. Fortunately, as we’ll see in the next section, RStudio provides a much simpler way of changing directories…

6.5 Summary


  1. Well, the partition, technically.↩︎

  2. One additional thing worth calling your attention to is the file.choose() function. Suppose you want to load a file and you don’t quite remember where it is, but would like to browse for it. Typing file.choose() at the command line will open a window in which you can browse to find the file; when you click on the file you want, R will print out the full path to that file. This is kind of handy.↩︎