A Tutorial on NFS vs. Native I/O: how to find the best place to do something faster

by Paul Tatarsky

 

Let’s address the concept of "native I/O" with a little quick explanation since we've got lots of servers out there. This is oversimplified, but hey:

It would be good for everyone to take a second and understand how to use the command "df". Type it. Read the manpage for it. Introduce yourself. Buy it a soda.

Say I want to gzip a massive file and I want to have it go as fast as possible and not impact other folks quite as much, or worse have Patrick yell at you (shudder). I want to try to copy a file with - if possible - native disk I/O, not network I/O. It’s faster. I can go home at 5:00PM if I use native I/O. I'm watching the sun come up with network I/O.

If you run df on a filename or path, it reports the type of filesystem that provides that file. For example, I have a giant file in my cluster homedir called MONGO_FILE:

hgwdev [~]> df MONGO_FILE

Filesystem     1k-blocks     Used Available Use% Mounted on

eieio-10:/export/cluster/home    538437056  11836224 499249776   3% /cluster/home ^^^^^^^^

This tells me the file is being NFS mounted from another machine called eieio-10 which is the gigabit interface of eieio. All our fileservers are gigabit connected. Eieio is our new homedir NFS server. Not kkstore anymore.

So I go over to eieio.

eieio [~]> df MONGO_FILE

Filesystem           1k-blocks      Used Available Use% Mounted on

/export/cluster/home 538437056  11837032 499248964   3% /cluster/home^^^^^^^^^^^^^^^^^^^^

Ah, native disk. (there is NO hostname listed at the front) The homedirs for cluster accounts are directly connected to SCSI controllers on eieio. Nice fast ones.

So if I wanted to gzip this file in that same directory, this machine "eieio" would be the fastest place to do it, because from any other machine (hgwdev, kkstore, the cluster, etc.) it is NFS mounted which means all reads and writes have to first travel the network before reaching the actual platters. Cut out the middleman of the network and you get done faster, and produce less network congestion.

OK. Now let’s say I've gzip'd MONGO_FILE and I want to copy it to /cluster/store2.

I'm still on eieio, but I want to copy it to another directory where I store my mongo files for Patrick.

eieio [~]> df /cluster/store2/mymongofiles
Filesystem        1k-blocks  Used Available Use% Mounted on

kkstore-10:/export/cluster/store2      493915584 464489772   4336300 100% /cluster/store2

I see that from eieio, /cluster/store2 is NFS mounted from the 10-net side of kkstore, which is another gigabit card.

Is there any magical way to do all native-I/O here? No. If you go over to see on kkstore, you can see that MONGO_FILE.gz from kkstores point of view is NFS now from eieio.

kkstore [~]> df MONGO_FILE.gz
Filesystem           1k-blocks      Used Available Use% Mounted on

eieio-10:/export/cluster/home                     538437056  11845636 499240360   3% /cluster/home^^^^^^^^

So either way, your copy will travel over the network. It will be slower. Not that slow, but slower.

But say I decide I want to start putting mongo files in /cluster/store4.

eieio [~]> df /cluster/store4

Filesystem           1k-blocks      Used Available Use% Mounted on

/export/cluster/store4                   538437056  83551280 427534716  17% /cluster/store4^^^^^^^^^^^^^^^^^^^^

OOOOH. Native disk. I can copy from my homedir native disk to store4 native disk. I'm going home early!

Now, for most files, given our network infrastructure, this is all drop in the bucket level stuff. But if for example you are moving Terabytes around over the network, you might just want to float to genecats that fact and seek cluster-admin advice.

Now, where is perhaps the worst place to do anything involving native I/O?..hgwdev! Why? Hgwdev gets almost all of its data from NFS. Running df without any filename gives the list of currently mounted data:

hgwdev [~]> df
Filesystem     1k-blocks     Used Available Use% Mounted on

/dev/sda3       549276872 224556052 296819128 44% /

/dev/sda1       202220   17390   174390 10% /boot

none           1029512     0   1029512   0% /dev/shm

eieio-10:/export/cluster/home          538437056  11846184 499239808   3% /cluster/home

eieio:/export/projects/compbio          538437056 106576880 404509112  21% /projects/compbio

services:/export/cse/grads              41943040  32389984   9479872  78% /cse/grads

kks00:/export/projects/hg               115630512  54516488  55240264  50% /projects/hg

services:/export/cse/others/guests             10485760   4959264   5360384  49% /cse/guests

(and so on)

Hgwdev was designed as a Mysql server, and so the native disks were used for Mysql, making it faster for browser testing.

df /var/lib/mysql/

Filesystem          1k-blocks      Used Available Use% Mounted on

/dev/sda3            549276872 224556052 296819128  44% /^^^^^^^^^

How about this one? I want to ftp 4GB of data from some place off campus. We know it will use the network to get the data, but if I do it from a directory that is natively attached, the download will go much much faster since you don't have to wait for the network write of the downloaded data to the remote disk. Make sense? The bottleneck will still be the speed to the remote site, but why add to it?

So, in short, taking a second to spot native I/O vs. network I/O can save you and your associates time. Always feel free to ask cluster-admin if there is a "better way" to do something. Sometimes you got to use the network though, and that’s perfectly fine. We built a gigabit network just for that purpose. But if you find yourself in a situation where you can take advantage of native disk I/O I highly recommend it.