A Tutorial on NFS vs. Native I/O: how to find the best place to do something faster

by Paul Tatarsky

 

Letís address the concept of "native I/O" with a little quick explanation since we've got lots of servers out there. This is oversimplified, but hey:

It would be good for everyone to take a second and understand how to use the command "df". Type it. Read the manpage for it. Introduce yourself. Buy it a soda.

Say I want to gzip a massive file and I want to have it go as fast as possible and not impact other folks quite as much, or worse have Patrick yell at you (shudder). I want to try to copy a file with - if possible - native disk I/O, not network I/O. Itís faster. I can go home at 5:00PM if I use native I/O. I'm watching the sun come up with network I/O.

If you run df on a filename or path, it reports the type of filesystem that provides that file. For example, I have a giant file in my cluster homedir called MONGO_FILE:

hgwdev [~]> df MONGO_FILE

Filesystem†††† 1k-blocks†††† Used Available Use% Mounted on

eieio-10:/export/cluster/home††† 53843705611836224 499249776†† 3% /cluster/home ^^^^^^^^

This tells me the file is being NFS mounted from another machine called eieio-10 which is the gigabit interface of eieio. All our fileservers are gigabit connected. Eieio is our new homedir NFS server. Not kkstore anymore.

So I go over to eieio.

eieio [~]> df MONGO_FILE

Filesystem†††††††††† 1k-blocks††††† Used Available Use% Mounted on

/export/cluster/home 53843705611837032 499248964†† 3% /cluster/home^^^^^^^^^^^^^^^^^^^^

Ah, native disk. (there is NO hostname listed at the front) The homedirs for cluster accounts are directly connected to SCSI controllers on eieio. Nice fast ones.

So if I wanted to gzip this file in that same directory, this machine "eieio" would be the fastest place to do it, because from any other machine (hgwdev, kkstore, the cluster, etc.) it is NFS mounted which means all reads and writes have to first travel the network before reaching the actual platters. Cut out the middleman of the network and you get done faster, and produce less network congestion.

OK. Now letís say I've gzip'd MONGO_FILE and I want to copy it to /cluster/store2.

I'm still on eieio, but I want to copy it to another directory where I store my mongo files for Patrick.

eieio [~]> df /cluster/store2/mymongofiles
Filesystem††††††† 1k-blocksUsed Available Use% Mounted on

kkstore-10:/export/cluster/store2††††† 493915584 464489772†† 4336300 100% /cluster/store2

I see that from eieio, /cluster/store2 is NFS mounted from the 10-net side of kkstore, which is another gigabit card.

Is there any magical way to do all native-I/O here? No. If you go over to see on kkstore, you can see that MONGO_FILE.gz from kkstores point of view is NFS now from eieio.

kkstore [~]> df MONGO_FILE.gz
Filesystem†††††††††† 1k-blocks††††† Used Available Use% Mounted on

eieio-10:/export/cluster/home†††††††††††††††††††† 53843705611845636 499240360†† 3% /cluster/home^^^^^^^^

So either way, your copy will travel over the network. It will be slower. Not that slow, but slower.

But say I decide I want to start putting mongo files in /cluster/store4.

eieio [~]> df /cluster/store4

Filesystem†††††††††† 1k-blocks††††† Used Available Use% Mounted on

/export/cluster/store4 ††††††††††††††††† 53843705683551280 42753471617% /cluster/store4^^^^^^^^^^^^^^^^^^^^

OOOOH. Native disk. I can copy from my homedir native disk to store4 native disk. I'm going home early!

Now, for most files, given our network infrastructure, this is all drop in the bucket level stuff. But if for example you are moving Terabytes around over the network, you might just want to float to genecats that fact and seek cluster-admin advice.

Now, where is perhaps the worst place to do anything involving native I/O?..hgwdev! Why? Hgwdev gets almost all of its data from NFS. Running df without any filename gives the list of currently mounted data:

hgwdev [~]> df
Filesystem†††† 1k-blocks†††† Used Available Use% Mounted on

/dev/sda3†††††† 549276872 224556052 296819128 44% /

/dev/sda1†††††† 202220†† 17390†† 174390 10% /boot

none†††††††††† 1029512†††† 0†† 1029512†† 0% /dev/shm

eieio-10:/export/cluster/home††††††††† 53843705611846184 499239808†† 3% /cluster/home

eieio:/export/projects/compbio††††††††† 538437056 106576880 40450911221% /projects/compbio

services:/export/cse/grads††††††††††††† 4194304032389984†† 947987278% /cse/grads

kks00:/export/projects/hg†††††††††††††† 115630512545164885524026450% /projects/hg

services:/export/cse/others/guests†††††††††††† 10485760†† 4959264†† 536038449% /cse/guests

(and so on)

Hgwdev was designed as a Mysql server, and so the native disks were used for Mysql, making it faster for browser testing.

df /var/lib/mysql/

Filesystem††††††††† 1k-blocks††††† Used Available Use% Mounted on

/dev/sda3††††††††††† 549276872 224556052 29681912844% /^^^^^^^^^

How about this one? I want to ftp 4GB of data from some place off campus. We know it will use the network to get the data, but if I do it from a directory that is natively attached, the download will go much much faster since you don't have to wait for the network write of the downloaded data to the remote disk. Make sense? The bottleneck will still be the speed to the remote site, but why add to it?

So, in short, taking a second to spot native I/O vs. network I/O can save you and your associates time. Always feel free to ask cluster-admin if there is a "better way" to do something. Sometimes you got to use the network though, and thatís perfectly fine. We built a gigabit network just for that purpose. But if you find yourself in a situation where you can take advantage of native disk I/O I highly recommend it.