[TriLUG] SAN file locking

bak bak at picklefactory.org
Mon Dec 19 22:17:26 EST 2011


On 12/19/11 8:53 PM, Matt Pusateri wrote:
> Oracle still sells the Sunstore devices.  One nice thing about them, is 
> their built on ZFS, so you get replication and dedupe for free, as opposed 
> to giving EMC/NetApp licensing fees for the same features.  

AFAIK NetApp has never charged for dedupe and has no plans to start.

> This is my original problem/question: I saw it from the viewpoint of 
> filelocking, but the bigger picture is who owns the blocks? eg where 
> it the partitioning and FS. 

"Who owns the blocks?" is a complicated question. But "where is the
partitioning?" is easier -- the server is responsible for the partition
table, just as if the disk the SAN presented was physically attached.
The server owning the partition table, formatting blocks presented by
the SAN, is just the way it's developed from days of yore.

It's a "here's a simple, underlying technology that we all understand
really well, and there's no good reason to change it, so we're sticking
with it" kind of situation. Like how video drivers still pretend there's
a beam of light shooting from a gun in the back of a CRT and sweeping
across a screen, left-to-right and top-to-bottom, hosts still pretend
that SANs give them a bunch of contiguous blocks on physical disks
instead of a complicated metadisk abstraction of
maybe-allocated-or-deduplicated blocks which is too complicated to
explain just now. :)

So as far as I know, nobody has taken the step of putting metadata on
another disk, because it would require rethinking the way the underlying
stuff works and it's not entirely clear what problem would be solved.

What might be more likely is that a SAN will say "this set of blocks is
getting asked for an awful lot, I'm going to keep it in cache / on an
SSD until further notice."

>> Some operating systems are OK with having a read-only filesystem attached. 
>> But solutions like this
> 
> you mean (ro) solutions?

Yup. As opposed to clustered filesystems I suppose :)

>> for the SAN space are not there, because the problem to be solved would 
>> have to be
>>
>> -- Useful even with a read-only filesystem
>> -- Requiring the sort of low-latency performance SAN provides
>> -- Not more cheaply and easily deployed with a r/o NFS export 
>
> I'm sorry. I don't know what you're trying to say here. I don't even get 
> enough to ask you a question about it. Can you try again.

Sure. Let me go back a step. Nobody buys SAN equipment as general
storage. It's just too expensive to be a hammer for every nail. But it
works well for the things mentioned earlier -- VMWare, OLTP, etc.

So a SAN presenting blocks read-only to a bunch of hosts in order to
solve a problem like 'let me share homedirs or /usr for a big stack of
servers' would only be likely if the homedirs needed to be extremely
fast and low latency, so fast that just deploying a (much cheaper) NAS
exporting via NFS wouldn't do the job, and if you had a situation where
not being able to write to homedirs or /usr was acceptable to the
server's OS.

The SAN way to solve the 'only pay once for /usr, get it for all the
servers' would be to have the servers as VMWare guests, and have all of
their root disks on the same storage container on the SAN, then use
dedupe to squeeze the data down to just one on-disk instance of those
blocks in /usr.

--bak



More information about the TriLUG mailing list