[TriLUG] copying files

Sean Korb spkorb at gmail.com
Tue Jun 19 23:41:11 EDT 2012


On Tue, Jun 19, 2012 at 10:15 PM, Joseph Mack NA3T <jmack at wm7d.net> wrote:
> On Tue, 19 Jun 2012, Jeff Schornick wrote:
>
>> On Tue, Jun 19, 2012 at 9:39 PM, Joseph Mack NA3T <jmack at wm7d.net> wrote:
>>>
>>> I haven't used rsync. So after the initial phase, both ends know the
>>> files
>>> at each end and when I add a new file at one end, rsync will notice and
>>> just
>>> handle it?
>>
>>
>> Not quite.
>>
>> On each synchronization run, rsync creates a local list from the
>> source directory, while simultaneously creating the analogous list on
>> the remote end.  This means if you have 1000 files, you may be looking
>> at 1000 fstats on each end.  However, these checks are both done
>> locally on the corresponding machines.  As long as the target system's
>> local file I/O isn't significantly slower than the source machine's,
>> you shouldn't be introducing any additional delay.
>>
>> After both lists have been generated, rsync uses a minimal amount of
>> network traffic to compare the lists and generate a final list of
>> which files need to be updated.  As expected, only those files are
>> sent over the network.
>>
>> After the synchronization is complete, the generated lists get tossed
>> out as dirty laundry.  There is no long running daemon which attempts
>> to keep them up-to-date in realtime.  However, I imagine someone has
>> created a slick piece of code using inotify to do just that.
>
>
> OK, so I'd have to invoke rsync every 5 mins. Assembling the list of files
> at each end has to be done anyhow (eg find). Presumably 1000 fstats take the
> same time no matter whether find or rsync then processes the list. The
> problem then is comparing the lists at each end.
>
> cp -auv is really slow
>
> rsync you say is fast (and I believe you).
>
> but I already have my list from `find`, so there's no extra cost if I use
> find.
>
> The copy of the files takes the same time no matter which way I assembled
> the list of files to be copied.
>
> So `find` followed by `cp --parents` or `cpio` seems to be it.
>
> Alan points out the resilience of rsync. This is a good feature, but as it
> turns out (and I didn't say this), I don't mind loosing an occassional file,
> but throughput is high priority. The backup machine is writing files from
> many sources and it only has a few seconds to service a source machine, or
> it will fall over with the load.

Use both?  rsync is pretty darned efficient even used atomically.

find . -mtime -5 -type f -print0 -exec rsync  -at {} /nfsdir/ \; or
some crazy mess with xargs would be the proper way to do it.

I think I have something buried somewhere that kind of does this...
uses rsync to ship all the *differences* between two volumes to a
third volume cutting down on space used for shipping hard drives of
data back and forth using a FedEx truck.  I haven't used it in years
so I'll have to do some digging.

sean

-- 
Sean Korb spkorb at spkorb.org http://www.spkorb.org
'65,'68 Mustangs,'68 Cougar,'78 R100/7,'60 Metro,'59 A35,'71 Pantera #1382
"The more you drive, the less intelligent you get" --Miller
"Computers are useless.  They can only give you answers." -P. Picasso



More information about the TriLUG mailing list