[TriLUG] copying files

Sean Korb spkorb at gmail.com
Wed Jun 20 10:41:02 EDT 2012


On Tue, Jun 19, 2012 at 11:41 PM, Sean Korb <spkorb at gmail.com> wrote:
> On Tue, Jun 19, 2012 at 10:15 PM, Joseph Mack NA3T <jmack at wm7d.net> wrote:
>> On Tue, 19 Jun 2012, Jeff Schornick wrote:
>>
>>> On Tue, Jun 19, 2012 at 9:39 PM, Joseph Mack NA3T <jmack at wm7d.net> wrote:
>>>>
>>>> I haven't used rsync. So after the initial phase, both ends know the
>>>> files
>>>> at each end and when I add a new file at one end, rsync will notice and
>>>> just
>>>> handle it?
>>>
>>>
>>> Not quite.
>>>
>>> On each synchronization run, rsync creates a local list from the
>>> source directory, while simultaneously creating the analogous list on
>>> the remote end.  This means if you have 1000 files, you may be looking
>>> at 1000 fstats on each end.  However, these checks are both done
>>> locally on the corresponding machines.  As long as the target system's
>>> local file I/O isn't significantly slower than the source machine's,
>>> you shouldn't be introducing any additional delay.
>>>
>>> After both lists have been generated, rsync uses a minimal amount of
>>> network traffic to compare the lists and generate a final list of
>>> which files need to be updated.  As expected, only those files are
>>> sent over the network.
>>>
>>> After the synchronization is complete, the generated lists get tossed
>>> out as dirty laundry.  There is no long running daemon which attempts
>>> to keep them up-to-date in realtime.  However, I imagine someone has
>>> created a slick piece of code using inotify to do just that.
>>
>>
>> OK, so I'd have to invoke rsync every 5 mins. Assembling the list of files
>> at each end has to be done anyhow (eg find). Presumably 1000 fstats take the
>> same time no matter whether find or rsync then processes the list. The
>> problem then is comparing the lists at each end.
>>
>> cp -auv is really slow
>>
>> rsync you say is fast (and I believe you).
>>
>> but I already have my list from `find`, so there's no extra cost if I use
>> find.
>>
>> The copy of the files takes the same time no matter which way I assembled
>> the list of files to be copied.
>>
>> So `find` followed by `cp --parents` or `cpio` seems to be it.
>>
>> Alan points out the resilience of rsync. This is a good feature, but as it
>> turns out (and I didn't say this), I don't mind loosing an occassional file,
>> but throughput is high priority. The backup machine is writing files from
>> many sources and it only has a few seconds to service a source machine, or
>> it will fall over with the load.
>
> Use both?  rsync is pretty darned efficient even used atomically.
>
> find . -mtime -5 -type f -print0 -exec rsync  -at {} /nfsdir/ \; or
> some crazy mess with xargs would be the proper way to do it.
>
> I think I have something buried somewhere that kind of does this...
> uses rsync to ship all the *differences* between two volumes to a
> third volume cutting down on space used for shipping hard drives of
> data back and forth using a FedEx truck.  I haven't used it in years
> so I'll have to do some digging.

Kind of off topic now but I wanted to finish my thoughts on it.  Code
is not all mine, but forgot where I mined the bits from.  If you have
a big pipe, this operation looks grotesque.  With luck, you will never
be obliged to manage data in this way.

Build differences from source volume and replica volume to
transportable volume.  Then integrate changes to destination volume
after transportable volume arrives at replica site.

Collecting the files:

#!/bin/bash
SRCDIR="/home/username/lotsoffiles"
RSYNCDEST="someplace::somebody"
COPYDEST="/media/bigoldencryptedharddrive/"

rsync -rav --dry-run $SRCDIR* $RSYNCDEST|\
sed -e '1d' -e '$d' \
-e '/^$/d' -e '/sent.*\/sec$/d' \
-e '/\/$/d' > filelist
for file in `cat filelist`
do
d="$(dirname $file)"
[ ! -d $COPYDEST$d ] && mkdir -p $COPYDEST$d || :
echo "Copying $file"
/bin/cp --preserve $SRCDIR$file $COPYDEST$d
done


At the other end, depositing the files to make a whole:

#!/bin/bash
UPDATES_SRC="/media/bigoldencryptedharddrive/"
DEST_DIR="Replicaspacewherethediffsarerecontituted"

cd $UPDATES_SRC
rm -f updatedfiles filelist
find . -type f > filelist
for file in `cat filelist`
do
        [ -f $DEST_DIR$file ] && ls -l $DEST_DIR$file >> updatedfiles || :
        d="$(dirname $file)"
        [ ! -d $DEST_DIR$d ] && mkdir -p $DEST_DIR$d || :
        ##[ ! -d $DEST_DIR$d ] && echo "Creating $DEST_DIR$d" || :
        echo "Copying $file ..."
        cp -f $UPDATES_SRC$file $DEST_DIR$file
done

Enjoy!

-- 
Sean Korb spkorb at spkorb.org http://www.spkorb.org
'65,'68 Mustangs,'68 Cougar,'78 R100/7,'60 Metro,'59 A35,'71 Pantera #1382
"The more you drive, the less intelligent you get" --Miller
"Computers are useless.  They can only give you answers." -P. Picasso



More information about the TriLUG mailing list