[TriLUG] SUMMARY: KZPCC and Hardware Component Management (fwd)

Daniel Monjar Daniel.Monjar at na.biomerieux.com
Wed Aug 7 18:45:21 EDT 2002


I know you are playing with a DEC/Compaq Mylex RAID controller.  I've used 
a bunch of those things in alpha 2100's and 4100' under NT, Unix and VMS. 
this summary message from a Tru64 list might be of some benefit to you.  It 
speaks to a different rev of the board but parts are still applicable.


---------- Forwarded Message ----------
Date: Wednesday, August 07, 2002 3:19 PM -0400
From: Richard Jackson <rjackson at portal.gmu.edu>
To: tru64-unix-managers at ornl.gov
Subject: SUMMARY: KZPCC and Hardware Component Management

Hello,

I received some feedback from friendly tru64-unix-managers list readers and
HP/Compaq/DEC Customer Support Center staff.  Thank you.

Selden E Ball Jr <SEB at LNS62.LNS.CORNELL.EDU>
Raul Sossa S. <RSossa at datadec.co.cr>
HP CSC Staff


I have a few questions related to the KZPCC-CE (3-port PCI RAID)
controller and the 'new' Tru64 UNIX 5.x hardware component management.
The KZPCC-CE is installed in a ES45 running Tru64 UNIX 5.1A patch kit #2.

---------------------------------------------------------------------------
---- QUESTION:
1. Is it possible to print the KZPCC-CE configuration (e.g., for disaster
recovery)?  The KZPAC-CB/KZESC-BA/KZPSC-BA RAID Configuration Utility (RCU)
allowed me to 'print' the configuration to a text file on a floppy to be
printed later.  The Compaq StorageWorks KZPCC-CE and KZPCC-AC User Guide,
dated Aug 2001, on page 3-1 states the use of SMOR is restricted to
Compaq AlphaServers with graphics console setting and it is not supported
under a serial console.

ANSWER:
Try using a laptop computer attached to the ES45 serial port while
using KEATERM.  Another suggestion is to install the SWCC KZPCC Agent
for Tru64 UNIX on the ES45 and then install the corresponding client in
a Microsoft Windows NT 4.0 or Windows 2000 machine. You'll be able to
print this info and manage the controller.  Use the SWCC software at
http://www.compaq.com/alphaserver/products/storage/kzpcc.html or on the
CD included with the controller kit, Compaq Ultra2 Backplane RAID Controller
DS-KZPCC.

I have not tried either, yet.

---------------------------------------------------------------------------
---- QUESTION:
2. Is it possible to save the KZPCC-CE configuration (e.g., for disaster
recovery)?  The KZPAC-CB/KZESC-BA/KZPSC-BA RAID Configuration Utility (RCU)
allowed me to save the configuration to a floppy.

ANSWER:
I am told it is not possible.

---------------------------------------------------------------------------
---- QUESTION:
3. If the KZPCC-CE fails and must be replaced or the configuration is lost,
how do I quickly restore the configuration?  If the configuration must be
re-applied via SMOR, doesn't the SMOR 'Set System Config' initialize the
devices (i.e., the user data is lost)?

ANSWER:
Unfortunately, the replacement KZPCC-CE forces an initialize and the user
data must be restored from backup.  Ouch!

---------------------------------------------------------------------------
---- QUESTION:
4. How in the world is the KZPCC logical device defined in the SRM
console?  That is, SMOR may report the RAID devices as ID 0, 1, and 2.
However, SRM may report dza526.0.0.2004.1, dza528.0.0.2004.1, and
dza532.0.0.2004.1.  I understand if I have DZXabc, X is the controller
ID.  How is the abc defined.  For example, I had a RAID 1 (ID 0), JBOD
(ID 1), and RAID 0+1 (ID 2) devices.  I purchased another disk drive
and converted the JBOD into RAID 1.  SMOR reported the new RAID 1
device as ID 1 (HBA:0 Channel:0 Id:1 LUN:0) (this is good and what I
expected) but the SRM console and operating system treated the new RAID
device as ID/LUN 3.  The SMOR ID appears to be ignored by the SRM and
Tru64 UNIX 5.1A.

ANSWER:
Why the SMOR ID appears to be ignored and how the SRM defines the device
name are a mystery.

---------------------------------------------------------------------------
---- QUESTION:
5. Tru64 UNIX 5.1A pk #2 dsfmgr man page example 5 has
        /sbin/dsfmgr -R delete hwid 25
Shouldn't this be
        /sbin/dsfmgr -R hwid 25
ANSWER:
Yes, the dsfmgr man page is wrong.  I have reported this issue to HP.
As a side note, http://www.tru64unix.compaq.com/docs/updates/V51A/TITLE.HTM,
Compaq Technical Update for Tru64 UNIX Version 5.1A, March 26, 2002:
Replacing SCSI Devices, has incorrect syntax, too.

---------------------------------------------------------------------------
---- QUESTION:
6. Under Tru64 UNIX 4.0G or lower the DEC/Compaq/HP Field Service Engineers
would replace failed external SCSI tape drives (DLT) while the system is
running (no reboot, no system change).  Under Tru64 UNIX 5.1A, must we now
do the following to retain the same device special file;
-------------
replace the tape drive
reboot
hwmgr -delete component -id XX
hwmgr -refresh component
dn_setup -init
dsfmgr -K
reboot
-------------
That is, if the same SCSI target is used, then is the hardware component
gymnastics and reboots necessary?

ANSWER:
The good news is a reboot may not be necessary.  However, Hardware Component
Management gynastics are necessary.  The preferred method is probably what
is described in the Tru64 UNIX Version 5.1A System Administrator book,
section 5.4.4.11 Replacing a Failed SCSI Device.  However, the instructions
are geared for a failed disk drive.  I was given these instructions;

. remove old broken device (e.g., tape0)
. install replacement device
. hwmgr -scan scsi                      (find the new device)
. hwmgr -show scsi                      (list the devices)
. dsfmgr -K                             (create device special files, eg
tape1) . dsfmgr -e tape1 tape0                 (exchange device special
files) . hwmgr -delete scsi -did 54            (delete old tape0 did from
hwmgr show)

NOTE: the 'hwmgr -delete' step, burning incense, praying to your deity, and
sacrificing farm animals are all optional.

---------------------------------------------------------------------------
---- QUESTION:
7. What value does hardware component management add that justifies the
added aggravation for the system administrator?  Is the value the
ability to move a device from one target ID to another target ID and
continue to use the same device special file?  If so, do system
administrators perform that task more often than replacing hardware?

ANSWER:
The benefit to non-cluster systems (i.e., standalone) is debatable.
Hardware Component Management benefits the cluster environment.  Both
Context Dependent Symbolic Links (CDSLs) (e.g., /usr/sbin/cdslinvchk)
and hardware component management are forced upon non-cluster systems
(some may find /cluster/members/member0/tmp annoying, for example).  On
the bright side it is a job security enhancement.

---------------------------------------------------------------------------
----

-- 
Regards,						   /~\ The ASCII
Richard Jackson						   \ / Ribbon Campaign
Computer Systems Engineer,				    X  Against HTML
Information Technology Unit, Technology Systems Division   / \ Email!
Enterprise Servers and Operations Department
George Mason University, Fairfax, Virginia

---------- End Forwarded Message ----------



--
Daniel Monjar
IS Manager, Technical Services
bioMérieux, Inc.
Durham, NC US




More information about the TriLUG mailing list