T3 FAQ
Sun StorEdge T3 Frequently Asked Questions
Q. Can I use the 181GB or 15000-RPM drives in a regular T3A? A. No. 15000-RPM and 181GB drives are supported in T3+ only.
Q. What is the difference between the StorEdge T3 for the Enterprise (T3ES) and the StorEdge T3 for the Workgroup (T3WG)?
A. The quick answer is that the Workgroup is just one tray and the Enterprise is two trays, or a "partner group." See the table below for a more complete summary.
Q. Does the T3 require a Volume Manager license?
A. Yes. Many mistakenly think the T3 has a built-in license like the A5x00 and the SSA; it does NOT. A new T3ES will come bundled with VxVM and the required license (that is entered using vxlicense). The T3WG does NOT come with VxVM or the license bundled. They must be ordered separately if VxVm is to be used.
Q. What hosts can I hook up a T3 to?
A. U60, U80,
E220R, E420R
E250, E450,
Netra t 1405,
Sun Blade 1000,
E3x00-E6x00,
E10000,
E280R,
F3800,
F4800, F4810
F6800NOTE: The listing above is kept up to date as much as possible. For the official and most up to date word on which hosts are supported, please see the following links, as they are externally available and are considered to be accurate.
http://www.sun.com/storage/t3wg/details.html
http://www.sun.com/storage/t3es/details.html
or
http://www.sun.com/storage/t3/sun_support.html
Q. What is the maximum number of T3's I can hook up to each host?
A. Please see the tables listed here:
http://www.sun.com/storage/t3wg/details.html
http://www.sun.com/storage/t3es/details.html
or
http://www.sun.com/storage/t3/sun_support.htmlI used to try to keep a local copy of the table (still below), but inconsistent & inaccurate sources have made it impossible to keep up. The links above are viewable by customers and are supposed to be considered "THE" source for this information.
Q. What version of Solaris do I need for the T3?
A. Solaris 2.6, 7, or 8. Plus all appropriate patches of course.
Q. What do I do if I have forgotten the password on my T3?
A. If you have forgotten the password on the T300 storage system, you can do the following to gain access:
- Stop all I/O going to the T3(s).
- Connect console cable to a terminal and power off all units (both trays in a partner group).
- Power master unit back on (leave alternate master off) and interrupt the boot process by hitting RETURN when you see this message:
"hit the RETURN key within X seconds to cancel..."
NOTE: If you leave the alternate master on, it will continue to boot. Eventually it will assume there is a problem with the master (that we interrupted), disable it, and make itself the master. Not good...
- At the "T300-EP" prompt, use the 'set passwd' command to view (and memorize...) the password.
- Type 'reset' to reboot, and immediately power on the alternate master (if present).
- If you choose, you can change the password after it boots with the 'passwd' command.
NOTE: The T3 allows you to set the passwd at the "T300-EP" prompt using the 'set passwd {newpasswd}' string, and it will appear to change it. But you may find after booting up that your new password did not take. Again, not good...
Q. What output should can be used to troubleshoot the T3?
A. Here is a list of commands.
ver
fru stat
fru list
vol stat
vol list
vol mode
set
sys list
port list
port listmap
/syslog file
/syslog.OLD file
/var/adm/messages{.t300} off of the data hostFor battery related issues, get these also:
id read u{1|2}pcu{1|2}
refresh -sOn the really nasty problems, run T3 Extractor to capture all of this and more.
Q. Does a regular Explorer get all of that?
A. No (not yet). You'll need to run this one in addition to a regular Explorer.
UPDATE: When run with the '-w t3extended' option, the latest Explorer can get just about everything that Extractor gets. However, there are a few shortcomings still:
- This method is MUCH SLOWER than running Extractor. It will take a VERY long time, particularly when running against multiple T3's because it executes a telnet and login for every single command it runs.
- The Explorer output of EVERY T3 command includes the login session.
For example:
# more ver.out
telnet> Trying 129.153.49.210...
Connected to purple21.
Escape character is '^]'.
pSOSystem (129.153.49.210)
Login: root
Password:
T300 Release 1.18.00 2001/11/16 13:35:09 (129.153.49.210)
Copyright (C) 1997-2001 Sun Microsystems, Inc.
All Rights Reserved.
purple21:/:<1>ver
T300 Release 1.18.00 2001/11/16 13:35:09 (129.153.49.210)
Copyright (C) 1997-2001 Sun Microsystems, Inc.
All Rights Reserved.
purple21:/:<2>
exit
Kind of a lot to read just to see the FW version, isn't it?
For these reasons, it is still recommended to get an Extractor if possible.
Q. Can 18-GB, 36-GB, and 73-GB drives be mixed in a single array or partner group?
A. No, there are currently no plans to support mixed drive capacities. However, you CAN mix different drive models of the same capacity, e.g., 36GHH with 36GLP drives. Just remember that this can complicate your disk firmware updates for arrays with multiple drive models present. This is because drive firmware upgrades are an "offline" procedure. The more model types present, the greater the chance a given drive firmware update will require downtime. You should attempt to minimize mixing where possible.
Q. Can Sun StorEdge A5X00/A3500/A1000/D1000 arrays and Sun StorEdge T3 arrays be connected to the same host system?
A. Yes, the Sun StorEdge T3 array has been tested with various combinations of the above listed Sun storage systems.
Q. Can FC-AL hubs be used with the Sun StorEdge T3 arrays? How many arrays can be connected to a single hub?
A. Yes, up to four Sun StorEdge T3 arrays can be attached to the Sun 7-port hub. BUT...there was a problem where DMP was not working properly with T3's and hubs. This came from both a controller firmware and a Veritas problem that were to be fixed by FW 1.16 and Veritas patches.
As of 11/03/00, this issue is resolved. Patch 109115 now includes T3 firmware 1.17. See the matrix for the necessary Veritas patch(es).
For more details on the setup of T3, hubs, and multi-initiation, please see information in the T3 Configuration Guide.
Q. Can I use multi-hosting with the T3 and hubs?
A. Yes and No. Multi-initiation is allowed with the T3WG config (maximum of TWO initiators). Multi-initiation is NOT currently supported with T3ES (partner groups). See the T3 Configuration Guide for more details.
UPDATE: Multi-initiation of T3WG AND T3ES is now fully supported starting with FW 1.17b. See the Switch documentation page for the Installation & Configuration Guide showing support for these configs.
If applicable, the appropriate cluster documentation is also recommended.
Q. Do I have to plug the T3 into certain ports on the hub?
A. No, the T3 can use any available port on the hub. See p.34 of the T3 Configuration Guide for more details.
Q. Can both Sun StorEdge A5X00 arrays and Sun StorEdge T3 arrays be connected on the same Fibre Channel loop/hub?
A. No, mixing of Sun StorEdge A5X00 and T3 arrays on the same Fibre Channel loop/hub is not supported.
Q. Can I hook the T3 to a switch?
A. Yes. Switch support with T3 went to GA on 1/23/01.
Q. How many Fibre Channel host connections are on each Sun StorEdge T3 array?
A. There is a single 100 MB/sec. host connection on each array. Every array MUST be connected to a host bus adapter (HBA), an FC-AL hub or switch, or an on-board FC-AL socket of the SBus or graphics I/O boards.
Q. Does the Sun StorEdge T3 array have multi-host support? If so, how many hosts/initiators?
A. There is currently a limitation of TWO initiators per loop for the Sun StorEdge T3 for the workgroup (single tray) and a SINGLE initiator limitation for Sun StorEdge T3 for enterprise (paired arrays) configurations. This limitation applies to both FC-AL hub and switch configurations.
UPDATE: Multi-initiation of T3WG AND T3ES is now fully supported starting with FW 1.17b. See the Switch documentation page for the Installation & Configuration Guide showing support for these configs.If applicable, the appropriate cluster documentation is also recommended.
Q. Can I convert two T3 Workgroup trays into one T3 Enterprise partner group or split a partner group into two single bricks?
A. The short answer is yes. The longer answer is it really isn't supported to turn two WG bricks into an ES partner group. If it was, people would buy two "cheap" bricks and then just buy the Interconnect Cables (which aren't sold individually!). We would have to sell the cables for about $15,000 each to make up the cost. Having said all that, both procedures are documented.
So you can convert a single ES unit into two WG bricks. That would leave you with some unused UIC cables that you could potentially convert two WG trays into an ES with. Check out Chapter 10 of the T3 Field Service Manual for details on both procedures.
There is apparently also an upgrade path available via trade-in (I pulled this from the email archives). Look for Part Number ALW-30-D-TWT3E-014.
UPDATE: Some versions of the FSM have an error in them. It might say this:
"The former alternate master unit contains the same volume(s) as it did when connected in the partner group. Those volumes are available when the unit is powered on."
This is not necessarily true. You should record the volume info (blocksize, raid level, drives, etc) because you will have to rebuild them on the alternate master. You will have to rebuild it EXACTLY the same and use the '.vol init fast' command to initialize the volume and preserve the data that was on the drives.
Finally, you may hit BugID 4473104 (for T3 AND T3+) when trying to remove the old "phantom" u2 volume(s) from u1's configuration (see p10-18 of FSM). The bug prevents you from removing the volume(s) from u1 if the PG has already been split. To workaround it, you should remove the u2 volume(s) BEFORE splitting the pair (but after recording the config), or downgrade (or tftpboot) to 1.17 and remove it, or rejoin the PG and remove it. You can also do a 'boot -w' to work around this, but you will have to restore your 'set' and 'sys' settings as well.
Q. Are Long-Wave GBICs supported on the Sun StorEdge T3 array?
A. No, there are currently no plans to support the Long-Wave GBIC on the Sun StorEdge T3 array. Extended distance functionality will be moved to the Fibre Channel Switch when it becomes available.
Q. What is the maximum distance supported between the host and the array?
A. The Sun StorEdge T3 array supports a 500-meter distance between host and array using 50/125-micron multi-mode Fibre Channel cables.
Q. What are Sun StorEdge T3 expansion units? Are they supported?
A. Expansion units are disk arrays that do not have a controller card. They allow increased storage density without adding I/O capacity. Sun is not shipping expansion units at this time. However, they are under evaluation at select customer sites and are a supported configuration only for those customers. In other words, you may see some calls slip in with this config...
Q. When will support for the Sun StorEdge T3 expansion units be available?
A. Sun is currently evaluating configuration using the Sun StorEdge T3 expansion unit. No date has been set for these configurations.
Q. How many Sun StorEdge T3 trays will fit in the Sun StorEdge 72-inch cabinet?
A. Eight Sun StorEdge T3 for the workgroup OR eight Sun StorEdge T3 for the enterprise trays will fit in the Sun StorEdge 72-inch cabinet.
The T3 Rackmount Placement Matrix may be a helpful resource in this area.
- Eight Sun StorEdge T3 arrays for the workgroup consume 32 RUs (rack units) using eight 4U rackmount rails.
- Eight Sun StorEdge T3 arrays for the enterprise consume 28 RUs rack units) using four 7U rackmount rails.
Q. Can I mix Sun StorEdge A5000/A3500/A1000/D1000 arrays and Sun StorEdge T3 arrays in a Sun StorEdge 72-inch expansion cabinet?
A. No, only the Sun StorEdge T3 arrays can be configured together in a Sun StorEdge expansion cabinet at this time.
UPDATE: That is the "official" answer. The unofficial answer is "yes" you can mix them. The marketing docs just have not yet been updated to reflect this. There is a publically available T3 FAQ that says you can do it though. It also contains some rules for how they must be configured.
There should be an official statement for this soon. For now, see #12 here:
- External T3ES FAQ
- External T3WG FAQ
TARGET/ADDRESS QUESTIONS
Q. How can I see which target ID the T3 is set to?
A. There are two different commands on the T3 that will give that info. 'port list' and 'port listmap'. See example output below.
purple11:/:<1>port list
port targetid addr_type status host wwn
u1p1 11 hard online sun 50020f2300002b92
u2p1 12 hard online sun 50020f2300002b66So in this case, you should have some devices in Solaris with "t" numbers of 11 and 12.
NOTICE this gives you the WWN of the bricks as well...
purple11:/:<2>port listmap
port targetid addr_type lun volume owner access
u1p1 11 hard 0 u1v1 u1 primary
u1p1 11 hard 1 u2v1 u2 failover
u2p1 12 hard 0 u1v1 u1 failover
u2p1 12 hard 1 u2v1 u2 primaryHere we see that "mp_support" is set to rw and we are presenting two paths to each lun. The "targetid" column shows what the "t" number will be while the "lun" column shows the "d" numbers. "c" numbers are arbitrary but with the info above (including the WWN), you should be able to map your luns to their Solaris device names.
Q. Uh, oh. I've got a target conflict. How can I change the target ID on the T3 so that I can see all of my devices?
A. The T3 'port' command is used for this as well. Here is the syntax along with an example of setting the target to 75 on unit 1.
port set {port} targetid {value}
purple11:/:<5> port set u1p1 targetid 75A 'reset' of the T3 will be required for the change to take effect.
You can take a look at p.10-23 of the T3 Field Service Manual for a documented example.
Q. Are there restrictions on what targets I can use for the T3?
A. Valid target ID's are 0 - 125.
GENERAL LUN QUESTIONS
Q. How many luns can a T3 have?
A. Each tray can have a maximum of 2 luns. So a partner group may have 4 luns.
Q. Can I change the layout of a lun on the fly? Can I:
- grow it?
- shrink it?
- change the RAID level?
- change the lun blocksize?
- add/remove a hot spare?
A. No, No, No, No, and...No. Any of these changes will require backing up data, deleting the lun, recreating it with the desired layout, and restoring data.
Q. Can I use any disk in the tray as a standby (hot spare)?
A. No. Drive 9 (far right) must be used as the hot spare.
Q. Ok, I'm ready to make some luns. Are there any special rules or considerations I need to worry about?
A. Here are the lun creation rules:
luns must consist of a contiguous sequence of disks a disk may not be partitioned into different luns luns may NOT span disk trays 1-2 luns per tray can be configured RAID 1 luns can consist of 2-9 disks RAID 5 luns can consist of 3-9 disks If a hot spare is assigned, it MUST be disk 9 If a hot spare is assigned, it MUST be assigned when the FIRST lun is created and used for ALL luns in the disk tray (See NOTE below)* NOTE: This rule is enforced by the Component Manager GUI, but is not enforced by the T3 CLI. Even though the CLI does not enforce it, users are STRONGLY ENCOURAGED to adhere to this rule to avoid experiencing unpredictable behavior that may result in the need to rebuild the luns.
Q. Do the luns within a tray have to be the same RAID level?
A. No. RAID levels can be mixed within the tray. But...there is a catch.
Check out the rules for creating luns in the question above. If you create a RAID 0 lun first, and later create a RAID 1 or 5, you won't be able to have a hot spare with it. If you create the RAID1 or 5 first, you will not be able to create a RAID 0 later. Again, please see the lun creation rules above.
Q. I heard that you can make a mirrored lun out of an odd number of disks. What's up with that?
A. Yes, you can. Infodoc #22450 gives a nice explanation on it. You (and your customers) can read all about it. The thing to remember is that this is still RAID 1+0 (sort of) and can withstand multiple drive failures. But those drives cannot be adjacent. If you lose two disks next to each other (that goes for d9 and d1 too), then the data is toast.
Q. What does the lun's data blocksize parameter do? Are there any guidelines for what it should be set to?
A. This answer is lifted right from p.15 of the T3 Configuration Guide. Check the guide for any updates or changes to this information. The data block size (A.K.A. stripe unit size) is the amount of data written to each drive when striping data across drives. The block size can be changed only when there are no volumes defined. The block size can be configured as 16KB, 32KB, or 64KB. The default block size is 64KB.
A cache segment is the amount of data being read into cache. A cache segment is 1/8 of a data block. Therefore, cache segments can be 2KB, 4KB, or 8KB. Because the default block size is 64KB, the default cache segment size is 8KB.
*NOTE: The disk tray data block size is independent of I/O block size. Alignment of the two is not recommended.
Selecting a Data Block Size
If the I/O initiated from the host is 4KB, a data block size of 64KB would force 8KB of internal disk I/O, wasting 4KB of the cache segment. Therefore, it would be best to configure 32KB block sizes, causing 4KB physical I/O from the disk. If sequential activity occurs, full block writes (32KB) will take place. For 8KB I/O or greater from the host, use 64KB blocks.
Applications benefit from the following data block or stripe unit sizes:
*16KB data block size
- Online Transaction Processing (OLTP)
- Internet service provider (ISP)
- Enterprise Resource Planning (ERP)
*32KB data block size
- NFS file system, version 2
- Attribute-intensive NFS file system, version 3
*64KB data block size
- Data-intensive NFS file system, version 3
- Decision Support Systems (DSS)
- Data Warehouse (DW)
- High Performance Computing (HPC)
NOTE: The data block size must be configured before any logical volumes are created on the units. Remember, this block size is used for every logical volume created on the unit. Therefore it is important to have similar application data configured per unit.
NOTE: For more detailed information about configuring data block or stripe unit size, refer to the Configuration Rules for Mission Critical Storage document (A.K.A. "Big Rules" document).
Data block size is universal throughout a partner group. Therefore, you cannot change it after you have created a volume. To change the data block size, you must first delete the volume(s), change the data block size, and then create new volume(s).
CAUTION: Unless you back up and restore the data on these volumes, it will be lost.
Q. What are these numbers in the 'vol stat' output? Is the drive okay?
A. See the table below that includes sample output from 'vol stat' along with the possible drive status numbers.
See any of the following for further details:
- p.2-27 T3 Installation, Operation, and Service Manual
- p.4-3 T3 Administrator's Guide
- p.5-2 T3 Field Service Manual
Q. I've got DMP enabled and Solaris sees 2 paths to each of the T3 luns. How do I know which path is the primary and which is the secondary?
A. There are 3 main ways to view this information.
- FROM SOLARIS:
# format -e
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t2d0 [drive type unknown]
/sbus@1f,0/SUNW,fas@e,8800000/sd@2,0
1. c0t3d0 [SUN2.1G cyl 2733 alt 2 hd 19 sec 80]
/sbus@1f,0/SUNW,fas@e,8800000/sd@3,0
2. c1t1d0 [SUN-T3-0100 cyl 34145 alt 2 hd 32 sec 128]
/sbus@1f,0/SUNW,socal@1,0/sf@0,0/ssd@w50020f2300000172,0
3. c1t1d1 [SUN-T3-0100 cyl 34145 alt 2 hd 32 sec 128]
/sbus@1f,0/SUNW,socal@1,0/sf@0,0/ssd@w50020f230000015a,0
4. c2t2d0 [SUN-T3-0100 cyl 34145 alt 2 hd 64 sec 128]
/sbus@e,0/SUNW,socal@d,0/sf@0,0/ssd@w50020f2300000172,1
5. c2t2d1 [SUN-T3-0100 cyl 34145 alt 2 hd 32 sec 128]
/sbus@e,0/SUNW,socal@d,0/sf@0,0/ssd@w50020f230000015a,1
Specify disk (enter its number): 2
selecting c1t1d0
[disk formatted]In this example, four T3 disk devices are shown as disk numbers 2 through 5. They can be identified by the SUN-T3-0100 label.
NOTE: To view multiple device paths using the 'format' command, the mp_support setting on the disk tray must be set to "rw".
- FROM VOLUME MANAGER:
Run a 'vxdisk list {devicename} and view the bottom of the output.
EXAMPLE:
root[ksh]@cube01# vxdisk list c1t1d1
Device: c1t1d1s2
.
.
{** OUTPUT SHORTENED FOR EXAMPLE **}
.
.
Multipathing information:
numpaths: 2
c1t1d1s2 state=enabled type=secondary
c2t2d1s2 state=enabled type=primary
- FROM THE T3:
run a 'port listmap' to view primary and secondary paths for the luns.
EXAMPLE:
cube02:/:<1>port listmap
port targetid addr_type lun volume owner access
u1p1 1 hard 0 u1v1 u1 primary
u1p1 1 hard 1 u2v1 u2 failover
u2p1 2 hard 0 u1v1 u1 failover
u2p1 2 hard 1 u2v1 u2 primary
Q. The lun size is smaller in 'format' than it is with 'vol list' on the T3. What happened to the extra space?
A. Nothing. It's a difference in the way they calculate the size. The T3 is using 1000 bytes per KB. Solaris is using 1024 bytes per KB (the true measurement). You're not losing any actual capacity. The T3 is just making it look little bigger than it is.
See p.2-34 of the T3 Installation, Operation, and Service Manual for an example and complete explanation.
For the same reason, you should be aware that there will be a difference in the advertised capacity of the drive and what Solaris will see as useable capacity. See the question below regarding the T3's drive capacity for further info.
UPDATE: T3+ FW 2.01 has changed this behavior. With 2.01, the T3 volume capacity will actually match 'format'. This can lead to some confusion after upgrading to 2.01 because 'vol list' will list "smaller" LUNs after the upgrade. See BugID 4521882 for details.
FIRMWARE QUESTIONS
Q. How do I check firmware on this {fru}?
A. There are 4 main firmwares that you should be aware of on the T3.
- Controller (a.k.a "boot code")
- loop card
- disk
- eprom
Check controller firmware with the 'ver' command. Check loop card, disk, and eprom revisions with the 'fru list' command. You can check out some examples below.
Q. How do I upgrade firmware on this {fru}?
A. For T3, get Patch 109115. For T3+, get Patch 112276. The README files will have all you need to know about upgrading all types of firmware.
Q. Is 1.18 the latest firmware for the T3A? Will there be any more upgrades?
A. No 1.18.02 is the latest. There will not likely be any further upgrades to T3A firmware.
PATCH QUESTIONS [TOP]
Q. What patches do I need for the T3?
A. That information would be on the matrix
Q. What is the minimum version of StorTools I can run on the T3?
A. StorTools 3.3 (4.0 needed for Switch/SAN configs).
Q. Do I need any patches for that version of StorTools?
A. See the matrix!
Q. What version of Component Manager is supported on the T3?
A. From the matrix, I see 2.0 or later.
Q. Any patches for Component Manager needed?
A. Guess what? See the matrix!
Q. What is so great about this matrix?
A. It only has compatibility, patch, and firmware information for the T3, A5x00, and SSA, along with what versions of Solaris, Volume Manager, DiskSuite, StorTools, and Component Manager they're compatible with.
Learn it, Know it, Live it.
Customers can get it directly from SunSolve as Early Notifier #14838.
FRU QUESTIONS [TOP]
Q. What is that flashing light on my {fru} telling me? How do I read these LEDs?
A. Here are the tables for the fru LED's.
Chapter 4 of the T3 Installation, Operation, and Service Manual has further details.
Q. What is the actual capacity of the T3's disk drive(s)?
A. Here are the published byte counts for each T3-supported drive (from the specs).
Marketing GB Bytes True GB
------------ --------------- --------
18.2GB = 18,113,808,384 -> 16.87GB
36.4GB = 36,420,074,496 -> 33.92GB
73.4GB = 73,407,865,344 -> 68.37GB
181.5GB = 181,555,200,512 -> 169.88GBThese discrepancies can lead to a difference in the usable space reported by the T3 and the OS. See the question above for details.
NOTE: My definition of a "True" GB may differ from yours. I am using 1024MB/GB. Disk drive manufacturers use 1000MB/GB and market their drives accordingly. If you use 1000 as the divisor, the capacity is indeed as advertised. Solaris uses 1024MB/GB.
Q. How do I upgrade firmware on this {fru}?
A. For T3, get Patch 109115. For T3+, get Patch 112276. The README files will have all you need to know about upgrading all types of firmware.
Q. How many functioning fans are required to cool the T3?
A. 3 out of the 4 fans must be operating to keep the T3 up and running.
Q. So if I lose a power supply in a PCU, aren't I going to lose 2 out of the 4 fans? You just said I need 3!
A. The fans are powered through the T3's midplane, so even if one power supply fails, it's fans will draw power from the other PCU.
Q. I hit the "reset" button on one of the T3 controllers in the partner group and it disabled the controller. Why? What does the "reset" button do?
A. Believe it or not, this is expected behavior in a partner group. With two active controllers, issuing a reset disables that controller (it actually does reset it, but because the other controller senses a problem, it sets a signal that prevents the controller from rebooting).
Once you have done this, you must re-enable it using 'enable u[1|2]'. If you issue a 'reset -y' or press the "reset" button on the remaining controller, the active master will reboot, but the controller that was disabled first remains disabled.
The same thing would happen if you issued a 'disable u[1|2]' and then a 'reset -y'. The disabled controller would remain disabled after the reset because you never cleared the signal.
For these reasons, the "reset" button should only be used on a partner group if the software image is hung and the ethernet as well as both console connections are unresponsive.
On the other hand, in a single brick, pressing the reset button should simply reboot the controller.
Q. I powered off one brick of my partner group, and the other brick shut down too! Why do BOTH of them go down?
A. Remember that a partner group is designed to operate as two bricks working together as a SINGLE UNIT. If communication between the bricks is lost, then the PG will flush cache data to disk and shut down to preserve data integrity. The bottom line is that the two bricks have to be able to talk to each other.
With that in mind, there are several scenarios where you would see this or similar behavior. Here are a few that will cause an entire partner group to shut down:
- Removing power from 1 brick (pulling PCUs, power cords, shutting off switches).
- Pulling both loop cards from 1 brick.
- Disconnecting both loop interconnect cables from a PG.
- Removing any FRU for longer than 30 minutes will shut down the entire PG.
- Lack of communication at boot time (say, for reason 1 or 2 above) will cause the PG to hang during bootup until communication is established.
CACHE QUESTIONS
Q. What are the possible modes that cache can be in?
A. Cache mode can be set to the following values:
- AUTO (Default): The cache mode is determined as either write-behind or write-through, based on the I/O profile. If the disk tray has full redundancy available, then caching operates in write-behind mode. If any disk tray component is non-redundant, the mode is set to write-through. Read caching is always performed. Auto caching mode provides the best performance while retaining full redundancy protection.
- WRITE-BEHIND: All read and write operations are written to cache. An algorithm determines when the data is destaged or moved from cache to disk. Write-behind cache improves performance, because a write to a high-speed cache is faster than a write to a normal disk. Use write-behind mode with a standalone disk tray configuration when you want to force write-behind caching to be used.
NOTE: Write-behind cache adds a degree of risk, because the data stays in memory longer. Although it is generally no more than a few seconds until the data is written to disk, if the system crashes or is powered down before then, the data is lost. For this reason, there is a possibility of data loss in this cache mode if the units are not configured as a fully redundant partner group.
- WRITE-THROUGH: This mode forces write-through caching to be used. In write-through mode, data is written through cache in a serial manner and is then written to the disk. Write-through caching does not improve write performance. However, if a subsequent read operation needs the same data, the read performance is improved, because the data is already in cache.
- NONE: No reads or writes are cached.
NOTE: For full redundancy, set the cache mode and the mirror variables to "auto" for a partner group. This ensures that the cache is mirrored between controllers and that write-behind cache mode is in effect. If a failure occurs, the data is synchronized to disk, and then write-through mode takes effect.
Once the problem has been corrected, cache again operates in write-behind mode.
CAUTION: If one of the redundant components fails, the cache mode is set to write-through. If, however, you view the cache mode at this time using the 'vol mode' command, the setting for cache will be displayed as "writethrough" and the setting for mirror will be displayed as "on". While mirroring is enabled, it is NOT being used at this time.
The following sources provide more detail about cache modes and operation:
- p.28 T3 Technical Whitepaper (See the Glossary too)
- p.3-6 T3 Installation, Operation, and Service Manual
- p.13 T3 Configuration Guide
Q. How do I know what mode the cache is in?
A. Run a 'vol mode' to find out.
Q. What events can trigger write-behind caching to be disabled?
A. The following table highlights the system states that can cause write-behind cache to be disabled. In cases where write-behind cache has been disabled, the disk tray is designed to protect the data in cache. If any hardware component fails that might compromise the safety of this data, the disk tray will disable write-behind cache, causing outstanding cached data to be flushed to disk. Write-behind caching is re-enabled when the failed condition has been corrected.
This answer was lifted right from p.3-7 of the T3 Installation, Operation, and Service Manual. Please refer to it for any updates.
Q. How much data can the T3's /syslog file hold?
A. The /syslog file on the T3 has a file size limit of 1MB. After 1MB, it gets moved to /syslog.OLD (which also has a 1MB limit). A nasty problem that goes unchecked can easily fill up both files with the same repeating error. For this reason, it is highly recommended that you set up message logging to a remote host. For example, log messages to a /var/adm/messages.t300 file on your data host.
Q. What are the different categories of messages that the T3 sends out?
A. The T3 sends out messages (determined by the "loglevel" setting) in four categories. You designate in the T3's /etc/syslog.conf which of these levels to RECEIVE and where to receive them. Generally, they'll go to the T3's /syslog file and, if so configured, to a remote host (recommended).
See below for the table and description of the four message categories.
See p.4-13 of the T3 Administrator's Guide or p.C-2 of the T3 Field Service Manual for further details.
Q. What do the "logto" and "loglevel" settings do? How do they work with syslog?
A. The "logto" setting simply tells the T3 where to log the messages. It can be set to 1, *, or some "filename". It is "*" by default.
1 = Forces logging to serial console. The messages will not go to /syslog if it is set to 1.
* = Directs logging daemon to log as specified in /etc/syslog.conf.The "loglevel" parameter specifies what level of messages the T3 will SEND out. There are five possible levels (default is 3):
0 = No logging at all
1 = Error messages only
2 = Warning and higher messages
3 = Notice and higher messages
4 = All message levels including info
So "loglevel" says what type of messages will be generated and SENT. The T3's /etc/syslog.conf says which of those will be RECEIVED and captured in a file.
NOTE: If changes are made to the /etc/syslog.conf file (on the T3), you must type 'set logto *' for those changes to take effect.
See p.A-17 in the T3 Administrator's Guide for further details.
LUN RECONSTRUCTION [TOP]
Q. How do I know what percentage of the volume has reconstructed?
A. Simply run a 'proc list' to get the percentage done.
Q. How long should volume reconstruction take?
A. Depending on the T3 settings & host I/Os, the reconstruction time can vary dramatically. See the table below for some good estimates.
If the host I/O is given priority (see 'sys recon_rate'), and there is a large amount of host I/O taking place, reconstruction rates can increase significantly. In the case of T3, if maximum priority is given to host I/O, and if I/O activity is continuous, it could take hours or even days for a reconstruction to complete.
Q. Can I increase the reconstruction rate during reconstruction to speed up volume recovery?
A. No. The rate must be set before reconstruction takes place. If you wish, you may interrupt reconstruction, increase the rate, and then initiate reconstruction again.
Q. If I lose power in the middle of volume reconstruction, will it pick up where it left off after power is restored?
A. No. It will restart reconstructing from the beginning.
Q. Will a volume reconstruct if it isn't mounted?
A. No, it needs to be mounted (on the T3, not the host) for reconstruction to take place.
BATTERY QUESTIONS [TOP]
Q. How can I check the status of the batteries? How do I know the status of a refresh operation or when they are going to refresh next?
A. Run a 'refresh -s'.
purple17:/etc:<40>refresh -s
PCU1 PCU2
-----------------------------------------------------------------
U1 Recharging Pending
U2 Recharging Pending
Current Time Fri Sep 15 17:09:01 2000
Start Time Fri Sep 15 16:52:40 2000
Next Refresh Fri Sep 29 16:52:40 2000
Total time elapsed : 0 hours, 16 minutes, 21 seconds.
And while we're at it, check out what else you can do with the 'refresh' command:
refresh -c ----> start refresh
refresh -s ----> get status
refresh -i ----> re-init command file
refresh -k ----> kill current refreshing task
Q. How do I replace the battery?
A. The battery is not a FRU. If it has been determined that the battery needs to be replaced, then do so by replacing the entire Power and Cooling Unit (PCU).
UDPATE: Ever since FCO A0183 came out, the battery has been an available FRU in the SSHB. There is also a documented procedure now within the T3 Field Service Manual for replacing the battery. This procedure includes the use of hidden ".dot" commands.
So if your customer needs a battery replacement and someone from SUN will be going onsite to do it, they can order the battery FRU and replace it. If the customer does not have onsite support or wishes to do it themselves, then they should replace the entire PCU as the customer docs state.
This goes for ALL T3 batteries regardless of whether they are part of FCO A0813 or not.
See this PSIM Bulletin for details on the above.
Q. Does the cache change to writethrough mode during the battery refresh?
A. Yes, but only during the discharge period. For approximately 12 minutes (per PCU), the cache will switch to writethrough. After the finishes and the battery starts to recharge, the cache will go back to writebehind mode for the remainder of the refresh cycle.
So for an entire refresh, the T3 can be in writethrough cache mode for approximately 20-25 minutes (~12 min. per PCU) of the 20 or so hours the refresh takes.
You can also find this documented in the following manuals:
- p.5-10 of the T3 Installation, Operation, and Service Manual
- p.7-8 of the T3 Field Service Manual
UPDATE: The behavior described above applies only to firmware versions prior to 2.01.00. Starting with 2.01.00, the cache will remain in writebehind during the entire scheduled battery refresh.
This only applies to T3+, T3A still goes to writethrough during discharge phase. However, starting with 1.18.00 FW, that discharge phase no longer goes beyond 6 minutes. At 6 minutes, the recharge phase begins and cache returns to writebehind mode.
NOTE: When batteries recharge following a power outage, the cache will remain in writethrough for the entire time it takes to recharge (up to 12 hours).
Q. How can I tell how old the battery is?
A. Run an 'id read' on the unit and pcu in question. Here is an example:
purple17:/etc:<37>id read u1pcu1
Revision : 0000
Manufacture Week : 00032000
Battery Install Week : 00032000
Battery Life Used : 31 days, 8 hours
Battery Life Span : 730 days, 12 hours
Serial Number : 010293
Battery Warranty Date: 20000815151902
Battery Internal Flag: 0x00000000
Vendor ID : TECTROL-CAN
Model ID : 300-1454-01(50)
Q. I've got a lot of messages saying "Battery not OK." Do I have a problem?
A. Not necessarily. This will be seen during the refresh cycle that occurs every 28 days. Check for other activity around the same time to find out if it was doing a refresh. Here is an example of normal refresh activity and output:
Oct 30 15:04:41 BATD[1]: N: Battery Refreshing cycle starts from this point.
Oct 30 15:04:44 LPCT[1]: N: u1pcu1: Refreshing battery
Oct 30 15:04:48 LPCT[1]: N: u2pcu1: Refreshing battery
Oct 30 15:18:03 LPCT[1]: N: u1pcu1: Battery not OK
Oct 30 15:18:05 BATD[1]: N: u1pcu1: hold time was 803 seconds.
Oct 30 15:18:21 LPCT[1]: N: u2pcu1: Battery not OK
Oct 30 15:18:24 BATD[1]: N: u2pcu1: hold time was 818 seconds.
Oct 31 02:46:50 LPCT[1]: N: u1pcu2: Refreshing battery
Oct 31 02:46:54 LPCT[1]: N: u2pcu2: Refreshing battery
Oct 31 02:59:57 LPCT[1]: N: u2pcu2: Battery not OK
Oct 31 03:00:00 BATD[1]: N: u2pcu2: hold time was 788 seconds.
Oct 31 03:03:57 LPCT[1]: N: u1pcu2: Battery not OK
Oct 31 03:03:59 BATD[1]: N: u1pcu2: hold time was 1031 seconds.
Oct 31 12:54:54 BATD[1]: N: Battery Refreshing cycle ends at this point.
You will also see "Battery not OK" when power is restored to the T3 following an outage or plugging power cords in after relocating T3(s).
Q. How long does a refresh cycle take?
A. A normal refresh cycle takes around 20-24 hours to complete.
For redundancy, only 1 battery per brick is refreshed at a time. The cycle begins with a discharge of the battery that takes 6-20 minutes, followed by the recharge which takes 10-12 hours. When pcu1 finishes, pcu2 goes through the same process.
A partner group finishes a refresh cycle just as fast as a single brick because the u1-2pcu1's refresh simultaneously, followed by the u1-2pcu2's.
UPDATE: Starting with FW 1.18.00, the discharge automatically ends at 6 minutes. If the discharge goes 6 minutes, the battery is considered good and immediately begins the recharge. If the battery discharges in under 6 minutes, it flags it with a "Hold Time Low" and should be replaced.
Q. How long does the T3 battery last?
A. From the 'id read' output in the question above, we see 730 days...or 2 years. After that it must be replaced, by replacing the PCU.
You can find this info and more battery related stuff here:
- p.7-6 of T3 Field Service Manual
- p.5-10 of T3 Installation, Operation, and Service Manual
Last changes: Sunday, February 20, 2005 02:07:20 PM,
:P 2004-2005 filibeto.org, site statistics, legal stuff