Oracle® Clusterware Administration and Deployment Guide 11g Release 1 (11.1) Part Number B28255-01 |
|
|
View PDF |
This appendix introduces monitoring the Oracle Clusterware environment and explains how you can enable dynamic debugging to troubleshoot Oracle Clusterware processing, and enable debugging and tracing for specific components and specific Oracle Clusterware resources to focus your troubleshooting efforts.
This appendix contains the following topics:
Clusterware Log Files and the Unified Log Directory Structure
Enabling Additional Tracing for Oracle Clusterware High Availability
You can use Oracle Enterprise Manager to monitor the Oracle Clusterware environment. When you log in to Oracle Enterprise Manager using a client browser, the Cluster Database Home page appears where you can monitor the status of both Oracle Clusterware environments. Monitoring can include such things as:
Notification if there are any VIP relocations
Status of the Oracle Clusterware on each node of the cluster using information obtained through the Cluster Verification Utility (cluvfy)
Notification if node applications (nodeapps) start or stop
Notification of issues in the Oracle Clusterware alert log for the OCR, voting disk issues (if any), and node evictions
The Cluster Database Home page is similar to a single-instance Database Home page. However, on the Cluster Database Home page, Oracle Enterprise Manager displays the system state and availability. This includes a summary about alert messages and job activity, as well as links to all the database and Automatic Storage Management (ASM) instances. For example, you can track problems with services on the cluster including when a service is not running on all of the preferred instances or when a service response time threshold is not being met.
You can use the Oracle Enterprise Manager Interconnects page to monitor the Oracle Clusterware environment. The Interconnects page shows the public and private interfaces on the cluster, the overall throughput on the private interconnect, individual throughput on each of the network interfaces, error rates (if any) and the load contributed by database instances on the interconnect, including:
Overall throughput across the private interconnect
Notification if a database instance is using public interface due to misconfiguration
Throughput and errors (if any) on the interconnect
Throughput contributed by individual instances on the interconnect
All of this information also is available as collections that have a historic view. This is useful in conjunction with cluster cache coherency, such as when diagnosing problems related to cluster wait events. You can access the Interconnects page by clicking the Interconnect tab on the Cluster Database home page.
Also, the Oracle Enterprise Manager Cluster Database Performance page provides a quick glimpse of the performance statistics for a database. Statistics are rolled up across all the instances in the cluster database in charts. Using the links next to the charts, you can get more specific information and perform any of the following tasks:
Identify the causes of performance issues.
Decide whether resources need to be added or redistributed.
Tune your SQL plan and schema for better optimization.
Resolve performance issues
The charts on the Cluster Database Performance page include the following:
Chart for Cluster Host Load Average—The Cluster Host Load Average chart in the Cluster Database Performance page shows potential problems that are outside the database. The chart shows maximum, average, and minimum load values for available nodes in the cluster for the previous hour.
Chart for Global Cache Block Access Latency—Each cluster database instance has its own buffer cache in its System Global Area (SGA). Using Cache Fusion, Oracle RAC environments logically combine each instance's buffer cache to enable the database instances to process data as if the data resided on a logically combined, single cache.
Chart for Average Active Sessions—The Average Active Sessions chart in the Cluster Database Performance page shows potential problems inside the database. Categories, called wait classes, show how much of the database is using a resource, such as CPU or disk I/O. Comparing CPU time to wait time helps to determine how much of the response time is consumed with useful work rather than waiting for resources that are potentially held by other processes.
Chart for Database Throughput—The Database Throughput charts summarize any resource contention that appears in the Average Active Sessions chart, and also show how much work the database is performing on behalf of the users or applications. The Per Second view shows the number of transactions compared to the number of logons, and the amount of physical reads compared to the redo size for each second. The Per Transaction view shows the amount of physical reads compared to the redo size for each transaction. Logons is the number of users that are logged on to the database.
In addition, the Top Activity drilldown menu on the Cluster Database Performance page enables you to see the activity by wait events, services, and instances. Plus, you can see the details about SQL/sessions by going to a prior point in time by moving the slider on the chart.
You can use crsctl
commands as the root
user to enable dynamic debugging for Oracle Clusterware, the Event Manager (EVM), and the clusterware subcomponents. You can dynamically change debugging levels using crsctl
commands. Debugging information remains in the Oracle Cluster Registry (OCR) for use during the next startup. You can also enable debugging for resources.
The crsctl
syntax to enable debugging for Oracle Clusterware is:
crsctl debug log crs "CRSRTI:1,CRSCOMM:2"
The crsctl
syntax to enable debugging for EVM is:
crsctl debug log evm "EVMCOMM:1"
The crsctl
syntax to enable debugging for resources is:
crsctl debug log res "resname:1"
You can use crsctl
commands as the root
user to enable dynamic debugging for the Oracle Clusterware Cluster Ready Services (CRS), Oracle Cluster Registry (OCR), Cluster Synchronization Services (CSS), and the Event Manager (EVM).
This section contains the following topics:
You can enable debugging for the CRS, OCR, CSS, and EVM modules and their components by setting environment variables or by issuing crsctl debug
commands using the following syntax:
crsctl debug log module_name component:debugging_level
You must issue the crsctl debug
command as the root
user, and supply the following information:
component
—The name of a component for the CRS, OCR, EVM, or CSS module. See Table F-1 for a list of all of the components.
debugging_level
—A number from 1 to 5 to indicate the level of detail you want the debug command to return, where 1 is the least amount of debugging output and 5 provides the most detailed debugging output.
You can dynamically change the debugging level in the crsctl command, or you can configure an init file for changing the debugging level as described in "Creating an Initialization File to Contain the Debugging Level".
The following commands show examples of how to enable debugging for the various modules:
To enable debugging for Oracle Clusterware:
crsctl debug log crs "CRSRTI:1,CRSCOMM:2"
To enable debugging for OCR:
crsctl debug log crs "CRSRTI:1,CRSCOMM:2,OCRSRV:4"
To enable debugging for EVM:
crsctl debug log evm "EVMCOMM:1"
To enable debugging for resources
crsctl debug log res "resname:1"
To list the components that can be used for debugging, issue the crsctl lsmodules
command using the following syntax and supply crs
, evm
, or css
for the module_name
parameter:
crsctl lsmodules module_name
Note:
You do not have to be theroot
user to run the crsctl
command with the lsmodules
option.Table F-1 shows the components for the CRS, OCR, EVM, and CSS modules, respectively. Note that some of the component names are common between the CRS, EVM, and CSS daemons and may be enabled on that specific daemon. For example COMMNS
is the NS layer and because each daemon uses the NS layer, you can enable this specific module component on any of the daemons to get specific debugging information.
Table F-1 Components for the CRS, OCR, EVM, and CSS Modules
CRS ModulesFoot 1 | OCR ModulesFoot 2 | EVM ModulesFoot 3 | CSS ModulesFoot 4 |
---|---|---|---|
CRSUI CRSCOMM CRSRTI CRSMAIN CRSPLACE CRSAPP CRSRES CRSCOMM CRSOCR CRSTIMER CRSEVT CRSD CLUCLS CSSCLNT COMMCRS COMMNS |
OCRAPI OCRCLI OCRSRV OCRMAS OCRMSG OCRCAC OCRRAW OCRUTL OCROSD OCR Tools Modules OCRCONF OCRDUMP OCRCHECK |
EVMD EVMDMAIN EVMCOMM EVMEVT EVMAPP EVMAGENT CRSOCR CLUCLS CSSCLNT COMMCRS COMMNS |
CSSD COMMCRS COMMNS |
Footnote 1 List the CRS component modules using the crsctl lsmodules crs
command.
Footnote 2 You cannot list the OCR modules using the crsctl lsmodules
command.
Footnote 3 List the EVM component modules using the crsctl lsmodules evm
command.
Footnote 4 List the CSS component modules using the crsctl lsmodules css
command.
This section describes how to specify the debugging level in an initialization file. This debugging information is stored for use during the next startup.
For each process that you want to debug, you can create an initialization file that contains the debugging level.
The initialization file name includes the name of the process that you are debugging (process_name
.ini
). The file is located in the |Oracle_home
/log
/hostname
/admin
/| directory.
For example, ORACLE_HOME/log/hostA
/admin/clscfg.ini
is the name for the CLSCFG debugging initialization file on hostA.
See Also:
"Enabling Debugging for CRS, OCR, CSS, and EVM Modules" for information about dynamically changing debugging levels by specifying the level number (from 1 to 5) on thecrsctl
commandYou can start or stop Oracle Clusterware by issuing crsctl
start
and stop
commands.
Example 1 Stopping Oracle Clusterware
To stop Oracle Clusterware and its related resources on a specific node, issue the following command:
crsctl stop crs
Example 2 Starting Oracle Clusterware
To start Oracle Clusterware and its related resources on a specific node, issue the following command:
crsctl start crs
Note:
You must run thesecrsctl
commands as the root
user.When the Oracle Clusterware daemons are enabled, they start automatically at the time the node is started. To prevent the daemons from starting, you can disable them using crsctl
commands. You can use crsctl
commands as follows to enable and disable the startup of the Oracle Clusterware daemons.
Issue the following command to enable startup for all of the Oracle Clusterware daemons:
crsctl enable crs
Issue the following command to disable the startup of all of the Oracle Clusterware daemons:
crsctl disable crs
Note:
You must run thesecrsctl
commands as the root
user.You can determine the active version or the software version running on the local node cluster by issuing crsctl
activeversion
and softwarewareversion
commands.
The software version is the binary version of the software on a particular cluster node.
The active version is the lowest software version running in a cluster.
These versions are used while upgrading a cluster.
Example 1 Determining the Active Version
To determine the active version on the local node, issue the following command:
crsctl query crs activeversion
Example 2 Determining the Software Version
To determine the software version on the local node, issue the following command:
crsctl query crs softwareversion
Every time an Oracle Clusterware error occurs, you should use run the diagcollection.pl
script to collect diagnostic information from Oracle Clusterware in trace files. The diagnostics provide additional information so Oracle Support can resolve problems. Run this script from the following location:
CRS_home/bin/diagcollection.pl
Note:
You must run this script as theroot
user.Oracle Clusterware posts alert messages when important events occur. The following is an example of an alert from the CRSD process:
[NORMAL] CLSD-1201: CRSD started on host %s [ERROR] CLSD-1202: CRSD aborted on host %s. Error [%s]. Details in %s. [ERROR] CLSD-1203: Failover failed for the CRS resource %s. Details in %s. [NORMAL] CLSD-1204: Recovering CRS resources for host %s [ERROR] CLSD-1205: Auto-start failed for the CRS resource %s. Details in %s.
The location of this alert log on Linux, UNIX, and Windows systems is in the following directory path, where CRS_home
is the name of the location of Oracle Clusterware: CRS_home
/log/hostname/alerthostname.log
.
The following example shows an EVMD alert:
[NORMAL] CLSD-1401: EVMD started on node %s [ERROR] CLSD-1402: EVMD aborted on node %s. Error [%s]. Details in %s.
You can use crsctl
command to enable resource debugging using the following syntax:
crsctl debug log res "ora.node1.vip:1"
This has the effect of setting the environment variable USER_ORA_DEBUG
, to 1
, before running the start
, stop
, or check
action scripts for the ora.node1.vip
resource.
Note:
You must run thiscrsctl
command as the root
user.Use the crsctl check
command to determine the health of your clusterware as in the following example:
crsctl check crs
Issue the following command to determine the health of individual daemons where daemon
is crsd
, cssd
or evmd
:
crsctl check daemon
Note:
You do not have to be theroot
user to perform health checks.Oracle uses a unified log directory structure to consolidate the Oracle Clusterware component log files. This consolidated structure simplifies diagnostic information collection and assists during data retrieval and problem analysis.
Oracle retains five files that are 20MB in size for the CSSD process and one file that is 10MB in size for the CRSD
and EVMD
processes. In addition, Oracle deletes the oldest log file for any log file group when the maximum storage limit for the group's files exceeds 10MB. Alert files are stored in the directory structures shown in Table F-2.
Table F-2 Locations of Oracle Clusterware Component Log Files
Component | Log File LocationFoot 1 |
---|---|
|
|
For the OCR tools (OCRDUMP, OCRCHECK, OCRCONFIG) record log information in the following location:Foot 2
The OCR server records log information in the following location:Foot 3
|
|
The following path is specific to LinuxFoot 4 :
|
|
CRS_home/log/hostname/cssd |
|
|
|
Oracle RAC RACG |
The Oracle RAC high availability trace files are located in the following two locations: CRS_home/log/hostname/racg and $ORACLE_HOME/log/hostname/racg Core files are in subdirectories of the log directory. Each RACG executable has a subdirectory assigned exclusively for that executable. The name of the RACG executable subdirectory is the same as the name of the executable. |
Footnote 1 The directory structure is the same for Linux, UNIX, and Windows systems.
Footnote 2 To change the amount of logging, edit the path in the CRS_home
/srvm/admin/ocrlog.ini
file.
Footnote 3 To change the amount of logging, edit the path in the CRS_home
/log/
hostname
/crsd/crsd.ini
file.
Footnote 4 This path is dependent upon the installed Linux or UNIX platform.
This following topics in this section explain how to troubleshoot the OCR:
This section explains how to use the OCRDUMP utility to view OCR content for troubleshooting. The OCRDUMP utility enables you to view the OCR contents by writing OCR content to a file or stdout
in a readable format.
You can use a number of options for OCRDUMP. For example, you can limit the output to a key and its descendents. You can also write the contents to an XML file that you can view using a browser. OCRDUMP writes the OCR keys as ASCII strings and values in a datatype format. OCRDUMP retrieves header information based on a best effort basis.
OCRDUMP also creates a log file in CRS_home/log/hostname/client. To change the amount of logging, edit the file CRS_Home
/srvm/admin/ocrlog.ini
.
To change the logging component, edit the entry containing the comploglvl=
entry. For example, to change the logging of the ORCAPI component to 3 and to change the logging of the OCRRAW component to 5, make the following entry in the ocrlog.ini
file:
comploglvl="OCRAPI:3;OCRRAW:5"
Note:
Make sure that you have file creation privileges in theCRS_home
directory before using the OCRDUMP utility.This section describes the OCRDUMP utility command syntax and usage. Run the ocrdump
command with the following syntax where filename
is the name of a target file to which you want Oracle to write the OCR output and where keyname
is the name of a key from which you want Oracle to write OCR subtree content:
ocrdump [file_name|-stdout] [-backupfile backup_file_name] [-keyname keyname] [-xml] [-noheader]
Table F-3 describes the OCRDUMP utility options and option descriptions.
Table F-3 OCRDUMP Options and Option Descriptions
Options | Description |
---|---|
|
The name of a file to which you want OCRDUMP to write output. By default, output from the OCRDUMP utility is written to the predefined output file named |
|
Use this option to redirect the OCRDUMP output to the text terminal that initiated the program. If you do not redirect the output, output from the OCRDUMP utility is written to the predefined output file named |
|
The name of an OCR key whose subtree is to be dumped. |
|
Writes the output in XML format. |
|
Does not print the time at which you ran the command and when the OCR configuration occurred. |
|
Option to identify a backup file. |
|
The name of the backup file with the content you want to view. You can query the backups using the |
The following ocrdump
utility examples extract various types of OCR information and write it to various targets:
ocrdump
Writes the OCR content to a file called OCRDUMPFILE
in the current directory.
ocrdump MYFILE
Writes the OCR content to a file called MYFILE
in the current directory.
ocrdump -stdout -keyname SYSTEM
Writes the OCR content from the subtree of the key SYSTEM
to stdout
.
ocrdump -stdout -xml
Writes the OCR content to stdout
in XML format.
The following OCRDUMP examples show the KEYNAME
, VALUE TYPE
, VALUE
, permission set (user
, group
, world
) and access rights for two sample runs of the ocrdump
command. The following shows the output for the SYSTEM.language
key that has a text value of AMERICAN_AMERICA.WE8ASCII37
.
[SYSTEM.language] ORATEXT : AMERICAN_AMERICA.WE8ASCII37 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : user, GROUP_NAME : group }
The following shows the output for the SYSTEM.version
key that has integer value of 3
:
[SYSTEM.version] UB4 (10) : 3 SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : user, GROUP_NAME : group }
The OCRCHECK utility displays the version of the OCR's block format, total space available and used space, OCRID, and the OCR locations that you have configured. OCRCHECK performs a block-by-block checksum operation for all of the blocks in all of the OCRs that you have configured. It also returns an individual status for each file as well as a result for the overall OCR integrity check.
The following example shows a sample of the OCRCHECK utility output:
Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 262144 Used space (kbytes) : 16256 Available space (kbytes) : 245888 ID : 1918913332 Device/File Name : /dev/raw/raw1 Device/File integrity check succeeded Device/File Name : /dev/raw/raw2 Device/File integrity check succeeded Cluster registry integrity check succeeded
OCRCHECK creates a log file in the directory CRS_home
/log/hostname/client
. To change amount of logging, edit the file CRS_home
/srvm/admin/ocrlog.ini
.
Table F-4 describes common OCR problems with corresponding resolution suggestions.
Table F-4 Common OCR Problems and Solutions
Problem | Solution |
---|---|
Not currently using OCR mirroring and would like to enable it. |
Run the |
An OCR failed and you need to replace it. Error messages in Enterprise Manager or OCR log file. |
Run the |
An OCR has a misconfiguration. |
Run the |
You are experiencing a severe performance effect from OCR processing or you want to remove an OCR for other reasons. |
Run the |
An OCR has failed and before you can fix it, the node need to be rebooted with only one OCR. |
Run the |
Oracle Support may ask you to enable tracing to capture additional information. Because the procedures described in this section may affect performance, only perform these activities with the assistance of Oracle Support. This section includes the following topics:
To generate additional trace information for a running resource, Oracle recommends that you use CRSCTL
commands. For example, issue the following command to turn on debugging for resources:
$ crsctl debug log res "resource_name:level"
For example, to set the value of the USR_ORA_DEBUG
initialization parameter to 1 for the VIP resource, issue the following command:
$ crsctl debug log res ora.cwclu011.vip:1
The event manager daemons (evmd)
running on separate nodes communicate through specific ports. To determine whether the evmd
for a node can send and receive messages, perform the test described in this section while running session 1 in the background.On node 1, session 1 enter:
$ evmwatch –A –t "@timestamp @@"
On node 2, session 2 enter:
$ evmpost -u "hello" [-h nodename]
Session 1 should show output similar to the following:
$ 21-Jul-2007 08:04:26 hello
Ensure that each node can both send and receive messages by executing this test in several permutations.