Saturday, 13 September, 2014 11:33 Written by Brian B
by Brian Bream and Suzanne Zorn
Published September 2014
Proactive monitoring of the components in Oracle Exadata Database Machine (also called Oracle Exadata) can help ensure the highest levels of system availability and performance. This article provides a high-level overview of using Oracle Enterprise Manager Cloud Control 12c and command-line utilities to monitor Oracle Exadata Storage Servers.
More detailed coverage of monitoring Oracle Exadata, including hands-on exercises, is included in the Oracle University class Exadata Database Machine Administration Workshop.
Oracle Exadata Database Machine—an engineered system with preconfigured, pretuned, and pretested hardware and software components—is designed to be the highest performing and most available platform for running Oracle Database. Components include database servers (also called compute nodes), Oracle Exadata Storage Servers (also called storage cells), Oracle's Sun Datacenter InfiniBand Switch 36 switches, and Exadata Smart Flash Cache.
Oracle Exadata uses several technologies to enable the monitoring of its components. These technologies include Oracle Integrated Lights Out Manager (Oracle ILOM), Simple Network Management Protocol (SNMP), and Intelligent Platform Management Interface (IPMI).
Devices run SNMP agents; these agents send status and alerts to an SNMP management console (such as Cloud Control) on the network.
There are two approaches for monitoring Oracle Exadata Storage Servers: using a command-line interface (CLI) or using the graphical interface provided by the Oracle Enterprise Manager Cloud Control 12c console.
cellclicommand is used for management and monitoring of individual Oracle Exadata storage cells. In addition, the
dcli(distributed CLI) utility can be used to execute scripts and commands, such as those for shutting down compute nodes, across multiple storage cells from a single interface.
Figure 1. Oracle Enterprise Manager Cloud Control 12c screenshot showing the status of various hardware components in Oracle Exadata Database Machine.
No third-party software—including third-party monitoring agents—should be installed on Oracle Exadata Storage Servers. However, Oracle Exadata can be configured to send SNMP alerts to other SNMP managers on the network.
Before using Oracle Enterprise Manager Cloud Control 12c with Oracle Exadata, an Oracle Management Agent and Oracle Exadata plug-in must be installed on every Oracle Exadata database server (see Figure 2). This agent monitors software targets, such as the database instances and Oracle Clusterware resources, on the database servers. The plug-in enables monitoring of other hardware components in Oracle Exadata, including the storage servers, switches, and power distribution units.
On the storage servers, the CELLSRV process provides the majority of Oracle Exadata storage services and is the primary storage software component. One of its functions is to process, collect, and store metrics. The Management Server (MS) process receives the metrics data from CELLSRV, keeps a subset of metrics in memory, and writes to an internal disk-based repository hourly. In addition, the MS process can generate alerts for important storage cell hardware or software events.
The Restart Server (RS) process is used to start up and shut down the CELLSRV and MS processes. It also monitors these services to check whether they need to be restarted.
The primary components of Oracle Enterprise Manger Cloud Control 12c are the Oracle Management Service, the Oracle Management Repository, and the Cloud Control Console. The Oracle Management Service communicates with the agents on the managed targets and stores information in the Oracle Management Repository. The Cloud Control Console provides a web-based interface for monitoring and management.
Figure 2. Oracle Enterprise Manager Cloud Control 12c monitoring architecture.
For more information on configuring Oracle Enterprise Manager Cloud Control 12c to monitor Oracle Exadata, please see the Oracle Enterprise Manager Exadata Management Getting Started Guide and the "Managing Oracle Exadata with Oracle Enterprise Manager 12c" white paper.
Note: This article focuses on using Oracle Enterprise Manager Cloud Control 12c to monitor the storage servers in Oracle Exadata. Oracle Enterprise Manager Cloud Control can also be used to monitor other Oracle Exadata hardware and software components.
Metrics, thresholds, and alerts are key monitoring concepts. Metrics are runtime properties, such as I/O requests, throughput, or the current server temperature. Alerts are important events, such as hardware failures, software errors, or configuration issues. Thresholds are defined metric levels that, if exceeded, cause an alert to be automatically triggered.
When using Oracle Enterprise Manager Cloud Control 12c, quarantine objects are created when prescribed faults are detected, so that similar faults can be avoided in the future. This capability provides increased availability of the monitored system.
cellcli command is run on the storage cells (not on the compute nodes) to display monitoring information. The general format of the command is:
<verb> <object> <modifier> <filter>
verbspecifies an action (such as list or describe).
objectspecifies which object the action should be performed on (for example, a cell disk).
modifier(optional) specifies how the action should be modified (for example, to apply to all disks or to a specific disk).
filter(optional) is similar to a SQL WHERE predicate, and is used to filter the command output.
The following are some basic examples:
list physicaldisk (verb and object)
list cell detail (verb, object, and modifier)
list physicaldisk where diskType='Flashdisk' (verb, object, and filter)
By default, the user
cellmonitor can execute read-only queries using the
cellcli command. The user
celladmin can execute
cellcli commands that modify the configuration.
Metrics are recorded measurements; for the storage cells, this includes measurements such as the number of I/O requests or the throughput.
cellcli command refers to each metric using a composite of abbreviations, for example:
CD_IO_RQ_R_SMis the number of I/O requests (
IO_RQ) to read (
R) small blocks (
SM) on a cell disk (
GD_IO_BY_W_LG_SECis the number of MB (
IO_BY) of large block (
LG) I/O writes (
W) per second (
SEC) on a grid disk (
In addition, metrics
metricObjectName, which is the object being measured (for example, a specific cell disk)
CELL_FILESYSTEM, and so on)
For more details on Oracle Exadata cell metric attributes, see the Oracle Exadata Storage Server Software User's Guide.
The following examples illustrate basic usage of the
cellcli command to display metrics information for Oracle Exadata storage cells.
CL_CPUT. It is of
Instantaneous, it is associated with
CELL, and it has a measurement unit of percentage.
# CellCLI> LIST METRICDEFINITION WHERE objectType ='CELL' DETAIL name: CL_CPUT description: "Cell CPU Utilization is the percentage of time over the previous minute that the system CPUs were not idle (from /proc/stat). " metricType: Instantaneous objectType: CELL unit: % ...
# CellCLI> LIST METRICCURRENT WHERE objectType = 'CELLDISK' CD_IO_TM_W_SM_RQ CD_1_cell03 205.5 us/request CD_IO_TM_W_SM_RQ CD_2_cell03 93.3 us/request CD_IO_TM_W_SM_RQ CD_3_cell03 0.0 us/request ...
# CellCLI> LIST METRICHISTORY WHERE name like 'CL_.*' - AND collectionTime > '2009-10-11T15:28:36-07:00' CL_RUNQ cell03_2 6.0 2009-10-11T15:28:37-07:00 CL_CPUT cell03_2 47.6 % 2009-10-11T15:29:36-07:00 CL_FANS cell03_2 1 2009-10-11T15:29:36-07:00 CL_TEMP cell03_2 0.0 C 2009-10-11T15:29:36-07:00 CL_RUNQ cell03_2 5.2 2009-10-11T15:29:37-07:00 ...
Oracle Enterprise Manager Cloud Control provides an intuitive view of Oracle Exadata status, including the status of all hardware and software components. Each storage server is a separate target in Oracle Enterprise Manager Cloud Control, and the Oracle Exadata storage servers are grouped together for collective monitoring of all storage.
The Oracle Enterprise Manager Cloud Control console makes it easy to see the status at a glance, and provides an easy way to drill down to get more detailed information. Figure 3 shows a screenshot of the console.
Figure 3. Oracle Enterprise Manager Cloud Control 12c console.
Alerts for important events that occur within Oracle Exadata storage cells should be monitored and investigated to help ensure the continued uninterrupted operation of storage. Alerts are assigned a severity of
info. Metrics can be used to signal warning alerts or critical alerts when defined threshold values are exceeded.
Similar to metrics monitoring, the Oracle Exadata CLI or Oracle Enterprise Manager Cloud Control 12c can be used to monitor alerts. The following examples illustrate using the
cellcli command to monitor storage cell alerts and create thresholds.
CellCLI> LIST ALERTDEFINITION ATTRIBUTES name, metricName, description ADRAlert "CELL Incident Error" HardwareAlert "Hardware Alert" StatefulAlert_CG_IO_RQ_LG CG_IO_RQ_LG "Threshold Based Stateful Alert" StatefulAlert_CG_IO_RQ_LG_SEC CG_IO_RQ_LG_SEC "Threshold Based ...Alert" StatefulAlert_CG_IO_RQ_SM CG_IO_RQ_SM "Threshold Based Stateful Alert" ...
CellCLI> LIST ALERTHISTORY WHERE severity = 'critical' - AND examinedBy = '' DETAIL CellCLI>
Note: This command produces output only if there are alerts that have not been reviewed by another administrator. No output signifies no missing (that is, not yet reviewed) alerts.
CT_IO_WT_LG_RQmetric, which specifies the average number of milliseconds that large I/O requests have waited to be scheduled. The alert is triggered by two consecutive measurements (
occurrences=2) over the threshold values. Values of one second over the threshold trigger a warning alert; values of two seconds over the threshold trigger a critical alert.
CellCLI> CREATE THRESHOLD ct_io_wt_lg_rq.interactive - warning=1000, critical=2000, comparison='>', - occurrences=2, observation=5 CellCLI>
CREATE THRESHOLD command creates a threshold that specifies the conditions for the generation of a metric alert. The absence of an output indicates that the threshold was created successfully.
When alerts are triggered, they automatically appear in the Oracle Enterprise Manager Cloud Control console. Administrators can select any Oracle Exadata target, view alerts on that target, and drill down to display more details about each alert. In addition, the Cloud Control console can be used to set up rules for metric alerts. See the chapter on "Using Incident Management" in the Oracle Enterprise Manager Cloud Control Administrator's Guide for more information.
Both the CLI and Oracle Enterprise Manager Cloud Control 12c can be used to monitor storage server availability. To use the command-line approach, administrators must explicitly execute the following
cellcli command on an Oracle Exadata storage server, and then check the status in the command output:
# CellCLI> list cell detail ... cellsrvStatus: running msStatus: running rsStatus: running
Oracle Enterprise Manager Cloud Control 12c provides a visual overview of the availability of the storage cells, with color-coded green and red status symbols to indicate available and unavailable, respectively (see Figure 4). With Oracle Enterprise Manager Cloud Control, administrators can determine the status at a glance, and then drill down to the affected components for more information.
Figure 4. Oracle Enterprise Manager Cloud Control 12c provides status information at a glance.
Oracle Enterprise Manager Cloud Control 12c makes it easy to compare metrics across multiple storage servers. Figure 5 contains a screenshot showing a comparison of the average read response times of Oracle Exadata cell disks. The built-in graphing capability easily shows the relative performance of multiple cell disks.
Figure 5. Using Oracle Enterprise Manager Cloud Control 12c to compare metrics across multiple servers.
The distributed CLI utility,
dcli, can be used to execute commands across multiple servers on Oracle Exadata. However, it is much more complex to manually aggregate statistics reported in its command output and make comparisons across multiple storage servers.
Oracle Enterprise Manger Cloud Control 12c provides easy-to-use, intuitive monitoring of Oracle Exadata Storage Servers. Status information is visually displayed, making it easy to pinpoint problems and then drill down for more detailed information. In addition, Oracle Enterprise Manger Cloud Control provides capabilities for easily comparing metrics across multiple storage servers.
The CLI (
cellcli command and
dcli utility) can be useful for scripts and creating processes that need to be repeated.
The following resources are available for Oracle Exadata Database Machine and Oracle Enterprise Manager Cloud Control:
Brian Bream has been involved in information technology since 1981. He currently serves at the Chief Technology Officer at Collier IT. Brian also functions as an Oracle University instructor delivering courses that focus on Oracle's engineered systems, Oracle Solaris, Oracle Linux, and Oracle's virtualization and storage solutions.
Collier IT is a full-service Platinum-level Oracle partner that provides Oracle solutions, including Oracle engineered systems, software, services, and Oracle University training. Collier IT provides its customers with complete, open, and integrated solutions, from business concept to complete implementation. Since 1991, Collier IT has specialized in creating and implementing robust infrastructure solutions for organizations of all sizes. Collier IT was a go-to partner for Sun Microsystems for ten years prior to the acquisition of Sun by Oracle in 2009. As a former Sun Executive Partner and now as a Platinum-level Oracle partner, Collier IT is aligned to provide customers with complete solutions that address their business needs.
Suzanne Zorn has over twenty years of experience as a writer and editor of technical documentation for the computer industry. Previously, Suzanne worked as a consultant for Sun Microsystems' Professional Services division specializing in system configuration and planning. Suzanne has a BS in computer science and engineering from Bucknell University and an MS in computer science from Rensselaer Polytechnic Institute.
|Revision 1.0, 09/08/2014|
Monday, 09 December, 2013 23:48 Written by Brian B
Taking a few minutes to cover some of the virtualization solutions from Oracle in one of our Oracle University classes
Thursday, 21 November, 2013 22:28 Written by Brian B
Something I hope that Rick and I can do with some regularity.
Tuesday, 29 October, 2013 22:09 Written by Brian B
The last blog post gave us some brief descriptions of the various scheduling classes in Solaris. I focused on the Time Sharing (TS) class since it is the default. Hopefully we can see that the TS (and the IA class for that matter) makes its decisions based on how the threads are using the CPU. Are we CPU intensive or are we I/O intensive? It works well, but it doesn’t provide the administrator fine-grain control as it relates to resource management.
To address this, The Fair Share Scheduler (FSS) was added to Solaris in the Solaris 9 release.
The primary benefit of FSS is to allow the admin an ability to identify and dispatch processes and their threads based upon their importance as determined by the business and implemented by the administrator.
We saw the complexity of the TS dispatch table in the earlier post. Here we see the FSS table has no such complexity.
In FSS we use the concept of CPU shares. These shares allow the admin a fine level of granularity to carve up CPU resources. We are no longer limited to allocating an entire CPU. The admin designates the importance of the workload by assigning to it a number of shares. You dictate importance by assigning a larger number of shares to those workloads that carry a higher importance. Shares ARE NOT the same as CPU caps nor CPU resource usage. Shares simply define the relative importance of workloads in comparison to other workloads where CPU resource usage is an actual measurement of consumption. A workload may be given 50% of the shares yet at a point in time may be only consuming 5% of the CPU. I look at a CPU share as a minimum guaranty of CPU allocation, not as a cap on CPU consumption.
When we assign shares to a work load, we need to be aware of the shares that are already assigned. It is the ratio of shares assigned to one workload compared to all of the other workloads.
I speak of FSS in a “Horizontal” and a “Vertical” aspect when I’m delivering for Oracle University. In Solaris 9 we were able to define projects in the /etc/project file. This is the vertical aspect. In Solaris 10 Non-Global Zones were introduced and brought with it the Horizontal aspect. I assign shares horizontally across the various zones and then vertically within each zone in the /etc/project file if needed.
By default the Non-Global zones use the default scheduling class. If the system is updated with a new default class, they will obtain the new setting when booted or rebooted. The recommended scheduler to use with Non-Global Zones is the FSS. The preferred way is to set the system default scheduler to FSS and all zones then inherit it.
To display information about the loaded scheduling classes, run priocntl -l
SYS (System Class)
TS (Time Sharing)
Configured TS User Priority Range: -60 through 60
SDC (System Duty-Cycle Class)
FX (Fixed priority)
Configured FX User Priority Range: 0 through 60
Configured IA User Priority Range: -60 through 60
priocntl can be used to view or set scheduling parameters for a specified process.
To determine the global priority of a process run ps -ecl
root@solaris:~# ps -ecl #The c displays properties of the scheduler, we see the class (CLS) and the priority (PRI) F S UID PID PPID CLS PRI ADDR SZ WCHAN TTY TIME CMD 1 T 0 0 0 SYS 96 ? 0 ? 0:01 sched 1 S 0 5 0 SDC 99 ? 0 ? ? 0:02 zpool-rp 1 S 0 6 0 SDC 99 ? 0 ? ? 0:00 kmem_tas 0 S 0 1 0 TS 59 ? 720 ? ? 0:00 init 1 S 0 2 0 SYS 98 ? 0 ? ? 0:00 pageout 1 S 0 3 0 SYS 60 ? 0 ? ? 0:01 fsflush 1 S 0 7 0 SYS 60 ? 0 ? ? 0:00 intrd 1 S 0 8 0 SYS 60 ? 0 ? ? 0:00 vmtasks 0 S 0 869 1 TS 59 ? 1461 ? ? 0:05 nscd 0 S 0 11 1 TS 59 ? 3949 ? ? 0:11 svc.star 0 S 0 13 1 TS 59 ? 5007 ? ? 0:32 svc.conf 0 S 0 164 1 TS 59 ? 822 ? ? 0:00 vbiosd 0 S 16 460 1 TS 59 ? 1323 ? ? 0:00 nwamd
To set the default scheduling class use dispadmin -d FSS and then dispadmin -d to ensure it changed. Then run dispadmin -l to see that it loaded.
root@solaris:~# dispadmin -d dispadmin: Default scheduling class is not set root@solaris:~# dispadmin -d FSS root@solaris:~# dispadmin -d FSS (Fair Share) root@solaris:~# dispadmin -l CONFIGURED CLASSES ================== SYS (System Class) TS (Time Sharing) SDC (System Duty-Cycle Class) FX (Fixed Priority) IA (Interactive) FSS (Fair Share)
Manually move add of the running processes into the FSS class and then verify with the ps command.
root@solaris:~# priocntl -s -c FSS -i all root@solaris:~# ps -ef -o class,zone,fname | grep -v CLS | sort -k2 | more FSS global auditd FSS global automoun FSS global automoun FSS global bash FSS global bash FSS global bonobo-a FSS global clock-ap FSS global console- FSS global cron FSS global cupsd FSS global dbus-dae FSS global dbus-dae FSS global dbus-lau FSS global dbus-lau
Finally move init over to the FSS class so all children will inherit.
root@solaris:~# ps -ecf | grep init root 1 0 TS 59 16:33:44 ? 0:00 /usr/sbin/init root@solaris:~# priocntl -s -c FSS -i pid 1 root@solaris:~# ps -ecf | grep init root 1 0 FSS 29 16:33:44 ? 0:00 /usr/sbin/init
With the FSS all set, we now assign shares to our Non-Global Zones
set cpu-shares=number of shares
To display CPU consumption run prstat -Z
Tuesday, 22 October, 2013 20:24 Written by Brian B
The Oracle Solaris kernel has a number of process scheduling classes available.
A brief review.
Timesharing (TS) This is the default class for processes and their associated kernel threads. Priorities in the class are dynamically adjusted based upon CPU utilization in an attempt to allocate processor resources evenly.
Interactive (IA) This is an enhanced version of TS. Some texts reference this in conjunction with TS, i.e. TS/IA. This class applies to the in-focus window in the GUI. It provides extra resources to processes associated with that specific window.
Fair Share Scheduler (FSS) This class is “share based” rather than priority based. The threads associated with this class are scheduled based on the associated shares assigned to them and the processor’s utilization.
Fixed-Priority (FX) Priorities for these threads are fixed regardless of how they interact with the CPU. They do not vary dynamically over the life of the thread.
System (SYS) Used to schedule kernel threads. These threads are bound meaning unlike the userland threads listed above they do not context switch off the CPU if their time quantum is consumed. They run until they are blocked or they complete.
Real-Time (RT) These threads are fixed-priority with a fixed time duration. They are the one of the highest priority classes with only interrupts carrying a higher priority.
As it relates to the priority ranges for the scheduling classes the userland classes (TS/IA/FX/FSS) carry the lowest priorities, 0-59. The SYS class is next ranging from 60-99. At the top (ignoring INT) is the RT class at 100-159.
We can mix scheduling classes on the same system but there are some considerations to keep in mind.
TS and IA as well as FSS and RT can be in the same processor set.
We can look at how the TS class (default) makes its decisions by looking at the dispatch table itself
This table is indexed by the priority level of the thread. To understand an entry lets use priority 30 as an example.
The left most column is marked as ts_quantum -> Timesharing quantum. This specifies the time in milliseconds (identified by the RES=1000) that the thread will be allocated before it will be involuntary context-switched off the CPU.
A context switch is the process of storing and restoring the state of a process so that execution can be resumed at the same point at a later time. We store this state in the Light Weight Process (LWP) that the thread was bound to. Basically a thread binds to a LWP. A LWP binds to a kernel thread (kthr) and the kernel thread is presented to the kernel dispatcher. When the thread is placed on the CPU (hardware strand/thread) the contents of the LWP is loaded onto the CPU and the CPU starts execution at that point. When the thread is removed from the CPU (it is preempted, the time quantum is consumed, it sleeps) the contents of the CPU registers are loaded into the LWP and then it is removed from the CPU and returns to the dispatch queue to compete again based on priority with the other threads that request access to the CPU.
So, at a priority of 30 the thread has 80 milliseconds to complete its work or it will be forced off the CPU. In the event that it does not complete its work, the system will context switch the thread off the CPU AND change its priority. We see the next column ts_tqexp ->Timesharing time quantum expired. This column identifies the new priority of the thread. In this case ts_tqexp is now 20. So, we consumed our time quantum, we were involuntary context switched off the CPU, and we had our priority lowered when we returned to the dispatch queue. At a priority of 20 our time quantum is now 120 milliseconds. Lowered priority to keep the thread from “hogging” the CPU but an increase in the time quantum in hopes that when we do get back on the CPU we have more time to complete our work.
The next column identifies our new priority when the thread returns from a sleep state. There is no reason to keep a thread on a CPU if there is no work to be done. When the thread enters a sleep state we leave the CPU, this is a VOLUNTARY context switch and we are placed on the sleep queue. When we leave the sleep queue we are not placed back on the CPU. We are placed back on the dispatch queue to compete with the other threads to gain access to the CPU. Since we have been off the CPU for a period of time we advance the priority in this case we were initially dispatched at 30, we voluntary context switched off the CPU, and when we woke we were given the priority of 53. Notice that at a priority of 53, or new time quantum is 40. The priority increased from 30 to 53 but the time quantum decreased from 80 to 40. We get you back on the CPU faster but limit the amount of time you get on the CPU.
The last two columns deal with the attempt to prevent CPU starvation. ts_maxwait is a measurement (in seconds) that if exceed without access to the CPU the value of ts_lwait is assigned. notice that for all but priority 59 that this value is set to 0. So when we exceed 0 (meaning we have been off the CPU for 1 second) we are assigned the value of ts_lwait. Again using 30 as our example we would go from a priority of 30 to a priority of 53 if we were prevented access to the CPU for 1 second.
In the middle of all of this we have preemption. The Solaris Kernel is fully preemptible. All threads, even SYS threads, will be preempted if a higher priority thread hits the kernel dispatcher while a lower priority thread is running. The thread isn’t allowed to complete its time quantum, it is context switched off the CPU.
And don’t forget, there is IA, FSS, FX, SYS, RT, and INT threads that adds to the chaos if allowed and why I provided some of the guidance I listed earlier.
We see some use of FX and quite a bit more of the FSS with Solaris zones. I’ll talk about FSS in another post.
I’ll spend a better part of a day whiteboarding all of this in the Oracle University Solaris Performance Management Class.