WEDNESDAY, MARCH 29, 2017

Posts Tagged ‘Oracle University’

Monitoring Exadata Storage Servers


Monitoring Oracle Exadata Storage Servers

by Brian Bream and Suzanne Zorn

This article describes how to use Oracle Enterprise Manager Cloud Control and command-line utilities to monitor Oracle Exadata Storage Servers.

Published September 2014


Collier IT logo
Oracle logo

Proactive monitoring of the components in Oracle Exadata Database Machine (also called Oracle Exadata) can help ensure the highest levels of system availability and performance. This article provides a high-level overview of using Oracle Enterprise Manager Cloud Control 12c and command-line utilities to monitor Oracle Exadata Storage Servers.

Want to comment on this article? Post the link on Facebook's OTN Garage page.  Have a similar article to share? Bring it up on Facebook or Twitter and let's discuss.

More detailed coverage of monitoring Oracle Exadata, including hands-on exercises, is included in the Oracle University class Exadata Database Machine Administration Workshop.

Oracle Exadata Database Machine

Oracle Exadata Database Machine—an engineered system with preconfigured, pretuned, and pretested hardware and software components—is designed to be the highest performing and most available platform for running Oracle Database. Components include database servers (also called compute nodes), Oracle Exadata Storage Servers (also called storage cells), Oracle's Sun Datacenter InfiniBand Switch 36 switches, and Exadata Smart Flash Cache.

Monitoring Technologies

Oracle Exadata uses several technologies to enable the monitoring of its components. These technologies include Oracle Integrated Lights Out Manager (Oracle ILOM), Simple Network Management Protocol (SNMP), and Intelligent Platform Management Interface (IPMI).

  • Oracle ILOM. Oracle ILOM is integrated service processor hardware and software that is preinstalled on Oracle servers, including the storage and database servers in Oracle Exadata. The service processor runs its own embedded operating system and has a dedicated Ethernet port to provide out-of-band server monitoring and management capabilities. Oracle ILOM can be accessed via a browser-based web interface or a command-line interface, and it also provides an SNMP interface and IPMI support.
  • SNMP. SNMP is an open, industry-standard protocol used to monitor and manage devices on an IP network. Oracle Exadata components—including database and storage servers, switches, and power distribution units (PDUs)—use SNMP to raise alerts and report monitoring information. SNMP also enables active management of devices, such as modifying the device configuration remotely.

    Devices run SNMP agents; these agents send status and alerts to an SNMP management console (such as Cloud Control) on the network.

  • IPMI. IPMI is an open, industry-standard protocol used primarily for remote server configuration and management across a network. In Oracle Exadata, the database and storage servers contain built-in IPMI support in Oracle ILOM.

Monitoring Tools

There are two approaches for monitoring Oracle Exadata Storage Servers: using a command-line interface (CLI) or using the graphical interface provided by the Oracle Enterprise Manager Cloud Control 12c console.

  • Command-line interface. The cellcli command is used for management and monitoring of individual Oracle Exadata storage cells. In addition, the dcli (distributed CLI) utility can be used to execute scripts and commands, such as those for shutting down compute nodes, across multiple storage cells from a single interface.
  • Oracle Enterprise Manager Cloud Control 12c. This system management platform provides integrated hardware and software management (see Figure 1). Its hardware view includes a schematic of storage cells, compute nodes, and switches, as well as hardware component alerts. Its software view includes software alerts as well as information about performance, availability, and usage organized by databases, services, and clusters.

Figure 1. Oracle Enterprise Manager Cloud Control 12c screenshot showing the status of various hardware components in Oracle Exadata Database Machine.

Figure 1. Oracle Enterprise Manager Cloud Control 12c screenshot showing the status of various hardware components in Oracle Exadata Database Machine.

No third-party software—including third-party monitoring agents—should be installed on Oracle Exadata Storage Servers. However, Oracle Exadata can be configured to send SNMP alerts to other SNMP managers on the network.

Monitoring Architecture of Oracle Enterprise Cloud Control

Before using Oracle Enterprise Manager Cloud Control 12c with Oracle Exadata, an Oracle Management Agent and Oracle Exadata plug-in must be installed on every Oracle Exadata database server (see Figure 2). This agent monitors software targets, such as the database instances and Oracle Clusterware resources, on the database servers. The plug-in enables monitoring of other hardware components in Oracle Exadata, including the storage servers, switches, and power distribution units.

On the storage servers, the CELLSRV process provides the majority of Oracle Exadata storage services and is the primary storage software component. One of its functions is to process, collect, and store metrics. The Management Server (MS) process receives the metrics data from CELLSRV, keeps a subset of metrics in memory, and writes to an internal disk-based repository hourly. In addition, the MS process can generate alerts for important storage cell hardware or software events.

The Restart Server (RS) process is used to start up and shut down the CELLSRV and MS processes. It also monitors these services to check whether they need to be restarted.

The primary components of Oracle Enterprise Manger Cloud Control 12c are the Oracle Management Service, the Oracle Management Repository, and the Cloud Control Console. The Oracle Management Service communicates with the agents on the managed targets and stores information in the Oracle Management Repository. The Cloud Control Console provides a web-based interface for monitoring and management.

Figure 2. Oracle Enterprise Manager Cloud Control 12c monitoring architecture.

Figure 2. Oracle Enterprise Manager Cloud Control 12c monitoring architecture.

For more information on configuring Oracle Enterprise Manager Cloud Control 12c to monitor Oracle Exadata, please see the Oracle Enterprise Manager Exadata Management Getting Started Guide and the "Managing Oracle Exadata with Oracle Enterprise Manager 12c" white paper.

Note: This article focuses on using Oracle Enterprise Manager Cloud Control 12c to monitor the storage servers in Oracle Exadata. Oracle Enterprise Manager Cloud Control can also be used to monitor other Oracle Exadata hardware and software components.

Metrics, Thresholds, and Alerts

Metrics, thresholds, and alerts are key monitoring concepts. Metrics are runtime properties, such as I/O requests, throughput, or the current server temperature. Alerts are important events, such as hardware failures, software errors, or configuration issues. Thresholds are defined metric levels that, if exceeded, cause an alert to be automatically triggered.

When using Oracle Enterprise Manager Cloud Control 12c, quarantine objects are created when prescribed faults are detected, so that similar faults can be avoided in the future. This capability provides increased availability of the monitored system.

Monitoring Metrics Using the CLI

The cellcli command is run on the storage cells (not on the compute nodes) to display monitoring information. The general format of the command is:

<verb> <object> <modifier> <filter>

Where:

  • verb specifies an action (such as list or describe).
  • object specifies which object the action should be performed on (for example, a cell disk).
  • modifier (optional) specifies how the action should be modified (for example, to apply to all disks or to a specific disk).
  • filter (optional) is similar to a SQL WHERE predicate, and is used to filter the command output.

The following are some basic examples:

list physicaldisk (verb and object)

list cell detail (verb, object, and modifier)

list physicaldisk where diskType='Flashdisk' (verb, object, and filter)

By default, the user cellmonitor can execute read-only queries using the cellcli command. The user celladmin can execute cellcli commands that modify the configuration.

Metrics Terminology

Metrics are recorded measurements; for the storage cells, this includes measurements such as the number of I/O requests or the throughput.

The cellcli command refers to each metric using a composite of abbreviations, for example:

  • CD_IO_RQ_R_SM is the number of I/O requests (IO_RQ) to read (R) small blocks (SM) on a cell disk (CD).
  • GD_IO_BY_W_LG_SEC is the number of MB (IO_BY) of large block (LG) I/O writes (W) per second (SEC) on a grid disk (GD).

In addition, metrics

  • Are associated with a metricObjectName, which is the object being measured (for example, a specific cell disk)
  • Belong to an objectType group (IORM_DATABASE, CELLDISK, CELL_FILESYSTEM, and so on)
  • Have a metricType (Cumulative, Instantaneous, Rate, Transition)
  • Have a measurement unit (for example, milliseconds, microseconds, %, °F, °C)

For more details on Oracle Exadata cell metric attributes, see the Oracle Exadata Storage Server Software User's Guide.

Example Commands

The following examples illustrate basic usage of the cellcli command to display metrics information for Oracle Exadata storage cells.

  • Example 1: Display the metric definitions for a cell. This command can be used to display detailed information about the metrics that are available for a storage cell. As this example shows, one such metric is named CL_CPUT. It is of metricType Instantaneous, it is associated with objectType CELL, and it has a measurement unit of percentage.

    # CellCLI> LIST METRICDEFINITION WHERE objectType ='CELL' DETAIL
    name: CL_CPUT
    description: "Cell CPU Utilization is the percentage of time over
    the previous minute that the system CPUs were not
    idle (from /proc/stat). "
    metricType: Instantaneous objectType: CELL  unit: %
    ...
    
  • Example 2: Display the current metric values for a cell.

    # CellCLI> LIST METRICCURRENT WHERE objectType = 'CELLDISK'
    CD_IO_TM_W_SM_RQ CD_1_cell03    205.5 us/request
    CD_IO_TM_W_SM_RQ CD_2_cell03    93.3  us/request
    CD_IO_TM_W_SM_RQ CD_3_cell03    0.0   us/request
    ...
    
  • Example 3: Display the metric history for a cell. This command can provide insights about the trends for the values of a metric.

    # CellCLI> LIST METRICHISTORY WHERE name like 'CL_.*' -
    AND collectionTime > '2009-10-11T15:28:36-07:00'
    CL_RUNQ cell03_2 	6.0       2009-10-11T15:28:37-07:00
    CL_CPUT cell03_2 	47.6 %    2009-10-11T15:29:36-07:00
    CL_FANS cell03_2 	1         2009-10-11T15:29:36-07:00
    CL_TEMP cell03_2 	0.0 C     2009-10-11T15:29:36-07:00
    CL_RUNQ cell03_2 	5.2       2009-10-11T15:29:37-07:00
    ...
    

Monitoring Metrics Using the Oracle Enterprise Manager Cloud Control Console

Oracle Enterprise Manager Cloud Control provides an intuitive view of Oracle Exadata status, including the status of all hardware and software components. Each storage server is a separate target in Oracle Enterprise Manager Cloud Control, and the Oracle Exadata storage servers are grouped together for collective monitoring of all storage.

The Oracle Enterprise Manager Cloud Control console makes it easy to see the status at a glance, and provides an easy way to drill down to get more detailed information. Figure 3 shows a screenshot of the console.

Figure 3. Oracle Enterprise Manager Cloud Control 12c console.

Figure 3. Oracle Enterprise Manager Cloud Control 12c console.

Monitoring Alerts

Alerts for important events that occur within Oracle Exadata storage cells should be monitored and investigated to help ensure the continued uninterrupted operation of storage. Alerts are assigned a severity of warning, critical, clear, or info. Metrics can be used to signal warning alerts or critical alerts when defined threshold values are exceeded.

Similar to metrics monitoring, the Oracle Exadata CLI or Oracle Enterprise Manager Cloud Control 12c can be used to monitor alerts. The following examples illustrate using the cellcli command to monitor storage cell alerts and create thresholds.

  • Example 1: Display the definitions for all alerts that can be generated on the storage cell.

    CellCLI> LIST ALERTDEFINITION ATTRIBUTES name, metricName, description
    ADRAlert "CELL Incident Error"
    HardwareAlert "Hardware Alert"
    StatefulAlert_CG_IO_RQ_LG CG_IO_RQ_LG "Threshold Based Stateful Alert"
    StatefulAlert_CG_IO_RQ_LG_SEC CG_IO_RQ_LG_SEC "Threshold Based ...Alert"
    StatefulAlert_CG_IO_RQ_SM CG_IO_RQ_SM "Threshold Based Stateful Alert"
    ...
    
  • Example 2: Display the alert history for a storage cell.

    CellCLI> LIST ALERTHISTORY WHERE severity = 'critical' -
    AND examinedBy = '' DETAIL
    CellCLI>
    

    Note: This command produces output only if there are alerts that have not been reviewed by another administrator. No output signifies no missing (that is, not yet reviewed) alerts.

  • Example 3: Create a threshold to trigger an alert. This example uses the CT_IO_WT_LG_RQ metric, which specifies the average number of milliseconds that large I/O requests have waited to be scheduled. The alert is triggered by two consecutive measurements (occurrences=2) over the threshold values. Values of one second over the threshold trigger a warning alert; values of two seconds over the threshold trigger a critical alert.

    CellCLI> CREATE THRESHOLD ct_io_wt_lg_rq.interactive -
             warning=1000, critical=2000, comparison='>', -
             occurrences=2, observation=5
    CellCLI>
    

    Note: The CREATE THRESHOLD command creates a threshold that specifies the conditions for the generation of a metric alert. The absence of an output indicates that the threshold was created successfully.

When alerts are triggered, they automatically appear in the Oracle Enterprise Manager Cloud Control console. Administrators can select any Oracle Exadata target, view alerts on that target, and drill down to display more details about each alert. In addition, the Cloud Control console can be used to set up rules for metric alerts. See the chapter on "Using Incident Management" in the Oracle Enterprise Manager Cloud Control Administrator's Guide for more information.

Comparison: Monitoring Storage Server Availability

Both the CLI and Oracle Enterprise Manager Cloud Control 12c can be used to monitor storage server availability. To use the command-line approach, administrators must explicitly execute the following cellcli command on an Oracle Exadata storage server, and then check the status in the command output:

# CellCLI> list cell detail
...
    cellsrvStatus:      running
    msStatus:           running
    rsStatus:           running

Oracle Enterprise Manager Cloud Control 12c provides a visual overview of the availability of the storage cells, with color-coded green and red status symbols to indicate available and unavailable, respectively (see Figure 4). With Oracle Enterprise Manager Cloud Control, administrators can determine the status at a glance, and then drill down to the affected components for more information.

Figure 4. Oracle Enterprise Manager Cloud Control 12c provides status information at a glance.

Figure 4. Oracle Enterprise Manager Cloud Control 12c provides status information at a glance.

Comparing Metrics Across Multiple Storage Servers

Oracle Enterprise Manager Cloud Control 12c makes it easy to compare metrics across multiple storage servers. Figure 5 contains a screenshot showing a comparison of the average read response times of Oracle Exadata cell disks. The built-in graphing capability easily shows the relative performance of multiple cell disks.

Figure 5. Using Oracle Enterprise Manager Cloud Control 12c to compare metrics across multiple servers.

Figure 5. Using Oracle Enterprise Manager Cloud Control 12c to compare metrics across multiple servers.

The distributed CLI utility, dcli, can be used to execute commands across multiple servers on Oracle Exadata. However, it is much more complex to manually aggregate statistics reported in its command output and make comparisons across multiple storage servers.

Final Thoughts

Oracle Enterprise Manger Cloud Control 12c provides easy-to-use, intuitive monitoring of Oracle Exadata Storage Servers. Status information is visually displayed, making it easy to pinpoint problems and then drill down for more detailed information. In addition, Oracle Enterprise Manger Cloud Control provides capabilities for easily comparing metrics across multiple storage servers.

The CLI (cellcli command and dcli utility) can be useful for scripts and creating processes that need to be repeated.

See Also

The following resources are available for Oracle Exadata Database Machine and Oracle Enterprise Manager Cloud Control:

About the Authors

Brian Bream has been involved in information technology since 1981. He currently serves at the Chief Technology Officer at Collier IT. Brian also functions as an Oracle University instructor delivering courses that focus on Oracle's engineered systems, Oracle Solaris, Oracle Linux, and Oracle's virtualization and storage solutions.

Collier IT is a full-service Platinum-level Oracle partner that provides Oracle solutions, including Oracle engineered systems, software, services, and Oracle University training. Collier IT provides its customers with complete, open, and integrated solutions, from business concept to complete implementation. Since 1991, Collier IT has specialized in creating and implementing robust infrastructure solutions for organizations of all sizes. Collier IT was a go-to partner for Sun Microsystems for ten years prior to the acquisition of Sun by Oracle in 2009. As a former Sun Executive Partner and now as a Platinum-level Oracle partner, Collier IT is aligned to provide customers with complete solutions that address their business needs.

Suzanne Zorn has over twenty years of experience as a writer and editor of technical documentation for the computer industry. Previously, Suzanne worked as a consultant for Sun Microsystems' Professional Services division specializing in system configuration and planning. Suzanne has a BS in computer science and engineering from Bucknell University and an MS in computer science from Rensselaer Polytechnic Institute.

Revision 1.0, 09/08/2014

Follow us:
Blog | Facebook | Twitter | YouTube

Oracle Virtualization

Taking a few minutes to cover some of the virtualization solutions from Oracle in one of our Oracle University classes

OTN’s Rick Ramsey and I 11/20

FSS – More Process Scheduling

The last blog post gave us some brief descriptions of the various scheduling classes in Solaris. I focused on the Time Sharing (TS) class since it is the default. Hopefully we can see that the TS (and the IA class for that matter) makes its decisions based on how the threads are using the CPU. Are we CPU intensive or are we I/O intensive? It works well, but it doesn’t provide the administrator fine-grain control as it relates to resource management.

To address this, The Fair Share Scheduler (FSS) was added to Solaris in the Solaris 9 release.

The primary benefit of FSS is to allow the admin an ability to identify and dispatch processes and their threads based upon their importance as determined by the business and implemented by the administrator.

We saw the complexity of the TS dispatch table in the earlier post. Here we see the FSS table has no such complexity.

FSS Dispatch Table
#
# Fair Share Scheduler Configuration
#
RES=1000
#
# Time Quantum
#
QUANTUM=110

In FSS we use the concept of CPU shares. These shares allow the admin a fine level of granularity to carve up CPU resources. We are no longer limited to allocating an entire CPU. The admin designates the importance of the workload by assigning to it a number of shares. You dictate importance by assigning a larger number of shares to those workloads that carry a higher importance. Shares ARE NOT the same as CPU caps nor CPU resource usage. Shares simply define the relative importance of workloads in comparison to other workloads where CPU resource usage is an actual measurement of consumption. A workload may be given 50% of the shares yet at a point in time may be only consuming 5% of the CPU. I look at a CPU share as a minimum guaranty of CPU allocation, not as a cap on CPU consumption.

When we assign shares to a work load, we need to be aware of the shares that are already assigned. It is the ratio of shares assigned to one workload compared to all of the other workloads.

I speak of FSS in a “Horizontal” and a “Vertical” aspect when I’m delivering for Oracle University. In Solaris 9 we were able to define projects in the /etc/project file. This is the vertical aspect. In Solaris 10 Non-Global Zones were introduced and brought with it the Horizontal aspect. I assign shares horizontally across the various zones and then vertically within each zone in the /etc/project file if needed.

By default the Non-Global zones use the default scheduling class. If the system is updated with a new default class, they will obtain the new setting when booted or rebooted. The recommended scheduler to use with Non-Global Zones is the FSS. The preferred way is to set the system default scheduler to FSS and all zones then inherit it.

To display information about the loaded scheduling classes, run priocntl -l


root@solaris:~# priocntl -l
CONFIGURED CLASSES
==================

SYS (System Class)

TS (Time Sharing)
Configured TS User Priority Range: -60 through 60

SDC (System Duty-Cycle Class)

FX (Fixed priority)
Configured FX User Priority Range: 0 through 60

IA (Interactive)
Configured IA User Priority Range: -60 through 60

priocntl can be used to view or set scheduling parameters for a specified process.

To determine the global priority of a process run ps -ecl

root@solaris:~# ps -ecl #The c displays properties of the scheduler, we see the class (CLS) and the priority (PRI)
 F S    UID   PID  PPID  CLS PRI     ADDR     SZ    WCHAN TTY         TIME CMD
 1 T      0     0     0  SYS  96        ?      0          ?           0:01 sched
 1 S      0     5     0  SDC  99        ?      0        ? ?           0:02 zpool-rp
 1 S      0     6     0  SDC  99        ?      0        ? ?           0:00 kmem_tas
 0 S      0     1     0   TS  59        ?    720        ? ?           0:00 init
 1 S      0     2     0  SYS  98        ?      0        ? ?           0:00 pageout
 1 S      0     3     0  SYS  60        ?      0        ? ?           0:01 fsflush
 1 S      0     7     0  SYS  60        ?      0        ? ?           0:00 intrd
 1 S      0     8     0  SYS  60        ?      0        ? ?           0:00 vmtasks
 0 S      0   869     1   TS  59        ?   1461        ? ?           0:05 nscd
 0 S      0    11     1   TS  59        ?   3949        ? ?           0:11 svc.star
 0 S      0    13     1   TS  59        ?   5007        ? ?           0:32 svc.conf
 0 S      0   164     1   TS  59        ?    822        ? ?           0:00 vbiosd
 0 S     16   460     1   TS  59        ?   1323        ? ?           0:00 nwamd

To set the default scheduling class use dispadmin -d FSS and then dispadmin -d to ensure it changed. Then run dispadmin -l to see that it loaded.

root@solaris:~# dispadmin -d
dispadmin: Default scheduling class is not set
root@solaris:~# dispadmin -d FSS
root@solaris:~# dispadmin -d
FSS	(Fair Share)
root@solaris:~# dispadmin -l
CONFIGURED CLASSES
==================

SYS	(System Class)
TS	(Time Sharing)
SDC	(System Duty-Cycle Class)
FX	(Fixed Priority)
IA	(Interactive)
FSS	(Fair Share)

Manually move add of the running processes into the FSS class and then verify with the ps command.

root@solaris:~# priocntl -s -c FSS -i all
root@solaris:~# ps -ef -o class,zone,fname | grep -v CLS | sort -k2 | more
 FSS   global auditd
 FSS   global automoun
 FSS   global automoun
 FSS   global bash
 FSS   global bash
 FSS   global bonobo-a
 FSS   global clock-ap
 FSS   global console-
 FSS   global cron
 FSS   global cupsd
 FSS   global dbus-dae
 FSS   global dbus-dae
 FSS   global dbus-lau
 FSS   global dbus-lau

Finally move init over to the FSS class so all children will inherit.

root@solaris:~# ps -ecf | grep init
    root     1     0   TS  59 16:33:44 ?           0:00 /usr/sbin/init
root@solaris:~# priocntl -s -c FSS -i pid 1
root@solaris:~# ps -ecf | grep init
    root     1     0  FSS  29 16:33:44 ?           0:00 /usr/sbin/init

With the FSS all set, we now assign shares to our Non-Global Zones
zonecfg -z
set cpu-shares=number of shares
exit zonecfg

To display CPU consumption run prstat -Z

Solaris Process Scheduling

The Oracle Solaris kernel has a number of process scheduling classes available.

A brief review.

Timesharing (TS) This is the default class for processes and their associated kernel threads. Priorities in the class are dynamically adjusted based upon CPU utilization in an attempt to allocate processor resources evenly.

Interactive (IA) This is an enhanced version of TS. Some texts reference this in conjunction with TS, i.e. TS/IA. This class applies to the in-focus window in the GUI. It provides extra resources to processes associated with that specific window.

Fair Share Scheduler (FSS) This class is “share based” rather than priority based. The threads associated with this class are scheduled based on the associated shares assigned to them and the processor’s utilization.

Fixed-Priority (FX) Priorities for these threads are fixed regardless of how they interact with the CPU. They do not vary dynamically over the life of the thread.

System (SYS) Used to schedule kernel threads. These threads are bound meaning unlike the userland threads listed above they do not context switch off the CPU if their time quantum is consumed. They run until they are blocked or they complete.

Real-Time (RT) These threads are fixed-priority with a fixed time duration. They are the one of the highest priority classes with only interrupts carrying a higher priority.

As it relates to the priority ranges for the scheduling classes the userland classes (TS/IA/FX/FSS) carry the lowest priorities, 0-59. The SYS class is next ranging from 60-99. At the top (ignoring INT) is the RT class at 100-159.

We can mix scheduling classes on the same system but there are some considerations to keep in mind.

  • Avoid having the FSS, TS, IA, and FX classes share the same processor set (pset)
  • All processes that run on a processor set must be in the same scheduling class so they do not compete for the same CPUs
  • To avoid starving applications, use processor sets for FSS and FX class applications

TS and IA as well as FSS and RT can be in the same processor set.

We can look at how the TS class (default) makes its decisions by looking at the dispatch table itself
Dispatch Table
This table is indexed by the priority level of the thread. To understand an entry lets use priority 30 as an example.
The left most column is marked as ts_quantum -> Timesharing quantum. This specifies the time in milliseconds (identified by the RES=1000) that the thread will be allocated before it will be involuntary context-switched off the CPU.

A context switch is the process of storing and restoring the state of a process so that execution can be resumed at the same point at a later time. We store this state in the Light Weight Process (LWP) that the thread was bound to. Basically a thread binds to a LWP. A LWP binds to a kernel thread (kthr) and the kernel thread is presented to the kernel dispatcher. When the thread is placed on the CPU (hardware strand/thread) the contents of the LWP is loaded onto the CPU and the CPU starts execution at that point. When the thread is removed from the CPU (it is preempted, the time quantum is consumed, it sleeps) the contents of the CPU registers are loaded into the LWP and then it is removed from the CPU and returns to the dispatch queue to compete again based on priority with the other threads that request access to the CPU.

So, at a priority of 30 the thread has 80 milliseconds to complete its work or it will be forced off the CPU. In the event that it does not complete its work, the system will context switch the thread off the CPU AND change its priority. We see the next column ts_tqexp ->Timesharing time quantum expired. This column identifies the new priority of the thread. In this case ts_tqexp is now 20. So, we consumed our time quantum, we were involuntary context switched off the CPU, and we had our priority lowered when we returned to the dispatch queue. At a priority of 20 our time quantum is now 120 milliseconds. Lowered priority to keep the thread from “hogging” the CPU but an increase in the time quantum in hopes that when we do get back on the CPU we have more time to complete our work.

The next column identifies our new priority when the thread returns from a sleep state. There is no reason to keep a thread on a CPU if there is no work to be done. When the thread enters a sleep state we leave the CPU, this is a VOLUNTARY context switch and we are placed on the sleep queue. When we leave the sleep queue we are not placed back on the CPU. We are placed back on the dispatch queue to compete with the other threads to gain access to the CPU. Since we have been off the CPU for a period of time we advance the priority in this case we were initially dispatched at 30, we voluntary context switched off the CPU, and when we woke we were given the priority of 53. Notice that at a priority of 53, or new time quantum is 40. The priority increased from 30 to 53 but the time quantum decreased from 80 to 40. We get you back on the CPU faster but limit the amount of time you get on the CPU.

The context and involuntary context switching can be seen in a mpstat commmand. csw and icsw
mpstat

The last two columns deal with the attempt to prevent CPU starvation. ts_maxwait is a measurement (in seconds) that if exceed without access to the CPU the value of ts_lwait is assigned. notice that for all but priority 59 that this value is set to 0. So when we exceed 0 (meaning we have been off the CPU for 1 second) we are assigned the value of ts_lwait. Again using 30 as our example we would go from a priority of 30 to a priority of 53 if we were prevented access to the CPU for 1 second.

In the middle of all of this we have preemption. The Solaris Kernel is fully preemptible. All threads, even SYS threads, will be preempted if a higher priority thread hits the kernel dispatcher while a lower priority thread is running. The thread isn’t allowed to complete its time quantum, it is context switched off the CPU.

And don’t forget, there is IA, FSS, FX, SYS, RT, and INT threads that adds to the chaos if allowed and why I provided some of the guidance I listed earlier.

We see some use of FX and quite a bit more of the FSS with Solaris zones. I’ll talk about FSS in another post.

I’ll spend a better part of a day whiteboarding all of this in the Oracle University Solaris Performance Management Class.

Solaris 10

Solaris 11