FRIDAY, APRIL 19, 2024

Posts Tagged ‘Oracle’

Oracle ACE – for Systems Technologies

I’ve been honored with the designation of Oracle ACE for Systems Technologies.

Rick Ramsey @OTN_Garage OTN Garage Blog and I have been working to add a USA based Solaris/Systems Technologies ACE for close to two years. My most sincere thanks and appreciation to him for working with me and to my sponsors within Oracle for their recommendation.

My Profile can be viewed here and Ricks blog post can be viewed here.

The entry to this prestigious group is challenging. My application was declined twice. This is about technologists that freely give back to the community. Blogs, Social Media (I’m @snatchbrain), Authoring Technical Articles, Public Presentations, Industry Certifications, and networking are but some of the entry points that are evaluated by the program to gain admittance.

I have two personal challenges, the first is to assist anyone that would like to see what this program is about and to assist those interested in attaining the designation, and the second, is to continue to evangelize the technologies and work towards the designation of Ace Director.

If you are interested, please reach out to me and let me know. I want to help those that have an interest.

So, Collier IT has two ACE designated employees, myself, and Seth Miller. We look to growing that number both internal and external.

Twitter:
@snatchbrain
@Seth_M_Miller
@oracleace
@otn_garage

IMAG0565

Monitoring Exadata Storage Servers


Monitoring Oracle Exadata Storage Servers

by Brian Bream and Suzanne Zorn

This article describes how to use Oracle Enterprise Manager Cloud Control and command-line utilities to monitor Oracle Exadata Storage Servers.

Published September 2014


Collier IT logo
Oracle logo

Proactive monitoring of the components in Oracle Exadata Database Machine (also called Oracle Exadata) can help ensure the highest levels of system availability and performance. This article provides a high-level overview of using Oracle Enterprise Manager Cloud Control 12c and command-line utilities to monitor Oracle Exadata Storage Servers.

Want to comment on this article? Post the link on Facebook's OTN Garage page.  Have a similar article to share? Bring it up on Facebook or Twitter and let's discuss.

More detailed coverage of monitoring Oracle Exadata, including hands-on exercises, is included in the Oracle University class Exadata Database Machine Administration Workshop.

Oracle Exadata Database Machine

Oracle Exadata Database Machine—an engineered system with preconfigured, pretuned, and pretested hardware and software components—is designed to be the highest performing and most available platform for running Oracle Database. Components include database servers (also called compute nodes), Oracle Exadata Storage Servers (also called storage cells), Oracle's Sun Datacenter InfiniBand Switch 36 switches, and Exadata Smart Flash Cache.

Monitoring Technologies

Oracle Exadata uses several technologies to enable the monitoring of its components. These technologies include Oracle Integrated Lights Out Manager (Oracle ILOM), Simple Network Management Protocol (SNMP), and Intelligent Platform Management Interface (IPMI).

  • Oracle ILOM. Oracle ILOM is integrated service processor hardware and software that is preinstalled on Oracle servers, including the storage and database servers in Oracle Exadata. The service processor runs its own embedded operating system and has a dedicated Ethernet port to provide out-of-band server monitoring and management capabilities. Oracle ILOM can be accessed via a browser-based web interface or a command-line interface, and it also provides an SNMP interface and IPMI support.
  • SNMP. SNMP is an open, industry-standard protocol used to monitor and manage devices on an IP network. Oracle Exadata components—including database and storage servers, switches, and power distribution units (PDUs)—use SNMP to raise alerts and report monitoring information. SNMP also enables active management of devices, such as modifying the device configuration remotely.

    Devices run SNMP agents; these agents send status and alerts to an SNMP management console (such as Cloud Control) on the network.

  • IPMI. IPMI is an open, industry-standard protocol used primarily for remote server configuration and management across a network. In Oracle Exadata, the database and storage servers contain built-in IPMI support in Oracle ILOM.

Monitoring Tools

There are two approaches for monitoring Oracle Exadata Storage Servers: using a command-line interface (CLI) or using the graphical interface provided by the Oracle Enterprise Manager Cloud Control 12c console.

  • Command-line interface. The cellcli command is used for management and monitoring of individual Oracle Exadata storage cells. In addition, the dcli (distributed CLI) utility can be used to execute scripts and commands, such as those for shutting down compute nodes, across multiple storage cells from a single interface.
  • Oracle Enterprise Manager Cloud Control 12c. This system management platform provides integrated hardware and software management (see Figure 1). Its hardware view includes a schematic of storage cells, compute nodes, and switches, as well as hardware component alerts. Its software view includes software alerts as well as information about performance, availability, and usage organized by databases, services, and clusters.

Figure 1. Oracle Enterprise Manager Cloud Control 12c screenshot showing the status of various hardware components in Oracle Exadata Database Machine.

Figure 1. Oracle Enterprise Manager Cloud Control 12c screenshot showing the status of various hardware components in Oracle Exadata Database Machine.

No third-party software—including third-party monitoring agents—should be installed on Oracle Exadata Storage Servers. However, Oracle Exadata can be configured to send SNMP alerts to other SNMP managers on the network.

Monitoring Architecture of Oracle Enterprise Cloud Control

Before using Oracle Enterprise Manager Cloud Control 12c with Oracle Exadata, an Oracle Management Agent and Oracle Exadata plug-in must be installed on every Oracle Exadata database server (see Figure 2). This agent monitors software targets, such as the database instances and Oracle Clusterware resources, on the database servers. The plug-in enables monitoring of other hardware components in Oracle Exadata, including the storage servers, switches, and power distribution units.

On the storage servers, the CELLSRV process provides the majority of Oracle Exadata storage services and is the primary storage software component. One of its functions is to process, collect, and store metrics. The Management Server (MS) process receives the metrics data from CELLSRV, keeps a subset of metrics in memory, and writes to an internal disk-based repository hourly. In addition, the MS process can generate alerts for important storage cell hardware or software events.

The Restart Server (RS) process is used to start up and shut down the CELLSRV and MS processes. It also monitors these services to check whether they need to be restarted.

The primary components of Oracle Enterprise Manger Cloud Control 12c are the Oracle Management Service, the Oracle Management Repository, and the Cloud Control Console. The Oracle Management Service communicates with the agents on the managed targets and stores information in the Oracle Management Repository. The Cloud Control Console provides a web-based interface for monitoring and management.

Figure 2. Oracle Enterprise Manager Cloud Control 12c monitoring architecture.

Figure 2. Oracle Enterprise Manager Cloud Control 12c monitoring architecture.

For more information on configuring Oracle Enterprise Manager Cloud Control 12c to monitor Oracle Exadata, please see the Oracle Enterprise Manager Exadata Management Getting Started Guide and the "Managing Oracle Exadata with Oracle Enterprise Manager 12c" white paper.

Note: This article focuses on using Oracle Enterprise Manager Cloud Control 12c to monitor the storage servers in Oracle Exadata. Oracle Enterprise Manager Cloud Control can also be used to monitor other Oracle Exadata hardware and software components.

Metrics, Thresholds, and Alerts

Metrics, thresholds, and alerts are key monitoring concepts. Metrics are runtime properties, such as I/O requests, throughput, or the current server temperature. Alerts are important events, such as hardware failures, software errors, or configuration issues. Thresholds are defined metric levels that, if exceeded, cause an alert to be automatically triggered.

When using Oracle Enterprise Manager Cloud Control 12c, quarantine objects are created when prescribed faults are detected, so that similar faults can be avoided in the future. This capability provides increased availability of the monitored system.

Monitoring Metrics Using the CLI

The cellcli command is run on the storage cells (not on the compute nodes) to display monitoring information. The general format of the command is:

<verb> <object> <modifier> <filter>

Where:

  • verb specifies an action (such as list or describe).
  • object specifies which object the action should be performed on (for example, a cell disk).
  • modifier (optional) specifies how the action should be modified (for example, to apply to all disks or to a specific disk).
  • filter (optional) is similar to a SQL WHERE predicate, and is used to filter the command output.

The following are some basic examples:

list physicaldisk (verb and object)

list cell detail (verb, object, and modifier)

list physicaldisk where diskType='Flashdisk' (verb, object, and filter)

By default, the user cellmonitor can execute read-only queries using the cellcli command. The user celladmin can execute cellcli commands that modify the configuration.

Metrics Terminology

Metrics are recorded measurements; for the storage cells, this includes measurements such as the number of I/O requests or the throughput.

The cellcli command refers to each metric using a composite of abbreviations, for example:

  • CD_IO_RQ_R_SM is the number of I/O requests (IO_RQ) to read (R) small blocks (SM) on a cell disk (CD).
  • GD_IO_BY_W_LG_SEC is the number of MB (IO_BY) of large block (LG) I/O writes (W) per second (SEC) on a grid disk (GD).

In addition, metrics

  • Are associated with a metricObjectName, which is the object being measured (for example, a specific cell disk)
  • Belong to an objectType group (IORM_DATABASE, CELLDISK, CELL_FILESYSTEM, and so on)
  • Have a metricType (Cumulative, Instantaneous, Rate, Transition)
  • Have a measurement unit (for example, milliseconds, microseconds, %, °F, °C)

For more details on Oracle Exadata cell metric attributes, see the Oracle Exadata Storage Server Software User's Guide.

Example Commands

The following examples illustrate basic usage of the cellcli command to display metrics information for Oracle Exadata storage cells.

  • Example 1: Display the metric definitions for a cell. This command can be used to display detailed information about the metrics that are available for a storage cell. As this example shows, one such metric is named CL_CPUT. It is of metricType Instantaneous, it is associated with objectType CELL, and it has a measurement unit of percentage.

    # CellCLI> LIST METRICDEFINITION WHERE objectType ='CELL' DETAIL
    name: CL_CPUT
    description: "Cell CPU Utilization is the percentage of time over
    the previous minute that the system CPUs were not
    idle (from /proc/stat). "
    metricType: Instantaneous objectType: CELL  unit: %
    ...
    
  • Example 2: Display the current metric values for a cell.

    # CellCLI> LIST METRICCURRENT WHERE objectType = 'CELLDISK'
    CD_IO_TM_W_SM_RQ CD_1_cell03    205.5 us/request
    CD_IO_TM_W_SM_RQ CD_2_cell03    93.3  us/request
    CD_IO_TM_W_SM_RQ CD_3_cell03    0.0   us/request
    ...
    
  • Example 3: Display the metric history for a cell. This command can provide insights about the trends for the values of a metric.

    # CellCLI> LIST METRICHISTORY WHERE name like 'CL_.*' -
    AND collectionTime > '2009-10-11T15:28:36-07:00'
    CL_RUNQ cell03_2 	6.0       2009-10-11T15:28:37-07:00
    CL_CPUT cell03_2 	47.6 %    2009-10-11T15:29:36-07:00
    CL_FANS cell03_2 	1         2009-10-11T15:29:36-07:00
    CL_TEMP cell03_2 	0.0 C     2009-10-11T15:29:36-07:00
    CL_RUNQ cell03_2 	5.2       2009-10-11T15:29:37-07:00
    ...
    

Monitoring Metrics Using the Oracle Enterprise Manager Cloud Control Console

Oracle Enterprise Manager Cloud Control provides an intuitive view of Oracle Exadata status, including the status of all hardware and software components. Each storage server is a separate target in Oracle Enterprise Manager Cloud Control, and the Oracle Exadata storage servers are grouped together for collective monitoring of all storage.

The Oracle Enterprise Manager Cloud Control console makes it easy to see the status at a glance, and provides an easy way to drill down to get more detailed information. Figure 3 shows a screenshot of the console.

Figure 3. Oracle Enterprise Manager Cloud Control 12c console.

Figure 3. Oracle Enterprise Manager Cloud Control 12c console.

Monitoring Alerts

Alerts for important events that occur within Oracle Exadata storage cells should be monitored and investigated to help ensure the continued uninterrupted operation of storage. Alerts are assigned a severity of warning, critical, clear, or info. Metrics can be used to signal warning alerts or critical alerts when defined threshold values are exceeded.

Similar to metrics monitoring, the Oracle Exadata CLI or Oracle Enterprise Manager Cloud Control 12c can be used to monitor alerts. The following examples illustrate using the cellcli command to monitor storage cell alerts and create thresholds.

  • Example 1: Display the definitions for all alerts that can be generated on the storage cell.

    CellCLI> LIST ALERTDEFINITION ATTRIBUTES name, metricName, description
    ADRAlert "CELL Incident Error"
    HardwareAlert "Hardware Alert"
    StatefulAlert_CG_IO_RQ_LG CG_IO_RQ_LG "Threshold Based Stateful Alert"
    StatefulAlert_CG_IO_RQ_LG_SEC CG_IO_RQ_LG_SEC "Threshold Based ...Alert"
    StatefulAlert_CG_IO_RQ_SM CG_IO_RQ_SM "Threshold Based Stateful Alert"
    ...
    
  • Example 2: Display the alert history for a storage cell.

    CellCLI> LIST ALERTHISTORY WHERE severity = 'critical' -
    AND examinedBy = '' DETAIL
    CellCLI>
    

    Note: This command produces output only if there are alerts that have not been reviewed by another administrator. No output signifies no missing (that is, not yet reviewed) alerts.

  • Example 3: Create a threshold to trigger an alert. This example uses the CT_IO_WT_LG_RQ metric, which specifies the average number of milliseconds that large I/O requests have waited to be scheduled. The alert is triggered by two consecutive measurements (occurrences=2) over the threshold values. Values of one second over the threshold trigger a warning alert; values of two seconds over the threshold trigger a critical alert.

    CellCLI> CREATE THRESHOLD ct_io_wt_lg_rq.interactive -
             warning=1000, critical=2000, comparison='>', -
             occurrences=2, observation=5
    CellCLI>
    

    Note: The CREATE THRESHOLD command creates a threshold that specifies the conditions for the generation of a metric alert. The absence of an output indicates that the threshold was created successfully.

When alerts are triggered, they automatically appear in the Oracle Enterprise Manager Cloud Control console. Administrators can select any Oracle Exadata target, view alerts on that target, and drill down to display more details about each alert. In addition, the Cloud Control console can be used to set up rules for metric alerts. See the chapter on "Using Incident Management" in the Oracle Enterprise Manager Cloud Control Administrator's Guide for more information.

Comparison: Monitoring Storage Server Availability

Both the CLI and Oracle Enterprise Manager Cloud Control 12c can be used to monitor storage server availability. To use the command-line approach, administrators must explicitly execute the following cellcli command on an Oracle Exadata storage server, and then check the status in the command output:

# CellCLI> list cell detail
...
    cellsrvStatus:      running
    msStatus:           running
    rsStatus:           running

Oracle Enterprise Manager Cloud Control 12c provides a visual overview of the availability of the storage cells, with color-coded green and red status symbols to indicate available and unavailable, respectively (see Figure 4). With Oracle Enterprise Manager Cloud Control, administrators can determine the status at a glance, and then drill down to the affected components for more information.

Figure 4. Oracle Enterprise Manager Cloud Control 12c provides status information at a glance.

Figure 4. Oracle Enterprise Manager Cloud Control 12c provides status information at a glance.

Comparing Metrics Across Multiple Storage Servers

Oracle Enterprise Manager Cloud Control 12c makes it easy to compare metrics across multiple storage servers. Figure 5 contains a screenshot showing a comparison of the average read response times of Oracle Exadata cell disks. The built-in graphing capability easily shows the relative performance of multiple cell disks.

Figure 5. Using Oracle Enterprise Manager Cloud Control 12c to compare metrics across multiple servers.

Figure 5. Using Oracle Enterprise Manager Cloud Control 12c to compare metrics across multiple servers.

The distributed CLI utility, dcli, can be used to execute commands across multiple servers on Oracle Exadata. However, it is much more complex to manually aggregate statistics reported in its command output and make comparisons across multiple storage servers.

Final Thoughts

Oracle Enterprise Manger Cloud Control 12c provides easy-to-use, intuitive monitoring of Oracle Exadata Storage Servers. Status information is visually displayed, making it easy to pinpoint problems and then drill down for more detailed information. In addition, Oracle Enterprise Manger Cloud Control provides capabilities for easily comparing metrics across multiple storage servers.

The CLI (cellcli command and dcli utility) can be useful for scripts and creating processes that need to be repeated.

See Also

The following resources are available for Oracle Exadata Database Machine and Oracle Enterprise Manager Cloud Control:

About the Authors

Brian Bream has been involved in information technology since 1981. He currently serves at the Chief Technology Officer at Collier IT. Brian also functions as an Oracle University instructor delivering courses that focus on Oracle's engineered systems, Oracle Solaris, Oracle Linux, and Oracle's virtualization and storage solutions.

Collier IT is a full-service Platinum-level Oracle partner that provides Oracle solutions, including Oracle engineered systems, software, services, and Oracle University training. Collier IT provides its customers with complete, open, and integrated solutions, from business concept to complete implementation. Since 1991, Collier IT has specialized in creating and implementing robust infrastructure solutions for organizations of all sizes. Collier IT was a go-to partner for Sun Microsystems for ten years prior to the acquisition of Sun by Oracle in 2009. As a former Sun Executive Partner and now as a Platinum-level Oracle partner, Collier IT is aligned to provide customers with complete solutions that address their business needs.

Suzanne Zorn has over twenty years of experience as a writer and editor of technical documentation for the computer industry. Previously, Suzanne worked as a consultant for Sun Microsystems' Professional Services division specializing in system configuration and planning. Suzanne has a BS in computer science and engineering from Bucknell University and an MS in computer science from Rensselaer Polytechnic Institute.

Revision 1.0, 09/08/2014

Follow us:
Blog | Facebook | Twitter | YouTube

Oracle Virtualization

Taking a few minutes to cover some of the virtualization solutions from Oracle in one of our Oracle University classes

Old School System Admin – A dying breed?

I’ve been involved in some fashion of IT for over thirty years now. Running a FidoNet BBS (The Twilight Zone) in 1986 was my first interaction with a human element and where I first experienced the concept of a System Administrator. Prior to that I was flipping 16 toggle switches to load stb’s, rbr’s and the like and reading the results on 16 LEDs keeping Navy Frigates moving through the water. What fun !

I’ve been in the trenches, racking and stacking, installing the OS and Applications, backing up and restoring, and fixing broken systems and applications. And at a point in time, that was my definition of a System Administrator. It isn’t any longer.

I’m asked “What is the real underlying problem for SysAdmins now that everything is virtual” As I mentioned in my interview with Rick Ramsey at OOW13 elasticity is the biggest challenge for the SysAdmins today. Business process demands are more complex and need to be provisioned faster than ever before. These demands span a large number of technologies and the SysAdmin needs to know them all.

  • A typical real world multi-tiered application may include Oracle Databases, Oracle Weblogic Server, Oracle Fusion Middleware, web servers, security infrastructures, and messaging
  • They also have specific infrastructure requirements like servers, OS, storage, network, and load balancers
  • They may be running on Engineered Systems like Exadata, Exalogic, Big Data Appliance, and ZFS Storage Appliances
  • And they need to be deployed in multiple environments like development, test, user acceptance, and production

The SysAdmin’s must be able to leverage technologies such as Virtualization, Infrastructure as a Service, Database as a Service, Middleware as a Service, Storage/Network provisioning, pooling and consolidation of hardware resources. They need to understand the technologies and how they interact with each other to ensure they can successfully deploy them and once deployed, manage them.

New/improved management tools need to be mastered to be successful. The SysAdmin role has been far too dependent on performing repetitive tasks and working in a reactionary mode attempting to locate and address/repair faults manually. As the complexity of our data centers continue to grow, this model becomes a significant limiting factor. We need to understand tools like Enterprise Manager 12c which allow for applications to be rapidly deployed by the end users like developers/testers themselves through self service, with metering and charge back.

The SysAdmins need to accept the automation that these new tools provide. To shun them will lead to their undoing.

And the knowledge level needed has never been greater. As an example, I expect a SysAdmin to know Dtrace if they are running Solaris or Oracle Linux. I expect them to have some basic understanding of the kernel, system calls, and the like so they understand what Dtrace tells them. I expect a SysAdmin to be comfortable working in a Database and a middleware environment. They need to understand the flow from the various tiers and how to provision those tiers rapidly when there is a business demand.

Basically the System Administrator must grow a much larger skill set to be successful. Don’t grow vertically in one technology, grow horizontally amongst many technologies. Engineer solutions with the specialist teams and know enough of the solutions to have an intelligent conversation. Know enough to assist in the architecture of the solution. Be proactive, not reactive.

So to answer the question “now that all is virtual, what’s the REAL underlying problem for sysadmins? Provisioning strategy?”

I think the complexity of a provisioning strategy is the REAL underlying challenge. Understanding which of the available technologies make sense, where each solution fits into the stack, how to provision and re-provision the solution in the stack, and how to manage it will be the new measure of success or failure in the SysAdmin realm. The tools are there, and for those that embrace the technology and the tools should have a very bright future.

And for those that don’t, a warning. It is coming from the other direction. I interact with DBAs frequently that are managing the entire Exadata appliance. They’ve been to the Solaris or Linux Admin classes, they’ve attended the Exadata class. The “SysAdmin” team isn’t a user, root or otherwise, on the system. The Database group has become system administrators on the majority of those systems. I’ve made similar observations in the Exalogic engineered system as well.

Embrace the technologies and the tools. Reach out and extend yourself. Throw away the old “rules”. Soon no one will really care what is under the hood. It won’t matter if it is Solaris or Oracle Linux, if it is SPARC or x86, what will matter is the IT staff’s ability to deploy the business demands on schedule.

OpenWorld 2013 Interview with Rick Ramsey.

I was honored to be asked by Rick Ramsey Twitter @OldManRamsey of Oracle Technology Network to be interviewed to discuss “What’s the biggest change data centers are facing today, and what does Collier IT recommend?” We were pressed for time, we only had 10 – 12 minutes and didn’t get much Solaris talk in. We are speaking of starting up a recurring Google Places environment so we can continue these short snippits of Oracle Technology with a focus around Operating Environments.

Here is the interview. Personal thanks to Oracle Corporation, Oracle Partner Network, Oracle Technology Network, Oracle University, and my employer Collier IT for allowing me this opportunity.