Friday, 06 February, 2015 12:33 Written by Brian B
I’ve been honored with the designation of Oracle ACE for Systems Technologies.
Rick Ramsey @OTN_Garage OTN Garage Blog and I have been working to add a USA based Solaris/Systems Technologies ACE for close to two years. My most sincere thanks and appreciation to him for working with me and to my sponsors within Oracle for their recommendation.
The entry to this prestigious group is challenging. My application was declined twice. This is about technologists that freely give back to the community. Blogs, Social Media (I’m @snatchbrain), Authoring Technical Articles, Public Presentations, Industry Certifications, and networking are but some of the entry points that are evaluated by the program to gain admittance.
I have two personal challenges, the first is to assist anyone that would like to see what this program is about and to assist those interested in attaining the designation, and the second, is to continue to evangelize the technologies and work towards the designation of Ace Director.
If you are interested, please reach out to me and let me know. I want to help those that have an interest.
Saturday, 13 September, 2014 11:33 Written by Brian B
by Brian Bream and Suzanne Zorn
Published September 2014
Proactive monitoring of the components in Oracle Exadata Database Machine (also called Oracle Exadata) can help ensure the highest levels of system availability and performance. This article provides a high-level overview of using Oracle Enterprise Manager Cloud Control 12c and command-line utilities to monitor Oracle Exadata Storage Servers.
More detailed coverage of monitoring Oracle Exadata, including hands-on exercises, is included in the Oracle University class Exadata Database Machine Administration Workshop.
Oracle Exadata Database Machine—an engineered system with preconfigured, pretuned, and pretested hardware and software components—is designed to be the highest performing and most available platform for running Oracle Database. Components include database servers (also called compute nodes), Oracle Exadata Storage Servers (also called storage cells), Oracle's Sun Datacenter InfiniBand Switch 36 switches, and Exadata Smart Flash Cache.
Oracle Exadata uses several technologies to enable the monitoring of its components. These technologies include Oracle Integrated Lights Out Manager (Oracle ILOM), Simple Network Management Protocol (SNMP), and Intelligent Platform Management Interface (IPMI).
Devices run SNMP agents; these agents send status and alerts to an SNMP management console (such as Cloud Control) on the network.
There are two approaches for monitoring Oracle Exadata Storage Servers: using a command-line interface (CLI) or using the graphical interface provided by the Oracle Enterprise Manager Cloud Control 12c console.
cellclicommand is used for management and monitoring of individual Oracle Exadata storage cells. In addition, the
dcli(distributed CLI) utility can be used to execute scripts and commands, such as those for shutting down compute nodes, across multiple storage cells from a single interface.
Figure 1. Oracle Enterprise Manager Cloud Control 12c screenshot showing the status of various hardware components in Oracle Exadata Database Machine.
No third-party software—including third-party monitoring agents—should be installed on Oracle Exadata Storage Servers. However, Oracle Exadata can be configured to send SNMP alerts to other SNMP managers on the network.
Before using Oracle Enterprise Manager Cloud Control 12c with Oracle Exadata, an Oracle Management Agent and Oracle Exadata plug-in must be installed on every Oracle Exadata database server (see Figure 2). This agent monitors software targets, such as the database instances and Oracle Clusterware resources, on the database servers. The plug-in enables monitoring of other hardware components in Oracle Exadata, including the storage servers, switches, and power distribution units.
On the storage servers, the CELLSRV process provides the majority of Oracle Exadata storage services and is the primary storage software component. One of its functions is to process, collect, and store metrics. The Management Server (MS) process receives the metrics data from CELLSRV, keeps a subset of metrics in memory, and writes to an internal disk-based repository hourly. In addition, the MS process can generate alerts for important storage cell hardware or software events.
The Restart Server (RS) process is used to start up and shut down the CELLSRV and MS processes. It also monitors these services to check whether they need to be restarted.
The primary components of Oracle Enterprise Manger Cloud Control 12c are the Oracle Management Service, the Oracle Management Repository, and the Cloud Control Console. The Oracle Management Service communicates with the agents on the managed targets and stores information in the Oracle Management Repository. The Cloud Control Console provides a web-based interface for monitoring and management.
Figure 2. Oracle Enterprise Manager Cloud Control 12c monitoring architecture.
For more information on configuring Oracle Enterprise Manager Cloud Control 12c to monitor Oracle Exadata, please see the Oracle Enterprise Manager Exadata Management Getting Started Guide and the "Managing Oracle Exadata with Oracle Enterprise Manager 12c" white paper.
Note: This article focuses on using Oracle Enterprise Manager Cloud Control 12c to monitor the storage servers in Oracle Exadata. Oracle Enterprise Manager Cloud Control can also be used to monitor other Oracle Exadata hardware and software components.
Metrics, thresholds, and alerts are key monitoring concepts. Metrics are runtime properties, such as I/O requests, throughput, or the current server temperature. Alerts are important events, such as hardware failures, software errors, or configuration issues. Thresholds are defined metric levels that, if exceeded, cause an alert to be automatically triggered.
When using Oracle Enterprise Manager Cloud Control 12c, quarantine objects are created when prescribed faults are detected, so that similar faults can be avoided in the future. This capability provides increased availability of the monitored system.
cellcli command is run on the storage cells (not on the compute nodes) to display monitoring information. The general format of the command is:
<verb> <object> <modifier> <filter>
verbspecifies an action (such as list or describe).
objectspecifies which object the action should be performed on (for example, a cell disk).
modifier(optional) specifies how the action should be modified (for example, to apply to all disks or to a specific disk).
filter(optional) is similar to a SQL WHERE predicate, and is used to filter the command output.
The following are some basic examples:
list physicaldisk (verb and object)
list cell detail (verb, object, and modifier)
list physicaldisk where diskType='Flashdisk' (verb, object, and filter)
By default, the user
cellmonitor can execute read-only queries using the
cellcli command. The user
celladmin can execute
cellcli commands that modify the configuration.
Metrics are recorded measurements; for the storage cells, this includes measurements such as the number of I/O requests or the throughput.
cellcli command refers to each metric using a composite of abbreviations, for example:
CD_IO_RQ_R_SMis the number of I/O requests (
IO_RQ) to read (
R) small blocks (
SM) on a cell disk (
GD_IO_BY_W_LG_SECis the number of MB (
IO_BY) of large block (
LG) I/O writes (
W) per second (
SEC) on a grid disk (
In addition, metrics
metricObjectName, which is the object being measured (for example, a specific cell disk)
CELL_FILESYSTEM, and so on)
For more details on Oracle Exadata cell metric attributes, see the Oracle Exadata Storage Server Software User's Guide.
The following examples illustrate basic usage of the
cellcli command to display metrics information for Oracle Exadata storage cells.
CL_CPUT. It is of
Instantaneous, it is associated with
CELL, and it has a measurement unit of percentage.
# CellCLI> LIST METRICDEFINITION WHERE objectType ='CELL' DETAIL name: CL_CPUT description: "Cell CPU Utilization is the percentage of time over the previous minute that the system CPUs were not idle (from /proc/stat). " metricType: Instantaneous objectType: CELL unit: % ...
# CellCLI> LIST METRICCURRENT WHERE objectType = 'CELLDISK' CD_IO_TM_W_SM_RQ CD_1_cell03 205.5 us/request CD_IO_TM_W_SM_RQ CD_2_cell03 93.3 us/request CD_IO_TM_W_SM_RQ CD_3_cell03 0.0 us/request ...
# CellCLI> LIST METRICHISTORY WHERE name like 'CL_.*' - AND collectionTime > '2009-10-11T15:28:36-07:00' CL_RUNQ cell03_2 6.0 2009-10-11T15:28:37-07:00 CL_CPUT cell03_2 47.6 % 2009-10-11T15:29:36-07:00 CL_FANS cell03_2 1 2009-10-11T15:29:36-07:00 CL_TEMP cell03_2 0.0 C 2009-10-11T15:29:36-07:00 CL_RUNQ cell03_2 5.2 2009-10-11T15:29:37-07:00 ...
Oracle Enterprise Manager Cloud Control provides an intuitive view of Oracle Exadata status, including the status of all hardware and software components. Each storage server is a separate target in Oracle Enterprise Manager Cloud Control, and the Oracle Exadata storage servers are grouped together for collective monitoring of all storage.
The Oracle Enterprise Manager Cloud Control console makes it easy to see the status at a glance, and provides an easy way to drill down to get more detailed information. Figure 3 shows a screenshot of the console.
Figure 3. Oracle Enterprise Manager Cloud Control 12c console.
Alerts for important events that occur within Oracle Exadata storage cells should be monitored and investigated to help ensure the continued uninterrupted operation of storage. Alerts are assigned a severity of
info. Metrics can be used to signal warning alerts or critical alerts when defined threshold values are exceeded.
Similar to metrics monitoring, the Oracle Exadata CLI or Oracle Enterprise Manager Cloud Control 12c can be used to monitor alerts. The following examples illustrate using the
cellcli command to monitor storage cell alerts and create thresholds.
CellCLI> LIST ALERTDEFINITION ATTRIBUTES name, metricName, description ADRAlert "CELL Incident Error" HardwareAlert "Hardware Alert" StatefulAlert_CG_IO_RQ_LG CG_IO_RQ_LG "Threshold Based Stateful Alert" StatefulAlert_CG_IO_RQ_LG_SEC CG_IO_RQ_LG_SEC "Threshold Based ...Alert" StatefulAlert_CG_IO_RQ_SM CG_IO_RQ_SM "Threshold Based Stateful Alert" ...
CellCLI> LIST ALERTHISTORY WHERE severity = 'critical' - AND examinedBy = '' DETAIL CellCLI>
Note: This command produces output only if there are alerts that have not been reviewed by another administrator. No output signifies no missing (that is, not yet reviewed) alerts.
CT_IO_WT_LG_RQmetric, which specifies the average number of milliseconds that large I/O requests have waited to be scheduled. The alert is triggered by two consecutive measurements (
occurrences=2) over the threshold values. Values of one second over the threshold trigger a warning alert; values of two seconds over the threshold trigger a critical alert.
CellCLI> CREATE THRESHOLD ct_io_wt_lg_rq.interactive - warning=1000, critical=2000, comparison='>', - occurrences=2, observation=5 CellCLI>
CREATE THRESHOLD command creates a threshold that specifies the conditions for the generation of a metric alert. The absence of an output indicates that the threshold was created successfully.
When alerts are triggered, they automatically appear in the Oracle Enterprise Manager Cloud Control console. Administrators can select any Oracle Exadata target, view alerts on that target, and drill down to display more details about each alert. In addition, the Cloud Control console can be used to set up rules for metric alerts. See the chapter on "Using Incident Management" in the Oracle Enterprise Manager Cloud Control Administrator's Guide for more information.
Both the CLI and Oracle Enterprise Manager Cloud Control 12c can be used to monitor storage server availability. To use the command-line approach, administrators must explicitly execute the following
cellcli command on an Oracle Exadata storage server, and then check the status in the command output:
# CellCLI> list cell detail ... cellsrvStatus: running msStatus: running rsStatus: running
Oracle Enterprise Manager Cloud Control 12c provides a visual overview of the availability of the storage cells, with color-coded green and red status symbols to indicate available and unavailable, respectively (see Figure 4). With Oracle Enterprise Manager Cloud Control, administrators can determine the status at a glance, and then drill down to the affected components for more information.
Figure 4. Oracle Enterprise Manager Cloud Control 12c provides status information at a glance.
Oracle Enterprise Manager Cloud Control 12c makes it easy to compare metrics across multiple storage servers. Figure 5 contains a screenshot showing a comparison of the average read response times of Oracle Exadata cell disks. The built-in graphing capability easily shows the relative performance of multiple cell disks.
Figure 5. Using Oracle Enterprise Manager Cloud Control 12c to compare metrics across multiple servers.
The distributed CLI utility,
dcli, can be used to execute commands across multiple servers on Oracle Exadata. However, it is much more complex to manually aggregate statistics reported in its command output and make comparisons across multiple storage servers.
Oracle Enterprise Manger Cloud Control 12c provides easy-to-use, intuitive monitoring of Oracle Exadata Storage Servers. Status information is visually displayed, making it easy to pinpoint problems and then drill down for more detailed information. In addition, Oracle Enterprise Manger Cloud Control provides capabilities for easily comparing metrics across multiple storage servers.
The CLI (
cellcli command and
dcli utility) can be useful for scripts and creating processes that need to be repeated.
The following resources are available for Oracle Exadata Database Machine and Oracle Enterprise Manager Cloud Control:
Brian Bream has been involved in information technology since 1981. He currently serves at the Chief Technology Officer at Collier IT. Brian also functions as an Oracle University instructor delivering courses that focus on Oracle's engineered systems, Oracle Solaris, Oracle Linux, and Oracle's virtualization and storage solutions.
Collier IT is a full-service Platinum-level Oracle partner that provides Oracle solutions, including Oracle engineered systems, software, services, and Oracle University training. Collier IT provides its customers with complete, open, and integrated solutions, from business concept to complete implementation. Since 1991, Collier IT has specialized in creating and implementing robust infrastructure solutions for organizations of all sizes. Collier IT was a go-to partner for Sun Microsystems for ten years prior to the acquisition of Sun by Oracle in 2009. As a former Sun Executive Partner and now as a Platinum-level Oracle partner, Collier IT is aligned to provide customers with complete solutions that address their business needs.
Suzanne Zorn has over twenty years of experience as a writer and editor of technical documentation for the computer industry. Previously, Suzanne worked as a consultant for Sun Microsystems' Professional Services division specializing in system configuration and planning. Suzanne has a BS in computer science and engineering from Bucknell University and an MS in computer science from Rensselaer Polytechnic Institute.
|Revision 1.0, 09/08/2014|
Monday, 09 December, 2013 23:48 Written by Brian B
Taking a few minutes to cover some of the virtualization solutions from Oracle in one of our Oracle University classes
Sunday, 10 November, 2013 20:57 Written by Brian B
I’ve been involved in some fashion of IT for over thirty years now. Running a FidoNet BBS (The Twilight Zone) in 1986 was my first interaction with a human element and where I first experienced the concept of a System Administrator. Prior to that I was flipping 16 toggle switches to load stb’s, rbr’s and the like and reading the results on 16 LEDs keeping Navy Frigates moving through the water. What fun !
I’ve been in the trenches, racking and stacking, installing the OS and Applications, backing up and restoring, and fixing broken systems and applications. And at a point in time, that was my definition of a System Administrator. It isn’t any longer.
I’m asked “What is the real underlying problem for SysAdmins now that everything is virtual” As I mentioned in my interview with Rick Ramsey at OOW13 elasticity is the biggest challenge for the SysAdmins today. Business process demands are more complex and need to be provisioned faster than ever before. These demands span a large number of technologies and the SysAdmin needs to know them all.
The SysAdmin’s must be able to leverage technologies such as Virtualization, Infrastructure as a Service, Database as a Service, Middleware as a Service, Storage/Network provisioning, pooling and consolidation of hardware resources. They need to understand the technologies and how they interact with each other to ensure they can successfully deploy them and once deployed, manage them.
New/improved management tools need to be mastered to be successful. The SysAdmin role has been far too dependent on performing repetitive tasks and working in a reactionary mode attempting to locate and address/repair faults manually. As the complexity of our data centers continue to grow, this model becomes a significant limiting factor. We need to understand tools like Enterprise Manager 12c which allow for applications to be rapidly deployed by the end users like developers/testers themselves through self service, with metering and charge back.
The SysAdmins need to accept the automation that these new tools provide. To shun them will lead to their undoing.
And the knowledge level needed has never been greater. As an example, I expect a SysAdmin to know Dtrace if they are running Solaris or Oracle Linux. I expect them to have some basic understanding of the kernel, system calls, and the like so they understand what Dtrace tells them. I expect a SysAdmin to be comfortable working in a Database and a middleware environment. They need to understand the flow from the various tiers and how to provision those tiers rapidly when there is a business demand.
Basically the System Administrator must grow a much larger skill set to be successful. Don’t grow vertically in one technology, grow horizontally amongst many technologies. Engineer solutions with the specialist teams and know enough of the solutions to have an intelligent conversation. Know enough to assist in the architecture of the solution. Be proactive, not reactive.
So to answer the question “now that all is virtual, what’s the REAL underlying problem for sysadmins? Provisioning strategy?”
I think the complexity of a provisioning strategy is the REAL underlying challenge. Understanding which of the available technologies make sense, where each solution fits into the stack, how to provision and re-provision the solution in the stack, and how to manage it will be the new measure of success or failure in the SysAdmin realm. The tools are there, and for those that embrace the technology and the tools should have a very bright future.
And for those that don’t, a warning. It is coming from the other direction. I interact with DBAs frequently that are managing the entire Exadata appliance. They’ve been to the Solaris or Linux Admin classes, they’ve attended the Exadata class. The “SysAdmin” team isn’t a user, root or otherwise, on the system. The Database group has become system administrators on the majority of those systems. I’ve made similar observations in the Exalogic engineered system as well.
Embrace the technologies and the tools. Reach out and extend yourself. Throw away the old “rules”. Soon no one will really care what is under the hood. It won’t matter if it is Solaris or Oracle Linux, if it is SPARC or x86, what will matter is the IT staff’s ability to deploy the business demands on schedule.
Monday, 30 September, 2013 18:37 Written by Brian B
I was honored to be asked by Rick Ramsey Twitter @OldManRamsey of Oracle Technology Network to be interviewed to discuss “What’s the biggest change data centers are facing today, and what does Collier IT recommend?” We were pressed for time, we only had 10 – 12 minutes and didn’t get much Solaris talk in. We are speaking of starting up a recurring Google Places environment so we can continue these short snippits of Oracle Technology with a focus around Operating Environments.
Here is the interview. Personal thanks to Oracle Corporation, Oracle Partner Network, Oracle Technology Network, Oracle University, and my employer Collier IT for allowing me this opportunity.