SUNDAY, AUGUST 20, 2017

Posts Tagged ‘Solaris’

FSS – More Process Scheduling

The last blog post gave us some brief descriptions of the various scheduling classes in Solaris. I focused on the Time Sharing (TS) class since it is the default. Hopefully we can see that the TS (and the IA class for that matter) makes its decisions based on how the threads are using the CPU. Are we CPU intensive or are we I/O intensive? It works well, but it doesn’t provide the administrator fine-grain control as it relates to resource management.

To address this, The Fair Share Scheduler (FSS) was added to Solaris in the Solaris 9 release.

The primary benefit of FSS is to allow the admin an ability to identify and dispatch processes and their threads based upon their importance as determined by the business and implemented by the administrator.

We saw the complexity of the TS dispatch table in the earlier post. Here we see the FSS table has no such complexity.

FSS Dispatch Table
#
# Fair Share Scheduler Configuration
#
RES=1000
#
# Time Quantum
#
QUANTUM=110

In FSS we use the concept of CPU shares. These shares allow the admin a fine level of granularity to carve up CPU resources. We are no longer limited to allocating an entire CPU. The admin designates the importance of the workload by assigning to it a number of shares. You dictate importance by assigning a larger number of shares to those workloads that carry a higher importance. Shares ARE NOT the same as CPU caps nor CPU resource usage. Shares simply define the relative importance of workloads in comparison to other workloads where CPU resource usage is an actual measurement of consumption. A workload may be given 50% of the shares yet at a point in time may be only consuming 5% of the CPU. I look at a CPU share as a minimum guaranty of CPU allocation, not as a cap on CPU consumption.

When we assign shares to a work load, we need to be aware of the shares that are already assigned. It is the ratio of shares assigned to one workload compared to all of the other workloads.

I speak of FSS in a “Horizontal” and a “Vertical” aspect when I’m delivering for Oracle University. In Solaris 9 we were able to define projects in the /etc/project file. This is the vertical aspect. In Solaris 10 Non-Global Zones were introduced and brought with it the Horizontal aspect. I assign shares horizontally across the various zones and then vertically within each zone in the /etc/project file if needed.

By default the Non-Global zones use the default scheduling class. If the system is updated with a new default class, they will obtain the new setting when booted or rebooted. The recommended scheduler to use with Non-Global Zones is the FSS. The preferred way is to set the system default scheduler to FSS and all zones then inherit it.

To display information about the loaded scheduling classes, run priocntl -l


root@solaris:~# priocntl -l
CONFIGURED CLASSES
==================

SYS (System Class)

TS (Time Sharing)
Configured TS User Priority Range: -60 through 60

SDC (System Duty-Cycle Class)

FX (Fixed priority)
Configured FX User Priority Range: 0 through 60

IA (Interactive)
Configured IA User Priority Range: -60 through 60

priocntl can be used to view or set scheduling parameters for a specified process.

To determine the global priority of a process run ps -ecl

root@solaris:~# ps -ecl #The c displays properties of the scheduler, we see the class (CLS) and the priority (PRI)
 F S    UID   PID  PPID  CLS PRI     ADDR     SZ    WCHAN TTY         TIME CMD
 1 T      0     0     0  SYS  96        ?      0          ?           0:01 sched
 1 S      0     5     0  SDC  99        ?      0        ? ?           0:02 zpool-rp
 1 S      0     6     0  SDC  99        ?      0        ? ?           0:00 kmem_tas
 0 S      0     1     0   TS  59        ?    720        ? ?           0:00 init
 1 S      0     2     0  SYS  98        ?      0        ? ?           0:00 pageout
 1 S      0     3     0  SYS  60        ?      0        ? ?           0:01 fsflush
 1 S      0     7     0  SYS  60        ?      0        ? ?           0:00 intrd
 1 S      0     8     0  SYS  60        ?      0        ? ?           0:00 vmtasks
 0 S      0   869     1   TS  59        ?   1461        ? ?           0:05 nscd
 0 S      0    11     1   TS  59        ?   3949        ? ?           0:11 svc.star
 0 S      0    13     1   TS  59        ?   5007        ? ?           0:32 svc.conf
 0 S      0   164     1   TS  59        ?    822        ? ?           0:00 vbiosd
 0 S     16   460     1   TS  59        ?   1323        ? ?           0:00 nwamd

To set the default scheduling class use dispadmin -d FSS and then dispadmin -d to ensure it changed. Then run dispadmin -l to see that it loaded.

root@solaris:~# dispadmin -d
dispadmin: Default scheduling class is not set
root@solaris:~# dispadmin -d FSS
root@solaris:~# dispadmin -d
FSS	(Fair Share)
root@solaris:~# dispadmin -l
CONFIGURED CLASSES
==================

SYS	(System Class)
TS	(Time Sharing)
SDC	(System Duty-Cycle Class)
FX	(Fixed Priority)
IA	(Interactive)
FSS	(Fair Share)

Manually move add of the running processes into the FSS class and then verify with the ps command.

root@solaris:~# priocntl -s -c FSS -i all
root@solaris:~# ps -ef -o class,zone,fname | grep -v CLS | sort -k2 | more
 FSS   global auditd
 FSS   global automoun
 FSS   global automoun
 FSS   global bash
 FSS   global bash
 FSS   global bonobo-a
 FSS   global clock-ap
 FSS   global console-
 FSS   global cron
 FSS   global cupsd
 FSS   global dbus-dae
 FSS   global dbus-dae
 FSS   global dbus-lau
 FSS   global dbus-lau

Finally move init over to the FSS class so all children will inherit.

root@solaris:~# ps -ecf | grep init
    root     1     0   TS  59 16:33:44 ?           0:00 /usr/sbin/init
root@solaris:~# priocntl -s -c FSS -i pid 1
root@solaris:~# ps -ecf | grep init
    root     1     0  FSS  29 16:33:44 ?           0:00 /usr/sbin/init

With the FSS all set, we now assign shares to our Non-Global Zones
zonecfg -z
set cpu-shares=number of shares
exit zonecfg

To display CPU consumption run prstat -Z

Solaris Process Scheduling

The Oracle Solaris kernel has a number of process scheduling classes available.

A brief review.

Timesharing (TS) This is the default class for processes and their associated kernel threads. Priorities in the class are dynamically adjusted based upon CPU utilization in an attempt to allocate processor resources evenly.

Interactive (IA) This is an enhanced version of TS. Some texts reference this in conjunction with TS, i.e. TS/IA. This class applies to the in-focus window in the GUI. It provides extra resources to processes associated with that specific window.

Fair Share Scheduler (FSS) This class is “share based” rather than priority based. The threads associated with this class are scheduled based on the associated shares assigned to them and the processor’s utilization.

Fixed-Priority (FX) Priorities for these threads are fixed regardless of how they interact with the CPU. They do not vary dynamically over the life of the thread.

System (SYS) Used to schedule kernel threads. These threads are bound meaning unlike the userland threads listed above they do not context switch off the CPU if their time quantum is consumed. They run until they are blocked or they complete.

Real-Time (RT) These threads are fixed-priority with a fixed time duration. They are the one of the highest priority classes with only interrupts carrying a higher priority.

As it relates to the priority ranges for the scheduling classes the userland classes (TS/IA/FX/FSS) carry the lowest priorities, 0-59. The SYS class is next ranging from 60-99. At the top (ignoring INT) is the RT class at 100-159.

We can mix scheduling classes on the same system but there are some considerations to keep in mind.

  • Avoid having the FSS, TS, IA, and FX classes share the same processor set (pset)
  • All processes that run on a processor set must be in the same scheduling class so they do not compete for the same CPUs
  • To avoid starving applications, use processor sets for FSS and FX class applications

TS and IA as well as FSS and RT can be in the same processor set.

We can look at how the TS class (default) makes its decisions by looking at the dispatch table itself
Dispatch Table
This table is indexed by the priority level of the thread. To understand an entry lets use priority 30 as an example.
The left most column is marked as ts_quantum -> Timesharing quantum. This specifies the time in milliseconds (identified by the RES=1000) that the thread will be allocated before it will be involuntary context-switched off the CPU.

A context switch is the process of storing and restoring the state of a process so that execution can be resumed at the same point at a later time. We store this state in the Light Weight Process (LWP) that the thread was bound to. Basically a thread binds to a LWP. A LWP binds to a kernel thread (kthr) and the kernel thread is presented to the kernel dispatcher. When the thread is placed on the CPU (hardware strand/thread) the contents of the LWP is loaded onto the CPU and the CPU starts execution at that point. When the thread is removed from the CPU (it is preempted, the time quantum is consumed, it sleeps) the contents of the CPU registers are loaded into the LWP and then it is removed from the CPU and returns to the dispatch queue to compete again based on priority with the other threads that request access to the CPU.

So, at a priority of 30 the thread has 80 milliseconds to complete its work or it will be forced off the CPU. In the event that it does not complete its work, the system will context switch the thread off the CPU AND change its priority. We see the next column ts_tqexp ->Timesharing time quantum expired. This column identifies the new priority of the thread. In this case ts_tqexp is now 20. So, we consumed our time quantum, we were involuntary context switched off the CPU, and we had our priority lowered when we returned to the dispatch queue. At a priority of 20 our time quantum is now 120 milliseconds. Lowered priority to keep the thread from “hogging” the CPU but an increase in the time quantum in hopes that when we do get back on the CPU we have more time to complete our work.

The next column identifies our new priority when the thread returns from a sleep state. There is no reason to keep a thread on a CPU if there is no work to be done. When the thread enters a sleep state we leave the CPU, this is a VOLUNTARY context switch and we are placed on the sleep queue. When we leave the sleep queue we are not placed back on the CPU. We are placed back on the dispatch queue to compete with the other threads to gain access to the CPU. Since we have been off the CPU for a period of time we advance the priority in this case we were initially dispatched at 30, we voluntary context switched off the CPU, and when we woke we were given the priority of 53. Notice that at a priority of 53, or new time quantum is 40. The priority increased from 30 to 53 but the time quantum decreased from 80 to 40. We get you back on the CPU faster but limit the amount of time you get on the CPU.

The context and involuntary context switching can be seen in a mpstat commmand. csw and icsw
mpstat

The last two columns deal with the attempt to prevent CPU starvation. ts_maxwait is a measurement (in seconds) that if exceed without access to the CPU the value of ts_lwait is assigned. notice that for all but priority 59 that this value is set to 0. So when we exceed 0 (meaning we have been off the CPU for 1 second) we are assigned the value of ts_lwait. Again using 30 as our example we would go from a priority of 30 to a priority of 53 if we were prevented access to the CPU for 1 second.

In the middle of all of this we have preemption. The Solaris Kernel is fully preemptible. All threads, even SYS threads, will be preempted if a higher priority thread hits the kernel dispatcher while a lower priority thread is running. The thread isn’t allowed to complete its time quantum, it is context switched off the CPU.

And don’t forget, there is IA, FSS, FX, SYS, RT, and INT threads that adds to the chaos if allowed and why I provided some of the guidance I listed earlier.

We see some use of FX and quite a bit more of the FSS with Solaris zones. I’ll talk about FSS in another post.

I’ll spend a better part of a day whiteboarding all of this in the Oracle University Solaris Performance Management Class.

Solaris 10

Solaris 11

Translation Lookaside Buffer

Teaching Solaris Performance Management this week and we got into a large discussion about T-Series CPUs, Multi-threaded -vs- Multi-Process applications, Multiple Page Size Support (MPSS), and the Translation Lookaside Buffer (TLB).

Solaris processes run in a virtual memory address space. When we attempt to utilize that memory address space something needs to map that virtual address to an actual physical address. On the SPARC platform the Hardware Address Translation layer (named SFMMU – Spitfire Memory Management Unit) performs this function. The MMU divides the virtual address space into pages. Solaris supports MPSS so we can change the size of these pages on both SPARC as well as x86.

The pagesize -a command will display the available page sizes on the system.

$ uname -a
SunOS chicago 5.10 Generic_127112-11 i86pc i386 i86pc Solaris
$ pagesize -a
4096
2097152
$ uname -a
SunOS niagara 5.10 Generic_138888-03 sun4v sparc SUNW Solaris
$ pagesize -a
8192
65536
4194304
268435456

Virtual Memory would not be very effective if every memory address had to be translated by looking up the associated physical page in memory. The solution is to cache the recent translations in a Translation Lookaside Buffer (TLB). A TLB has a fixed number of slots that contain Translation Table Entries (TTE), which map virtual addresses to physical addresses.

Modern servers today have multiple cores with multiple hardware strands allowing the system to dispatch a large number of threads to a CPU. Each of the processes associated with these threads will need to gain access to the physical memory location placing a burden on the TLB and the HAT. Simply put, there may not be enough space in the TLB to hold the Translation Table Entries (TTE) to hold all of the needed translations required by the large number of running processes.

To speed up handling of TLB miss traps, the processor provides a hardware-assisted lookup mechanism called the Translation Storage Buffer (TSB). The TSB is a virtually indexed, direct-mapped, physically contiguous, and size-aligned region of physical memory which is used to cache recently used Translation Table Entries (TTEs) after retrieval from the page tables. When a TLB miss occurs, the hardware uses the virtual address of the miss combined with the contents of a TSB base address register (which is pre-programmed on context switch) to calculate the pointer into the TSB of the entry corresponding to the virtual address. If the TSB entry tag matches the virtual address of the miss, the TTE is loaded into the TLB by the TLB miss handler, and the trapped instruction is retried. If no match is found, the trap handler branches to a slow path routine called the TSB miss handler. Quite a bit of complex work to handle these “misses”.

Starting with Solaris 10, Update 1 the Out-Of-The-BOX (OOB) Large Page Support turns on MPSS automatically for the applications heap and text(libraries). The advantage is that it improves the performance of your userland applications by limiting/reducing the CPU cycles required to service dTLB and iTLB misses. Theoretically we are mapping a larger amount of memory in the TLB if we choose to map larger pages.

For example, if the heap size of a process is 256M, on a Niagara (UltraSPARC-T1) box it will be mapped on to a single 256M page. On a system that doesn’t support large pages, it will be mapped on to 32,768 8K pages.

The pmap command displays the page sizes of memory mappings within the address space of a process. The -sx option directs pmap to show the page size for each mapping.

sol10# pmap -sx ┬┤pgrep testprog┬┤
2909:  ./testprog
 Address Kbytes   RSS  Anon Locked Pgsz Mode  Mapped File
00010000    8    8    -    -  8K r-x-- dev:277,83 ino:114875
00020000    8    8    8    -  8K rwx-- dev:277,83 ino:114875
00022000 131088 131088 131088    -  8K rwx--  [ heap ]
FF280000   120   120    -    -  8K r-x-- libc.so.1
FF29E000   136   128    -    -  - r-x-- libc.so.1
FF2C0000   72   72    -    -  8K r-x-- libc.so.1
FF2D2000   192   192    -    -  - r-x-- libc.so.1
FF302000   112   112    -    -  8K r-x-- libc.so.1
FF31E000   48   32    -    -  - r-x-- libc.so.1
FF33A000   24   24   24    -  8K rwx-- libc.so.1
FF340000    8    8    8    -  8K rwx-- libc.so.1
FF390000    8    8    -    -  8K r-x-- libc_psr.so.1
FF3A0000    8    8    -    -  8K r-x-- libdl.so.1
FF3B0000    8    8    8    -  8K rwx--  [ anon ]
FF3C0000   152   152    -    -  8K r-x-- ld.so.1
FF3F6000    8    8    8    -  8K rwx-- ld.so.1
FFBFA000   24   24   24    -  8K rwx--  [ stack ]
-------- ------- ------- ------- -------
total Kb 132024 132000 131168    -.

There may be some instances where OOB may cause poor performance of some of your applications including application crashes if the application makes an improper assumption regarding page sizes. If one runs into this scenario there are adjustments that can be made in /etc/system to enable or disable OOB support.

It also can introduce challenges on some caches and their coherency. On multi-threaded applications run on CMP and SMP systems, threads from a common PID can be dispatched to different CPUs, each holding their own TTEs in the TLB. When a thread unmaps virtual memory we have to perform a cleanup. There may now be a stale mapping on a different TLB that now maps to an invalid physical memory location that if allowed to remain could allow for corruption. Those CPUs that are crosscalled during a munmap that have actually run the process are cleaned up instead of just broadcasting it to all of the running CPUs. However as we add more processors, this can increase the time it takes to perform this cleanup. If you think this may be occurring if you migrate to a larger system, consider using processor pools or CPU binding of the process to see if that allows some relief.

We can use the trapstat command to gain some reference into the performance of our dTLB and iTLB hit rates. By specifying the -T option, trapstat shows TLB misses broken down by page size. In this example, CPU 0 is spending 7.9 percent of its time handling user-mode TLB misses on 8K pages, and another 2.3 percent of its time handling user-mode TLB misses on 64K pages.

example# trapstat -T -c 0
cpu m size| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim
----------+-------------------------------+-------------------------------+----
  0 u   8k|      1300  0.1        15  0.0 |    104897  7.9        90  0.0 | 8.0
  0 u  64k|         0  0.0         0  0.0 |     29935  2.3         7  0.0 | 2.3
  0 u 512k|         0  0.0         0  0.0 |      3569  0.2         2  0.0 | 0.2
  0 u   4m|         0  0.0         0  0.0 |       233  0.0         2  0.0 | 0.0
- - - - - + - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - + - -
  0 k   8k|        13  0.0         0  0.0 |     71733  6.5       110  0.0 | 6.5
  0 k  64k|         0  0.0         0  0.0 |         0  0.0         0  0.0 | 0.0
  0 k 512k|         0  0.0         0  0.0 |         0  0.0       206  0.1 | 0.1
  0 k   4m|         0  0.0         0  0.0 |         0  0.0         0  0.0 | 0.0
==========+===============================+===============================+====
      ttl |      1313  0.1        15  0.0 |    210367 17.1       417  0.2 |17.5

By specifying the -e option, trapstat displays statistics for only specific trap types. Using this option minimizes the probe effect when seeking specific data. This example yields statistics for only the dtlb-prot and syscall-32 traps on CPUs 12 through 15:

example# trapstat -e dtlb-prot,syscall-32 -c 12-15
vct  name               |    cpu12    cpu13    cpu14    cpu15
------------------------+------------------------------------
 6c dtlb-prot           |      817      754     1018      560
108 syscall-32          |     1426     1647     2186     1142

vct  name               |    cpu12    cpu13    cpu14    cpu15
------------------------+------------------------------------
 6c dtlb-prot           |     1085      996      800      707
108 syscall-32          |     2578     2167     1638     1452

cpustat allows another point of entry into monitor events on the CPU to include the workings of the TLB. The following command displays the three CPUs with the highest DTLB_miss rate.

example% cpustat -c DTLB_miss -k DTLB_miss -n 3 1 1

 time cpu event DTLB_miss
1.040 115  tick       107
1.006  18  tick        98
1.045 126  tick        31
1.046  96 total       236

event DTLB_miss
total       236

There is quite a bit more to think about relating to MPSS and the TLBs, I hope this post serves as a starting point for those that are running CMP/SMP systems with multi-threaded applications to perform a deeper dive.

Take a look at (pmap -sx) (ppgsz) (pagesize) (mpss.so.1) for additional direction.

Links – Week ending 8/9

If you are on 11i and are planning to upgrade to R12 then make sure you review the below links on the Consolidated Upgrade Patch 2 (CUP2). http://ow.ly/2yTvWl

Virtualization and Cloud Made Simple and Easy with Oracle’s Latest Engineered Systems – Webcast http://t.co/HFu9lzsbD8

Linux Container (LXC) Part 2: Working With Containers http://t.co/pDkVzHyYwk

e-book Engineered for Extreme Performance http://t.co/Yht6oLOQUA

Oracle Launches New Oracle Linux 6 Certifications; Oracle Linux 5 Exams To Retire http://t.co/rQNHGGrBG7

Oracle is Unveiling the Latest Engineered System for Enterprise Virtualization http://t.co/I46E2oi3dy

Ready for detailed info on Oracle Multitenant ? Read this technical white paper http://t.co/VZso6WMRdH

The Case for Running Oracle Database 12c on Oracle Solaris http://t.co/0KEMnSocix

10 Things CIOs Should Know About The World’s First Cloud Database http://t.co/sm0KrQbMkj

Oracle VM Templates for Oracle Database http://t.co/nrO4OavkMi

Basic mdb walkthrough.

The Solaris Crash Analysis Tool is a fantastic solution that is available in “My Oracle Support” (MOS) that can assist those that don’t have a strong background in Solaris internals in looking at potential issues with a system that is in a panic condition.

The built-in modular debugger (mdb) can also augment or at times work faster than SCAT

Here is a very basic walkthrough that I provide to our Collier IT engineers to assist them in initial diagnostics.

There’s much more, and I’ll add some additional walk-throughs later.

1. Useful information can be found in the stack backtrace to search keywords against MOS. Sometimes you get lucky here.

> $c
vpanic(127def0, 2a100ed40c0, 0, 0, 3effffff8000000, 1869c00)
cpu_deferred_error+0x568(ecc1ecc100000000, 2, 1000060000003a, 600000000, 0, 30001622360)
ktl0+0x48(29fff982000, 2a100ed4d78, 30000, 16, 60, 30)
pp_load_tlb+0x1e4(29fff980000, 29fff9822c0, 1d00, 29fff980300, 1822f00, 2)
ppcopy_common+0x12c(70001d32500, 700030b2500, 1, 1, 29fff982000, 29fff980000)
ppcopy+0xc(70001d32500, 700030b2500, 0, 0, 1822348, 70001d32500)
do_page_relocate+0x228(2a100ed5120, 2a100ed5128, 700030b2500, 2a100ed53e0, 0, 2a100ed4fb0)
page_relocate+0x14(2a100ed5120, 2a100ed5128, 1, 1, 2a100ed53e0, 0)
page_lookup_create+0x244(60017811400, 6007c570000, 70001d32500, 0, 2a100ed53e0, 0)
swap_getconpage+0xb4(60017811400, 6007c570000, 2000, 0, 2a100ed53c8, 2000)
anon_map_getpages+0x474(60010c02008, 0, 200, 109a420, 2a100ed53e0, 1)
segvn_fault_anonpages+0x32c(0, 800000, 0, 1, 6001753c2a8, 3)
segvn_fault+0x530(300034bc3c0, 300012abc20, 1, 1, 892000, ffffffffff76e000)
as_fault+0x4c8(300012abc20, 6001766b9d0, 890000, 60016881390, 186c0b0, 0)
pagefault+0xac(890000, 0, 1, 0, 60016881318, 1)
trap+0xd50(2a100ed5b90, 8903bb, 0, 1, fea0ad6c, 0)
utl0+0x4c(1e, fe8f8104, 9e58, fe8fee34, 7aebd8, fe8fa524)
>

2. Status can also give you things like the hostname and the kernel revision they’re running:

> ::status
debugging crash dump vmcore.0 (64-bit) from sunbkpsrv5
operating system: 5.10 Generic_142900-13 (sun4u)
panic message: UE CE Error(s)
dump content: kernel pages only
>

3. cpuinfo also shows some good info on what was running when the system panicked

> ::cpuinfo -v
 ID ADDR        FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD      PROC
  0 0000183a620  1b    7    0  60   no    no t-0    3000371fb20 java
                  |    |
       RUNNING <--+    +-->  PRI THREAD      PROC
         READY                60 2a1000c7ca0 sched
        EXISTS                59 30001e121e0 java
        ENABLE                59 30001d293e0 in.mpathd
                              59 3000371d480 java
                              59 3000371ce00 java
                              59 3000371c440 java
                              59 3000371f4a0 java

 ID ADDR        FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD      PROC
  1 0000180c000  1d    6    0  59  yes    no t-0    30001dc01c0 syslogd
                  |    |
       RUNNING <--+    +-->  PRI THREAD      PROC
      QUIESCED                99 2a100237ca0 sched
        EXISTS                60 2a100a83ca0 sched
        ENABLE                53 3000371c100 java
                              53 3000371c780 java
                              51 3000371aaa0 java
                              50 300032a9940 savecore

>

4. ::ps gives good info on everything running at the time of the crash

> ::ps
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R      0      0      0      0      0 0x00000001 0000000001838150 sched
R      3      0      0      0      0 0x00020001 0000060012dab848 fsflush
R      2      0      0      0      0 0x00020001 0000060012dac468 pageout
R      1      0      0      0      0 0x4a004000 0000060012dad088 init
R    808      1    807    807      0 0x42000000 0000060016acf890 nbevtmgr
R    805      1      7      7  60002 0x4a304102 0000060016746038 java
R    764      1    764    764      0 0x42000000 0000060016acec70 dbsrv11
R    712      1    711    711      0 0x42000000 0000060016ad04b0 bpcd
R    709      1    708    708      0 0x42000000 00000600167fa040 vnetd
R    386      1    385    385      0 0x42000000 0000060016ad10d0 snmpd
R    382      1    382    382     25 0x52010000 00000600169a2048 sendmail
R    381      1    381    381      0 0x52010000 00000600169a2c68 sendmail
R    334      1    334    334      0 0x42000000 0000060016747878 syslogd
R    327      1    327    327      0 0x42000000 00000600161c0490 sshd
R    324      1    323    323      0 0x42010000 00000600167fb880 smcboot
R    326    324    323    323      0 0x42010000 0000060013fba018 smcboot
R    325    324    323    323      0 0x42010000 00000600167fac60 smcboot
R    275      1    275    275      0 0x42000000 0000060016748498 utmpd
R    267      1    266    266      0 0x42000000 00000600159bb860 pbx_exchange
R    263      1    263    263      0 0x42000000 00000600159bac40 inetd
R    257      1    257    257      0 0x42000000 0000060013e26c30 automountd
R    259    257    257    257      0 0x42000000 0000060015d02488 automountd
R    251      1    251    251      1 0x42000000 0000060013fbc478 rpcbind
R    234      1    234    234      0 0x42010000 00000600161c10b0 cron
R    208      1    208    208      0 0x42000000 0000060015d00c48 xntpd
R    185      1      7      7      0 0x42000000 0000060013fbd098 iscsid
R    155      1    154    154      0 0x42000000 0000060013e28470 in.mpathd
R    144      1    144    144      0 0x42000000 00000600159ba020 picld
R    139      1    139    139      1 0x42000000 00000600159bd0a0 kcfd
R    136      1    136    136      0 0x42000000 0000060012daac28 nscd
R    120      1    120    120      0 0x42000000 0000060015d030a8 syseventd
R     80      1     79     79      0 0x42020000 0000060013e26010 dhcpagent
R     61      1     61     61      0 0x42000000 0000060013fbb858 devfsadm
R      9      1      9      9      0 0x42000000 0000060013e29090 svc.configd
R      7      1      7      7      0 0x42000000 0000060012daa008 svc.startd
R    357      7      7      7      0 0x4a004000 0000060016746c58 rc2
R    702    357      7      7      0 0x4a004000 00000600167490b8 lsvcrun
R    703    702      7      7      0 0x4a004000 0000060013e27850 sh
R    809    703      7      7      0 0x4a004000 00000600169a3888 pdde
R    812    809      7      7      0 0x4a004000 0000060016ace050 pdde
R    813    812      7      7      0 0x4a004000 00000600169a44a8 sleep
R    342      7      7      7      0 0x4a004000 0000060015d00028 svc-webconsole
R    717    342      7      7      0 0x4a004000 00000600169a50c8 sjwcx
R    720    717      7      7      0 0x4a004000 00000600167fc4a0 java
R    304      7    304    304      0 0x4a004000 0000060013fbac38 ttymon
R    290      7      7      7      0 0x4a004000 00000600167fd0c0 svc-dumpadm
R    293    290      7      7      0 0x4a004000 00000600161bf870 savecore
R    269      7    269    269      0 0x4a014000 00000600161be030 sac
R    278    269    269    269      0 0x4a014000 0000060015d01868 ttymon

5. ::panicinfo shows more info on the panic itself

> ::panicinfo
             cpu                0
          thread      3000371fb20
         message UE CE Error(s)
          tstate         80001606
              g1          1270ce4
              g2          127dc00
              g3  3effffff8000000
              g4         fbfffffe
              g5                1
              g6                0
              g7      3000371fb20
              o0          127def0
              o1      2a100ed4098
              o2                0
              o3                0
              o4 fc30ffffffffffff
              o5  3cf000000000000
              o6      2a100ed3761
              o7          11020dc
              pc          104982c
             npc          1049830
               y                0
>

6. Find the address of the thread that was executing when the system panicked.

> panic_thread/K
panic_thread:
panic_thread:   3003acf7020     
gt;

7. Run the thread macro against the pointer value from above. Search for the t_procp structure.

> 3003acf7020$<$thread
    t_link = 0
    t_stk = 0x2a108333ae0
    t_startpc = 0
    t_bound_cpu = 0x30004b42000
    t_affinitycnt = 0
    t_bind_cpu = 0xffff
    t_flag = 0x1800
    t_proc_flag = 0x104
...
    t_procp = 0x3005a6713e0    <== use the value here ...
 >

8. run the proc2u macro against the pointer from the t_procp structure. Look for the value stored in p_user.u_psargs. This is the full path to the command that was running on the CPU at the time of the system panic.

> 0x3005a6713e0$<proc2u
    p_user.u_execsw = execsw+0x28
    p_user.u_auxv = [
        {
            a_type = 0x7d8
            a_un = {
                a_val = 0xffffffff7fffff90
                a_ptr = 0xffffffff7fffff90
                a_fcn = 0xffffffff7fffff90
            }
...
    p_user.u_start = {
        tv_sec = 2007 Jun 11 00:00:00
        tv_nsec = 0xcf77e0
    }
    p_user.u_ticks = 0x191b148
    p_user.u_comm = [ "bgscollect" ]
    p_user.u_psargs = [ "bgscollect -I noInstance -B /usr/adm/best1_7.3.00" ]    <== use the value here     
    p_user.u_argc = 0x5     
    p_user.u_argv = 0xffffffff7ffffc98 ... 
    >