How to debug daemons under the Solaris SMF control [ID 1007672.1]
Applies to:
Solaris SPARC Operating System - Version: 10 3/05 and later [Release: 10.0 and later ]All Platforms
Goal
Solaris 10 Operating System introduced the Service Management Facility (SMF), which manages system services such as daemons. Debugging problems traditionally involves killing and restarting those daemons with debugging options. Under SMF the daemons will be immediately restarted so additional steps need to be taken.Solution
There are three options:
1. Disable the service and start the daemon(s) manually
2. Modify the service method
3. Modify the service manifest
Before proceeding it is important that the reader familiarise themselves with the SMF, the smf(5) man page is the recommended starting point.
---------------------------------------------------------------------------
1. Disable the service and start the daemon(s) manually
This is the simplest option and is generally recommended. The only complication is debugging services that other services depend upon or services that themselves start multiple daemons.
Determining the dependencies on a service is outside the current scope of this document. For further information consult the svcs(1) man page and in particular the -D option.
If a service is disabled it could have undesirable consequences as SMF service state is persistent across boots. The use of the svcadm(1M) disable -t option is recommended. This ensures that the service is temporarily disabled and will be enabled on reboot.
The steps involved are:
a. The service is temporarily disabled with svcadm(1M)
b. The relevant daemons are restarted with the required options
c. Debug the problem
d. If still running, kill the relevant daemons
e. The service is enabled with svcadm(1M)
Example - keyserv
a. The service is temporarily disabled with svcadm(1M)
# svcs keyserv
STATE STIME FMRI
online 16:21:12 svc:/network/rpc/keyserv:default
# svcadm disable -t keyserv
# svcs keyserv
STATE STIME FMRI
disabled 16:26:55 svc:/network/rpc/keyserv:default
# svcs -l keyserv
fmri svc:/network/rpc/keyserv:default
name RPC encryption key storage
enabled false (temporary)
state disabled
next_state none
state_time Thu May 17 16:26:55 2007
logfile /var/svc/log/network-rpc-keyserv:default.log
restarter svc:/system/svc/restarter:default
contract_id
dependency require_all/restart svc:/network/rpc/bind (online)
dependency require_all/restart svc:/system/identity:domain (online)
#
b. The relevant daemons are restarted with the required options
# pgrep -lf keyserv
# keyserv -D 2>/var/tmp/keyserv.debug 1>&2 &
[1] 3072
# pgrep -lf keyserv
3072 keyserv -D
#
c. Debug the problem
# tail -f /var/tmp/keyserv.debug
default disk cache size: 1MB
supported mechanisms:
alias disk cache size
===== ===============
dh192-0 0MB
d. If still running, kill the relevant daemons
# pgrep -lf keyserv
3072 keyserv -D
# pkill -x keyserv
# pgrep -lf keyserv
[1] + Terminated keyserv -D 2>/var/tmp/keyserv.debug 1>&2 &
#
e. The service is enabled with svcadm(1M)
# svcs keyserv
STATE STIME FMRI
disabled 16:26:55 svc:/network/rpc/keyserv:default
# svcadm enable keyserv
# svcs keyserv
STATE STIME FMRI
online 16:30:33 svc:/network/rpc/keyserv:default
#
---------------------------------------------------------------------------
2. Modify the service method
This is more complex but does resolve the problems related to dependencies and the services still being available after a reboot.
Most (but not all) services are started from scripts that are found in /lib/svc/method. Editing these scripts is not supported outside of explicit instructions to do so by Support in the course of an investigation.
The steps involved are:
a. The service is disabled with svcadm(1M)
b. The relevant method is edited
c. The service is enabled with svcadm(1M)
d. Debug the problem
e. The service is disabled with svcadm(1M)
f. The relevant method is restored
g. The service is enabled with svcadm(1M)
Example - the NIS+ cache manager
a. The service is disabled with svcadm(1M)
# svcs nisplus
STATE STIME FMRI
online 16:37:12 svc:/network/rpc/nisplus:default
# svcadm disable nisplus
# svcs nisplus
STATE STIME FMRI
disabled 16:47:16 svc:/network/rpc/nisplus:default
#
b. The relevant method is edited
# grep nis_cachemgr /lib/svc/method/nisplus
/usr/sbin/nis_cachemgr $cachemgr_flags || exit $?
# vi /lib/svc/method/nisplus
... the start options are changed, eg adding '-v'
c. The service is enabled with svcadm(1M)
# svcs nisplus
STATE STIME FMRI
disabled 16:47:16 svc:/network/rpc/nisplus:default
# svcadm enable nisplus
# svcs nisplus
STATE STIME FMRI
online 16:49:57 svc:/network/rpc/nisplus:default
#
d. Debug the problem
# pgrep -lf nis_cachemgr
647 /usr/sbin/nis_cachemgr -v
#
e. The service is disabled with svcadm(1M)
# svcs nisplus
STATE STIME FMRI
online 16:49:57 svc:/network/rpc/nisplus:default
# svcadm disable nisplus
# svcs nisplus
STATE STIME FMRI
disabled 16:50:50 svc:/network/rpc/nisplus:default
#
f. The relevant method is edited
# grep nis_cachemgr /lib/svc/method/nisplus
/usr/sbin/nis_cachemgr -v $cachemgr_flags || exit $?
# vi /lib/svc/method/nisplus
... the start options are restored, eg removing '-v'
g. The service is enabled with svcadm(1M)
# svcs nisplus
STATE STIME FMRI
disabled 16:50:50 svc:/network/rpc/nisplus:default
# svcadm enable nisplus
# svcs nisplus
STATE STIME FMRI
online 16:52:13 svc:/network/rpc/nisplus:default
#
---------------------------------------------------------------------------
3. Modify the service manifest
This is more complex again and similarly does resolve the problems related to dependencies and the services still being available after a reboot.
This method could be used to, say, define an alternative service (eg for service foo, a service foo-debug). The original service is disabled, the debug service is then enabled. The problem is that any dependencies on the original service will not take account of the new service name. For this reason defining alternative services is not advised.
The manifests define the start methods and a manifest can be changed to use a different method. The method can be changed either directly with editprop inside svccfg(1M) or, again using svccfg(1M), the manifest can be exported, edited and imported again. Modification of the standard manifests is not supported outside of explicit instructions to do so by Support in the course of an investigation.
The steps involved are:
a. The service is disabled with svcadm(1M)
b. The manifest is edited (eg exported, edited, re-imported)
c. A new method is written
d. The service is enabled with svcadm(1M)
e. Debug the problem
f. The service is disabled with svcadm(1M)
g. The original manifest is restored
h. The service is enabled with svcadm(1M)
Example - keyserv
a. The service is disabled with svcadm(1M)
# svcs keyserv
STATE STIME FMRI
online 16:43:44 svc:/network/rpc/keyserv:default
# svcadm disable keyserv
# svcs keyserv
STATE STIME FMRI
disabled 16:58:34 svc:/network/rpc/keyserv:default
#
b. The manifest is edited (eg exported, edited, re-imported)
# svccfg
svc:> export keyserv > /var/tmp/keyserv.xml
svc:> exit
# cp /var/tmp/keyserv.xml /var/tmp/keyserv.xml.orig
# grep sbin/keyserv /var/tmp/keyserv.xml
<... exec='/usr/sbin/keyserv' ...>
# vi /var/tmp/keyserv.xml
... change the start method, eg /var/tmp/keyserv
# svccfg
svc:> delete keyserv
svc:> select *keyserv*
Pattern '*keyserv*' doesn't match any instances or services
svc:> import /var/tmp/keyserv.xml
svc:> select *keyserv*
svc:/network/rpc/keyserv> listprop
...
start/exec astring /var/tmp/keyserv
...
svc:/network/rpc/keyserv> exit
#
c. A new method is written
We need to create the method, in this case, /var/tmp/keyserv:
#!/sbin/sh
#
/usr/sbin/keyserv -D 2>/var/tmp/keyserv.debug 1>&2 &
exit 0
d. The service is enabled with svcadm(1M)
# svcs keyserv
STATE STIME FMRI
disabled 17:04:26 svc:/network/rpc/keyserv:default
# svcadm enable keyserv
# svcs keyserv
STATE STIME FMRI
online 17:05:51 svc:/network/rpc/keyserv:default
# pgrep -lf keyserv
688 /usr/sbin/keyserv -D
#
e. Debug the problem
# tail -f /var/tmp/keyserv.debug
default disk cache size: 1MB
supported mechanisms:
alias disk cache size
===== ===============
dh192-0 0MB
f. The service is disabled with svcadm(1M)
# svcs keyserv
STATE STIME FMRI
online 17:05:51 svc:/network/rpc/keyserv:default
# svcadm disable keyserv
# svcs keyserv
STATE STIME FMRI
disabled 17:07:25 svc:/network/rpc/keyserv:default
#
g. The original manifest is restored
# svccfg
svc:> delete *keyserv*
svc:> select *keyserv*
Pattern '*keyserv*' doesn't match any instances or services
svc:> import /var/tmp/keyserv.xml.orig
svc:> select *keyserv*
svc:/network/rpc/keyserv> listprop
...
start/exec astring /usr/sbin/keyserv
...
#
NOTE: The original manifests in /var/svc/manifest can also be
used to restore the service.
h. The service is enabled with svcadm(1M)
# svcs keyserv
STATE STIME FMRI
disabled 17:08:17 svc:/network/rpc/keyserv:default
# svcadm enable keyserv
# svcs keyserv
STATE STIME FMRI
online 17:09:22 svc:/network/rpc/keyserv:default
# pgrep -lf keyserv
705 /usr/sbin/keyserv
#
---------------------------------------------------------------------------
Product
Solaris 10 Operating System