Writing Solaris Service Management Facility (SMF) service manifest

SMF services are basically daemons staying in background and waiting for the requests which they should server, when the request come the daemon wake ups, serve the request and then wait for the next request to come.

The services are building using software development platforms and languages but they have one common aspect which we are going to discuss here. The service manifests which describe the service for the SMF and let the SMF manage and understand the service life cycle.

To write a service manifest we should be familiar with XML and the service manifest schema located at /usr/share/lib/xml/dtd/service_bundle.dtd.1.  This file specifies what elements can be used for describing a service for the SMF.

Next thing we need is a text editor and preferable an XML aware text editor. The Gnome gedit can do the task for us.

The service manifest file composed of 6 important elements which are listed below:

  • The service declaration:  specifies the service name, type and instantiation model
  • Zero or more dependency declaration elements: Specifies the service dependencies
  • Lifecycle methods: Specifies the start, stop and refresh methods
  • Property groups: Which property groups the service description has.
  • Stability element: how stable the service interface is considering version changes
  • Template element: more human readable information for the service.

To describe a service, first thing we need to do is identifying the service itself. Following snippet shows how we can declare a service named jws.

<service name='network/jws’ type='service' version='1'>
<single_instance/>

The first line specifies the service name, version and its type. The service name attribute forms the FMRI of the service which for this instance will be svc:/network/jws. In the second line we are telling SMF that it should only instantiate one instance of this service which will be svc:/network/jws:default. We can use the create_default_instance element to manipulate automatic creation of the default instance.

All of the elements which we are going to mention in the following sections of this article are immediate child elements of the service element which itself is a child element of the service_bundle element.

The next important element is dependency declaration element. We can have one or more of this element in our service manifest.

<dependency name='net-physica' grouping='require_all ' restart_on='none' type='service'>
<service_fmri value='svc:/network/physical'/>
</dependency>

Here we are telling that our service depends on the svc:/network/physical service and this service needs to be online before our service can start. Some of the values for the grouping attribute are as follow:

  • The require_all which represent that all services marked with this grouping must be online before our service came online
  • The require_any which represents that any of the services in this grouping suffice and our service can become online if one of them is online
  • The optional_all presence of the services marked with this grouping is optional for our service. Our service works with or without them.
  • The exclude_all:  specifies the services which may have conflict with our service and we cannot become online in presence of them

The next important elements are specifying how the SMF should start, stop and refresh the service. For these tasks we use three exec_method elements as follow:

<exec_method name='start' type='method' exec='/opt/jws/runner start' timeout_seconds='60'>
</exec_method>

This is the start method, SMF will invoke what the exec attribute specifies when it want to start the service.

<exec_method name='stop' type='method' exec=':kill' timeout_seconds='60'>
</exec_method>

The SMF will terminate the process it started for the service using a kill signal. By default it uses the SIGTERM but we can specify our own signal. For example we can use ‘kill -9’ or ‘kill -HUP’ or any other signal we find appropriate for our service termination.

<exec_method name='refresh' type='method' exec='/opt/jws/runner reload_cof' timeout_seconds='60'>
</exec_method>

The final method which we should describe is the refresh method in which we should reload the service configuration without disturbing its function.

The start and stop methods description are required to be present in the manifest in order to import it into the SMF repository but the refresh method description and implementation is optional.

The timeout_seconds=’60’ specifies that the SMF should wait for 60 seconds before aborting the method execution and retrying it over. We can use longer timeout when we know that the execution of the method take longer or lower timeout when we know that our service starts sooner.

The property group elements specify which property groups the SMF should associate with the service.  We can use as many of the SMF built-in property groups or define our own property groups.

<property_group name='startd' type='framework'>
<propval name='ignore_error' type='astring' value='core,signal'/>
</property_group>

The above snippet shows how we can define using a framework built-in property group. The built-in property groups and their properties have meaning for the SMF framework and change its way of dealing with our service.  For example the above snippet says that the SMF should ignore core dump termination of suppresses of this service via a kill signal from another application.

Next important element is the stability element which specifies whether the property names, service name, its dependencies and other service attributes may or may not change for the next release. For example the following snippet specifies that the service attributes may change in the next release.

<stability value='Unstable'/>

Finally we have the template element which contains descriptive information about the service, for example we can have something like the following element which describes our service for human users.

<template>
<common_name>
<loctext xml:lang=’Java'>Java Network Server </loctext>
</common_name>
<documentation>
<manpage title='JWS' section='1M' manpath='/usr/share/man'/>
</documentation>
</template>

The entire element is self describing except for the man page element which specifies where the man utility command should look for the man pages for the jws service.

All of these elements should be saved into an XML file with the following structure:
<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle ...>
<service ...>
<single_instance .../>
<dependency ... />
<exec_method... />
<property_group .../>
<stability .../>
<template ...>
</service>
</service_bundle>

This is a raw representation of what the service manifest bare bone cal look like. The syntax is not accurate and the attributes and child elements are omitted to make it easier to scan by the eyes.

The service_bundle, service, and  start and stop method elements are mandatory while other elements are optional.

Now we should install the service into the SMF repository and then administrate it. For installing the service we can use the following command.

# svccfg import /path/to/manifest.xml

When we import the service manifest into the service repository, SMF will scan the service profiles and check whether this service is required to be enabled either directly or because of a dependent service or not. If required, the SMF will use the start method specified in the manifest to start the service. But before starting it, SMF will check its dependencies and start any required service recursively if they are not already online.

Reviewing the XML schema located at /usr/share/lib/xml/dtd/service_bundle.dtd.1 and the staying tuned for my next few entries is recommended for grasping a better understanding of the whole SMF and Solaris Services administration/management.

Configuring Solaris Link Aggregation (Ethernet bonding)

Link aggregation or commonly known Ethernet bonding allows us to enhance the network availability and performance by combining multiple network interfaces together and form an aggregation of those interfaces which act as a single network interface with greatly enhanced availability and performance.

When we aggregate two or more network interfaces, we are forming a new network interface on top of those physical interfaces combined in the link layer.
We need to have at least two network interfaces in our machine to create a link aggregation. The interfaces must be unplumb-ed in order for us to use them in a link aggregation. Following command shows how to unplumb a network interface.
# ifconfig e1000g1 down unplumb
We should disable the NWAM because link aggregation and NWAM cannot co-exist.
# svcadm disable network/physical:nwam
The interfaces we want to use in a link aggregation must not be part of virtual interface; otherwise it will not be possible to create the aggregation. To ensure that an interface is not part of a virtual interface checks the output for the following command.
# dladm show-link
Following figure shows that my e1000g0 has a virtual interface on top of it so it cannot be used in an aggregation.
To delete the virtual interface we can use the dladm command as follow
# dladm delete-vlan vlan0
The link aggregation as the name suggests works in the link layer and therefore we will use dladm command to make the necessary configurations.  We use create-aggr subcommand of dladm command with the following syntax to create aggregation links.
dladm  create-aggr [-l interface_name]*  aggregation_name
In this syntax we should have at least two occurrence of -l interface_name option followed by the aggregation name.
Assuming that we have e1000g0 and e1000g1 in our disposal following commands configure an aggregation link on top of them.
# dladm create-aggr -l e1000g0 -l e1000g1 aggr0
Now that the aggregation is created we can configure its IP allocation in the same way that we configure a physical or virtual network interface. Following command plumb the aggr0 interface, assign a static IP address to it and bring the interface up.
# ifconfig aggr0 plumb 10.0.2.25/24 up
Now we can use ifconfig command to see status of our new aggregated interface.
# ifconfig aggr0
The result of the above command should be similar to the following figure.
To get a list of all available network interfaces either virtual or physical we can use the dladm command as follow
# dladm show-link
And to get a list of aggregated interfaces we can use another subcommand of dladm as follow.
# dladm show-aggr
The output for previous dladm commands is shown in the following figure.
We can change an aggregation link underlying interfaces by adding an interface to the aggregation or removing one from the aggregation using add-aggr and remove-aggr subcommands of dladm command.  For example:
# dladm add-aggr -l e1000g2 aggr0
# dladm remove-aggr -l e1000g1 aggr0
The aggregation we created will survive the reboot but our ifconfig configuration will not survive a reboot unless we persist it using the interface configuration files.
To make the aggregation IP configuration persistent we just need to add create /etc/hostname.aggr0 file with the following content:
10.0.2.25/24
The interface configuration files are discussed in recipe 2 and 3 of this chapter in great details. Reviewing them is recommended.
To delete an aggregation we can use delete-aggr subcommand of dladm command. For example to delete aggr0 we can use the following commands.
# ifconfig aggr0 down unplumb
# dladm delete-aggr aggr0
As you can see before we could delete the aggregation we should bring down its interface and unplumb it.
In recipe 11 we discussed IPMP which allows us to have high availability by grouping network interfaces and when required automatically failing over the IP address of any failed interface to a healthy one. In this recipe we saw how we can join a group of interfaces together to have a better performance. By grouping a set of aggregations we can have the high availability that IPMP offers along with the performance boost that link aggregation offers.
The link aggregation works in layer 2 meaning that the aggregation groups the interfaces in layer 2 of network where network packets are dealt with. In this layer the network layer’s packets are extracted or created with the designated IP address of the aggregation and then are delivered to the lower, higher layer.

OpenSolaris Governing Board effort on turning some lights on OpenSolaris future

Well, After sometime that Oracle decided to keep complete silence, either on analog radio or via digital mediums, OGB (OpenSolaris Governing Board) which is a body of some experienced community member governing the community took action and called on Oracle publicly to decide about what it is going to do about the project.

Long story short, because of Oracle silence about what is going on behind the curtains community members turned restless of not knowing what will happen next which the piece of software they deployed into some of their customers server rooms. Some are looking to know when they can install the promised upgrade on the customer machines to reduce the disk space usage while other are ballistic not knowing whether they should go with OpenSolaris installations or they should turn to *BSD or Linux variants and give up on OpenSolaris.

These concerns raised many times in the mailing list to the point where “OpenSolaris is dead/ is dying” messages surpass all other technical discussions and Some of thees messages  seemed inappropriate enough for the Oracle staff to threaten that they are prepared to take action  up to “moderation and/or deactivation” of the list.

Jim Grisanzio: warning about mail on this list – [Tue Jun 29 16:43:47 UTC 2010]@ http://mail.opensolaris.org/pipermail/opensolaris-discuss/2010-June/057703.html

Recently there has been mail on this list that violates the website Terms of Use. Individuals are being warned. However, if this trend continues the Website Team is prepared to take action up to and including moderation and/or deactivation of the list. Please do not respond to hostile posts because that only escalates the situation.

Many more messages posted to the list with different comments and analysis of situation with Solaris/ OpenSolaris product and project.  These comments lead to another warning from Oracle staff:

Alan Burlison: Yet another warning about behaviour on this list – [Thu Jul 15 14:41:16 UTC 2010]@http://mail.opensolaris.org/pipermail/opensolaris-discuss/2010-July/058383.html

Despite repeated warnings, some people are continuing to badmouth each other on this list.  As explained previously, this is not acceptable. We’ve been warning people who have overstepped the mark, from this point on we won’t be doing that, we’ll just be immediately closing accounts and unsubscribing people from all of the opensolaris.org  lists.

If that proves ineffective we will consider other measures such as putting the list into moderation or shutting it down entirely.
Be quite clear, this unacceptable behaviour must stop, and now.

Apologies to the vast majority of the list members who clearly aren’t the cause of the problem.

The Oracle silence take long enough and the “OpenSolaris is dead/ going to die…” threads number and messages grow big enough to the point where OGB decided to weigh in and take action, but OGB has almost no control over anything these days and the only thing that they decided to do is calling on Oracle publicly and letting Oracle staff know that they are going to step down and resign from being the OGB if they have no power, official knowledge about what is the faith of the software that they are governing. @ http://wiki.genunix.org:8080/wiki/index.php/2010_07_12_OGB_Agenda

The OGB is keen to promote the uptake and open development of OpenSolaris and to work on behalf of the community with Oracle, as such the OGB needs Oracle to appoint a liaison by August 16, 2010, who has the authority to talk about the future of OpenSolaris and its interaction with the OpenSolaris community. Otherwise the OGB will take action at the August 23 meeting to trigger the clause in the OGB charter that will return control of the community to Oracle.

Now that OGB decided to publicly call on Oracle as a body of members elected by the community, Oracle need to send some response back. But what can be the response from Oracle? I think one of the following will happen:

  • Oracle will appoint a liaison and take part in the governance board, restructure it and assure community about the what is going on.
  • Oracle will act like nothing happened and no message like that has even been published but come forth with some resolutions at a later time after the ultimatum ended.
  • Oracle will come forth with some tanks and carrier class ships filled with Stark industries Iron Men and close down the project axing everyone posted harsh comments 😀

I believe the first alternative is what will happen despite some of the community members believe on some version of the last alternative and demolish of OpenSolaris.

Oracle is NOT taking back OpenSolaris, ZDNet Dana Blankenhorn got it wrong.

Once again the FUD around Solaris and OpenSolaris fate started to spread after Dana Blankenhorn misunderstood the licensing terms and used a eye catching and visitor increasing title, Oracle taking back OpenSolaris, for his blog entry. Well, from this article we can get that even the veteran writers can get things wrong and spread incorrect news 🙂

Folks, Solairs is one of the biggest Sun assets that Oracle is now own by taking over Sun . Solaris and OpenSolaris are going to be around in a much better shape than before because Oracle is betting its fight for the market share on this operating system to form a complete stack including storage, hardware, OS, middle-ware, support and so on.

Oracle may change the licensing terms for the Solaris OS, which is the Commercial distribution of OpenSolaris (with some added/ removed components) supported by Sun in old era, but to close the OpenSolaris code-base, no way. Changing the licensing terms can be result of Oracle seeking a higher revenue stream from the product and I bet Oracle will be able to get more out of Solaris than Sun because of its powerful marketing department 😛

Looking at the these FUD from any angel tells you that they are not correct because of at least the following reasons:

  • OpenSolaris has a large community around it which Oracle do not like to send away.
  • The Solaris/ OpenSolaris adoption highly increased after Sun pushed the source codes into OpenSolaris project. The whole Solaris on Z architecture, adoption of OpenSolaris increased so adoption of Solaris itself. Long story short, just take a look at http://www.genunix.org/ and http://hub.opensolaris.org/bin/view/Main/downloads to see how many active distributions are based on OpenSolaris core.
  • Solaris/ OpenSolaris is more important to Oracle to let it fall apart  because it has a lot to offer in Oracle strategy of offering end to end stack of its own.

People are talking about why the 2010.3 release is not released when it is already first days of April, the answer is “A few more weeks of development and testing will gives us a more stable OS” if you want to check the latest features which will be included in the 2010.3, grab the latest build (which is build 134 right now) from http://www.genunix.org/dist/indiana/ and play with it, but keep it in mind the build is not production ready yet. If you want the source code of OpenSolaris, take a look at http://hub.opensolaris.org/bin/view/Main/get to get the source code and build the OS yourself.

I am wondering what these people are getting from spreading wrong words and incorrect news about thins they have no clue about. Folks, Solaris OS is not OpenSolaris. OpenSolaris is CCDL licensed (except for some parts which are not CCDLed (http://hub.opensolaris.org/bin/view/Main/no_source)  while the Solaris distribution contains some of OpenSolaris components and features. some value added components and well along with some license/ distribution fees and first class support from Oracle.

Well, it was my personal feelings about the whole issue of OpenSolaris/ Solaris FUDs flying around.

Win your Copy of Wiley’s OpenSolaris Bible book

OpenSolaris can be considered both a desktop Operating system and a very stable and capable server operating system specially for enterprise scale deployments where virtualization is required in different level along with SA and high availability which are inseparable parts of an enterprise.

There are few books available for OpenSoalris including Pro OpenSolaris and OpenSolaris Bible which one can choose between them. I was reading OpenSolaris bible and post a review in DZone which may helps other to get a better understanding of the book content and quality

The OpenSolaris Bible book review is located at DZone IT books Zone. This is the first part of the review and you can get familiar with the book and what it is going to cover in first 11 chapters.Some free chapters are included in the review which you can download them to get familiar with the following topics:

  • Chapter 1: What Is OpenSolaris?
  • Chapter 3: OpenSolaris Crash Course.
  • Chapter 8: ZFS.

If you have a question about the book, just post them at book review page and in addition to getting your answer try your chance to win a free copy.