Making Nagios even more awesome

by Andreas Ericsson Email

It's been quite a while since I blogged anything now, and the reason is that I, along with my colleagues here at op5, have been hard at work producing a new GUI for Nagios. Naturally it will be GPL'd, and equally naturally it will be blazing fast, awesomely pretty and contain lots and lots of cool stuff, such as our reporting tool (pretty graphs for the suits), a new flash-based network map (based on RaVis by Google), and the Merlin module.

What with me being the company's die-hard C programmer, I'm naturally taking care of finishing off the Merlin module.

As some of you know, the merlin module was originally designed to be an event transport for effortless redundant and loadbalanced network monitoring. Since modules running inside Nagios have certain restrictions put upon them, we decided to empower the Merlin module with the capabilities to insert events into a database (a rather straightforward patch). The really cool part about it is that Merlin still retains its multiplexing networking capabilities, which means that you can now use Merlin as a (very, very fast) way of communicating Nagios events to other servers.

Since merlin is designed to work with a plethora of different topologies, this means that Nagios will be the easily most scalable network monitoring system of them all. If you want to monitor Google's server-park from a single tool, you'll have to use Nagios. If you want to monitor Second Life's vast and widespread server network, Nagios is the only choice. If you want to monitor the entire internet, Nagios can do that (provided you spend "some" money on hardware ;-))

If you're a handy guy when it comes to doing certificate authentication in C, I might have a job for you though. Currently all nodes have to be configured upstream in its chain of responsibility. The capability to add random servers without modifying the configuration of running servers would be even more awesome :)

Cross Site Request Forgery vulnerability in Nagios pre-3.0.6

by Andreas Ericsson Email

Tim Starling of the Wikimedia foundation reported a cross-site request forgery vulnerability affecting cmd.cgi, affecting Nagios versions up to and including 3.0.5.

A cross-site request forgery means that one site includes a <form> tag with an "action" value pointing to a different site. The idea is to utilize a user's already valid session with a site requiring authentication to submit forms to that site that the user didn't intend to submit.

For Nagios, the scenario looks like this:
1. Random Nagios Admin (RNA) logs in to nagios, supplying valid credentials.
2. RNA goes to evilsite.com, where some lurid java-script checks his browsers history and notices that RNA has a Nagios installation by looking at the previously browsed pages.
3. evilsite.com creates a form which, using hidden variables, submits a command to the Nagios site where RNA is an admin.
4. Since RNA is authenticated with valid credentials, the command is accepted and Nagios loads it as if RNA had submitted it himself (which, for all that cmd.cgi on the nagios server knows, he/she has).

With Nagios 2, the worst that could happen is that evilsite.com disables monitoring of the network, or submits any of the other commands that Nagios accepts (invalid commands are simply discarded by the Nagios core).

The remedy to this is a patch that I wrote, which I hope will go into Nagios 3.0.6, to be released Any Day Now(tm).

The fix I wrote works like this:
1. When RNA wants to submit a command, he/she is sent to the command submission page (the one with the 'commit' button).
2. The command submission page generates a random token that gets included as a hidden variable in the form. The session data (apart from the random numbers) is also written to disk.
3. When the 'commit' button is pressed, the session token is looked up and cmd.cgi makes sure the session is valid (ie, belongs to the right user and is less than 15 minutes ols). If there is no valid session token, command submission fails and the user is told so.

What really kills the ability for off-site forms to circumvent this is the fact that the session token gets written to disk. Even if someone manages to guess the pseudo-random SHA1 session token (which is 2^160 to 1 against) they still can't make that session valid by writing it to the nagios-server's disk.

The CSRF issue is still in Nagios 3.0.5, but can no longer trigger execution of arbitrary programs by the Nagios process due to the changes made to prevent malicious exploitation of CVE-2008-5027. Its impact is thereby reduced to disabling monitoring of the network and similar actions that can validly be requested from the Nagios process through the GUI. Bad enough, but no longer a vulnerability that allows a remote attacker to run arbitrary programs on the one server in your network that can bypass every firewall one way or another.

I'm withholding the CVE details until Steven has had time to update the information with that contained in the above paragraph. In case I forget to update this blog-post, the CVE candidate id is CVE-2008-5028.

A fixed version of Nagios is available at http://www.op5.org/src/nagios-3.0.5p1.tar.gz. This fixed version is the base of op5 Monitor 4.0.1 which no longer suffers from the vulnerability discussed here.

cmd.cgi authorization bypass vulnerability in Nagios pre-3.0.5

by Andreas Ericsson Email

Recently, Tim Starling of the Wikimedia foundation reported an issue that could allow authenticated users to bypass the authorization in cmd.cgi and submit arbitrary commands to Nagios' command pipe.

The vulnerability can be proven like this:
A user without full privileges creates an off-site form to submit a comment to Nagios. In the custom webform, the comment_data field is altered to be a "textarea" rather than "text", so the user can put newlines in there (note that this can easily be done with browser addons too).

The evil user then creates the comment so that the textarea contains a newline, and lets the second line contain a completely different command. cmd.cgi only verifies that the user is allowed to submit the first command but sends the entire input to Nagios without checking it for newlines. Nagios reads its command-pipe line-by-line and has no way of picking up the username of the person that submitted the command, so it happily runs all the commands fed to it.

For Nagios 2, this wouldn't have been such a big deal. The evil user could stop Nagios entirely, which is ofcourse (very!) bad, but that's where it ends.

However, in Nagios 3, the ability to change checkcommands and their arguments was added. Authenticated users can exploit this vulnerability to cause the Nagios process to run arbitrary commands, such as emailing the Nagios configurations (with its accurate map of the network and whatever passwords are stored there) to themselves, or open up remote shell sessions originating from inside the firewall. Bad stuff indeed.

I wrote a couple of patches that completely fixes this. Those patches were included in Nagios 3.0.5 and op5 Monitor 4.0.1. All users are urged to upgrade as soon as possible.

This vulnerability has been assigned the candidate name CVE-2008-5027 by Steven M. Christey of Mitre. The CVE details are below.

======================================================
Name: CVE-2008-5027
Status: Candidate
URL: http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-5027
Reference: MLIST:[nagios-devel] 20081107 Security fixes completed
Reference: URL:http://sourceforge.net/mailarchive/forum.php?thread_name=4914396D.5010009%40op5.se&forum_name=nagios-devel
Reference: MLIST:[oss-security] 20081106 CVE request: Nagios (two issues)
Reference: URL:http://www.openwall.com/lists/oss-security/2008/11/06/2
Reference: MISC:http://www.nagios.org/development/history/nagios-3x.php
Reference: CONFIRM:http://www.op5.com/support/news/389-important-security-fix-available-for-op5-monitor
Reference: BID:32156
Reference: URL:http://www.securityfocus.com/bid/32156

The Nagios process in (1) Nagios before 3.0.5 and (2) op5 Monitor
before 4.0.1 allows remote authenticated users to bypass authorization
checks, and trigger execution of arbitrary programs by this process,
via an (a) custom form or a (b) browser addon.

Adventures in C#

by Andreas Ericsson Email

Yesterday I bravely went forth into the weird world of windows programming. Recognizing the need to familiarize myself with at least *one* language and toolset that works on both Linux and Windows I decided to check out the Mono project. Since I immediately understood that the initial threshold would be far too high and tedious for me to get past, I enlisted the help of a comrade-in-arms, Martin Almström, to teach me the basics.

For our first C# project (well, mine anyways, he's a C# guru, working on a 5MLoC project for the Swedish emergency dispatch center), we decided to write a file transfer tool (toy really), where I wrote a client that connected to his server. His server-program would feed my client a file and my client would write that file to disk.

At first, it was very confusing. The idiots at microsoft just had to re-use perfectly valid C keywords but give them totally different semantic meaning. "hey! 'static', I know... err.. no, I don't". It's also quite funny to write in a language where "int" can actually be null (but only if you really ask for it). Add to that the fact that I've never written anything in a OOO (object-oriented only) language before, and the chaos is complete.

After a while though, I started seeing the nifty things about it. It's not as fast as a proper C program (and the closed-source compiler has no clue about optimizations). The source-code has the overhead of all the OOO languages, so "Hello world" requires about 10 lines of code (steep learning curve). It does have some nice features though, such as builtin-accessor-method-enforcement and "interfaces", which seem to work similar to how the linux kernel driver API works, but on a higher level.

To make a long story short, we managed to get the program to transfer the file properly, although not very efficiently. The best speed we reached was 2Mbit/sec (on a local gigabit lan). I blame myself, tbh. I have no clue how to optimally use the Stream objects in C# and was probably doing it wrong, but getting something *real* to work the first day has to be seen as quite a huge success ;-)

The upsides:
* monodevelop is a really nice IDE.
* The executable files load and run nearly as fast as C code.
* The executables are "portable" (well, you need the mono runtime to run them under Linux, but it's the easiest way I've found so far of making windows programs from Linux).
* It's fun hacking side-to-side with a friend.
* It's fun learning new languages.

The downsides:
* It's hard to find info on how to use the compiler from the command-line.
* The mono runtime is quite large.
* monodevelop sets the "runtime options" for new projects to "MONO/.NET 1.1", which means there's a lot of things you don't have (apparently, I'm no expert).
* It seems hard to link C programs to mono apps, making many of my libs useless (well, in need of rewrite, anyways).

Next time we'll probably write a GUI program, which I've never done. My secret plan is to learn this stuff well enough to write a git GUI for windows.

Object oriented configuration

by Andreas Ericsson Email

Ok, so I've been having this idea for quite some time (close to 3 years), but for some reason it has never really interested me all that much; Partially because it's such a radical change, and partially because, at least for a transition period, one would have to support both the new and the old configuration style syntax (perhaps indefinitely).

Since objects in nagios inherit a lot of stuff implicitly in Nagios, it's quite hard to follow how that works.
Since the only *real* object in Nagios is the "host", it doesn't make sense to treat any other object as a *true* object.
I strongly feel that groupings are something that should belong to the user interface only. They should not be so overloaded as they are today ("Oh, you want to view all hosts in areaX together? Sure, make a hostgroup", "Oh, you want to add a service to all hosts of typeX? Sure, make a hostgroup", "Oh, you want to view all hosts of typeX? Sure, make a hostgroup").

Because of this, I hereby propose to create a new configuration file format for Nagios that will
a) Make it easy to assign several services to similar types of hosts.
b) Make it easy to find all hosts of a certain type and group them in (insert-ui-of-choice-here).
c) Save quite a lot on typing.
d) Be a lot clearer regarding its inheritance than the current template system.
e) Make it possible for the community to share host type configuration profiles.
f) Make it possible to add hosts without restarting Nagios.

"Ooh, this bugger sets his goals high", you might think, but the answer to all of the above are actually extremely simple: Object oriented configuration, with object inheritance.

So how would that work? Well, since I can't explain what a nose looks like without getting caught up in all sorts of weird stuff, I'll just show an example here instead.

define host {
  host_name    foo_host
  alias        This is the Foo Host
  address      foo_host.example.com
  type         win2k3-fileserver
}

define host_type {
  host_type_name   win2k3-fileserver
  extends = windows-server
  # this applies to both the host and the services
  contact_groups = fileserver-admins

  service_template {
    check_interval     15
    retry_interval      3
    max_check_attempts  5
  }

  services {
    define service {
      service_description  Disk E
      check_command        check_nt_disk!E
    }

    define service {
      service_description  Disk F
      check_command        check_nt_disk!F
  }
}

# Oops, this host has one disk that no other fileserver has,
# so we define it separately, but we use the service_template
# from that profile anyway
define service {
  use                 win2k3-fileserver::service_template
  host_name           foo_host
  service_description Disk G
  check_command       check_nt_disk!G
}

The best thing is that this could (potentially) be written as a module, although that would (currently) be exceedingly difficult, since the module would then have to take care of preventing core Nagios from detecting errors in the configuration (unless one forces some objects to pre-exist in the regular config files, I dunno). I'm sure it's doable, but it won't be pretty :-/

<< 1 2 3 >>