Merlin contributions

by Andreas Ericsson Email

Merlin development has really kicked off. Single-server database support is in production at all our customers (all 350 or so of them), and people in the community have started using it for production use in a distributed environment.

Three people in particular have contributed awesomely to making Merlin better for distributed environments. The first to pick it up for this purpose is a guy named Russel Jennings. He's written several concise bug-reports and done tireless testing with various versions to find where some bug was introduced and which versions of the Merlin daemon work well together. A great big thanks for that, Russel!

The other person is a guy named Sean Millichamp. He's gone the extra mile and has started sending in patches for bugs he's found. So far, he's contributed with 10 patches, making him the second most prominent developer for the merlin module-daemon pair. So far, his patches hold very high standard indeed and I have great hopes that we'll see more of his excellent contributions making it into future releases.

Jean-Marc Le Fevre has also contributed some minor patches that he deserves recognition for.

Numerous other people have reported issues with Merlin and contributed to the Wiki and HOWTO's. Thank you all :)

Merlin progress report

by Andreas Ericsson Email

I'm clearly a workaholic when I'm fiddling with stuff I really like, and all the community interest in Merlin and Ninja lately has just made me a pure-bred hacker fanatic.

So I've implemented the state retention stuff in Merlin. Turns out that all that was really required was to make sure the status and object import works ok and is up-to-date (so I implemented an automagic way of making that happen). Then I can just read the current status from the database. I use an array sorted by object name ("host_name" for hosts and "host_name;service_description" for services) so I can use a binary search. 3000000 lookups of some randomly chosen nodes in a config with 15k hosts complete in just under 2.2 seconds. Quite impressive. Especially so when about 0.8 seconds is spent loading all the states and sorting the array in the first place.

I've also had a chance to look over the cross-host event transport stuff, which was subtly broken due to a brainfart of mine in a tertiary operation. I've tested it and it works just fine now agan.

With the changes mentioned above, I Merlin is rapidly approaching production quality in terms of its planned feature-set, so I've just released v0.5 with the hopes of attracting some more testers.

Let it rip, people, and make sure to let me know how it's going. Merlin can be downloaded from our git repositories using the following command:


  git clone git://git.op5.org/nagios/merlin.git

Cheerios for now.

Nagios for huge networks

by Andreas Ericsson Email

With the recent changes to the Nagios core development team, patches have been flooding in to the nagios-devel list. There's been such a flurry of improvements that I've actually had to stop working on Ninja and Merlin entirely over the past two weeks and just work on testing, adding and commenting on patches sent to the list. In view of that, I must say I'm convinced Ethan did the right thing when he extended the core dev team a bit.

However, this post is mostly about one particular patch from a guy named Jean Gabès. The patch speeds up Nagios' circular-parent-child dependency checks a *lot*. In a network 300 levels deep (root-host -> lvl1-child -> lvl2-child -> ... -> lvl300-child) where each level in itself has 500 hosts, vanilla Nagios had to be Ctrl-C'd out of after 53 minutes, while Nagios with Jean's patch completed in less than seven seconds (a speedup of more than 51000%!).

For a more modest network of 15000 nodes, (30 levels deep, 500 hosts in each level), vanilla Nagios completed a configuration check in 3 minutes and 33 seconds, while patched Nagios did the exact same job in less than one second.

Awesome, Jean. Thanks a lot indeed :-)

libgit2 moving forward

by Andreas Ericsson Email

Just a short post to announce myself as co-maintainer of libgit2. I don't know how long this will last, but Shawn O. Pearce (one of the truly heavy names in git development) will be very busy with the Google Summer of Code project, where core git has two student slots.

Shawn has been busy quite a long time (understandable, as he still makes significant contributions both to git.git and jgit.git), and only me and Ramsay Jones have made any significant contributions to libgit2 since it was announced back in November 2008.

Hopefully, this will mean whatever patches there are will get applied faster and that development will move forward at a quicker pace.

Personally, I've got a lot of un-committed work lying around that pertains to index reading and writing. I'll be working on finishing that up and sending it out to git@vger for review. The sooner we can get *some* part of the library working properly the better, imo, as git.git can't start using it until it does.

Oh, and libgit2 is currently looking for additional contributors. It doesn't really matter what you want to work on. Just clone the repo from git://repo.or.cz/libgit2.git and start hacking. Patches could go either to me (ae@op5.com) or to the git mailing list (git@vger.kernel.org).

The future of Nagios

by Andreas Ericsson Email

Some of you might know that a fork of Nagios has appeared recently. If you don't, go read about it in the nagios-devel mailing list archives. They're available on sourceforge somewhere, but I can't be bothered to look for them right now.

Working for a company that makes a living out of supporting and writing addons for Nagios, I must say I'm a bit sad. Being an enthusiastic and optimistic guy, I must say I'm thrilled.

A couple of facts before we set off:

  • The fork was instigated largely by german members of the community. It appears to have been spearheaded by a german company (though I don't know this for sure) that makes its living selling customized Nagios solutions and/or support. I don't know this for sure, but it sure looks as if that's what's happened.
  • The german company have unlawfully used the Nagios trademark after being asked not to do so. It has also registered Nagios as a trademark in Germany, to which is a huge slap in the face of an opensource project. They are naturally not on the best of terms with Nagios' founding father, Ethan, at the moment.
  • Ethan has been absent working with the aforementioned lawsuit (or whatever it is a trademark violation results in when friendly talk is no longer enough), and also trying to put together a new webbased user interface for Nagios.
  • Patches from all levels of the community have been erratically ignored during Ethan's absence. Some were picked up, but as many or more slipped between the cracks.
  • Ethan has always been the single person with commit access to the Nagios CVS (yuk) repository.
  • The fork uses git to track their patches.



The community developers have voiced a complaint that they cite as the primary reason for the fork:

Nagios is not being developed fast and openly enough.

I agree with this, and I'm currently discussing with Ethan about expanding the developer-base. Unfortunately, the scarce resource "trust" is even scarcer for those developers who joined the fork, which leaves the available candidates rather few. Happily, I count myself among them, and apparently so does Ethan. He emailed me away from public channels asking if I'd be willing to become a core developer, and op5 has graciously given a tentative promise to devote one to two days per week to Nagios development / patch management. Nothing's settled yet, but development has to continue even if the core maintainer takes a leave of absence, so one way or another, we'll make sure this happens.


In a perfect world (ie, one where I get to decide everything ;)), here's what will happen:

  • Nagios incorporates the good changes that the fork produces.
  • The benevolent but previously frustrated developers from the fork hop back to working on Nagios when they see it's once again moving forward. They could actually do that by keeping on working on their fork, although that would set them apart from the Nagios community a bit rather than make them members of it.
  • Nagios development picks up its pace and a new GUI is added to it which fulfills everyones wildest dreams.
  • Nagios development moves to using git instead of CVS. Since git actually invites people to fork the code but makes it incredibly simple to merge those changes back to the pre-fork project again, there could be any number of forks and Nagios would be the grand total of the best of all of them. Who would win on that? Well, the Nagios users for a start, and Nagios itself, and Ethan, and every company making a living off of Nagios one way or another. So that'd be a win-win-win-win-win situation? I like it.




For those who wonder where I'm standing in all this, I'll be working with Ethan to make the community developers happy while at the same time trying to prevent the community users from living through the confusion that a long-lived fork means. In the end, I hope Nagios becomes a better product with a stronger and better community backing it, which seems rather inevitable now that more people than ever are working frantically at making it so. Hopefully it results in a happy community where The Right People(tm) are part of a Nagios steering committee or some such.


Time will tell. It always does ;-)

1 2 3 >>