Zabbix vs Nagios comparison

Author

Kristof Kovacs

Software Architect & DevOps Consultant

Hello, I’m Kristof, a human being like you, and an easy to work with, friendly guy.

I've been a programmer, a consultant, CIO in startups, head of software development in government, and built two software companies.

Some days I’m coding Golang in the guts of a system and other days I'm wearing a suit to help clients with their DevOps practices.

Table of Contents

For years, I was using Nagios for server monitoring, but now I'm in the process of switching to Zabbix. I also use a third, much simpler system to monitor the main monitoring system.

Here is a practical comparison of Nagios vs Zabbix:

Zabbix #

Pros: #

Zabbix monitors all main protocols (HTTP, FTP, SSH, POP3, SMTP, SNMP, MySQL, etc)
Alerts in e-mail and/or SMS
Very good web interface
Native agent available on Windows, OS X, Linux, FreeBSD, etc
Multi-step web application monitoring (content, latency, speed)
Can visualize and compare any value it monitors
System "templates"
Monitoring of log files and reboots *
Local monitoring proxies **
Customizable dashboard screens
Real-time SLA reporting

Cons: #

Zabbix is more complex to set up
Escalation is a bit strange ***
No flapping detection
Documentation is spotty sometimes
Uses a database (like MySQL)

Nagios #

Pros: #

Nagios monitors all main protocols (HTTP, FTP, SSH, POP3, SMTP, SNMP, MySQL, etc)
Alerts in e-mail and/or SMS
Multiple alert levels: ERROR, WARNING, OK
"Flapping" detection
Automatic topography display
Completely stand-alone, no other software needed
Web content monitoring

Cons: #

Nagios needs SSH access or an addon (NRPE) to monitor remote system internals (open files, running processes, memory, etc)
Web interface is mostly read-only ****
No charting of monitored values (different systems like "Cacti" or "Nagiosgraph" can be bolted on)

* Albeit log and reboot monitoring means that one gets an "ERROR" and an "RECOVERY" message instead of one "CHANGED" or "REBOOTED" message. One gets used to it.

** For example, when there are multiple sites, each site can have its own "proxy" (local Zabbix monitor), taking load off the main Zabbix server, and collecting data even if the connection to the main server is severed.

*** It's great that higher levels of escalation get "ERROR" alerts only after some time; but in Zabbix their "RECOVERY" messages are delayed too. I don't see the point.

**** On the web admin of Nagios, one can acknowledge problems, disable alerts, and reschedule testing. But one can not add a new host or service.

Of course, both systems have much more features than what's listed here. I only wanted to list the points that I base my decision on.