QOS is a scalable, cross-platform, object-oriented monitoring framework capable of measuring and reporting on system and network parameters throughout a computer installation. QOS is currently deployed at large World Wide Web sites, internal computation farms, and developer networks.
QOS, which stands for Quality Of Service, is comprised of a central Server and Agents installed on systems throughout the network. The Agent records system parameters, such as the running processes, available memory, and network connectivity, and passes this information on to the Server. The Server records this information decides whether the information merit notification by comparing it to configured Triggers and sends email notifications if the value exceeds the trigger.
QOS is written using the Python programming language. Python is an object-oriented scripting language optimized for best speed on multiple platforms, be it UNIX, MacOS, or Windows. Information on Python is available at Python's Web site.
QOS is designed with specific goals in mind:
Scalability: QOS is based on the Python asyncore framework which can handle thousands of TCP connections with little overhead. Since the Agent is doing the difficult and time-consuming work of actually gathering information, the server is free to manage Agent reports and alerting. QOS was originally developed to monitor a very large Web site comprised of over 300 computers separated into more than 20 functional areas. The main server was a Intel Celeron 300MHz processor equipped with 128MB RAM and one IDE disk, and it handled the load easily while at the same time running a busy Agent and serving as an SSH bastion host!
Manageability: QOS is a centrally managed system. The Agents have no local configuration beyond the name of the QOS Server. When an Agent starts up, it contacts the server for it's configuration, then immediately begins collecting data based on that configuration. The configuration can be changed at run-time without requiring a Server restart; just one push of a button and the Agents update themselves at their next check-in. The configuration itself is heirarchical and is simple to generate from a script for larger installations.
Stability: The QOS Server is written as a simple, robust connection manager. The Server is immune to large changes in the operating environment, as it is not frequently calling external programs and collecting changing data. While agents may malfunction or crash, the server will simply report the agent is no longer collecting data and continue on its merry way.
Security: QOS's client-server relationship encourages strong network design as data flows from higher-security Agents towards the Server, which can be located on an administrative network or at the outer layers of a layered system architecture. With other server-based systems, the server must contact the systems it must monitor directly, which involves comprimising firewalls to allow that server access. These holes could be exploited by attackers to invade machines that would otherwise be protected. With QOS, the Agent does not accept inbound connections and therefore provides no vector for attack. The Server (and Agent on some platforms) can also run as an unprivileged user to prevent direct attacks against it from yielding administrative access, and the asyncore framework protects against Denial-of-Service (DoS) attacks from draining significant amounts of server resources.