April 21, 2003
Benchmarking log performance?
Has anyone done any comprehensive benchmarks of various logging solutions for Apache? Something that compared the relative CPU/RAM/resource consumption of using various logging techniques. The solutions range from using rotatelogs, cronolog, mod_sql, syslog and I'm sure others. I'm curious how they stacked up.
I'm most interested in quick access to specifically targeted URL consumption. That is, show me quickly what URLs are being used and with what sorts of frequency and quantity.
I'm sort of surprised to not see anyone having done a thorough head-to-head comparison of them. Or maybe I'm just not searching for the right keywords...
Suggestions?
Can't point you to any specific benchmarks, but I like the logging solution provided by mod_log_spread [1].
I haven't actually used it yet, but it feels "right", especially after my experience working for a company that needed to collect logs from a few hundred servers distributed globally. The servers could see heavy loads at times (sometimes seeing hundreds of thousands of simultaneous persistent connections feeding ms/real video streams). We had a a hierarchy of machines dedicated simply to collecting logs (with FTP) off the servers, parsing them, and getting the data into Oracle. The process was so costly that we usually couldn't process 24-hours worth of logs in 24 hours. This meant we also had to develop parallel solutions using something like SNMP to provide a real-time view of the network.
Using mod_log_spread the log info gets passed from httpd to the spread deamon on a unix domain socket (no disk access), from which it is broadcast using UDP to the log collectors. Because it's broadcast you can have as many collectors as you want without adding any additional load on your network.
Nice side benefit: you can also use the spread daemon as a software load balancing solution for a farm of apache servers using Wackamole and mod_backhand [2].
[1] http://www.lethargy.org/mod_log_spread/
[2] http://www.backhand.org/







