...feed that through a shell script that looks like this:
http://www.zillow.com
http://www.Picnik.com
http://www.buddytv.com
http://www.Smilebox.com
http://failblog.org/
http://icanhascheezburger.com/
http://www.wetpaint.com
http://www.feedjit.com
http://www.questionpro.com/
...
#!/bin/bash
while read line
do
wget http://uptime.netcraft.com/up/graph?site=$line;
done < startup-index.txt
to get a few hundred files from Netcraft.com's "What's That Site Running?" page in the local directory.Now the fun really begins, the filtering! We want to see what each company is using to put their best face forward to the public. It all starts at the server. No server, no nothing. Every one of the 'graph' files is an html-formatted page with lots of extraneous information. This filter collects the list of ALL servers used, regardless of past or present.
grep -h "<span style=\"white-space: nowrap\">" graphs* | sort | uniq
and the result of that is:F5 Big-IPNow feed that through this script which narrows the search and we get our server numbers:
FreeBSD
Linux
NetBSD/OpenBSD
Solaris 9/10
unknown
Windows Server 2003
Windows Server 2008
#!/bin/bash
while read line
do
echo "$line," `grep -h "$line.*<span" graph* | wc -l`
done < servers.txt
In this case (December 2009) we get the following:F5 Big-IP, 21On to web servers, using the same process, distill the list of web servers from the "graph" files with a filter like this:
FreeBSD, 8
Linux, 289
NetBSD/OpenBSD, 1
Solaris 9/10, 3
unknown, 11
Windows Server 2003, 71
Windows Server 2008, 22
grep -h ' was running ' graph* | sort | uniq > webservers.txt
which needs some editing and final refinement with grep -h .* webservers.txt | sort | uniq > webservers
then we are ready to apply the script that will tally the numbers #!/bin/bash
while read line
do
echo "$line," `grep -h "$line on" graph* | wc -l`
done < webservers
And the final score is:Apache, 223Since the whole point of this exercise is to expose what really happens behind the curtain, the rivalry is obviously Microsoft against Open Source and everybody else. The following charts are mashups from the results above.
Apache-Coyote, 13
Google, 5
Jetty(6.1.5), 1
KWS, 1
lighttpd, 1
Microsoft-IIS, 103
Mongrel, 13
nginx, 51
Ning, 2
Resin, 1
Server, 2
SSWS, 3
Sun-ONE-Web-Server, 1
thin, 2
Thrivesmart, 1
unknown, 2
WWW, 2
Any questions?