In Computer Science, particularly with optimizing compilers, ideas are often judged at least partially by their performance on certain benchmarks. The accuracy of these benchmarks is therefore an important concern, but little attention has been given specifically to this issue.
We provide a partial remedy by measuring, under a variety of conditions, the accuracy of a few benchmarks that measure execution times of programs written in a high-level, garbage-collected language. Standard deviations of large sets of execution times are used as a metric.
From such experiments we conclude that, among other things, both running processes and garbage collection can introduce introduce some variation into execution speeds, but an active network connection need not be greatly detrimental. Another result shows that redirecting the ouput of the benchmark to a file had some interaction with the garbage collector that caused large fluctuations in running tmes.