Collecting I/O performance metrics
We started to collect quite precise disk performance stats. First week of data is here!
These stats are collected from the lowest service to the highest, a significant portion is already being monitored this meticulously.
This includes the low-end Entry20 plans and Free seedboxes, as well as the highend Dragon, SSD and Dediseedbox plans!
What really is new, is that we are collecting this data straight to our automation system to be able to provide better provisioning algorithms to choose lower utilization server. However, these stats really reveal that there’s not much need for that, only to spot overloaded servers easier. We are using the linux iostat utility to collect this data.
The week average for all servers are:
Average disk utilization % is 21.18
Average IOPS Read: 302.88
Average IOPS Write: 37.02
Average Throughput Read: 90.09MB/s
Average Throughput Write: 2.69MB/s
Average service time: 4.88msec
Disk utilization % is not disk capacity, but from performance stand point.
What do these performance stats reveal?
All of the servers are very high performing and well utilized, but also far from their maximum performance capability, leaving very nicely room for peak periods!
There really is very very little writing on these devices, this speaks for the case of SSDs which have finite write cycles.
Service time, is the latency for IO requests, 4.88msec is AMAZING, considering disk seek average is supposed to be around 7.9msec. This means all the various optimizations and caching methods are really working well! 🙂 What does this tell is that SSDs might not be as advantageous as previously assumed, their strongest point is the extremely low service times, which allows for very high IOPS figures.
Disk utilization percentage is very low, which means there is plenty of room for temporary heavy activity (archiving, chunk checksumming etc.), and this again speaks against SSDs: If magnetic drives are really performing this well, the gain from SSDs might be marginal in the big picture.
We will be monitoring these, and coding tools to better analyze the data. These are just the first averages, and we will gain a better picture in longer run as we start looking for medians, peaks, lows, categorize by server type etc.