Reflecting on Year 2016 and What is Coming For Year 2017
It has been quite an year over here at Pulsed Media. New datacenter, new network, new servers. In March we moved to the new datacenter, moving the hardware and bringing it up. To our surprise it went super smooth and faster than expected, in only a few short hours we managed to move every single production server from old site to the new site and bring them online with almost zero hardware failures.
We knew upfront the site will have a very nice PUE, and we saw glimpses of that during the spring, but summer was coming. Even during Summer the PUE was much better than at the old site.
Lots of preparing and a few new machines during the spring and summer. Come fall and early winter and new batch of servers arrive. We are very happy to say that this early winter we have reached a boring, routine like mass production point in our new server setups. It was like that, but the process has now been honed in and is quite fast. It is quite relaxing and kind-of therapeutic work in it’s dull and routine way. Infact the initial disk checks and raid syncs takes almost all of the initial server production life. For a batch of 12 drives, 3 servers, it takes less than 2 hours for full hardware build, QA and OS installation from the moment you start unpacking the drives, including resolution of potential hardware issues. This process is quite fast and efficient for single person to do, and can be significantly under 2 hours starting from the point where the drives is unpacked, to the point server is racked, networked and online processing the initial raid array sync.
Last holiday season we were planning to get more SSD servers online, which we did (about quadrupled in numbers, while heavily slashing prices), and bump disk storage on magnetic drive series, which we did. We were looking forward to add higher capacity drives by using 8TB drives, which we did but not in the numbers we were hoping for (performance comes first).
Bonus disk quotas have increased by more than 30% as well, despite the cuts we had to make to this offer due to the stagnating hard drive prices for years now.
We were looking to put in place better and more quality controls in place, which we did much more than we ever thought a year back, down to continuous performance monitoring, a plethora of tests, real time environment monitoring in large scale; no, not just hdd and cpu temperatures, but the whole datacenter down to individual spots, intake and exhaust temps, to power draw per phase and in multiple places. Even enhancements to the basic server rollout in form of memory and cpu testing processes, all of which is stress tested during the quick 2hour turnaround in building the servers. We have even done great work on analyzing all of the performance data from multiple sources.
HDD and SSD reliability
Year ago we were reflecting on the hardware issues from 2014. Which are a past memory now, and were largely due to the infamous Seagate ST300DM001 HDDs. We are happy to know that kind of situation could never happen anymore as we got now such a variety of drives: HGST 2TB, 3TB. Seagate 2TB, 3TB, 5TB, 8TB (2 different models). Toshiba 3TB, 4TB, 5TB. Western Digital 2TB and coming soon 8TB. Most of our drive purchases are Toshiba like plan was a year ago, and still most of the drive failures happen on the remaining 3TB Seagates which are becoming a minority now and we might start recycling these servers during the up coming year to remove the remaining 3TB Seagates from production.
Toshiba 3TB failure rates are a bit higher than expected at around 5% annualized, but ~2/3rds of the failures happened from same purchase batch. So we are assuming those drives were mishandled somewhere in the logistics line. We haven’t had so far a single HGST fail (Only around 60 drives operational), neither a single 4TB or 5TB Toshiba fail.
New generation SSDs are so far exactly 0 failures, to very much our surprise from the 2013-2014 and earlier generations which were failing left and right. The SSD market is maturing and they have got their act finally together, first of which was the Samsung 850 series. All the 3D V-NAND models seems to be highly reliable so far, and write endurance is where it needs to be.
Utilization of RAID5 on all new servers helps tremendously, and RAID10 for the Dragon series. Daily backup routines for the SSD model lineup was the most economical means to provide a degree of redundancy for the SSD offers, albeit here we have some work to do, basically scaling it and potentially adding some more safeguards to avoid accidental deletion of the backups in certain edge cases which could lead to rsync believing the data is not there anymore (on the SSD server) and hence deleting the backup. To avoid that, we would need a partial versioning system. We are considering to add a secondary offsite backup location to achieve this goal.
Networking and 10Gbit
For year 2017 we are hoping to make some significant network upgrades, and potentially bringing 10Gbit speeds all the way to each server as 10Gbit is becoming an economically viable solution, without having to build humongous servers for 200+ users per server. We are planning to have the usual goal of 12 users per server, which is 4 hard drives, and thus 3 users per harddrive. With SSD drives this target can be 40 users, for 4 SSDs.
We’ll be looking how to better leverage SSDs, as getting to use the full performance potential of SSD has proven to be problematic to say the least. When there is no peers to accept data, there is no peers to accept data. It is starting to look like the only solution to this problem is to increase capacities, so an user can have more torrents running, thus higher performance utilization on the SSDs, along with 10Gbit to the server update for the rare burst moment.
The thing with 10Gbit is that it’s not really required, which is the reason we have not emphasized it. It is only needed for the rare occasion where there is actually demand for more than 1Gbit speeds for more than a few minutes. Hence, 10Gbit has been largely about just marketing for Seedboxes. With SSD seedboxes we have seen the true need for higher than 1Gbit speeds, but even with them it is not so common situation. Rolling out 10Gbit to servers will be all about a few edge cases where the bandwidth demand actually is high, which is much more rare than generally expected. Hence it is most likely we will have 10Gbit to server, but torrent instances still restricted to 1/1Gbps speeds, or even the most unexpected: Higher upload than download, such as 1Gbps downstream, 2Gbps upstream. Restricting download is about drive I/O performance: Writing takes much much more effort than reading, and as such is much more limited than upload.
Managed Dedicated… Or Virtual Seedbox?
During the upcoming year we also expect to be working on virtualization quite a bit for the next generation of Managed Dedicated Seedbox. We are looking into the viability of virtualization to consolidate benefits of larger server and dedicated resources. The current plan is to test dedicated ram and dedicated slice of storage I/O with shared CPU and Networking, and with 10Gbit to the server dedicated slice of network capacity. Ram would be multitude of number of instances on the server, so if 48G ram, then 3x16G instances (with tiny portion shared via memory ballooning for the host system), which would mean 4 drive RAID5 server, with storage I/O limiting that not single instance can take the full performance of the array, but can burst beyond their slice for short period of time. Network either fully shared, or restricted to a slice of the capacity.
An example, of this would be the MDS-4T. We would take an 6Core Opteron, 48GB ECC Ram, 4x5TB RAID5 server with 1Gbps (or 2x1Gbps) network and slice it to three slices, with partial sharing of CPU, giving each user 4 Cores, 15.5G Dedicated Ram, ~5TB of RAID5 redundant storage. The MDS-12T equivalent could be 2x6Core Opteron, 96G Ram, 4x8TB giving each user 8 cores, 47.5G Dedicated Ram, and ~12TB RAID5 of semi-dedicated disk space. Since drives are more modern, and our setup can achieve many times the storage I/O performance of the current MDS series these would be significant bump in performance, redundancy and for similar pricing.
We would appreciate feedback on this plan, as after finishing the details we would quickly move to migrating users over to these.
Consolidation of servers to Finland Datacenter
This has been long time coming, and we have made progress on this. With the volume of servers, and the high capital investment requirements it has for sure taken it’s time, but we are now looking at the possibility that everything would be consolidated to our own hardware during the year 2017, and hopefully by the summer 2017.
This means a lot of new servers, a lot of racking and networking to be done, and as mentioned above, new R&D and software platform for the virtualization.
And most importantly …
We will strive to increase the quality, performance and value of the offerings in any small or big way we can. We also look forward to building a number of new servers during 2017, first pallet of which is scheduled to arrive late February, early March! We are also looking forward to test out new server models for 2017 and forwards, also with more testing units scheduled to arrive on the very same pallet.