Traffic data collection

Many high traffic usage people are wondering how the data traffic is calculated and stats collected.

Comparing what our metric shows to ruTorrent is always going to be at the very least slightly misleading, because ruTorrent uses different period and is not as reliable for traffic consumption metrics as the kernel level we use. As far as we are aware ruTorrent metric at the very least is missing protocol traffic. Never mind, ruTorrent cannot meter the HTTP, FTP, SFTP, BTSYNC etc. traffic at all.

This has lead to many questions, in partial to the rolling counting, of inaccuracy.

We like to compare this to a phone bill where many users will be in disbelief how many minutes they have used. But data does not lie at all, and there is no free lunch. The data caps we offer are extremely high to begin with, but many still come close to, or exceed their data caps. We looked in past why data caps are necessary evil.

So here are the methods we use to calculate for traffic. If you want to, you can request full logs of data collection for your account from support as well. I hope this answers any questions you may have had about the traffic data collection and processing.

Gathering the data at kernel level

The basic iptables command is:

/sbin/iptables -A OUTPUT -m owner --uid-owner {$thisUid} -j ACCEPT

$thisUid is the UID of the user, the numerical code for that particular user. -m specifies iptables module “owner”, which is a specific iptables module for tracking per user.

Gathering the data is done with following command, in 5 minute intervals:

/sbin/iptables -nvx -L OUTPUT; /sbin/iptables -Z

What this does is takes the data consumption and then resets the counters to zero. The output looks something like this:

Chain OUTPUT (policy ACCEPT 2856 packets, 5357236 bytes)
pkts bytes target prot opt in out source destination
4743 11286991 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 owner UID match 1001
255570 568550930 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 owner UID match 1005
109538 287862296 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 owner UID match 1004

The header packets & bytes on policy ACCEPT is actually unmatched traffic, what follows is the per user matched traffic. Packets then bytes.

This output is then parsed, we take the line with correct owner UID match section, and from that line the raw bytes uploaded.

Saving logs and processing them

This data is then saved to per user log file  which includes exact timestamp in the format of: Y-m-d H:i:s which yields an output like 2017-01-03 06:25:03

The collection cron does not do anything further, and log processing and stats gathering is done in separate cron. This is to have layer separation which makes code easier to manage and each cron remains sensibly sized.

The stats script then goes on and reads these log files, it takes assumed number of lines from the log for the past 35 days and loops the lines, parsing each of them. Then it starts binning the data to “buckets” of month, week, day, hour and 15minutes based on the timestamp, the exact code to check for the month is:

if ($thisData['timestamp'] >= $compareTimeMonth) $data['month'] += $thisData['data'];

This means if the timestamp of parsed log line is equal or greater than 1month ago, then add the bytes from this line to the “bucket” for 1 month.

We also do checks for validity for the data, parsing the line yields false if the number of bytes exceeds 7500MB, and the processing cron checks that the data consumption does not exceed the link maximum + a safe margin. To parse the date to timestamp the code uses standard PHP function strtotime() to parse it, which does very excellent job of parsing multitude of formats, including the standard Y-m-d H:i:s which is very widely used format.

Traffic limits handling

This is another cron which takes the processed stats, and from the “raw” (unformatted number) column, month bucket takes the bytes uploaded.

If the month data consumption exceeds that of traffic limit, save runtime information that traffic limit is enabled and send the limit to rTorrent via SCGI. It is rechecked periodically and the data traffic limit is enforced. Minimum period is 3 days before lifting the data traffic limit.

Add a Comment

Your email address will not be published. Required fields are marked *