[opensuse-factory] Announcing download.o.o access metrics

The full announcement can be viewed at release-tools.opensuse.org [1] along with images. Adding to the variety of metrics already captured at metrics.o.o [2], I have added download.o.o access metrics [3]. These metrics are sourced from the Apache access logs produced by the download.o.o machine. The goal of parsing the logs was to provide some insight into product adoption and long-term usage, in addition to overall project health. http://release-tools.opensuse.org/image/metrics.opensuse.org-access-stacked.... The logs cover data from 2018-06-20 (and ingested daily going forward) to 2010-01-03 and amount to roughly 24TB of raw data. After exploring a few tools, like telegraf [4] (since commonly paired with influxdb [5]), they were found to be lacking in the speed department. For example, telegraf could not even handle 1000 entries per second [6] which would require well over three years to parse the data (reduced to over 6 months using concurrency if it supported that). Influxdb also couldn't handle the raw data (even a single day) as I had hoped to use it to perform the aggregations. As such, short of finding a magic tool which would still require customization for the custom log fields and meaning I opted to write a tool [7]. Given the speed sensitive nature of the problem I tested the primary scripting language of the openSUSE release tools, python, and compared it to PHP which I knew is generally faster. A simple test running a "starts with" on each log file line was an order of magnitude faster in PHP and the difference widened the more processing that was added. As such I opted for using PHP which was fast enough for the job while providing scripting language convenience. The end result was ~500,000 entries per second per core with full concurrency supported. Using this solution the last 8 years of data was processed and summarized in ~23 hours using 7 cores of an office machine. Going forward only the last day needs to be summarized which takes a minute or so. For those interested the 24TB was summarized to roughly 12GB of data which is then aggregated to roughly 8MB in influxdb. The 12GB lives on metrics.o.o in order to aggregate new days against previous data. The tool could be changed to drop data past the largest aggregation interval (ie a month), but if the aggregation algorithm is changed it would require the summary data. For further details about the tool or to review it see metrics/access directory [8] and README. One of the areas of interest was the number of beta systems Leap receives. The release schedule for the last three releases of Leap may be used to annotate the graphs by enabling the corresponding annotation at the top of the dashboard. The individual product series may also be isolated by clicking the product in the legend (ctrl+click to select more than one to isolate). The time range may also be changed using the tool in the top right (next to refresh button) or by selecting the area on graph (left click, hold, and drag to end of area desired). After focusing on 42.2 and 42.3 Beta phase we can see several thousand systems for both, but less for 42.3. It would be interesting to know if that reducing is a result of the rolling release model or something else. http://release-tools.opensuse.org/image/metrics.opensuse.org-access-beta.png One item to note is that, SUSE IPs (such as openQA) are not currently filtered out of the data and as such depending on usage may bump up the beta numbers. This is something I have not yet explored, but should not be too difficult to filter assuming an IP list or user-agent. The extreme long-tail of systems on old products is interesting and would seemingly indicate either neglected installs, laziness, or fear of updating, but given around a quarter of openSUSE systems are on releases beyond end-of- life [9] it is a bit concerning. :/ It may make sense to add an annotation containing product end of life dates. When compared to the last two versions of Leap_, Tumbleweed usage amounts to nearly half of one Leap release or a fifth of systems on supported releases. http://release-tools.opensuse.org/image/metrics.opensuse.org-access-stacked.... For those interested, in more details there are three collapsed sections at the bottom of the dashboard which contain additional breakdowns of the data and output from the tool. For example, you can see the request counts by unique system by product. Although the averages are reasonable, the maximums are extremely high. Such maximums seemingly indicate either spam or heavy UUID reuse. Changing the aggregation frequency to day shows a very flat series that seemingly indicates automation. http://release-tools.opensuse.org/image/metrics.opensuse.org-access-average-... Another area of interest is the steady increase in ipv6 traffic to roughly 10% of current unique systems. http://release-tools.opensuse.org/image/metrics.opensuse.org-access-unique-p... The tool output includes the raw log size the metrics represent for the current time interval in addition to the number of invalid entries encountered. From reviewing a large number of the entries marked invalid they indeed are generally bogus, attack attempts, or incomplete requests. If we see a large decline in system counts and huge spike in invalid counts that should be clear there is a problem with the logs or tool going forward, but the most recent numbers, before the log format was broken, show the lowest invalid counts. The invalid log entry counts line up nicely with the big hole in the data. http://release-tools.opensuse.org/image/metrics.opensuse.org-access-invalid.... If the time range is change to a year and the aggregation frequency (top left) is changed to a day we can very clearly see the correlation. It is even clear that the day before the big hole is the day the error was made as half the entries are invalid and log size is in between the day before and after. http://release-tools.opensuse.org/image/metrics.opensuse.org-access-log-size... http://release-tools.opensuse.org/image/metrics.opensuse.org-access-invalid-... Similarly, if the unique by product (stacked) is reviewed by day another pattern exposes itself. A consistent drop in unique counts by nearly 20%. In other words 20% of systems have weekends. :) http://release-tools.opensuse.org/image/metrics.opensuse.org-access-stacked-... Also note that one can export the data as CSV in addition to viewing a graph full screen by clicking on the graph title. I look forward to receiving feedback and insight after people explore the data. While reviewing some of the raw log data I discovered a fair number of interesting and odd entries. I will summarize some of the highlights below (excluded from mailing list announcement). Enjoy! [1] http://release-tools.opensuse.org/2018/06/22/download.o.o-access-metrics.htm... [2] https://metrics.opensuse.org [3] https://metrics.opensuse.org/d/osrt_access/osrt-access [4] https://github.com/influxdata/telegraf [5] https://github.com/influxdata/influxdb [6] https://github.com/influxdata/telegraf/issues/3539 [7] https://github.com/openSUSE/openSUSE-release-tools/pull/1578 [8] https://github.com/openSUSE/openSUSE-release-tools/tree/master/metrics/ access [9] https://en.opensuse.org/Lifetime -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On Friday, June 22, 2018 12:29:42 PM CDT Jan Engelhardt wrote:
So what caused the hole in 2018-01?
At the top of the dashboard on metrics.o.o in the "Explanation" in bold I described the reason for the hole: The logs are entirely invalid, due to a config mistake, from 2017-12-07 to 2018-03-08 and the custom log format, which includes the UUID from zypper, was not present from 2018-03-08 until 2018-05-29 which is the cause of the large "hole" in the graphs. -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

2018.06.22 19:39, Jimmy Berry rašė:
Nice to see statistics. But it seems that there is something wrong with colors for openSUSE 13.2 and openSUSE Leap 15.0 in metrics.opensuse.org-access-stacked.png According to image, since end of 2014 we have a lot of openSUSE Leap 15.0 users! ...and there were never openSUSE 13.2 users... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On Friday, June 22, 2018 12:54:39 PM CDT opensuse.lietuviu.kalba wrote:
The automatically assigned colors may be hard to differentiate with this many series. If you isolate the ones you want (click and then ctrl+click additional ones from legend) the data looks correct. I see 13.2 users and 15.0 only starting recently. You can also set the colors by clicking on the color in the legend. -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

2018.06.22 21:49, Jimmy Berry rašė:
I talk about static PNG image http://release-tools.opensuse.org/image/metrics.opensuse.org-access-stacked.... . I even opened GIMP and used tool to select lines by color in legend. I think there is some error while assigning color to data or error in tool. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On Friday, June 22, 2018 2:20:51 PM CDT opensuse.lietuviu.kalba wrote:
If one hovers over the line in live tool on metrics.o.o 13.2 [1] is highlighted, but 15.0 line is still rendered on top based on order and since so small is blended together. Obviously the area is still the color of 13.2, but keep in mind it is also blended with background due to transparency. The transparency could be disabled on stacked graph since no need to see other series in same space. [1] https://i.imgur.com/XnZow0j.png -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

2018.06.22 22:56, Jimmy Berry rašė:
OK. Perhaps 13.2 hides (in favor of 15.0) because of sorting (13.2 < 15.0 < 42.1 in math, but in openSUSE history 13.2 < 42 < 15). I hope we can set NaN values instead of zeros – or use other trick to not display lines for unreleased versions. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On Friday, June 22, 2018 3:07:04 PM CDT opensuse.lietuviu.kalba wrote:
It's possible. This data does not use influxdb aggregation, but it is done outside via tool. As such normally influxdb provides fill options like null for intervals where no data is present. I'll have a look on Monday. -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On 2018-06-22 19:54, opensuse.lietuviu.kalba wrote:
Looking at <https://metrics.opensuse.org/d/osrt_access/osrt-access?orgId=1>, unique by product (stacked) I find curious that the stable release of the time graph has the same shape but a bit under the graph for tumbleweed. I would expect the graphs to be different. -- Cheers / Saludos, Carlos E. R. (from 42.3 x86_64 "Malachite" at Telcontar)

On Friday, June 22, 2018 4:06:28 PM CDT Carlos E. R. wrote:
"stacked" as in each product series is stack on top of each other. The graph directly below is individual or you can click on a single product in legend in either to see same graph. -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On Saturday, June 23, 2018 2:29:16 AM CDT Carlos E. R. wrote:
You mean they add. So the graphs get the same shape... I see.
Yes, which is commonly referred to as stacked since they are stacked on top of each other. The tool, in this case Grafana, also refers to it as stacking. The top line of the stacked graph is thus the sum of unique systems across all products. -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

Interesting, that according to metrics.opensuse.org/ there is decrease of unique running openSUSE systems in last 7 years (from 2011), but according to Aplanas' presentation in 2016 <https://youtu.be/40emCNzs6so?t=712> there was tendency of increase (at least between 2009 and 2016). Both presents statistics based on UUID, how to explain different directions of trends? Regards -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On Friday, June 22, 2018 12:29:42 PM CDT Jan Engelhardt wrote:
So what caused the hole in 2018-01?
At the top of the dashboard on metrics.o.o in the "Explanation" in bold I described the reason for the hole: The logs are entirely invalid, due to a config mistake, from 2017-12-07 to 2018-03-08 and the custom log format, which includes the UUID from zypper, was not present from 2018-03-08 until 2018-05-29 which is the cause of the large "hole" in the graphs. -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

2018.06.22 19:39, Jimmy Berry rašė:
Nice to see statistics. But it seems that there is something wrong with colors for openSUSE 13.2 and openSUSE Leap 15.0 in metrics.opensuse.org-access-stacked.png According to image, since end of 2014 we have a lot of openSUSE Leap 15.0 users! ...and there were never openSUSE 13.2 users... -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On Friday, June 22, 2018 12:54:39 PM CDT opensuse.lietuviu.kalba wrote:
The automatically assigned colors may be hard to differentiate with this many series. If you isolate the ones you want (click and then ctrl+click additional ones from legend) the data looks correct. I see 13.2 users and 15.0 only starting recently. You can also set the colors by clicking on the color in the legend. -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

2018.06.22 21:49, Jimmy Berry rašė:
I talk about static PNG image http://release-tools.opensuse.org/image/metrics.opensuse.org-access-stacked.... . I even opened GIMP and used tool to select lines by color in legend. I think there is some error while assigning color to data or error in tool. -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org

On Friday, June 22, 2018 2:20:51 PM CDT opensuse.lietuviu.kalba wrote:
If one hovers over the line in live tool on metrics.o.o 13.2 [1] is highlighted, but 15.0 line is still rendered on top based on order and since so small is blended together. Obviously the area is still the color of 13.2, but keep in mind it is also blended with background due to transparency. The transparency could be disabled on stacked graph since no need to see other series in same space. [1] https://i.imgur.com/XnZow0j.png -- Jimmy -- To unsubscribe, e-mail: opensuse-factory+unsubscribe@opensuse.org To contact the owner, e-mail: opensuse-factory+owner@opensuse.org
participants (4)
-
Carlos E. R.
-
Jan Engelhardt
-
Jimmy Berry
-
opensuse.lietuviu.kalba