Counting LOC in yast and libyui Github organizations
Hi! Today the question "does anyone know how many LOC we maintain" arose in the #yast channel at the freenode.net IRC server. And although most of us know about `wc -l` (in fact, my teammate Stefan suggested running `wc -l **/*.{rb,cc,h}` in a full checkout), I quickly wondered if someone already built a tool to perform such task against all repos of a Github organization. And the answer is yes, Ben Balter[1] did it 5 years ago and called it count-org-loc[2] which under the hood uses cloc[3] (a tool written in Perl for counting blank, comment, and physical lines of code in many programming languages) in a similar way as it is explained in this[4] StackOverflow answer. Although I was aware that the results do not answer the original question, since not every repository at Github yast organization is maintained by the YaST team core, I ran it against both, yast and libyui Github organizations. You can see the results I got at [5] and [6]. Just if you are as curious as I am :P BTW, I installed cloc via npm[7] because the available 1-click install[8] for Tumbleweed didn't work for me. Hope you can forgive me :) :P [1] https://github.com/benbalter [2] https://github.com/benbalter/count-org-loc [3] https://github.com/AlDanial/cloc#Docker- [4] https://stackoverflow.com/a/29012789 [5] https://gist.github.com/dgdavid/086010eb369d895f121e76f0dc6154f4 [6] https://gist.github.com/dgdavid/90022844a5891a9aa893e961d1549b55 [7] https://github.com/kentcdodds/cloc#readme [8] https://software.opensuse.org/download/package?package=cloc&project=Documentation%3ATools -- David Díaz YaST Team at SUSE LINUX GmbH IRC: dgdavid
On Tue, 2021-03-16 at 17:47 +0000, David Díaz wrote:
[...] I ran it against both, yast and libyui Github organizations. [...]
Almost the same thing, but using scc[1] with some flags at [2] and [3]. Sadly, such tool does not have the "--by-dir" flag yet[4]. On a final note, I have learned that the simple "wc -l" command becomes a completely different thing when talking about counting LOC ;) [1] https://github.com/boyter/scc - Sloc, Cloc and Code [2] https://gist.github.com/dgdavid/3a92c3ac4829441858c6327b568268cc [3] https://gist.github.com/dgdavid/d9e743ad7d306ad8dbdce10450230416 [4] https://github.com/boyter/scc/issues/225 -- David Díaz YaST Team at SUSE LINUX GmbH IRC: dgdavid
On 2021-03-16 23:01, David Díaz wrote:
On a final note, I have learned that the simple "wc -l" command becomes a completely different thing when talking about counting LOC ;)
It's a matter of definition, of course, what exactly a "line of code" actually is. Like all metrics, it needs to be used wisely, and at the same time we need to keep in mind that it's just a number that by itself doesn't tell anybody very much. During my career (spanning when I began programming in the early 1980s to studying computer science 1986-1991 to starting in my first paid job in 1991 to today) I have seen many people producing tons of lines of code that can hardly be compared. I have seen copy&paste programmers who produce enormous numbers of LOC of questionable quality that others would have condensed into a fraction of that with smart use of reusable, parameterized functions (and later objects). I have seen people programming highly optimized hand-written Assembler for use in mobile phones on the GSM layer where a hard timer interrupt after n CPU cycles makes it an absolute requirement to be finished with your timer loop within that time, or the system will crash; so those people were sitting for days on end to save a line of (Assembler) code here and a line there. Their achievement was clearly not boasting lots of LOC, but minimizing it. I have seen people writing tricky, short code that was not self-explanatory at all where some lines more would have clarified a lot. I myself tend to add a lot of comments explaining things in the code and, more importantly, in function or class headers. And I use a lot of empty lines to make clear separations between functionally related and unrelated things. And I keep seeing that legalese garbage that some people claim corporate lawyers require us to add to each and every source file (as comment lines, but that's comments that are not useful to anybody in the world, it's just useless gibberish), at the same time not a single word at the file header WTF that code is all about. How does all that stack up with the various definitions of LOC? Does the definition really make sense? So, any measure of LOC is completely arbitrary and really cannot be compared. It's just a pointless number. It may be an impressive number, but it's still just a number. Don't get over-excited by such numbers. Add to this the fact that YaST is a project with a 20+ years history; we started back in late 1999. During that time, lots of code have been added, edited, removed. You always only see the last state; a snapshot of what happened over time. Just adding up the LOC does not tell the whole story. In fact, it's downright deceiving. As all of you surely know, sometimes you need to do many hours (or even days) of research to find out what a particular block of code is all about; what it was originally meant for, what of that is still relevant today, what is obsolete, and what it means to change it in any way. And then you have a patch of 5 lines added and 5 lines removed to prove for what you did over 3 days, and the overall LOC count is the same as before. That doesn't sound impressive at all. In that aspect, more often than not we are in the role of archaeologists and/or restaurators; you can't simply compare what we do with some newbie fresh from university writing some Ruby-on-Rails application. It would be comparing a house painter whitewashing a wall with a restaurator restoring a medieval fresco on a church ceiling to its old glory. Sure, we could simply whitewash it, but that would probably not make anybody happy. ;-) Just my 2 Cents^W^W Ok, that was more like 10.- EUR ;-) Kind regards -- Stefan Hundhammer <shundhammer@suse.de> YaST Developer SUSE Linux GmbH GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
On Wed, 2021-03-17 at 11:57 +0100, Stefan Hundhammer wrote:
On 2021-03-16 23:01, David Díaz wrote:
On a final note, I have learned that the simple "wc -l" command becomes a completely different thing when talking about counting LOC ;)
It's a matter of definition, of course, what exactly a "line of code" actually is.
Like all metrics, it needs to be used wisely, and at the same time we need to keep in mind that it's just a number that by itself doesn't tell anybody very much.
Agree
During my career (spanning when I began programming in the early 1980s to studying computer science 1986-1991 to starting in my first paid job in 1991 to today) I have seen many people producing tons of lines of code that can hardly be compared.
I have seen copy&paste programmers who produce enormous numbers of LOC of questionable quality that others would have condensed into a fraction of that with smart use of reusable, parameterized functions (and later objects).
I have seen people programming highly optimized hand-written Assembler for use in mobile phones on the GSM layer where a hard timer interrupt after n CPU cycles makes it an absolute requirement to be finished with your timer loop within that time, or the system will crash; so those people were sitting for days on end to save a line of (Assembler) code here and a line there. Their achievement was clearly not boasting lots of LOC, but minimizing it.
I have seen people writing tricky, short code that was not self-explanatory at all where some lines more would have clarified a lot.
I myself tend to add a lot of comments explaining things in the code and, more importantly, in function or class headers. And I use a lot of empty lines to make clear separations between functionally related and unrelated things.
And I keep seeing that legalese garbage that some people claim corporate lawyers require us to add to each and every source file (as comment lines, but that's comments that are not useful to anybody in the world, it's just useless gibberish), at the same time not a single word at the file header WTF that code is all about.
How does all that stack up with the various definitions of LOC? Does the definition really make sense?
So, any measure of LOC is completely arbitrary and really cannot be compared. It's just a pointless number. It may be an impressive number, but it's still just a number. Don't get over-excited by such numbers.
I don't ;)
Add to this the fact that YaST is a project with a 20+ years history; we started back in late 1999. During that time, lots of code have been added, edited, removed. You always only see the last state; a snapshot of what happened over time. Just adding up the LOC does not tell the whole story.
You're right.
In fact, it's downright deceiving.
As all of you surely know, sometimes you need to do many hours (or even days) of research to find out what a particular block of code is all about; what it was originally meant for, what of that is still relevant today, what is obsolete, and what it means to change it in any way. And then you have a patch of 5 lines added and 5 lines removed to prove for what you did over 3 days, and the overall LOC count is the same as before.
That's true.
That doesn't sound impressive at all.
In that aspect, more often than not we are in the role of archaeologists and/or restaurators; you can't simply compare what we do with some newbie fresh from university writing some Ruby-on-Rails application.
It would be comparing a house painter whitewashing a wall with a restaurator restoring a medieval fresco on a church ceiling to its old glory. Sure, we could simply whitewash it, but that would probably not make anybody happy. ;-)
Just my 2 Cents^W^W Ok, that was more like 10.- EUR ;-)
Thanks!
Kind regards
-- -- David Díaz YaST Team at SUSE LINUX GmbH IRC: dgdavid
On 3/17/21 10:57 AM, Stefan Hundhammer wrote:
It would be comparing a house painter whitewashing a wall with a restaurator restoring a medieval fresco on a church ceiling to its old glory. Sure, we could simply whitewash it, but that would probably not make anybody happy. ;-)
I love this comparison xD I agree, in general. LOCs does not say too much, but it says something. For example, it is amazing the amount of code we are writing in yast-storage-ng. Although I bet that the biggest part correspond to unit tests. CU Iván -- José Iván López González YaST Team at SUSE LINUX GmbH IRC: jilopez
On 2021-03-17 12:55, José Iván López González wrote:
For example, it is amazing the amount of code we are writing in yast-storage-ng. Although I bet that the biggest part correspond to unit tests.
Yes, but that's also code. It's not the code that we ship to end users, but it's code that verifies what we are doing against regressions. It takes work and time to write and maintain it, and it's not just some luxury; it is an integral part of the project. Kind regards -- Stefan Hundhammer <shundhammer@suse.de> YaST Developer SUSE Linux GmbH GF: Felix Imendörffer, Jane Smithard, Graham Norton; HRB 21284 (AG Nürnberg)
Hello, On 2021-03-17 11:57, Stefan Hundhammer wrote (excerpt):
I myself tend to add a lot of comments explaining things in the code
same for me, cf. "Make yourself understood" in https://github.com/rear/rear/wiki/Coding-Style I am totally selfish with that because I experienced too often that some time later I do no longer understand the mess that I had created so absolutely geniously in former times ;-) BUT I experienced others who prefer to have explanatory stuff outside of the actual code in some external documentation (because longer explanations distract from the actual code) but there the explanations basically always RIP "rot in peace" because too often even comments directly in place in the code were forgotten to be updated when the actual code was changed (according to my experience), AND I experienced others who seem to think source code is primarily there to tell the dumb machine what to do but I think in particular free software source code is primarily there to tell other humans what the code is intended to do to enable others to fix and enhance the code properly as needed, FINALLY I experienced others who seem to think nobody else exists ;-) Kind Regards Johannes Meixner -- SUSE Software Solutions Germany GmbH Maxfeldstr. 5 - 90409 Nuernberg - Germany (HRB 36809, AG Nuernberg) GF: Felix Imendoerffer
participants (4)
-
David Díaz
-
Johannes Meixner
-
José Iván López González
-
Stefan Hundhammer