ArchWiki talk:Statistics

From ArchWiki
Latest comment: 21 December 2023 by Nl6720 in topic Blocked users

Improvements

MediaWiki 1.23 changed the way Special:ActiveUsers is updated, making in "more efficient" for wikis like Wikipedia, but practically unusable for smaller wikis like ours. See the forum thread and ML thread.

This has encouraged me to finally implement an idea that I'd had for a long time: publishing more detailed statistics about our contributors in order to reward them with a form of official acknowledgement of their efforts, and hopefully to encourage more people to participate regularly to this project.

I've written a script for wiki-scripts that can update ArchWiki:Statistics#User statistics automatically: here you can provide feedback and especially propose improvements, like more statistics that you'd like to see, or changes to the way the information is presented... we have complete control now, so the sky is the limit ;)

Some ideas I thought of are:

  • split the table in two, one only with the recent edits and the other only with the total edit counts for each user
  • show the average of edits per day as was originally done in list-allusers.py, now broken by the aforementioned change in MediaWiki
  • split the page in subpages, and maybe only show smaller "preview" tables here in the main one
  • show the bot accounts next to their associated normal users, instead of using the same column
  • also ArchWiki:Statistics#Global statistics must be updated automatically
  • language-based statistics
  • display some charts/graphs?
  • only show users with at least 2 (or 3) edits in the time interval
  • reward edits to talk pages, by e.g. showing that specific number in a dedicated field
  • ignore edits to the User namespace
  • Add a legend explaining the meaning of the fields (especially "Recent", which may be replaced with another word)

Kynikos (talk) 14:33, 4 February 2015 (UTC)Reply

Great to see an alternative to the broken upstream page - it might get fixed some day, but considering their release cycle, it will take probably several years...
-- Lahwaacz (talk) 16:28, 4 February 2015 (UTC)Reply
Considering the creativity they put into developing new algorithms like the new way to refresh that page, I'm really curious to see what the new "fix" could be... (limiting the refresh of the page to not less than e.g. 1 hour or 1 day was too obvious and boring I guess, including exposing a configuration parameter to set the minimum interval) — Kynikos (talk) 10:59, 5 February 2015 (UTC)Reply
Perhaps it will just take me a year to file a bug report that is persuasive enough... :P Lahwaacz (talk) 08:03, 8 February 2015 (UTC)Reply
Since I started writing statistics-allrevisions.py I was wondering how to properly handle deleted revisions. I think that deleted revisions are not counted amongst the total edits, which also means that deleting an obsolete page will decrease the total edit count of its editors. We could read the deleted revisions, but this would also include possible vandalism that had better be left out. (Also, reading deleted revisions currently requires admin/maintainer rights and I don't know how hard it would be to access it through the API.)
-- Lahwaacz (talk) 16:28, 4 February 2015 (UTC)Reply
I think we should finally implement the redirect solution in ArchWiki:Requests#Should_we_remove_or_archive_obsolete_articles.3F, then restore all the deleted revision (carefully with the help of a bot script), and reserve deleting pages only for cases like spam, illegal content etc. (i.e. ~never). — Kynikos (talk) 10:59, 5 February 2015 (UTC)Reply
As for displaying charts, the files would be probably uploaded on ArchWiki itself and reuploading is currently buggy.
-- Lahwaacz (talk) 16:28, 4 February 2015 (UTC)Reply
Yes, this should be kept on hold, even though I haven't tested uploading files after the recent upgrade, have you? — Kynikos (talk) 10:59, 5 February 2015 (UTC)Reply
No, I haven't, but I've noticed the other Caching Bugs™ on 1.24 so the caching configuration is probably still the same. -- Lahwaacz (talk) 08:06, 8 February 2015 (UTC)Reply
Other ideas sound nice and useful, I will only note that splitting the table and page should be probably done as the last step because there may be other demands when more information are added. Unfortunately wiki tables can only by sortable and not filterable...
I'll also add some ideas:
  • the "User" column could contain a link to the user's contributions
  • include last edit date (at least for users with 0 recent edits)?
-- Lahwaacz (talk) 16:28, 4 February 2015 (UTC)Reply
Two more:
  • show a (total_user_edits * 100 / total_wiki_edits)% for each user, or for those with more than N edits
  • parse the old revisions of the page for long-term statistics (e.g. average edits per 30 days in the last 360 days) (since the users in the table are variable, the data wouldn't be available for each of them)
Kynikos (talk) 10:59, 5 February 2015 (UTC) (last edit: 02:26, 8 February 2015 (UTC))Reply
Regarding your last idea, there is a way to avoid parsing the table and instead be able to provide the information for all users in the table. The answer is caching (which might be a dirty word on this wiki by now :P)
For example the statistics-allrevisions.py script queries info about all revisions and stores the data locally in a JSON file, which is reused next time. The first query is pretty slow, but the next one is super fast. The file is about 50MB large for all revisions, which is not that much. When storing only revisions from the last year and stripping any unnecessary information such as edit summaries, it would be much smaller.
-- Lahwaacz (talk) 08:29, 8 February 2015 (UTC)Reply
One suggestion for the "Registration" column: It should be abbreviated to "year-month". The day and time don't add much but clutter the view. --Indigo (talk) 09:06, 8 February 2015 (UTC)Reply
Expanding on the user-to-all percentage idea, the ArchWiki admins, maintainers and bots together have made 117781 of the total 384325 edits, which is slightly above 30%. Also note that deleted revisions are excluded from the 117781 count, but included in the total 384325, so the precise percentage would be even higher. -- Lahwaacz (talk) 21:28, 7 July 2015 (UTC)Reply
So, are you against showing it? It was only a random idea, not that I mind too much :) — Kynikos (talk) 11:35, 8 July 2015 (UTC)Reply
Likewise, I think we can decide later... -- Lahwaacz (talk) 17:41, 8 July 2015 (UTC)Reply
There is an interesting statistics that could be included in the table: We could take inspiration from GitHub and implement the counting of current and longest streaks. This would have to rely on caching to maintain performance at reasonable levels. -- Lahwaacz (talk) 11:56, 5 April 2015 (UTC)Reply
Edit: possible draft implementation: [1] -- Lahwaacz (talk) 21:22, 5 April 2015 (UTC)Reply
Cool, just tested it :) — Kynikos (talk) 04:24, 6 April 2015 (UTC)Reply
The current/longest streak columns are now implemented. I've also reordered the other columns to keep the statistics grouped at the right, but with the latest changes to the script, it should be trivial to reorder them differently. -- Lahwaacz (talk) 21:15, 7 July 2015 (UTC)Reply
Great, the script works perfectly! I don't mind the column order, but as we add more fields, it could make sense to use shorter titles and implement the "legend" idea above. — Kynikos (talk) 03:47, 9 July 2015 (UTC)Reply
Streak details are now provided as tooltips, should this be noted in the legend or just let the readers find this by themselves? -- Lahwaacz (talk) 22:20, 10 July 2015 (UTC)Reply
I'd be for documenting it, and if you like the legend idea I think that's indeed the most natural place where to describe it. — Kynikos (talk) 09:10, 11 July 2015 (UTC)Reply
I've pushed the first version of the legend, of course pull requests with improvements are welcome :) Lahwaacz (talk) 21:42, 15 July 2015 (UTC)Reply
I'm thinking about including deleted revisions to the cache database, which would give more accurate results because of deleted revisions are not necessarily unwanted revisions. It would also give us a chance to recalculate some global statistics, e.g. taking proper average of edits per day instead of taking the last interval. -- Lahwaacz (talk) 22:00, 15 July 2015 (UTC)Reply
Until we find the time to finalize ArchWiki:Requests#Should_we_remove_or_archive_obsolete_articles.3F, as I said above, I agree that deleted revisions should be taken into account. — Kynikos (talk) 07:55, 18 July 2015 (UTC)Reply
Very well, deleted revisions are now included for the calculations. The existing cache will need to be reinitialized, the script should point it out. -- Lahwaacz (talk) 08:43, 24 July 2015 (UTC)Reply
Another interesting histogram could be the evolution of new account registrations per month. — Kynikos (talk) 12:19, 26 July 2015 (UTC)Reply

Preferences vs. Statistics

For some reason I have more edits in ArchWiki:Statistics than in my user preferences... [2] Is this one of those infamous caching bugs? -- Alad (talk) 12:44, 23 October 2015 (UTC)Reply

We're counting also edits on deleted pages, (AFAIK) MediaWiki doesn't. -- Lahwaacz (talk) 14:32, 23 October 2015 (UTC)Reply

Number of active users

The number of active users reported by Special:Statistics has been "440" every time I have visited that page in the last 4 months. While this is certainly possible, it also seems a bit unlikely, has anyone else seen it recently display a different number? -- Kynikos (talk) 04:26, 1 July 2017 (UTC)Reply

Maybe it somehow correlates with the problem that notification emails are sent only hourly? As far as I remember, the "new" implementation of Special:Statistics is cached and something is there to determine how big the site is and how often (relative to the amount of edits and background jobs?) the page should be renewed. By the way, since it's holidays now, maybe it's time to ping Florian again. -- Lahwaacz (talk) 08:37, 8 July 2017 (UTC)Reply
For the time being I'm getting the value from ArchWiki:Statistics#User statistics (wiki-script's database); the format is Special:Statistics value/ArchWiki:Statistics value, in practice 440/ArchWiki:Statistics value. -- Kynikos (talk) 11:06, 1 September 2017 (UTC)Reply
A potential solution to this issue would be to generate the current active number of users based on the number of entries in the user table on the client side. Here's a quick javascript css selector to find the number of entries on that specific table: document.querySelectorAll("table.wikitable:nth-child(8) > tbody:nth-child(2) > tr").length. This would at least make the count stay in sync with the table. -- Aeros167 (talk) 02:32, 7 June 2019 (UTC)Reply
Thanks Aeros, but that's a workaround, not a solution to the problem which lies in our wiki setup. Perhaps I could remove the 440s from the list and only keep the links to this discussion to remind that the values come from another source than the Statistics page. -- Kynikos (talk) 14:40, 8 June 2019 (UTC)Reply
Is there a viable method of contributing to the wiki-script database, or any other back-ends which the wiki utilizes? So far, most of my contributions have revolved around improving individual pages or fixing dead links, but if possible I'd like to contribute in other ways as well. I have some experience working with relational databases and further experience with web development. I am an intern software developer currently pursuing a CS-related bachelor's degree, so I'd be more than happy to contribute. It'd be a great additional way for me to build more relevant experience while contributing to one of the best sources of modern Linux documentation. -- Aeros167 (talk) 20:28, 10 June 2019 (UTC)Reply
You could work on the MediaWiki backend itself, however its development is driven primarily by the needs of the Wikimedia Foundation and their release cycle is rather slow, so it's hard to pursue ArchWiki-specific improvements. I could give you links for some of our attempts if you'd like. Over the years, we have developed our own tools to make our lives easier, to implement missing features, or even work around some bugs (like the one discussed in this section). You can see the lists of issues for Wiki Monkey and wiki-scripts on Github to see if you'd like to contribute. -- Lahwaacz (talk) 21:00, 10 June 2019 (UTC)Reply
Of course any help with our bots is very welcome, however if you feel adventurous you could also try to reproduce the problem after setting up your local ArchWiki, find a resolution and submit a patch to MediaWiki ;) -- Kynikos (talk) 15:00, 11 June 2019 (UTC)Reply
Thanks for the info, I'll gladly look through the bots and see if there's anything I could potentially contribute, should have some time to dig through the source code this weekend. I do have a decent amount of experience with Python in particular, so I'll probably check out wiki-scripts first. It might also be interesting to setup a local copy of the ArchWiki to perform some testing on, might also be useful for testing potential bot script changes against it. -- Aeros167 (talk) 02:55, 15 June 2019 (UTC)Reply

Blocked users

I find it unfair that PolarianDev disappeared from the table. Despite the ban, he made many good edits to ArchWiki. andreymal (talk) 12:10, 15 October 2023 (UTC)Reply

[3] -- Alad (talk) 16:33, 16 October 2023 (UTC)Reply
@Lahwaacz, is it possible to include blocked users with temporary bans? -- nl6720 (talk) 06:23, 20 October 2023 (UTC)Reply
I created an issue: https://github.com/lahwaacz/wiki-scripts/issues/98 ❄️❄️ nl6720 (talk) 12:37, 21 December 2023 (UTC)Reply