ArchWiki talk:Statistics

From ArchWiki
Jump to: navigation, search

Improvements

MediaWiki 1.23 changed the way Special:ActiveUsers is updated, making in "more efficient" for wikis like Wikipedia, but practically unusable for smaller wikis like ours. See the forum thread and ML thread.

This has encouraged me to finally implement an idea that I'd had for a long time: publishing more detailed statistics about our contributors in order to reward them with a form of official acknowledgement of their efforts, and hopefully to encourage more people to participate regularly to this project.

I've written a script for wiki-scripts that can update ArchWiki:Statistics#User statistics automatically: here you can provide feedback and especially propose improvements, like more statistics that you'd like to see, or changes to the way the information is presented... we have complete control now, so the sky is the limit ;)

Some ideas I thought of are:

  • split the table in two, one only with the recent edits and the other only with the total edit counts for each user
  • show the average of edits per day as was originally done in list-allusers.py, now broken by the aforementioned change in MediaWiki
  • split the page in subpages, and maybe only show smaller "preview" tables here in the main one
  • show the bot accounts next to their associated normal users, instead of using the same column
  • also ArchWiki:Statistics#Global statistics must be updated automatically
  • language-based statistics
  • display some charts/graphs?
  • only show users with at least 2 (or 3) edits in the time interval
  • reward edits to talk pages, by e.g. showing that specific number in a dedicated field
  • ignore edits to the User namespace
  • Add a legend explaining the meaning of the fields (especially "Recent", which may be replaced with another word)

Kynikos (talk) 14:33, 4 February 2015 (UTC)

Great to see an alternative to the broken upstream page - it might get fixed some day, but considering their release cycle, it will take probably several years...
-- Lahwaacz (talk) 16:28, 4 February 2015 (UTC)
Considering the creativity they put into developing new algorithms like the new way to refresh that page, I'm really curious to see what the new "fix" could be... (limiting the refresh of the page to not less than e.g. 1 hour or 1 day was too obvious and boring I guess, including exposing a configuration parameter to set the minimum interval) — Kynikos (talk) 10:59, 5 February 2015 (UTC)
Perhaps it will just take me a year to file a bug report that is persuasive enough... :P Lahwaacz (talk) 08:03, 8 February 2015 (UTC)
Since I started writing statistics-allrevisions.py I was wondering how to properly handle deleted revisions. I think that deleted revisions are not counted amongst the total edits, which also means that deleting an obsolete page will decrease the total edit count of its editors. We could read the deleted revisions, but this would also include possible vandalism that had better be left out. (Also, reading deleted revisions currently requires admin/maintainer rights and I don't know how hard it would be to access it through the API.)
-- Lahwaacz (talk) 16:28, 4 February 2015 (UTC)
I think we should finally implement the redirect solution in ArchWiki:Requests#Should_we_remove_or_archive_obsolete_articles.3F, then restore all the deleted revision (carefully with the help of a bot script), and reserve deleting pages only for cases like spam, illegal content etc. (i.e. ~never). — Kynikos (talk) 10:59, 5 February 2015 (UTC)
As for displaying charts, the files would be probably uploaded on ArchWiki itself and reuploading is currently buggy.
-- Lahwaacz (talk) 16:28, 4 February 2015 (UTC)
Yes, this should be kept on hold, even though I haven't tested uploading files after the recent upgrade, have you? — Kynikos (talk) 10:59, 5 February 2015 (UTC)
No, I haven't, but I've noticed the other Caching Bugs™ on 1.24 so the caching configuration is probably still the same. -- Lahwaacz (talk) 08:06, 8 February 2015 (UTC)
Other ideas sound nice and useful, I will only note that splitting the table and page should be probably done as the last step because there may be other demands when more information are added. Unfortunately wiki tables can only by sortable and not filterable...
I'll also add some ideas:
  • the "User" column could contain a link to the user's contributions
  • include last edit date (at least for users with 0 recent edits)?
-- Lahwaacz (talk) 16:28, 4 February 2015 (UTC)
Two more:
  • show a (total_user_edits * 100 / total_wiki_edits)% for each user, or for those with more than N edits
  • parse the old revisions of the page for long-term statistics (e.g. average edits per 30 days in the last 360 days) (since the users in the table are variable, the data wouldn't be available for each of them)
Kynikos (talk) 10:59, 5 February 2015 (UTC) (last edit: 02:26, 8 February 2015 (UTC))
Regarding your last idea, there is a way to avoid parsing the table and instead be able to provide the information for all users in the table. The answer is caching (which might be a dirty word on this wiki by now :P)
For example the statistics-allrevisions.py script queries info about all revisions and stores the data locally in a JSON file, which is reused next time. The first query is pretty slow, but the next one is super fast. The file is about 50MB large for all revisions, which is not that much. When storing only revisions from the last year and stripping any unnecessary information such as edit summaries, it would be much smaller.
-- Lahwaacz (talk) 08:29, 8 February 2015 (UTC)
One suggestion for the "Registration" column: It should be abbreviated to "year-month". The day and time don't add much but clutter the view. --Indigo (talk) 09:06, 8 February 2015 (UTC)
Expanding on the user-to-all percentage idea, the ArchWiki admins, maintainers and bots together have made 117781 of the total 384325 edits, which is slightly above 30%. Also note that deleted revisions are excluded from the 117781 count, but included in the total 384325, so the precise percentage would be even higher. -- Lahwaacz (talk) 21:28, 7 July 2015 (UTC)
So, are you against showing it? It was only a random idea, not that I mind too much :) — Kynikos (talk) 11:35, 8 July 2015 (UTC)
Likewise, I think we can decide later... -- Lahwaacz (talk) 17:41, 8 July 2015 (UTC)
There is an interesting statistics that could be included in the table: We could take inspiration from GitHub and implement the counting of current and longest streaks. This would have to rely on caching to maintain performance at reasonable levels. -- Lahwaacz (talk) 11:56, 5 April 2015 (UTC)
Edit: possible draft implementation: [1] -- Lahwaacz (talk) 21:22, 5 April 2015 (UTC)
Cool, just tested it :) — Kynikos (talk) 04:24, 6 April 2015 (UTC)
The current/longest streak columns are now implemented. I've also reordered the other columns to keep the statistics grouped at the right, but with the latest changes to the script, it should be trivial to reorder them differently. -- Lahwaacz (talk) 21:15, 7 July 2015 (UTC)
Great, the script works perfectly! I don't mind the column order, but as we add more fields, it could make sense to use shorter titles and implement the "legend" idea above. — Kynikos (talk) 03:47, 9 July 2015 (UTC)
Streak details are now provided as tooltips, should this be noted in the legend or just let the readers find this by themselves? -- Lahwaacz (talk) 22:20, 10 July 2015 (UTC)
I'd be for documenting it, and if you like the legend idea I think that's indeed the most natural place where to describe it. — Kynikos (talk) 09:10, 11 July 2015 (UTC)
I've pushed the first version of the legend, of course pull requests with improvements are welcome :) Lahwaacz (talk) 21:42, 15 July 2015 (UTC)
I'm thinking about including deleted revisions to the cache database, which would give more accurate results because of deleted revisions are not necessarily unwanted revisions. It would also give us a chance to recalculate some global statistics, e.g. taking proper average of edits per day instead of taking the last interval. -- Lahwaacz (talk) 22:00, 15 July 2015 (UTC)
Until we find the time to finalize ArchWiki:Requests#Should_we_remove_or_archive_obsolete_articles.3F, as I said above, I agree that deleted revisions should be taken into account. — Kynikos (talk) 07:55, 18 July 2015 (UTC)
Very well, deleted revisions are now included for the calculations. The existing cache will need to be reinitialized, the script should point it out. -- Lahwaacz (talk) 08:43, 24 July 2015 (UTC)
Another interesting histogram could be the evolution of new account registrations per month. — Kynikos (talk) 12:19, 26 July 2015 (UTC)

Preferences vs. Statistics

For some reason I have more edits in ArchWiki:Statistics than in my user preferences... [2] Is this one of those infamous caching bugs? -- Alad (talk) 12:44, 23 October 2015 (UTC)

We're counting also edits on deleted pages, (AFAIK) MediaWiki doesn't. -- Lahwaacz (talk) 14:32, 23 October 2015 (UTC)

Number of active users

The number of active users reported by Special:Statistics has been "440" every time I have visited that page in the last 4 months. While this is certainly possible, it also seems a bit unlikely, has anyone else seen it recently display a different number? -- Kynikos (talk) 04:26, 1 July 2017 (UTC)

Maybe it somehow correlates with the problem that notification emails are sent only hourly? As far as I remember, the "new" implementation of Special:Statistics is cached and something is there to determine how big the site is and how often (relative to the amount of edits and background jobs?) the page should be renewed. By the way, since it's holidays now, maybe it's time to ping Florian again. -- Lahwaacz (talk) 08:37, 8 July 2017 (UTC)