ArchWiki:Bots

From ArchWiki

Bots are an important tool of the maintenance team, which allows to easily perform repetitive tasks ranging from daily routine to complicated one-shot updates. Bot edits constitute over 8% of all contributions to the wiki — all of these edits would have been very tedious to do manually.

There are currently two active bot accounts:

This article or section needs expansion.

Reason: Maybe add something like a policy regarding new bots? (Discuss in ArchWiki talk:Bots)

Software

Bots use the MediaWiki API to communicate with the wiki server. There are many bots developed by the Wikimedia Foundation using this API, but they are usually not general enough to work on other wikis or even conflict with our style guidelines. Therefore we have been working on our own ArchWiki-specific bot tools, which have the same flaws when evaluated by external parties.

Wiki Monkey

The Wiki Monkey project's aim is to facilitate efficient editing by directly enhancing wiki pages in the web browser. It runs as a user script, allowing to execute repetitive tasks semi-automatically in article editor pages, or fully automatically from article-list pages such as Categories or WhatLinksHere. Wiki Monkey also adds some helpers such as filters for Special:RecentChanges and Special:NewPages. See the documentation for details.

Edits made by Wiki Monkey's bot interface are marked with the wiki-monkey tag, which can be filtered in the list of recent changes.

wiki-scripts

The wiki-scripts project contains many Python scripts built around a small library-like abstraction for the MediaWiki API. The purpose of the included scripts ranges from collecting information without editing the wiki to performing complex automated edits, which are described in #Tasks.

Edits made by wiki-scripts, either automatically or interactively, are marked with the wiki-scripts tag, which can be filtered in the list of recent changes.

Tasks

This section describes the tasks that are repeatedly performed by ArchWiki bots. It serves as an overview and documentation of the features of operated bot scripts. Note that bot edits are by default hidden from Special:RecentChanges, since their inclusion would make it far more difficult to follow and participate with regular contributions.

Double redirects

Fixing double redirects is the oldest automated task. It can be done for example with a Python script or Wiki Monkey's dedicated plugin.

Table of contents

The Table of contents page and its "translations" are maintained by using the toc.py script. The script can be run daily, its execution takes couple of seconds.

The page needs to be initialized manually with the following entry-point table:

{| id="wiki-scripts-toc-table"
|}

The content of this table is replaced with an updated version generated by the script, the rest of the page is left intact. The script recognizes the following optional attributes for configuration:

  • data-toc-languages specifies the languages to be shown on the page. It is a comma-separated list of language tags, at most 2 can be specified. Defaults to the language of the current page, i.e. ru for Table of contents (Русский).
  • data-toc-alsoin specifies the translation of the "also in" phrase. The format is tag1:text, tag2:text, ....

For example (from Table of contents (Русский)):

{| id="wiki-scripts-toc-table" data-toc-languages="ru,en" data-toc-alsoin="ru:Также в"
...
|}

Users can also translate the category names in the table by editing the links on the wiki page and the script will preserve them on updates.

Statistics

The ArchWiki:Statistics page is maintained by the statistics.py script. Currently only the User statistics section is autogenerated, the rest is updated manually. The update takes about 15 seconds and should be run daily.

The script works by obtaining metadata of all revisions and user accounts from the API and caching it locally for better performance. The edit counts are determined by manually counting user contributions without relying on MediaWiki counters.

Note: Some improvements are discussed in ArchWiki talk:Statistics#Improvements.

Package templates

The update-package-templates.py script parses the content of all pages and updates the Pkg, Grp and AUR templates. The package name is actually not changed by the script, but e.g. for packages that have been recently moved from AUR to the official repositories, the link is updated from Template:AUR to Template:Pkg. Invalid package links are marked with Template:Broken package link along with a sometimes useful hint showing the package status.

The script uses localized versions of Template:Broken package link when they exist and falls back to the English versions. Other than that there is no server-side configuration.

After each run, but at most once per 7 days, the script creates a detailed report of broken links at User:Lahwaacz.bot/Reports/archpkgs.

Interlanguage

The interlanguage.py script does the following:

  • Checks if the language of categories assigned to each page matches the language of the page itself.
  • Creates missing localized categories, mirroring the English category tree.
  • Updates the interlanguage links on all content pages using this algorithm.

The execution time depends on the amount of updates, it is usually less than a minute and about 30 seconds when there are no updates.

Page language

The update-page-language.py script determines the language of each page based on the title (see Help:i18n#Page titles) and sets the language code in the wiki's database. This is possible with the MediaWiki's $wgPageLanguageUseDB setting [1].

Link checking

  • extlink-checker.py tries to check the status of external links and those that are definitely broken are marked with Template:Dead link. Many links still pass unchecked by this tool, mainly because of sites requiring JavaScript and servers returning an inconclusive HTTP status code.
  • url-replace.py performs various replacements on external links, such as updating URLs from HTTP to HTTPS or replacing external links to wiki.archlinux.org with internal links.
  • link-checker.py performs various checks and replacements on internal links, links to manual pages and external links (using the same code as url-replace.py).
  • mark-archived-links.py marks internal links that lead to ArchWiki:Archive with Template:Archived page.