Difference between revisions of "User talk:Lahwaacz"

From ArchWiki
Jump to: navigation, search
(rm this old insult)
(link checker bot: re, close)
 
(200 intermediate revisions by 29 users not shown)
Line 1: Line 1:
 +
== Regex for replacing = codes ==
  
 +
Hi, regarding [[User:Lahwaacz#User:Lahwaacz#Regex_for_replacing_.3D_codes]] do you intend to use a similar expression with the editor assistant or directly with the bot? In the latter case I think it would be pretty dangerous, for example it would break templates that already use a named parameter, e.g. {{ic|<nowiki>{{Template|parameter=value}}</nowiki>}} would be turned into {{ic|<nowiki>{{Template|1=parameter=value}}</nowiki>}}. -- [[User:Kynikos|Kynikos]] ([[User talk:Kynikos|talk]]) 16:57, 21 March 2014 (UTC)
 +
 +
:I used it [https://wiki.archlinux.org/index.php?title=Systemd-networkd&diff=prev&oldid=306182 only once] and don't have any specific plans, but I'm quite certain I will need to use it again sometimes... Thanks for the warning, I will be cautious. -- [[User:Lahwaacz|Lahwaacz]] ([[User talk:Lahwaacz|talk]]) 17:50, 21 March 2014 (UTC)
 +
 +
== PodCastXDL ==
 +
 +
About [https://wiki.archlinux.org/index.php?title=List_of_applications/Internet&diff=prev&oldid=323048] (and [https://wiki.archlinux.org/index.php?title=List_of_applications/Internet&diff=next&oldid=323048]) [[User:Levi0x0x]], who should have indeed provided an edit summary, appears to be the developer of the application and the maintainer of the PKGBUILD. I would keep his edit. -- [[User:Kynikos|Kynikos]] ([[User talk:Kynikos|talk]]) 00:45, 5 July 2014 (UTC)
 +
 +
:I know - I've seen also [https://wiki.archlinux.org/index.php?title=MPlayer&diff=next&oldid=322278 bash-player] removed, both from wiki and Github (it seems the repo has been recreated from scratch). PodCastXDL has always been available upstream. -- [[User:Lahwaacz|Lahwaacz]] ([[User talk:Lahwaacz|talk]]) 08:20, 5 July 2014 (UTC)
 +
 +
::Didn't he add it to the list one week ago? [https://wiki.archlinux.org/index.php?title=List_of_applications/Internet&diff=prev&oldid=322258] Maybe he's found some bug and doesn't want people to use it until he fixes it? Anyway I'm not that interested, we can as well see if/how Levi0x0x reacts. -- [[User:Kynikos|Kynikos]] ([[User talk:Kynikos|talk]]) 04:32, 6 July 2014 (UTC)
 +
 +
== bot AUR to Official Repository edit ==
 +
 +
A recent bot edit ([https://github.com/lahwaacz/wiki-scripts/blob/master/update-package-templates.py update Pkg/AUR templates]) by [[User:Lahwaacz.bot]] on the [[Gitolite]] page correctly changed the AUR template to Pkg but left the [[Arch User Repository]] link
 +
 +
I fixed this, but would it be possible to modify the bot to take this into consideration?
 +
 +
I can imagine that blanket changing AUR links to Official Repository links in any given page could be dangerous - but for common phrasing or possibly word distance it would seem to be relatively safe
 +
 +
Or is there some sort of post-run manual inspection that I am unaware of that handles this situation?
 +
 +
Specifically this [https://wiki.archlinux.org/index.php?title=Gitolite&diff=next&oldid=366859 edit]
 +
 +
From:
 +
<pre>
 +
{{AUR|gitolite}} is available in the [[Arch User Repository]]
 +
</pre>
 +
 +
To:
 +
<pre>
 +
{{Pkg|gitolite}} is available in the [[Arch User Repository]]
 +
</pre>
 +
 +
 +
 +
[[User:Tido.com|Tido.com]] ([[User talk:Tido.com|talk]]) 01:50, 1 April 2015 (UTC)
 +
 +
By "word distance" above what I _meant_ was [[Wikipedia:Edit Distance|Edit Distance]] ;)
 +
 +
I was initially thinking of Hamming distance - but apparently that is for strings of equal length.
 +
 +
What looks more promising is the Levenshtein distance - specifically "Comparing a list of strings" from the Python [https://pypi.python.org/pypi/Distance/0.1.3 Distance] package.
 +
 +
Example shamelessly ripped from that page:
 +
 +
(mainly because I couldn't link directly to the relevant section)
 +
 +
<pre>
 +
>>> sent1 = ['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
 +
>>> sent2 = ['the', 'lazy', 'fox', 'jumps', 'over', 'the', 'crazy', 'dog']
 +
>>> distance.levenshtein(sent1, sent2)
 +
3
 +
</pre>
 +
[[User:Tido.com|Tido.com]] ([[User talk:Tido.com|talk]]) 04:07, 1 April 2015 (UTC)
 +
 +
:Hi,
 +
:the bot currently does not touch the surrounding text at all, it only modifies the package templates or appends [[Template:Broken package link]] when the package is not found. This is obviously not perfect, this behaviour may lead to some incorrect combinations as you noticed, but blindly fixing the package links and not the surrounding text is still considered to be an improvement. Checking the surrounding text manually would require a lot of manpower, which we don't have, so it is currently not done systematically. Feel free to ask for further details or see the most recent discussion: [[ArchWiki:Requests#Strategy_for_updating_package_templates]].
 +
:Regarding automatic updates of the surrounding text, the edit distance gives a clue about whether given edit should be performed or not, but it does not define how an edit should be performed. It can be useful in cases where there are multiple feasible substitutions in text and the strategy to select the optimal substitution is e.g. to minimize the Levenshtein distance. But we don't have any algorithm to generate feasible substitutions yet, so this technique fails. The surrounding text substitution is also very context sensitive and wiki bots must be designed in a way to minimize (ideally avoid completely) the [[wikipedia:Error_of_the_first_kind|error of the first kind]], which in this case is modifying correct text to be incorrect. This makes defining general rules for the text substitution really hard, on the other hand many rules would be necessary to cover even the basic form of standard wording, so in the end both ways may be comparably hard. Anyway, if you have some ideas, I'm all ears :)
 +
:-- [[User:Lahwaacz|Lahwaacz]] ([[User talk:Lahwaacz|talk]]) 17:51, 1 April 2015 (UTC)
 +
 +
== bot checking links after move  ==
 +
 +
Hi, re [[Talk:Touchpad Synaptics#adding libinput alternative]]. [[Touchpad Synaptics]] has 100+ backlinks and the more important ones -  a bit tedious task. I was just glancing over your clever github bot scripts. It would be handy to have a script after such moves: walk over the backlinks of [[Touchpad Synaptics]] and just replace "[[Touchpad Synaptics" with "[[Synaptics" from the links. That would leave all links to subsections intact. Leaving out the translations to handle manually, there would not be much to go wrong, or? --[[User:Indigo|Indigo]] ([[User talk:Indigo|talk]]) 07:36, 26 September 2015 (UTC)
 +
 +
:Hi, thanks for the suggestion. It would be indeed handy in this case, but most likely not generally. Imagine that there was a [[UUID]] page, which was later generalized and renamed to [[Persistent block device naming]] and content about UUID is now only a section on the page. In this case using the naive replacement would likely change the meaning of many sentences, and using shorter redirects for convenience is actually encouraged. There would have to be a list of whitelisted "harmless" replacements, which could even help to replace <nowiki>[[pacman|Install]]</nowiki> with <nowiki>[[Install]]</nowiki> etc. -- [[User:Lahwaacz|Lahwaacz]] ([[User talk:Lahwaacz|talk]]) 08:01, 26 September 2015 (UTC)
 +
 +
::Yes, good examples, but you are thinking universal already :) I did not mean it could be that. For example, if you take the time when the bulk of the title case moves were done. With such a script one could avoid a lot of internal redirects as well. E.g. [https://wiki.archlinux.org/index.php/Special:WhatLinksHere/Beginners'_Guide]. But it's ok, just an idea. Please close this, if you think it's too singular cases with a simple enough replacement where it could be applied. --[[User:Indigo|Indigo]] ([[User talk:Indigo|talk]]) 10:02, 26 September 2015 (UTC)
 +
 +
== <s>link checker bot</s> ==
 +
 +
Hi,
 +
curiously the external link used in [https://wiki.archlinux.org/index.php?title=Official_repositories&curid=2716&diff=427897&oldid=425879] works. I tried the following in firefox and gnome-web:
 +
* [//kernel.org]
 +
* [//kernel.org kernel.org]
 +
* [//sourceforge.net sourceforge.net]
 +
* [//bugs.freedesktop.org freedesktop]
 +
* [//nytimes.com nytimes_nossl]
 +
* [//libvirt.org this link destination has ssl and nonssl too]
 +
 +
All work fine, i.e. default to ssl when possible.
 +
Neat, uh? Maybe something to add to style and use your bot for? What do you think? --[[User:Indigo|Indigo]] ([[User talk:Indigo|talk]]) 12:10, 26 March 2016 (UTC)
 +
 +
:It's a ''protocol-relative'' link (see e.g. [https://www.mediawiki.org/wiki/Help:Links#External_links]), so the scheme is taken from the current URL, there is no fallback if the selected scheme does not work. For example [//mmg.fjfi.cvut.cz/mmg/index.php this link] does not work if you read this using HTTPS, because the target page is only available via HTTP.
 +
:As for the style guidelines, I would actually prefer explicit schemes (using HTTPS wherever possible). Is the Arch wiki even available via HTTP?
 +
:-- [[User:Lahwaacz|Lahwaacz]] ([[User talk:Lahwaacz|talk]]) 12:54, 26 March 2016 (UTC)
 +
 +
::Yeah, ok. Thanks for the links. I thought at first, some wm magic figures ssl availability in the background first. Since that does not work, any style rule to shorten out https:// is not useful (/makes it too complicated), I agree.
 +
::As for the wiki, no, the nginx redirects everytime. So a simple {{ic|curl http://wiki.archlinux.org}} fails flat. But I see they moved luna's certificate to Let's encrypt. One neat thing :) Closing. --[[User:Indigo|Indigo]] ([[User talk:Indigo|talk]]) 13:30, 26 March 2016 (UTC)

Latest revision as of 13:30, 26 March 2016

Regex for replacing = codes

Hi, regarding User:Lahwaacz#User:Lahwaacz#Regex_for_replacing_.3D_codes do you intend to use a similar expression with the editor assistant or directly with the bot? In the latter case I think it would be pretty dangerous, for example it would break templates that already use a named parameter, e.g. {{Template|parameter=value}} would be turned into {{Template|1=parameter=value}}. -- Kynikos (talk) 16:57, 21 March 2014 (UTC)

I used it only once and don't have any specific plans, but I'm quite certain I will need to use it again sometimes... Thanks for the warning, I will be cautious. -- Lahwaacz (talk) 17:50, 21 March 2014 (UTC)

PodCastXDL

About [1] (and [2]) User:Levi0x0x, who should have indeed provided an edit summary, appears to be the developer of the application and the maintainer of the PKGBUILD. I would keep his edit. -- Kynikos (talk) 00:45, 5 July 2014 (UTC)

I know - I've seen also bash-player removed, both from wiki and Github (it seems the repo has been recreated from scratch). PodCastXDL has always been available upstream. -- Lahwaacz (talk) 08:20, 5 July 2014 (UTC)
Didn't he add it to the list one week ago? [3] Maybe he's found some bug and doesn't want people to use it until he fixes it? Anyway I'm not that interested, we can as well see if/how Levi0x0x reacts. -- Kynikos (talk) 04:32, 6 July 2014 (UTC)

bot AUR to Official Repository edit

A recent bot edit (update Pkg/AUR templates) by User:Lahwaacz.bot on the Gitolite page correctly changed the AUR template to Pkg but left the Arch User Repository link

I fixed this, but would it be possible to modify the bot to take this into consideration?

I can imagine that blanket changing AUR links to Official Repository links in any given page could be dangerous - but for common phrasing or possibly word distance it would seem to be relatively safe

Or is there some sort of post-run manual inspection that I am unaware of that handles this situation?

Specifically this edit

From:

{{AUR|gitolite}} is available in the [[Arch User Repository]]

To:

{{Pkg|gitolite}} is available in the [[Arch User Repository]]


Tido.com (talk) 01:50, 1 April 2015 (UTC)

By "word distance" above what I _meant_ was Edit Distance ;)

I was initially thinking of Hamming distance - but apparently that is for strings of equal length.

What looks more promising is the Levenshtein distance - specifically "Comparing a list of strings" from the Python Distance package.

Example shamelessly ripped from that page:

(mainly because I couldn't link directly to the relevant section)

>>> sent1 = ['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
>>> sent2 = ['the', 'lazy', 'fox', 'jumps', 'over', 'the', 'crazy', 'dog']
>>> distance.levenshtein(sent1, sent2)
3

Tido.com (talk) 04:07, 1 April 2015 (UTC)

Hi,
the bot currently does not touch the surrounding text at all, it only modifies the package templates or appends Template:Broken package link when the package is not found. This is obviously not perfect, this behaviour may lead to some incorrect combinations as you noticed, but blindly fixing the package links and not the surrounding text is still considered to be an improvement. Checking the surrounding text manually would require a lot of manpower, which we don't have, so it is currently not done systematically. Feel free to ask for further details or see the most recent discussion: ArchWiki:Requests#Strategy_for_updating_package_templates.
Regarding automatic updates of the surrounding text, the edit distance gives a clue about whether given edit should be performed or not, but it does not define how an edit should be performed. It can be useful in cases where there are multiple feasible substitutions in text and the strategy to select the optimal substitution is e.g. to minimize the Levenshtein distance. But we don't have any algorithm to generate feasible substitutions yet, so this technique fails. The surrounding text substitution is also very context sensitive and wiki bots must be designed in a way to minimize (ideally avoid completely) the error of the first kind, which in this case is modifying correct text to be incorrect. This makes defining general rules for the text substitution really hard, on the other hand many rules would be necessary to cover even the basic form of standard wording, so in the end both ways may be comparably hard. Anyway, if you have some ideas, I'm all ears :)
-- Lahwaacz (talk) 17:51, 1 April 2015 (UTC)

bot checking links after move

Hi, re Talk:Touchpad Synaptics#adding libinput alternative. Touchpad Synaptics has 100+ backlinks and the more important ones - a bit tedious task. I was just glancing over your clever github bot scripts. It would be handy to have a script after such moves: walk over the backlinks of Touchpad Synaptics and just replace "[[Touchpad Synaptics" with "[[Synaptics" from the links. That would leave all links to subsections intact. Leaving out the translations to handle manually, there would not be much to go wrong, or? --Indigo (talk) 07:36, 26 September 2015 (UTC)

Hi, thanks for the suggestion. It would be indeed handy in this case, but most likely not generally. Imagine that there was a UUID page, which was later generalized and renamed to Persistent block device naming and content about UUID is now only a section on the page. In this case using the naive replacement would likely change the meaning of many sentences, and using shorter redirects for convenience is actually encouraged. There would have to be a list of whitelisted "harmless" replacements, which could even help to replace [[pacman|Install]] with [[Install]] etc. -- Lahwaacz (talk) 08:01, 26 September 2015 (UTC)
Yes, good examples, but you are thinking universal already :) I did not mean it could be that. For example, if you take the time when the bulk of the title case moves were done. With such a script one could avoid a lot of internal redirects as well. E.g. [4]. But it's ok, just an idea. Please close this, if you think it's too singular cases with a simple enough replacement where it could be applied. --Indigo (talk) 10:02, 26 September 2015 (UTC)

link checker bot

Hi, curiously the external link used in [5] works. I tried the following in firefox and gnome-web:

All work fine, i.e. default to ssl when possible. Neat, uh? Maybe something to add to style and use your bot for? What do you think? --Indigo (talk) 12:10, 26 March 2016 (UTC)

It's a protocol-relative link (see e.g. [7]), so the scheme is taken from the current URL, there is no fallback if the selected scheme does not work. For example this link does not work if you read this using HTTPS, because the target page is only available via HTTP.
As for the style guidelines, I would actually prefer explicit schemes (using HTTPS wherever possible). Is the Arch wiki even available via HTTP?
-- Lahwaacz (talk) 12:54, 26 March 2016 (UTC)
Yeah, ok. Thanks for the links. I thought at first, some wm magic figures ssl availability in the background first. Since that does not work, any style rule to shorten out https:// is not useful (/makes it too complicated), I agree.
As for the wiki, no, the nginx redirects everytime. So a simple curl http://wiki.archlinux.org fails flat. But I see they moved luna's certificate to Let's encrypt. One neat thing :) Closing. --Indigo (talk) 13:30, 26 March 2016 (UTC)