Talk:Locate

From ArchWiki
(Redirected from Talk:Mlocate)
Latest comment: 3 February by Arash in topic pacman hook

pacman hook

How about mentioning pacman hook for it? pacman-updatedb-hookAUR --Ratijas (talk) 16:50, 21 September 2021 (UTC)Reply[reply]

What does it do? — Lahwaacz (talk) 08:01, 25 September 2021 (UTC)Reply[reply]
I do think, it certainly worth mentioning in the article. The pacman hook you're talking about just runs updatedb after each upgrade, install and remove.
Even though, I don't find use for it personally, may someone does so.
Arash (talk) 00:54, 28 January 2024 (UTC)Reply[reply]
How long does it take to run? Do you really want to have the entire filesystem, including e.g. /home, indexed after every package operation? — Lahwaacz (talk) 14:42, 28 January 2024 (UTC)Reply[reply]
Sorry for delay.
It takes between 15–20 seconds on my laptop with a Skylake intel processor and 16G ram. Will be less, probably, on my desktop, but I didn't try.
It's worth mentioning, it is the case with plocate, I didn't try it with mlocate. Also, it's not the initial run of it, Its initial run was probably automatically with its systemd timer way before.
I guess it does not re-index what is already indexed, I don't know how all this stuff works whatsoever.
Arash (talk) 04:11, 31 January 2024 (UTC)Reply[reply]
It takes about 4 minutes on my desktop. — andreymal (talk) 10:37, 31 January 2024 (UTC)Reply[reply]
Please specify, on how much used space (your filled storage), what hardware and how frequently.
According to updatedb(8), "If the database already exists, its data is reused to avoid rereading directories that have not changed."
So it really depends on how frequently do you run updatedb by any mean — including by cron jobs, Systemd timers, manually or a pacman hook.
On my laptop, with about 120GiB used storage of SSD, and the mentioned internal hardware, running it daily can't take more than 20 Second. Running it again after a few hours takes about 3 seconds!
Nevertheless, I think all these measurements are not the subject, mentioning existence of a pacman hook in AUR, related to the tool documented in this article (locate and variants), doesn't need any of these measurements. It's up to the user to see if he likes to use an available tool or not!
If we think that it's not a good idea, we can just put a Note about it, including detailed technicals reasons that why using this mentioned tool would not be a good idea on certain systems!
We have to think first that why such a pacman hook even exists? Most probably, because there are people who found it useful!
One of the two main tradeoffs of locate(s) programs is that they can't have absolute accuracy like find. Their accuracy can be improved by updating their database more frequently — and the overall efficiency cost is the same, because not-changed directories are not re-indexed!
Arash (talk) 15:33, 31 January 2024 (UTC)Reply[reply]
120GiB is nothing. I have 4.5 millon files on two SSDs and two HDDs totaling ~4.5TiB, and even find -type f takes minutes to complete (but just to clarify, I'm not against mentioning it) — andreymal (talk) 16:00, 31 January 2024 (UTC)Reply[reply]
Even if updatedb won't update its database for files that were not changed since the last time, it still has to traverse the whole directory tree and this may generate a lot of IO. Note that "not-changed directory" does not make sense - directory metadata does not include any info related to changes in its subdirectories and files.
If you want to mention this on the page, you need to make sure that readers are aware of all the consequences. IMO it would be more useful to just hint that increasing the frequency of the systemd timer or reindexing more often in general may help, I don't see a benefit of connecting the reindexing to pacman operations.
Lahwaacz (talk) 11:37, 3 February 2024 (UTC)Reply[reply]
How it ignores unchanged directories has to be asked from its developer. Nevertheless, your statement about the case is true, most probably.
I personally, as a human, can find if a directory has possibly changed internally or not by comparing its current and previous atime.
But talking about efficiency in consuming system resources. That has many aspects.
Best practice to generate less I/O regarding updatedb, is to run it, only, when a file/directory has been created, removed or renamed. Time is the most indirect effective factor here. You can create/remove millions of files in an hour while not touching anything for the rest of the season!
A system can serve for a year or for months without any change to its persistent data storage, or to the files which searching for them would be desirable. While systemd timer for updatedb is running every day, without any actual benefit, in the example.
pacman operations create/remove/change files in many directories, particularly, in /etc and /usr directory trees.
Just increasing the frequency of running updatedb based on time, while resisting to run it when your filesystem has definitely changed?
In using locate programs, the worst case is to search for a file that exists, but is not indexed! So user mistakenly assume its absence, falsely!
However! I believe in freedom, existence of methods to use a program is good to be documented, doesn't matter if we think that it's a bad method, personally. We can warn or note the reader about possible cons/caveats/consequences of the documented technic.
Arash (talk) 21:58, 3 February 2024 (UTC)Reply[reply]

[plocate] `PRUNE_BIND_MOUNTS = "no"` required for btrfs

Is it worth to mention this? More info here Leuko (talk) 10:58, 6 December 2023 (UTC)Reply[reply]