should we mention par2cmdline?
I had used parchive before and was looking for its packagename for it here. Maybe it should be mentioned. Feels backup-related enough to me.
Guidelines for adding programs?
Should a particular program (assuming it is open source) already have a package created for ArchLinux before adding it to this list? Also, what if it is relatively new, and doesn't have a lot of users yet, should it go through a minimum amount of other user testing before adding it? In the interest of full disclosure, I have recently contributed a backup system as open source (currently hosted on github), that I've personally used for a while -- it is similar in concept to the rsync-based backups, but it includes full file level deduplication (even across multiple clients), an SQLite-based catalog, and the client side uses standard GNU tar, find, and a wrapper shell script (no binaries to install on the client side). So you end up with the simplicity of rsync, with some of the features of the heavy-weight backup programs.
- You are free to add your tool. But since this is an Arch Wiki, the tool should be easily accessed by Arch user. I think at least a AUR package is needed. Create a AUR package is very easy. See the AUR and PKGBUILD for how to. -- Fengchao (talk) 07:42, 17 February 2013 (UTC)
Can I add these tree other options or they are not suitable by any reason?
- Sure, considering that MEGA is listed in Backup_Programs#Cloud_backups, these three would fit there much better ;) But please keep the structure the same as for other software. -- Lahwaacz (talk) 22:01, 13 December 2013 (UTC)
Wuala is shutting down
- Yes, please remove Wuala. I don't know Tresorit, you may add it if you want, or leave it to somebody else. — Kynikos (talk) 01:38, 30 September 2015 (UTC)
I am one of the authors of this paper were we compare the performance and system resources usage of five backup tools. Should it be included in this wiki?
The Console / Graphical categorization doesn't really make sense any more
I just added Bacula, which appears to be the most downloaded open source backup solution. Bacula can be used in console and graphical mode, and included a web interface as well. For the time being I just created a new category: Console & Graphical. -- Pgoetz (talk) 13:23, 20 December 2014 (UTC)
- Hi, I don't get it, are you just reporting the fact that you've added Bacula, or are you proposing something about the Console/Graphical categorization? -- Kynikos (talk) 02:22, 21 December 2014 (UTC)
I'm working on a possible restructuring of the page, especially using a table to compare the numerous incremental backup applications, see the draft at User:Kynikos/Backup programs. Here I wanted to see if there's somebody who doesn't like the idea in general, possibly bringing reasoned objections, so that I avoid wasting time on a project that eventually wouldn't be merged.
Positive feedback, ideas and direct contributions to the draft are also welcome, of course.
Distributed file systems
Samba and NFS are not distributed in the sense as the other entries. They may be seen as "distributed" from the client's point of view, but the storage is (usually) not distributed as is the case of e.g. GlusterFS. Samba and NFS can be used to export the final mount of a distributed file system over the network to the clients, but they are not responsible for the "distribution".
Even Wikipedia is very confused about this, e.g. w:File_system#Network_file_systems shows NFS and Samba as examples and links to w:Distributed_file_system as the main article, which is also very clumsy with explaining all the differences.
- Right, thanks for clarifying. A secondary reason why I added NFS and Samba to the list is that the only "overview" article that links to them is General recommendations: for the moment I've moved them in the Related box of File systems, but if we move the whole "Distributed file systems" section there from here, maybe also those two links can get their own section? — Kynikos (talk) 03:57, 1 February 2016 (UTC)
Anyway, are Archers supposed to run their own clusters for personal backups? :P
- Eheh initially the article was only mentioning Tahoe-LAFS among the Cloud storage applications. That's what encouraged me to add Ceph, for which I remembered we have an article, and then I decided to mention also GlusterFS and Sheepdog, eventually splitting them in the current list. Although the article doesn't explicitly set its scope to personal backups, I agree that those projects may not fit well here, and that's why I proposed to move them to File systems, instead of just deleting the section, so we don't reorphan Ceph, what do you think? — Kynikos (talk) 03:57, 1 February 2016 (UTC)
- Update: File systems#Clustered file systems has been expanded. I'm in favor of merging List of applications/Internet#Distributed file systems there. — Kynikos (talk) 10:46, 27 February 2017 (UTC)
Adding new information and reorganising the table of chunk-based increments archive/backup tools
Here's what I propose for this table:
- Added: Column "Increment Basis" to differentiate between full+incremental design and tools that base new increments off all prior data
- Added: Column "Historical archives can be removed" to differentiate between tools that are designed so prior archives can be removed easily or not
- Also moved GUI frontends of CLI tools to the "other interfaces" column
Here's the modified table: User:Level323
- About "Increment Basis", doesn't that simply distinguish between applications that allow creating subsequent full backups and those that create only one at the beginning? I think it's misleading to color the "Prior full backup" case in red, because an additional full backup is done intentionally, and it's obvious that the subsequent increments are then based on it. If really one type should be considered "better" than the other (green vs red), then the ability to create subsequent full backups would be an additional feature, therefore that should be colored in green. If adding a column like this, however, I wouldn't color it at all.
- About "Historical archives can be removed" I agree it can be useful; perhaps the heading can be simplified with "Old archives removable".
- About merging GUI frontends into Other interfaces, I tend to disagree because a frontend may add or hide features when compared to the backend application. Also, at least kup has ambiguous documentation about the backend, search "Needed backup programs" in https://www.linux-apps.com/p/1127689/
- — Kynikos (talk) 11:50, 10 August 2016 (UTC)
- Re your comments about "increment basis": I think my perhaps poor choice of column title led you down the wrong track. The feature I intended to distinguish was the deduplication abilities of the tool. The "traditional" approach of full backups followed by incrementals (typically) only avoids duplication of entire files that have not changed since the last full backup. This means that the destination "archive" will still retain considerable duplication (the unchanged parts of larger files as well as files in the next full backup that have not changed since the last full backup). In contrast, tools like borg/attic/obnam/bup commit all backup data to a single monolithic store and employ a rolling hash. This eliminates the concept of full+incremental backups (and avoids the associated duplication) and it also avoids the duplication of parts of larger files that have not changed. So I've changed the column title to "deduplication method". I've also removed the colouring of that column, but would note that most common use cases (personal and SME backup roles) would see the old full+incremental paradigm as inferior to the monolithic store + rolling-hash method. About the only beneficial use case I can think of for the full+incremental approach is when the archive is being written to write-once media.
- Re column "Historical archives can be removed" I agree the name is clunky. Have changed it to your suggestion.
- Re merging GUI frontends: I don't mind either way.
- Level323 (talk) 01:09, 19 August 2016 (UTC)
- Please pardon my stubbornness, but the meaning of the deduplication column isn't very clear yet to me :) Are we talking about the difference between w:Differential backup and w:Incremental backup (well explained in w:Differential backup#Illustration)? Then I'd call the column "Diff basis" or similar, and I would directly and simply use "differential" or "incremental" in the cells, leaving the explanation to a "Specific legend" under Synchronization and backup programs#Chunk-based increments, as it's already done for Synchronization and backup programs#File-based increments, with links to the Wikipedia articles. Also, I'd move the column between "Implementation" and "Compressed storage", and I'd indeed leave it uncolored.
- Green light for the "Old archives removable" column then.
- About merging the GUI frontends, let's leave them separate until somebody else shares their opinion here.
- — Kynikos (talk) 11:55, 19 August 2016 (UTC)
Backup by exporting list of programs
This page seems to omit documentation on backing up the installed state of the operating system (as explained for other distros here). I guess maybe that's out of scope, bu pointing towards it might be helpful. Ben Creasy (talk) 04:50, 24 October 2016 (UTC)
 replaced, only in the "Data synchronization" section, some direct links to official websites with direct links to packages for applications with an ArchWiki article.
This is a problem that I've also always seen in the App template, anyway I see it at least as redundant to link to both the package(s) and the main application article, which in turn surely links to the packages itself, 1) because the user may be tempted to install the packages directly, skipping any particular instructions that may be present in the article (which exists for a purpose), and 2) because we create duplication of content, i.e. if a package is split, renamed, alternatives are added etc., we have to keep both locations in sync. For this reason I propose to revert the edit.
If I'm the only one thinking this way, however, the same change should be applied to the other tables for consistency.
- I don't regard it as a problem. By providing a link to the article and a link to the package readers can choose whether they want to directly install a package or follow the article's installation section. I don't think that the ArchWiki should be idiot-proof. Overviews always result in duplication. –Larivact (talk) 15:22, 11 October 2017 (UTC)
   removed a few legend entries because the meaning of the respective fields is, I agree, kind of obvious, however the legend still has entries for almost all fields, and only arbitrarily excluding a few ("obviousness" is subjective after all) gives me a sense of incompleteness, so I propose to restore the entries.
Again, as an alternative the change should be applied to the rest of the article consistently.
- Thanks Larivact for restoring the legend entries. "Maintenance" is the only one left out, what if we reinstate it with the advice to add a "last-checked" date? -- Kynikos (talk) 12:26, 12 October 2017 (UTC)
taskd removal proposal
taskd is specific to taskwarrior, it can't be used for generic file sync or backup. Should it be removed from this page ?
Add luckybackup to the list
Luckybackup is a simple GUI tool for backup & sync. Although, it is not actively being maintained, the author is active on their discussion page. It is also available in the AUR luckybackup[AUR] --RaZorr (talk) 13:23, 20 December 2021 (UTC)
clarify Change propagation
The legend for data synchronization programs states:
- Change propagation
- Specifies in how many directions changes can be propagated.
- unidirectional means one-way synchronization of two locations,
- bidirectional means two-way synchronization of two locations and
- multidirectional means full synchronization of more than two locations.
However, there are two aspects or models of change propagation. For example, git is a truly distributed system, however, every data propagation for that is unidirectional.
On the other hand, there are file synchronization models, when there is a central server with the master version, and clients can only pull data from there (not push). I made a search on this topic, and could only find a piece of documentation for mutagen: one-way-safe: In this unidirectional synchronization mode, changes are only allowed to propagate from alpha to beta.
In this sense, rsync is bidirectional (not unidirectional, as it is written), or even multidirectional, like how it is written for git-annex (I used that program, and its data propagation works like git).
Maybe the distinction is subjective, but it would be nice to express what you think on that.
- Your model where you claim that rsync is bidirectional assumes that there is "a central server with the master version, and clients can only pull data from there (not push)". That does not make sense as the clients cannot change the master version, and if the master version changes due to some other factors, these changes presumably overwrite anything that happened on the clients. This is definitely not bidirectional, let alone multidirectional, as the information cannot go from the client to the master. — Lahwaacz (talk) 17:54, 9 July 2022 (UTC)
- "you claim that rsync is bidirectional assumes that there is "a central server with the master version" - I don't. I say that this is typical for a unidirectional synchronization. "the clients cannot change the master version" - yes, they can. For rsync source/destination can both be local or remote. I'm pushing my files to the server, and it renames/deletes/overwrites files in my account on the server. I agree that probably it is possible to tune remote rsync to be read-only (though it might be strange), but by default any changes can propagate to and from remote. Ynikitenko (talk) 18:25, 10 July 2022 (UTC)
- You ended one sentence with "unidirectional" and continued with the phrase "On the other hand..." so it seemed the following was not about "unidirectional". If I parsed that differently than you meant, please try to explain yourself better.
- I'll just try to explain why rsync is unidirectional. If you change file A in the source and file B in the destination, rsync can't be run with flags to propagate file A from source to destination and file B from destination to source. Unless you have some frontend which can detect this case and run rsync twice with partial file lists – but this is not pure rsync, but a different software.
- — Lahwaacz (talk) 19:02, 10 July 2022 (UTC)
- I see. No, I meant another aspect. I usually do my best to clarify the meaning, and can answer the questions, of course. Thanks for the note!
- Now I understand your point; some googling shows to me that you are right in this regard. However, I still can't find an "official" definition for these transfers (uni/bi/multi-directional). Do you have a link, could it be added to the wiki? Ynikitenko (talk) 19:27, 10 July 2022 (UTC)