Talk:Synchronization and backup programs
- 1 should we mention par2cmdline?
- 2 Guidelines for adding programs?
- 3 Cloud backups
- 4 Should a link to a performance comparison between Rsync, Rdiff-backup, Duplicity, Areca and Link-Backup be included?
- 5 The Console / Graphical categorization doesn't really make sense any more
- 6 Radical reorganization
- 7 Recommendations?
- 8 Backup by exporting list of programs
Unidirectional or bidirectional data synchronization
- 10 2017-10-09 changes
should we mention par2cmdline?
I had used parchive before and was looking for its packagename for it here. Maybe it should be mentioned. Feels backup-related enough to me.
Guidelines for adding programs?
Should a particular program (assuming it is open source) already have a package created for ArchLinux before adding it to this list? Also, what if it is relatively new, and doesn't have a lot of users yet, should it go through a minimum amount of other user testing before adding it? In the interest of full disclosure, I have recently contributed a backup system as open source (currently hosted on github), that I've personally used for a while -- it is similar in concept to the rsync-based backups, but it includes full file level deduplication (even across multiple clients), an SQLite-based catalog, and the client side uses standard GNU tar, find, and a wrapper shell script (no binaries to install on the client side). So you end up with the simplicity of rsync, with some of the features of the heavy-weight backup programs.
- You are free to add your tool. But since this is an Arch Wiki, the tool should be easily accessed by Arch user. I think at least a AUR package is needed. Create a AUR package is very easy. See the AUR and PKGBUILD for how to. -- Fengchao (talk) 07:42, 17 February 2013 (UTC)
Can I add these tree other options or they are not suitable by any reason?
- Sure, considering that MEGA is listed in Backup_Programs#Cloud_backups, these three would fit there much better ;) But please keep the structure the same as for other software. -- Lahwaacz (talk) 22:01, 13 December 2013 (UTC)
Wuala is shutting down
- Yes, please remove Wuala. I don't know Tresorit, you may add it if you want, or leave it to somebody else. — Kynikos (talk) 01:38, 30 September 2015 (UTC)
I am one of the authors of this paper were we compare the performance and system resources usage of five backup tools. Should it be included in this wiki?
The Console / Graphical categorization doesn't really make sense any more
I just added Bacula, which appears to be the most downloaded open source backup solution. Bacula can be used in console and graphical mode, and included a web interface as well. For the time being I just created a new category: Console & Graphical. -- Pgoetz (talk) 13:23, 20 December 2014 (UTC)
- Hi, I don't get it, are you just reporting the fact that you've added Bacula, or are you proposing something about the Console/Graphical categorization? -- Kynikos (talk) 02:22, 21 December 2014 (UTC)
I'm working on a possible restructuring of the page, especially using a table to compare the numerous incremental backup applications, see the draft at User:Kynikos/Backup programs. Here I wanted to see if there's somebody who doesn't like the idea in general, possibly bringing reasoned objections, so that I avoid wasting time on a project that eventually wouldn't be merged.
Positive feedback, ideas and direct contributions to the draft are also welcome, of course.
Distributed file systems
Samba and NFS are not distributed in the sense as the other entries. They may be seen as "distributed" from the client's point of view, but the storage is (usually) not distributed as is the case of e.g. GlusterFS. Samba and NFS can be used to export the final mount of a distributed file system over the network to the clients, but they are not responsible for the "distribution".
Even Wikipedia is very confused about this, e.g. w:File_system#Network_file_systems shows NFS and Samba as examples and links to w:Distributed_file_system as the main article, which is also very clumsy with explaining all the differences.
- Right, thanks for clarifying. A secondary reason why I added NFS and Samba to the list is that the only "overview" article that links to them is General recommendations: for the moment I've moved them in the Related box of File systems, but if we move the whole "Distributed file systems" section there from here, maybe also those two links can get their own section? — Kynikos (talk) 03:57, 1 February 2016 (UTC)
Anyway, are Archers supposed to run their own clusters for personal backups? :P
- Eheh initially the article was only mentioning Tahoe-LAFS among the Cloud storage applications. That's what encouraged me to add Ceph, for which I remembered we have an article, and then I decided to mention also GlusterFS and Sheepdog, eventually splitting them in the current list. Although the article doesn't explicitly set its scope to personal backups, I agree that those projects may not fit well here, and that's why I proposed to move them to File systems, instead of just deleting the section, so we don't reorphan Ceph, what do you think? — Kynikos (talk) 03:57, 1 February 2016 (UTC)
- Update: File systems#Clustered file systems has been expanded. I'm in favor of merging List of applications/Internet#Distributed file systems there. — Kynikos (talk) 10:46, 27 February 2017 (UTC)
Adding new information and reorganising the table of chunk-based increments archive/backup tools
Here's what I propose for this table:
- Added: Column "Increment Basis" to differentiate between full+incremental design and tools that base new increments off all prior data
- Added: Column "Historical archives can be removed" to differentiate between tools that are designed so prior archives can be removed easily or not
- Also moved GUI frontends of CLI tools to the "other interfaces" column
Here's the modified table: User:Level323
- About "Increment Basis", doesn't that simply distinguish between applications that allow creating subsequent full backups and those that create only one at the beginning? I think it's misleading to color the "Prior full backup" case in red, because an additional full backup is done intentionally, and it's obvious that the subsequent increments are then based on it. If really one type should be considered "better" than the other (green vs red), then the ability to create subsequent full backups would be an additional feature, therefore that should be colored in green. If adding a column like this, however, I wouldn't color it at all.
- About "Historical archives can be removed" I agree it can be useful; perhaps the heading can be simplified with "Old archives removable".
- About merging GUI frontends into Other interfaces, I tend to disagree because a frontend may add or hide features when compared to the backend application. Also, at least kup has ambiguous documentation about the backend, search "Needed backup programs" in https://www.linux-apps.com/p/1127689/
- — Kynikos (talk) 11:50, 10 August 2016 (UTC)
- Re your comments about "increment basis": I think my perhaps poor choice of column title led you down the wrong track. The feature I intended to distinguish was the deduplication abilities of the tool. The "traditional" approach of full backups followed by incrementals (typically) only avoids duplication of entire files that have not changed since the last full backup. This means that the destination "archive" will still retain considerable duplication (the unchanged parts of larger files as well as files in the next full backup that have not changed since the last full backup). In contrast, tools like borg/attic/obnam/bup commit all backup data to a single monolithic store and employ a rolling hash. This eliminates the concept of full+incremental backups (and avoids the associated duplication) and it also avoids the duplication of parts of larger files that have not changed. So I've changed the column title to "deduplication method". I've also removed the colouring of that column, but would note that most common use cases (personal and SME backup roles) would see the old full+incremental paradigm as inferior to the monolithic store + rolling-hash method. About the only beneficial use case I can think of for the full+incremental approach is when the archive is being written to write-once media.
- Re column "Historical archives can be removed" I agree the name is clunky. Have changed it to your suggestion.
- Re merging GUI frontends: I don't mind either way.
- Level323 (talk) 01:09, 19 August 2016 (UTC)
- Please pardon my stubbornness, but the meaning of the deduplication column isn't very clear yet to me :) Are we talking about the difference between w:Differential backup and w:Incremental backup (well explained in w:Differential backup#Illustration)? Then I'd call the column "Diff basis" or similar, and I would directly and simply use "differential" or "incremental" in the cells, leaving the explanation to a "Specific legend" under Synchronization and backup programs#Chunk-based increments, as it's already done for Synchronization and backup programs#File-based increments, with links to the Wikipedia articles. Also, I'd move the column between "Implementation" and "Compressed storage", and I'd indeed leave it uncolored.
- Green light for the "Old archives removable" column then.
- About merging the GUI frontends, let's leave them separate until somebody else shares their opinion here.
- — Kynikos (talk) 11:55, 19 August 2016 (UTC)
We can't just give out recommendations, I suppose, but is there any way we can figure out which solutions are the most popular, or cite reviews of some of the software? It would be nice to help the users figure out the best software in a neutral way. Ben Creasy (talk) 04:46, 24 October 2016 (UTC)
Backup by exporting list of programs
This page seems to omit documentation on backing up the installed state of the operating system (as explained for other distros here). I guess maybe that's out of scope, bu pointing towards it might be helpful. Ben Creasy (talk) 04:50, 24 October 2016 (UTC)
Unidirectional or bidirectional data synchronization
As far as I can tell, the table in Synchronization_and_backup_programs#Data_synchronization does not consider the way changes are merged or discarded on the nodes involved in the synchronization. To explain what I mean, consider a directory synchronized between two machines (i.e. the state is identical on both ends). Then let's create file
A and remove file
B on one machine, and create file
C and remove file
D on the other machine. After that, bidirectional synchronization results in files
C created on both machines and files
D removed from both machines, but a unidirectional synchronization discards the changes on one of the machines.
Notable examples are rsync (unidirectional) and unison (bidirectional), so I think it's important to distinguish them in the table. Conceptually, there hasn't been any conflict yet, because all changes were to different files, so the "Conflict resolution" column does not apply - definitely not in case of unison and synching, in case of rsync I don't know what "preview changes" means.
- What if we change the "Multidirectional" column to "Direction" and allow values such as "unidirectional", "bidirectional" or "multidirectional", and change "preview changes" to "n/a" for rsync and rdiff-backup? -- Kynikos (talk) 11:13, 7 July 2017 (UTC)
- So the purpose of the two columns would change such that "Conflict resolution" refines the meaning of "bidirectional" and "multidirectional" from the "Direction" column?
- I don't think that "Direction: unidirectional" is a good characteristic of rsync, it can transfer both ways (local to remote and vice versa), but not at the same time. Maybe we could just name the column "Change detection" or "Change propagation" or something like that?
- -- Lahwaacz (talk) 12:34, 7 July 2017 (UTC)
- I've fixed the "Conflict resolution" problem, but I don't have the time to review all the applications to see which are uni-, bi- or multidirectional, and also the entry in the legend will require a longer description. I like "Change detection" or "Change propagation" though. -- Kynikos (talk) 11:22, 8 July 2017 (UTC)
- I've renamed the "Multidirectional" column to "Change propagation" and swapped it with the "Conflict resolution" column. From the little that can be found about some programs, the current values in the table seem correct. Closing, feel free to reopen if there is anything to be added. -- Lahwaacz (talk) 14:30, 5 August 2017 (UTC)
 replaced, only in the "Data synchronization" section, some direct links to official websites with direct links to packages for applications with an ArchWiki article.
This is a problem that I've also always seen in the App template, anyway I see it at least as redundant to link to both the package(s) and the main application article, which in turn surely links to the packages itself, 1) because the user may be tempted to install the packages directly, skipping any particular instructions that may be present in the article (which exists for a purpose), and 2) because we create duplication of content, i.e. if a package is split, renamed, alternatives are added etc., we have to keep both locations in sync. For this reason I propose to revert the edit.
If I'm the only one thinking this way, however, the same change should be applied to the other tables for consistency.
- I don't regard it as a problem. By providing a link to the article and a link to the package readers can choose whether they want to directly install a package or follow the article's installation section. I don't think that the ArchWiki should be idiot-proof. Overviews always result in duplication. –Larivact (talk) 15:22, 11 October 2017 (UTC)
 possibly simplified the terminology of the "Change propagation" section, but as touched on in #Unidirectional or bidirectional data synchronization the meaning may be ambiguous, so I propose to at least restore the longer legend description. -- Kynikos (talk) 15:20, 10 October 2017 (UTC)
   removed a few legend entries because the meaning of the respective fields is, I agree, kind of obvious, however the legend still has entries for almost all fields, and only arbitrarily excluding a few ("obviousness" is subjective after all) gives me a sense of incompleteness, so I propose to restore the entries.
Again, as an alternative the change should be applied to the rest of the article consistently.
- Thanks Larivact for restoring the legend entries. "Maintenance" is the only one left out, what if we reinstate it with the advice to add a "last-checked" date? -- Kynikos (talk) 12:26, 12 October 2017 (UTC)
Merging CLI with other interfaces
 merged the CLI and Other interfaces columns, only in the "Data synchronization" section, but this of course must come with the loss of the visual green/red colors which were remarking the popular CLI/GUI distinction that for example we're frequently using in List of applications. I propose to restore the previous separate columns and colors.
Alternatively, the same change should be applied to the other tables, for consistency.