Talk:Synchronization and backup programs

From ArchWiki
Latest comment: 10 July 2022 by Lahwaacz in topic clarify Change propagation

should we mention par2cmdline?

I had used parchive before and was looking for its packagename for it here. Maybe it should be mentioned. Feels backup-related enough to me.

Thanks --Kristianlm (talk) 13:01, 15 October 2012 (UTC)Reply

Guidelines for adding programs?

Should a particular program (assuming it is open source) already have a package created for ArchLinux before adding it to this list? Also, what if it is relatively new, and doesn't have a lot of users yet, should it go through a minimum amount of other user testing before adding it? In the interest of full disclosure, I have recently contributed a backup system as open source (currently hosted on github), that I've personally used for a while -- it is similar in concept to the rsync-based backups, but it includes full file level deduplication (even across multiple clients), an SQLite-based catalog, and the client side uses standard GNU tar, find, and a wrapper shell script (no binaries to install on the client side). So you end up with the simplicity of rsync, with some of the features of the heavy-weight backup programs.

Thought I would ask here first before posting a writeup on the main wiki page -- don't want to appear like I'm spamming links or anything. Derekp7 (talk) 01:48, 17 February 2013 (UTC)Reply

You are free to add your tool. But since this is an Arch Wiki, the tool should be easily accessed by Arch user. I think at least a AUR package is needed. Create a AUR package is very easy. See the AUR and PKGBUILD for how to. -- Fengchao (talk) 07:42, 17 February 2013 (UTC)Reply

Cloud backups

Can I add these tree other options or they are not suitable by any reason?

- Drive[1] - AUR[2]
- Copy[3] - AUR[4]
- Bitcasa[5] - AUR[6]

--Gabriel B. Casella (talk) 21:37, 13 December 2013 (UTC)Reply

Sure, considering that MEGA is listed in Backup_Programs#Cloud_backups, these three would fit there much better ;) But please keep the structure the same as for other software. -- Lahwaacz (talk) 22:01, 13 December 2013 (UTC)Reply

Wuala is shutting down

Wuala is shutting down, see notice[7]. Can I remove it? Also it's recommended alternative Tresorit[8] is not exaclty the same, it's more of an encrypted Dropbox.

-- Folti (talk) 14:08, 29 September 2015 (UTC)Reply

Yes, please remove Wuala. I don't know Tresorit, you may add it if you want, or leave it to somebody else. — Kynikos (talk) 01:38, 30 September 2015 (UTC)Reply

Should a link to a performance comparison between Rsync, Rdiff-backup, Duplicity, Areca and Link-Backup be included?

I am one of the authors of this paper were we compare the performance and system resources usage of five backup tools. Should it be included in this wiki?

Thanks —This unsigned comment is by Aurhe (talk) Oct 2014. Please sign your posts with ~~~~!

Yes, you can use the See also section at the bottom. -- Kynikos (talk) 01:35, 8 October 2014 (UTC)Reply

The Console / Graphical categorization doesn't really make sense any more

I just added Bacula, which appears to be the most downloaded open source backup solution. Bacula can be used in console and graphical mode, and included a web interface as well. For the time being I just created a new category: Console & Graphical. -- Pgoetz (talk) 13:23, 20 December 2014 (UTC)Reply

Hi, I don't get it, are you just reporting the fact that you've added Bacula, or are you proposing something about the Console/Graphical categorization? -- Kynikos (talk) 02:22, 21 December 2014 (UTC)Reply
Bacula is the example of why the categorization scheme doesn't make sense. -- Pgoetz (talk) 09:41, 22 December 2014 (UTC)Reply
I'm not a big fan of that distinction either for the same reason, but it's derived from List of applications, and I feel many users would object to its removal. Let's leave this open and see if there are more comments. -- Kynikos (talk) 03:25, 23 December 2014 (UTC)Reply

Radical reorganization

I'm working on a possible restructuring of the page, especially using a table to compare the numerous incremental backup applications, see the draft at User:Kynikos/Backup programs. Here I wanted to see if there's somebody who doesn't like the idea in general, possibly bringing reasoned objections, so that I avoid wasting time on a project that eventually wouldn't be merged.

Positive feedback, ideas and direct contributions to the draft are also welcome, of course.

Kynikos (talk) 14:03, 28 January 2016 (UTC)Reply

Great idea! --Edh (talk) 17:08, 29 January 2016 (UTC)Reply
Thanks, I will merge soon if there are no objections, because I can't fill the table by myself :) — Kynikos (talk) 03:31, 31 January 2016 (UTC)Reply
Merged, hopefully it will improve over time. — Kynikos (talk) 16:20, 31 January 2016 (UTC)Reply
I will see what I can contribute as soon as my semester is over. -- Edh (talk) 18:14, 31 January 2016 (UTC)Reply

Distributed file systems

Samba and NFS are not distributed in the sense as the other entries. They may be seen as "distributed" from the client's point of view, but the storage is (usually) not distributed as is the case of e.g. GlusterFS. Samba and NFS can be used to export the final mount of a distributed file system over the network to the clients, but they are not responsible for the "distribution".

Even Wikipedia is very confused about this, e.g. w:File_system#Network_file_systems shows NFS and Samba as examples and links to w:Distributed_file_system as the main article, which is also very clumsy with explaining all the differences.

Lahwaacz (talk) 16:39, 31 January 2016 (UTC)Reply

Right, thanks for clarifying. A secondary reason why I added NFS and Samba to the list is that the only "overview" article that links to them is General recommendations: for the moment I've moved them in the Related box of File systems, but if we move the whole "Distributed file systems" section there from here, maybe also those two links can get their own section? — Kynikos (talk) 03:57, 1 February 2016 (UTC)Reply
They could be added to List of applications#File sharing, next to FTP. This page already links there. -- Lahwaacz (talk) 07:49, 1 February 2016 (UTC)Reply
Agreed, done. — Kynikos (talk) 07:08, 21 February 2016 (UTC)Reply

Anyway, are Archers supposed to run their own clusters for personal backups? :P

Lahwaacz (talk) 16:39, 31 January 2016 (UTC)Reply

Eheh initially the article was only mentioning Tahoe-LAFS among the Cloud storage applications. That's what encouraged me to add Ceph, for which I remembered we have an article, and then I decided to mention also GlusterFS and Sheepdog, eventually splitting them in the current list. Although the article doesn't explicitly set its scope to personal backups, I agree that those projects may not fit well here, and that's why I proposed to move them to File systems, instead of just deleting the section, so we don't reorphan Ceph, what do you think? — Kynikos (talk) 03:57, 1 February 2016 (UTC)Reply
There could be an introduction to "network targets" somewhere, listing all the common options: SSH/SFTP, rsync, FTP, NFS, Samba, 3rd party cloud services and then the distributed file systems. -- Lahwaacz (talk) 07:49, 1 February 2016 (UTC)Reply
Update: File systems#Clustered file systems has been expanded. I'm in favor of merging List of applications/Internet#Distributed file systems there. — Kynikos (talk) 10:46, 27 February 2017 (UTC)Reply

Adding new information and reorganising the table of chunk-based increments archive/backup tools

Here's what I propose for this table:

  • Added: Column "Increment Basis" to differentiate between full+incremental design and tools that base new increments off all prior data
  • Added: Column "Historical archives can be removed" to differentiate between tools that are designed so prior archives can be removed easily or not
  • Also moved GUI frontends of CLI tools to the "other interfaces" column

Here's the modified table: User:Level323

Level323 (talk) 22:20, 8 August 2016 (UTC)Reply

  • About "Increment Basis", doesn't that simply distinguish between applications that allow creating subsequent full backups and those that create only one at the beginning? I think it's misleading to color the "Prior full backup" case in red, because an additional full backup is done intentionally, and it's obvious that the subsequent increments are then based on it. If really one type should be considered "better" than the other (green vs red), then the ability to create subsequent full backups would be an additional feature, therefore that should be colored in green. If adding a column like this, however, I wouldn't color it at all.
  • About "Historical archives can be removed" I agree it can be useful; perhaps the heading can be simplified with "Old archives removable".
  • About merging GUI frontends into Other interfaces, I tend to disagree because a frontend may add or hide features when compared to the backend application. Also, at least kup has ambiguous documentation about the backend, search "Needed backup programs" in https://www.linux-apps.com/p/1127689/
Kynikos (talk) 11:50, 10 August 2016 (UTC)Reply
  • Re your comments about "increment basis": I think my perhaps poor choice of column title led you down the wrong track. The feature I intended to distinguish was the deduplication abilities of the tool. The "traditional" approach of full backups followed by incrementals (typically) only avoids duplication of entire files that have not changed since the last full backup. This means that the destination "archive" will still retain considerable duplication (the unchanged parts of larger files as well as files in the next full backup that have not changed since the last full backup). In contrast, tools like borg/attic/obnam/bup commit all backup data to a single monolithic store and employ a rolling hash. This eliminates the concept of full+incremental backups (and avoids the associated duplication) and it also avoids the duplication of parts of larger files that have not changed. So I've changed the column title to "deduplication method". I've also removed the colouring of that column, but would note that most common use cases (personal and SME backup roles) would see the old full+incremental paradigm as inferior to the monolithic store + rolling-hash method. About the only beneficial use case I can think of for the full+incremental approach is when the archive is being written to write-once media.
  • Re column "Historical archives can be removed" I agree the name is clunky. Have changed it to your suggestion.
  • Re merging GUI frontends: I don't mind either way.
Level323 (talk) 01:09, 19 August 2016 (UTC)Reply
Kynikos (talk) 11:55, 19 August 2016 (UTC)Reply

Backup by exporting list of programs

This page seems to omit documentation on backing up the installed state of the operating system (as explained for other distros here). I guess maybe that's out of scope, bu pointing towards it might be helpful. Ben Creasy (talk) 04:50, 24 October 2016 (UTC)Reply

2017-10-09 changes

Bypassing articles with links to packages

[9] replaced, only in the "Data synchronization" section, some direct links to official websites with direct links to packages for applications with an ArchWiki article.

This is a problem that I've also always seen in the App template, anyway I see it at least as redundant to link to both the package(s) and the main application article, which in turn surely links to the packages itself, 1) because the user may be tempted to install the packages directly, skipping any particular instructions that may be present in the article (which exists for a purpose), and 2) because we create duplication of content, i.e. if a package is split, renamed, alternatives are added etc., we have to keep both locations in sync. For this reason I propose to revert the edit.

If I'm the only one thinking this way, however, the same change should be applied to the other tables for consistency.

-- Kynikos (talk) 15:20, 10 October 2017 (UTC)Reply

I don't regard it as a problem. By providing a link to the article and a link to the package readers can choose whether they want to directly install a package or follow the article's installation section. I don't think that the ArchWiki should be idiot-proof. Overviews always result in duplication. –Larivact (talk) 15:22, 11 October 2017 (UTC)Reply
Let's leave this open for a third opinion then, the article can stay in the current state meanwhile, this isn't a big issue after all. -- Kynikos (talk) 12:23, 12 October 2017 (UTC)Reply
I also updated the rest of the tables to be consistent with other comparison tables. --Larivact (talk) 04:21, 14 September 2018 (UTC)Reply

Partial legends

[10] [11] [12] removed a few legend entries because the meaning of the respective fields is, I agree, kind of obvious, however the legend still has entries for almost all fields, and only arbitrarily excluding a few ("obviousness" is subjective after all) gives me a sense of incompleteness, so I propose to restore the entries.

Again, as an alternative the change should be applied to the rest of the article consistently.

-- Kynikos (talk) 15:20, 10 October 2017 (UTC)Reply

Thanks Larivact for restoring the legend entries. "Maintenance" is the only one left out, what if we reinstate it with the advice to add a "last-checked" date? -- Kynikos (talk) 12:26, 12 October 2017 (UTC)Reply

taskd removal proposal

taskd is specific to taskwarrior, it can't be used for generic file sync or backup. Should it be removed from this page ?

Apollo22 (talk) 13:45, 10 April 2021 (UTC)Reply

Add luckybackup to the list

Luckybackup is a simple GUI tool for backup & sync. Although, it is not actively being maintained, the author is active on their discussion page. It is also available in the AUR luckybackup[AUR] --RaZorr (talk) 13:23, 20 December 2021 (UTC)Reply

clarify Change propagation

The legend for data synchronization programs states:

Change propagation
Specifies in how many directions changes can be propagated.
  • unidirectional means one-way synchronization of two locations,
  • bidirectional means two-way synchronization of two locations and
  • multidirectional means full synchronization of more than two locations.

However, there are two aspects or models of change propagation. For example, git is a truly distributed system, however, every data propagation for that is unidirectional.

On the other hand, there are file synchronization models, when there is a central server with the master version, and clients can only pull data from there (not push). I made a search on this topic, and could only find a piece of documentation for mutagen: one-way-safe: In this unidirectional synchronization mode, changes are only allowed to propagate from alpha to beta.

In this sense, rsync is bidirectional (not unidirectional, as it is written), or even multidirectional, like how it is written for git-annex (I used that program, and its data propagation works like git).

Maybe the distinction is subjective, but it would be nice to express what you think on that.

Ynikitenko (talk) 10:10, 3 July 2022 (UTC)Reply

Your model where you claim that rsync is bidirectional assumes that there is "a central server with the master version, and clients can only pull data from there (not push)". That does not make sense as the clients cannot change the master version, and if the master version changes due to some other factors, these changes presumably overwrite anything that happened on the clients. This is definitely not bidirectional, let alone multidirectional, as the information cannot go from the client to the master. — Lahwaacz (talk) 17:54, 9 July 2022 (UTC)Reply
"you claim that rsync is bidirectional assumes that there is "a central server with the master version" - I don't. I say that this is typical for a unidirectional synchronization. "the clients cannot change the master version" - yes, they can. For rsync source/destination can both be local or remote. I'm pushing my files to the server, and it renames/deletes/overwrites files in my account on the server. I agree that probably it is possible to tune remote rsync to be read-only (though it might be strange), but by default any changes can propagate to and from remote. Ynikitenko (talk) 18:25, 10 July 2022 (UTC)Reply
You ended one sentence with "unidirectional" and continued with the phrase "On the other hand..." so it seemed the following was not about "unidirectional". If I parsed that differently than you meant, please try to explain yourself better.
I'll just try to explain why rsync is unidirectional. If you change file A in the source and file B in the destination, rsync can't be run with flags to propagate file A from source to destination and file B from destination to source. Unless you have some frontend which can detect this case and run rsync twice with partial file lists – but this is not pure rsync, but a different software.
Lahwaacz (talk) 19:02, 10 July 2022 (UTC)Reply
I see. No, I meant another aspect. I usually do my best to clarify the meaning, and can answer the questions, of course. Thanks for the note!
Now I understand your point; some googling shows to me that you are right in this regard. However, I still can't find an "official" definition for these transfers (uni/bi/multi-directional). Do you have a link, could it be added to the wiki? Ynikitenko (talk) 19:27, 10 July 2022 (UTC)Reply
I don't have a link at hand, but there might be something – I didn't look into this topic in a while. — Lahwaacz (talk) 20:36, 10 July 2022 (UTC)Reply