I’m probably not going to undertake this, but I’m considering it. Do you just mean download all your submitted posts, and attached images? If so, it seems very simple.
Archive your toots, favourites and bookmarks, and work with them.
positional arguments:
{archive,replies,media,text,context,html,split,expire,report,followers,following,mutuals,whitelist,fix-boosts,login,meow}
archive archive your toots, favourites and bookmarks
replies archive missing toots you replied to
media download media referred to by toots in your archive
text search and export toots in the archive as plain text
context show a toot in context (i.e. with its ancestors and
its descendants
html export toots and media in the archive as static HTML
split split an archive into two
expire delete older toots from the server and unfavour
favourites if and only if they are in your archive
report report some numbers about your toots, favourites and
bookmarks
followers show followers
following find people you are following but who never mention
you
mutuals find people you are following and who follow you back
whitelist print the whitelist to help you debug problems
fix-boosts mark all the boosts as not deleted (triggering their
deletion)
login login to the instance for testing purposes
meow import your backup into Meow, a browser-based export
viewer (see https://purr.neocities.org/about/)
options:
-h, --help show this help message and exit
–quiet, -q do not output normal status messages
Once you have created archives in the current directory, you can use ‘all’
instead of your account and the commands will be run once for every archive in
the directory.
Something that merely performs the backup function would be a great start, and the most useful function. But from there it could get quite complex with useful features. Mastodon-archive goes as far as to fetch content by others when it is a direct reply to something the user said. But I would favor going further and grab the whole thread of any thread the user spoke in.
Some non-essential but useful features in a Lemmy variant might be:
Detect censorship on the user’s comments, which would entail a periodic comparison.
Fetch new versions of edited posts - ideally use a simple version tracking tool.
Find alternative links to the same post, thus tracking down cached versions on other hosts, and perhaps even detect host-specific censorship.
A flexible restore mechanism, so a user could cross-post potentially in batches. So if my1sthost.net/c/foo vanishes, the posts could be reconstituted on my2ndhost.net/c/foo. (note that lack of data portability is attributed to someone’s decision to not use Lemmy).
Shadowban detection: run as logged-in and also run cookieless, and flag differences.
Non-Lemmy support (kbin,mbin,piefed)
A MOVE option, which would make a local copy and then delete it from the server. This could get tricky because a user might want to only do this on msgs that have no replies.
In any case, a simple one-off backup would not need a huge effort. I suspect it would start with this API call: https://lemmy.readme.io/reference/get_user The natural next advancement would be to run the same job multiple times and expect it to not re-fetch already fetched content. For some reason that’s a little messy on Mastodon… not sure about Lemmy. On mastdodon, Kensanada’s code decides that after seeing an already archived msg 5 times that it need not crawl deeper. Feels a bit ad hoc / non-deterministic.
I’m probably not going to undertake this, but I’m considering it. Do you just mean download all your submitted posts, and attached images? If so, it seems very simple.
The mastodon-archive man page gives an idea of what it does:
man page
usage: mastodon-archive.py [-h] [–quiet] {archive,replies,media,text,context,html,split,expire,report,followers,following,mutuals,whitelist,fix-boosts,login,meow} …
Archive your toots, favourites and bookmarks, and work with them.
positional arguments: {archive,replies,media,text,context,html,split,expire,report,followers,following,mutuals,whitelist,fix-boosts,login,meow} archive archive your toots, favourites and bookmarks replies archive missing toots you replied to media download media referred to by toots in your archive text search and export toots in the archive as plain text context show a toot in context (i.e. with its ancestors and its descendants html export toots and media in the archive as static HTML split split an archive into two expire delete older toots from the server and unfavour favourites if and only if they are in your archive report report some numbers about your toots, favourites and bookmarks followers show followers following find people you are following but who never mention you mutuals find people you are following and who follow you back whitelist print the whitelist to help you debug problems fix-boosts mark all the boosts as not deleted (triggering their deletion) login login to the instance for testing purposes meow import your backup into Meow, a browser-based export viewer (see https://purr.neocities.org/about/)
options: -h, --help show this help message and exit –quiet, -q do not output normal status messages
Once you have created archives in the current directory, you can use ‘all’ instead of your account and the commands will be run once for every archive in the directory.
Something that merely performs the backup function would be a great start, and the most useful function. But from there it could get quite complex with useful features. Mastodon-archive goes as far as to fetch content by others when it is a direct reply to something the user said. But I would favor going further and grab the whole thread of any thread the user spoke in.
Some non-essential but useful features in a Lemmy variant might be:
In any case, a simple one-off backup would not need a huge effort. I suspect it would start with this API call: https://lemmy.readme.io/reference/get_user The natural next advancement would be to run the same job multiple times and expect it to not re-fetch already fetched content. For some reason that’s a little messy on Mastodon… not sure about Lemmy. On mastdodon, Kensanada’s code decides that after seeing an already archived msg 5 times that it need not crawl deeper. Feels a bit ad hoc / non-deterministic.