For years I’ve on and off looked for web archiving software that can capture most sites, including ones that are “complex” with lots of AJAX and require logins like Reddit. Which ones have worked best for you?

Ideally I want one that can be started up programatically or via command line, an opens a chromium instance (or any browser), and captures everything shown on the page. I could also open the instance myself and log into sites and install addons like UBlock Origin. (btw, archiveweb.page must be started manually).

  • klangcola@reddthat.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    SingleFile provides a faithful representation of the original webpage, so bloated webpages are indeed saved as bloated html files.

    On the plus side you’re getting an exact copy, but on the downside an exact copy may not be necessary and takes a huge amount of space.

    • N0x0n@lemmy.ml
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      3 days ago

      You’re right ! And because OP want to archive Reddit pages I propose an alternative to reduce that bloated site to a minimum :).

      From my tests, it can go from 20MB to 700Bytes. IMO still big for a chat conversation but the readability from the alternative front-end is a + !