![]() # completely optional, CLI can always be used without running a server Optional: Start the server then login to the Web UI ⇢ Admin.Run the initial setup and create an admin user.ĭocker compose run archivebox init -setup.Download the docker-compose.yml file into a new empty directory (can be anywhere).Install Docker and Docker Compose on your system (if not already installed).□ Docker Compose is recommended for the easiest install/update UX + best security + all the extras working out-of-the-box. ✳️ Easy Setup docker-compose (macOS/Linux/Windows) □ recommended (click to expand) □ Supported OSs: Linux/BSD, macOS, Windows (Docker) □ CPUs: amd64 ( x86_64), arm64 ( arm8), arm7 (raspi>=3) Note: On arm7, the playwright package, provides easy chromium management, is not yet available. Support/consulting pays for hosting and funds new ArchiveBox open-source development. for individuals, NGOs, academia, governments, journalism, law, and more…Īll our work is open-source and primarily geared towards non-profits.setup & support, team permissioning, hashing, audit logging, backups, custom archiving etc.Planned: support for running JS during archiving to adblock, autoscroll, modal-hide, thread-expandĬontact us if your non-profit institution/org wants to use ArchiveBox professionally.Advanced users: support for archiving content requiring login/paywall/cookies (see wiki security caveats!).Saves all pages to as well by default for redundancy (can be disabled for local-only mode). ![]() Usable as a oneshot CLI, self-hosted web UI, Python API (BETA), REST API (ALPHA), or desktop app (ALPHA).Uses standard, durable, long-term formats like HTML, JSON, PDF, PNG, MP4, TXT, and WARC.Supports scheduled/realtime importing from many types of sources.Extracts a wide variety of content out-of-the-box: media (yt-dlp), articles (readability), code (git), etc.Comprehensive documentation, active development, and rich community.Powerful, intuitive command line interface with modular optional dependencies.Free & open source, doesn’t require signing up online, stores all data locally.The goal is to sleep soundly knowing the part of the internet you care about will be automatically preserved in durable, easily accessible formats for decades after it goes down. Researchers: collecting AI training sets, feeding analysis / web crawling pipelines.Lawyers: evidence collection, hashing & integrity verifying, search, tagging, & review.Journalists: crawling and collecting research, preserving quoted material, fact-checking and review.Individuals: backing up browser bookmarks/history, saving FB/Insta/etc.□️ ArchiveBox is used by many professionals and hobbyists who save content off the web, for example: It uses normal filesystem folders to organize archives (no complicated proprietary formats), and offers a CLI + web UI. news articles -> article body TXT + title, author, featured images.> MP3/MP4 + subtitles, description, thumbnail HTML/Generic Websites -> HTML, PDF, PNG, WARC, Singlefile.It also detects any content featured inside each webpage & extracts it out into a folder: □ It saves snapshots of the URLs you feed it in several redundant formats. □ You can feed ArchiveBox URLs one at a time, or schedule regular imports from browser bookmarks or history, feeds like RSS, bookmark services like Pocket/Pinboard, and more. ➡️ Use ArchiveBox as a command-line package and/or self-hosted web app on Linux, macOS, or in Docker. does a great job as a free central archive, but they require all archives to be public, and they can’t save every type of content.ĪrchiveBox is an open source tool that helps you archive web content on your own (or privately within an organization): save copies of browser bookmarks, preserve evidence for legal cases, backup photos from FB / Insta / Flickr, download your media from YT / Soundcloud / etc., snapshot research papers & academic citations, and more… Without active preservation effort, everything on the internet eventually dissapears or degrades. ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |