Multi-Page Application Archive

Crawls a Multi-Page Application into a zip file. Serve the Multi-Page Application from the zip file. A MPA archiver. Could be used as a Site Generator.

Installation

npm install -g mpa-archive

Usage

Crawling

mpa http://example.net

Will crawl the url recursively and save it in example.net.zip . Once done, it will display a report and can serve the files from the zip.

Serving

mpa

Will create a server for each zip file on the current directory. Host is localhost with a port seeded to the zip file path.

Features

It uses headless puppeteer

Crawls http://example.net with cpu count / 2 threads

with threads Progress is displayed in the console

Fetches sitemap.txt and sitemap.xml as a seed point

and as a seed point Reports HTTP status codes different than 200, 304, 204, 206

Crawls on site urls only but will fetch external resources

external resources Intercepts site resources and saves that too

Generates mpa/sitemap.txt and mpa/sitemap.xml

and Saves site sourcemaps

Can resume if process exit, save checkpoint every 250 urls

to consider