Open Source Projects
Hanzo Archives develops leading-edge commercial web archiving tools and services, some of which has been released as open source. The first project to be release is Hanzo WARC Tools.
Hanzo WARC Tools is an open source implementation of WARC Tools, to facilitate and promote the adoption of the ISO 28500 Standard WARC file format for storing web archives by the mainstream web development community. This open source software library comprises a set of command line tools and Python scripts for manipulation and management of WARC files.
WARC files are produced by web archiving crawlers, such as Hanzo’s crawler, and Heritrix, the open-source crawler developed by the Internet Archive.
The project is led by Hanzo Archives, in collaboration with Internet Archive Web team, and supported by the International Internet Preservation Consortium (IIPC).
More information about this project can be found on the Hanzo WARC Tools Project Wiki.