Open Source Projects

Hanzo WARC Tools

The main goal of WARC Tools is to facilitate and promote the adoption of the WARC file format for storing web archives by the mainstream web development community by providing an open source software library, a set of command line tools, web server plug-ins and technical documentation for manipulation and management of web archive files, or WARC files.

WARC files are produced by web archiving crawlers, such as Heritrix, the open-source, extensible, Web-scale, archiving quality Web crawler developed by the Internet Archive with the Nordic National Libraries, and Hanzo’s own commercial crawlers.

The project is lead by Hanzo Archives, in collaboration with Internet Archive Web team, and supported by the International Internet Preservation Consortium (IIPC).

WARC Tools are implemented in a set of core libraries, and the functionality is made available to end users as command line tools, extensions to existing tools, and simple web applications for accessing WARC content. In addition all the libraries have APIs and dynamic language bindings are made available as software libraries for developers.

The library and tools are mostly implemented in ANSI C and is highly portable, with build/installation on various Linux and Unix distributions, as well as Windows, together with unix man pages, build and installation guides, developer guides, etc.

Hanzo Search Tools

The main goal of Search Tools is to facilitate and promote the adoption of the WARC file format for storing web archives by the mainstream web development community by providing an open source software library, a set of command line tools, web server plug-ins and technical documentation for full-text and metadata search of web archive files, or WARC files.

The project is lead by Hanzo Archives.

Hanzo s3 Tools

A set of *nix like command line tools for working with Amazon Web Services s3. They are backed by a simple python library that can be used to work with the material you have uploaded to S3.