All the news
20 Feb 08
Hanzo releases open source WARC Tools
With the generous support of the International Internet Preservation Consortium (IIPC), the people behind the worlds largest and most comprehensive web archives, Hanzo has developed an extensive open source library and tools for the creation and manipulation of ISO standard compliant WARC-based web archives.
As the new ISO standard for WARC (Web Archive file format) nears completion, Hanzo's release of an advanced software library, 'libwarc', together with command line tools and web services to exploit this new format, will enable software developers, libraries and archivists around the world to easily migrate to and support this new standard.
WARC Tools have been made available to the development community through the Apache 2 license; download the source code, read documentation and join the discussion here:
- WARC Tools Project Home: http://code.google.com/p/warc-tools
- WARC Tools Mailing List: http://groups.google.com/group/warc-tools
WARC Tools core component is 'libwarc', an extensible and fast C library:
- Read WARC's predecessor format, ARC, originating from the Internet Archive and used extensively throughout the world
- Read WARC files
- Write WARC files
- Provide a range of powerful record-level iterators
- Provide SWIG-based APIs for a range of dynamic languages and Java
In addition, there are a number of command line tools, for high level functions:
- ARC to WARC migration -- solving a problem facing many of the original web archiving institutions
- WARC verification
- WARC web service -- access the functionality of libwarc over the web
- mod_warc -- fast web access to WARC records, enabling WARC archive access tools development
Developing the WARC Tools is in line with Hanzo's aim of commodising web archiving software and services to enable the the widespread collection and preservation of Web content.
"Hanzo began development of libwarc as a means to embrace the emerging standard for web archive files, thereby enabling seamless interoperability accross web archives at a fundamental level." said Mark Middleton, CEO of Hanzo Archives Limited.
"IIPC are happy to support the development of WARC Tools. This is an extremely important development for the web archiving community as the tools make a significant contribution to the overall standardisation effort and de-risk the migration of important archives to the new standard.", said Gildas Illien, Technical Officer of IIPC.
Hanzo continue to develop the library and tools, focussing on documentation, high-level command line and web-based services. In addition, Hanzo will incorporate libwarc within its own products in the coming months.
WARC Tools are important to the web archiving community as they will make a significant contribution to the standardisation effort and de-risk the migration of important archives to the new standard.