Archiving Internal Web Content


wood-3.pngWhen archiving web content, we use different methods depending on the type of web content. Each client may require a different service to suit their needs, so here at Hanzo we’ve developed software with sufficient breadth for various types of web archiving. In this blog post we’re going to take a look at internal web content and how go about archiving it.

What is ‘internal’ web content?

Internal web content can be a ‘type’ of content or content in a specific location or use-case. For example, in our experience, internal web content is password-protected websites, dynamic websites and form results, internal company resources such as wiki’s, SharePoint and business social networks. More simply: your company intranet.

A great example of how we archive internal content is the work we did for LOCOG(London Organising Committee of the Olympic and Paralympic Games). The committee had a secure SharePoint site that they wished to be archived. It was used to organize the games and includes web pages, media, wiki’s, shared collaborative workspaces, projects and documents.

How do we do it?

Hanzo’s crawlers were configured with the necessary security credentials to access the site in a secure way to make captures of the site. The advantage that Hanzo has over other archiving products is that it can utilize a broad range of security systems frequenty needed for access to internal web content. Furthermore, Hanzo does not just make a capture of documents and associated data in SharePoint, Hanzo makes a working replica of the site with all the content, media, documentation and information presented as they appeared on the live site itself.

Our software’s crawlers are configured to the requirements of the client and are deployed to the site in the same way a person would access the site. They capture the site content and write it to ISO 28500:2009 WARC files, before being scanned for viruses and malware. Afterwards reports and indexes are created based on the captured content.

One index consists various metadata fields and a full text index is also produced for traditional search and discovery purposes. Together with the WARC files, the indexes are used by Hanzo’s access software to make the captured material available to users.

This access control system gives the client control over archived content and system functionality, but it can also allow varying permission levels.

So despite the sophisticated nature of today’s collaborative systems, wikis, and intranets, Hanzo can accurately preserve websites of this nature and all of their complicated inner workings. In LOCOG’s case, an archived SharePoint website is an asset for both legal and historical reasons. This is something that many companies may benefit from.

Find out more…

Hanzo’s range of products can archive a variety of website and social media content, giving you protection against litigation. Get in contact with us to discuss your web-archiving needs.

About The Author