The Hanzo Blog
30 Apr 10
Professional Services Engineer Wanted
Note to agencies: Do not contact us about this vacancy. If we want to use an agency, we'll contact you!
About the Company
We are an innovative company, providing web archiving services across various sectors in the UK and USA. Our clients range from government institutions to Fortune 100 businesses, including globally recognised brands such as The Coca-Cola Company.
Job Summary
Reporting to the Chief Technical Officer, the Professional Services Engineer will primarily work alongside clients in delivering our software solution. You will work on the deployment and support of our advanced archival crawler, crawler tools, and the technical side of the archiving operations for customers. You will ensure the delivery of high-quality services to our customers. This will include writing plans and documentation, operating and supporting our software at a technical level, working within client guidelines and contractual arrangements.
How to Apply
Please forward a covering letter stating your interest and suitability along with your CV and contact details (such as home address, mobile number, email, IM/skype) to:
Interviews will be held in August/September 2010.
30 Apr 10
Retaining Control of Social Media
In the near future, a single dedicated website will seem as old fashioned as newspaper and television ads. The cloud and the social web are forcing all our data off our own machines and onto those of bigger companies. Sometimes it’s just because it’s less hassle - who wants to be operating a data center anyway - and at other times because we want to take advantage of the ability of social media to engage with our customers. We want to hear what they have to say, get instant feedback on our products and services.
Of course this causes all sorts of problems: when everything was on your own machines you might just have been able to keep track of it. Use backups to see what you did in the past, and keep carefully constructed processes to see what you published and keep appropriate records. Now you are publishing to Facebook and Twitter, commenting on blogs, and who knows what else, and that makes keeping records, backups and so on an impossibility.
This is where web archiving can let you regain control. By monitoring and archiving every little thing you put out onto the web and by archiving the links between all of these things (and your customers) you can build a comprehensive one-stop collection of everything you do. You can trackback over time to see what you said and when. If you have compliance obligations or legal issues you can “dial back” and see exactly what was communicated when and how. Your marketing team can see how they presented things at any given time and use it as inspiration or to help them stay fresh. You can keep and search records of everything your customers say about you on social media sites. In short - returning ownership of vital corporate information to you and not leaving it in the hands of third parties like Google and Facebook and Twitter.
28 Apr 10
Authentication and Admissibility of Website Printouts
A customer’s attorney asked us to outline admissibility issues concerning printouts directly from the web.
Printouts directly from the web are easily challenged because there is no inherent authentication. A printout can be easily fabricated, requiring little skill. Critically a printout (whether on paper, a PDF, or a screenshot/image of a page) does not contain forensic information that can be used as proof of authenticity. Furthermore, the presence of a web page on the live web at the same URL as displayed in a printout may or may not display the same information as the printout, as such it is not proof of authenticity of the printout. There is no tangible connection between a printout and the live web page. A printout, PDF or screenshot may easily be tampered with to conform / or not conform with the web page. I would argue strongly that a printout of a live web page is the weakest form of evidence there is.
If the printout (or PDF or screenshot) is of an archived web page from Hanzo Archives, then it can be shown that the printout can be re-produced time and again, on demand, because the archive is properly preserved. There is a strong link between the printout and the archive. Moreover the archive has a chain of evidence that proves it is an authentic capture of the live page at a particular point in time, and that the archive data has not been tampered with since the archive was created. In this way, a printout, backed up by a proper archive, is a strong form of evidence.
The strongest evidence of course is the archive itself, provided it is in native form, with all the attending forensic information, such as timestamps, digests, and so on.
28 Apr 10
Web Archive Provides Instant Access
Happy to hear that in the first quarter of deployment, a FINRA-regulated client put their web archive to good use following a recent incident concerning the display of incorrect fund data. Their inside counsel were pleasantly surprised to be able to review archived websites captured daily over the last month, and to be able to identify each website and page affected from the day the error occurred through to its resolution. Instantly - no back-up tapes to find, no restores, no delays.
Interesting ROI there.
11 Jan 10
Hanzo at LegalTech NY 2010
Hanzo will be at LegalTech NY 2010, exhibiting in booth 536.
Visit us to find out why leading banks choose Hanzo to archive their websites for FINRA and SEC 17a-4 compliance, or why the worlds most successful brand use Hanzo to archive their online branding and promotions activities.
If you require FINRA or SEC compliance for your websites, or if your websites are included in your litigation readiness plans, visit Hanzo in booth 536 to find out how.
13 Nov 09
Hanzo takes archiving of websites to a forensic level
Bruce Wilson is a law, technology, and business development consultant whose experience includes leadership roles in business consulting, law and IT. Following our meeting at LegalTech NY in February, and a couple of conversations over this fall, Bruce has just published this great interview:
http://wilsonig.com/2009/11/12/forensic-archiving-and-search-of-web-2-0-sites/
Thanks Bruce!
01 Sep 09
Python Developers Wanted
Note to agencies: Do not contact us about this vacancy as we do not use agencies as a rule. If we ever do, we’ll contact you!
About the Company
Hanzo Archives Limited is a small, cutting-edge web archiving software and service company providing website e-discovery and compliance solutions to the corporate Global 500 market. Founded and currently operating in Europe, we are now set to expand into the North American markets following early commercial successes.
Job Summary
Reporting to the Chief Technical Officer, the python software developer will primarily help with the development of our advanced archival crawler, crawler tools, and the technical side of the archiving operations for customers. The software engineer will ensure high-quality software products are produced and we continue to deliver innovative and high-quality services. This will include writing software products and tools to help with these tasks.
How to Apply
Please forward a covering letter stating your interest and suitability along with your CV and contact details (such as home address, mobile number, email, IM/skype) to:
Interviews will be held in London in September/October 2009.
Job Description
Roles and Responsibilities
- Maintain and enhance existing software (both internal products and our open source projects)
- Work on crawler operations, diagnosing crawler short-comings and working to resolve them
- Rigorously document your work
- Designing applications to capture archive material, including dynamic and rich media material, from the web
- Able to translate the feedback from operations into software development to enhance our technology-base
- Problem-solving and thinking laterally , both individually and as part of a team, to meet the needs of the company
- Communicate systematically and at the right time
- Work proactively and enthusiastically seek problems in the software and systems and find solutions
Skills and Abilities Required for the Role
- Ability to diagnose technical problems effectively
- Ability to work in a startup environment and work on any, sometimes disparate, tasks that need to be completed in a timely manner
- Ability to document software rigorously
- Ability to work with and without supervision
- Ability to be a team player
- Ability to communicate and ask for advice when needed
- Ability to actively seek problems and find solutions
Person Specification - Essential
- Demonstrated ability to write quality code
- Demonstrated ability to understand and work with other people’s code
- Demonstrated ability to solve technical problems
- Experience in Python and Javascript
- Experience in C an advantage
- Understanding of Regular Expressions
- Strong Unix or Linux background including knowledge of tools like grep, find and awk
- In depth understanding of HTTP and web
- Ability to write clearly
- Ability to be responsible and self-motivated
- Eagerness to learn and solve problems
- Willing to firefight where necessary
Education/Qualifications and Experience
- A minimum of a Bachelors degree in a computing related subject
- 1-3 years experience within the software or web industry
17 Jun 09
Job Vacancy: Contract Crawl Engineer
Hanzo are seeking bright, enthusiastic, self motivated software engineers with strong problem solving skills and who love a challenge and are enthusiastic about working on the delivery of our highly technical crawler operations and development of products. A proven ability in Python and Javascript and knowledge of the workings of the web is essential. Strong Unix or Linux skills including scripting with command line tools like Find, Grep and Awk will be important.
This vacancy is closed. Thanks!
25 Mar 09
Celebrating Ada Lovelace Day
Today I’d like to celebrate Ada Lovelace Day with a brief mention of these great women in technology:
- Kris Carpenter, Director, Web Archive
- Kristine Hanna, Director, Web Archiving Services (she was co-founder of GeekGirls too for goodness sake)
- Molly Bragg, Partner Specialist, Web Archiving Services
These creative and brilliant women work tirelessly to collect and preserve the public web for our good friends, and yours too incidentally, the Internet Archive, whose simple motto sums up their contribution to technology and society so well: “Universal Access to All Human Knowledge (for Free, for Ever).”
from Mark
17 Mar 09
World Wide Web of Humanities Presentation at University of Oxford
Mark Middleton of Hanzo will present Search and Analysis of Data in WWWoH at the “Humanities on the Web: Is it working?” workshop at the Tsuzuki Lecture Theatre, St Anne’s College, Oxford, on 19 March 2009. This presentation is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License.
The presentation is a summary of our open source Search Tools project. A demo of the search tools is here.
Permissions beyond the scope of this license may be available, please contact us for more information.
12 Mar 09
Hanzo Archives show web archiving at LegalTech
From Hanzo Archives show web archiving at LegalTech « Chris Dale Lawyer Support
Short but absolutely on-the-button summary of Hanzo's selling message on Chris Dale's blog. Which, incidentally, is one of the few must-read blogs on e-discovery from the UK.
With web content now under greater scrutiny, web content collection and preservation for compliance and litigation should be tailored into records management practices. Hanzo's Webhold and Hanzo Enterprise products meets these requirements.
10 Feb 09
Hanzo at LegalTech NY 2009
Hanzo exhibited at LegalTech NY in Feb 2009. Here's what we learned:
- Lawyers, litigation support people, and records managers didn't know it was possible to archive websites in any way other than backup tapes
- Backup tapes are a nightmare when it comes to reviewing or producing website content
- These same people also didn't know it was possible to review and produce archived websites in native form, still browsable and searchable
- Intranet archiving is a HOT topic for the larger corporations
- Records managers are our friends
I hope to write more on these topics on our website and blog.
02 Dec 08
How Websites Differ From Other Electronic Files
Regarding my previous post, covering Judge Hedges decision:
From Hanzo Archives - Finding “No Reason to Treat Websites Differently than Other Electronic Files,†Court Grants Adverse Inference for Failure to Preserve Website : Electronic Discovery Law
I think it is important to recognise that while Judge Hedges is absolutely right from a legal perspective, it doesn't automatically follow from a technical perspective. This needs some explaining.
Websites are compound, complex, interconnected and hyperlinked collections of compound, complex, interconnected and hyperlinked documents. Lots of moving parts (and syllables).
This is quite different to other ESI. Consider, for example, an electronic file, such as a word or excel document, corresponds to a single document; an email is an envelope containing a single message, with metadata and attachments. A web-based document on the other hand, more often than not consists of many files: an html page, javascript code, style sheet(s), images, embedded media (possibly streaming), and links to different parts of the html file, other html files or documents, or other websites. Websites are not the same!
At the human level, Courts view web-based documents the same as your average human reader, i.e. the compound document described above, not the individual component parts. To preserve such documents in their native form, it is necessary to collect all the components parts correctly and store each and all of them unchanged.
As such, file-oriented methods for preservation are clearly inappropriate for websites.
30 Oct 08
Websites Are Like Any Other ESI
Re: Arteria Prop. Pty Ltd. v. Universal Funding V.T.O., Inc., 2008 WL 4513696 (D.N.J. Oct. 1, 2008):
From Finding "No Reason to Treat Websites Differently than Other Electronic Files," Court Grants Adverse Inference for Failure to Preserve Website : Electronic Discovery Law
This is a great decision for Hanzo customers. Here are the key issues raised:
- You are responsible for your website, no matter who maintains it or hosts it, it is your responsibility
- If you reasonably anticipate litigation you are required to preserve your website - "litigation hold"
This decision clearly underlines our product strategy.
We've designed our web archiving tool for exactly this scenario. As responsible owner of your websites, you should have records and information management policies in place to systematically archive your websites -- a "web archive". You can't rely on the developers or agencies involved in its development or hosting.
Hanzo archives any number of websites, from multiple URLs, CMS, databases and technologies, according to an agreed archive policy, and stores them in a secure, authenticating web archive. The web archive is an independent store for all your website content, enabling you to retain them according to your information management policies. This requires no additional effort by or consent from your developers, website designers, marketing agencies or hosting partners.
Secondly, the web archive provides a litigation hold for any or all of the web archive content. Moreover it is fully browsable, searchable and exportable, enabling discovery of your web resources in a fraction of the time taken using traditional preservation methods.
A more complete description of the case and the decision are on the Electronic Discovery Law blog.
21 Oct 08
Producer required to re-produce .TIFF documents in a “reasonably searchable” format
Producing web resources in native format can be burdensome. But we’ve changed that dramatically with our web e-discovery products. Don’t expect to get away with this anymore…
Hanzo uses client-side archiving technology, including proprietary web crawlers, API’s and plug-ins, that enable preservation of web resources in their native format: exactly the same format presented to browsers.
These resources are stored in archive files together with metadata verifying their authenticity. The archive files are ingested into a web archive and indexed. These can then be browsed the same way as the original website, along any captured timeline, and searched across full text, metadata and time.
More information on this is in our white paper “E-discovery: Why Archiving Your Web Presence is a Business Necessity”.
If a Web page, blog, thread in a customer forum, or your whole website were required by the courts, how would you be able to obtain the exact version required and present it as it was originally? How would you verify its authenticity? As regulations concerning corporate records and e-discovery proceedings are extended to include Web content, can you be certain you are compliant?
This white paper looks at compliance and e-discovery issues relating to Web content, and assesses the technologies you need to archive your Total Web Presence.
