Hanzo Archives is a web archiving company selling to global corporations, who use our products and services to capture, archive, preserve, and make discoverable web-based electronically stored information (ESI) in native format. Their needs are primarily driven by eDiscovery, information governance and heritage requirements. Our operations are based in UK and USA.

Hanzo has implemented the entire technology stack required to capture and archive the modern web with a sophisticated crawler at its core. This job is at the heart of crawler operations associated with a large research project on collection, archiving and preservation of linked open data. We are looking for bright, enthusiastic, self-motivated, self-learning developer with an interest in devops, and enthusiasm for big data, linked open data, semantic web.

Salary is negotiable depending on experience. The role is home-based, with the possibility of office-based in Bristol, UK.

To find out more or apply for this job, please email Shuba Rao at, include a cover note, your CV, and a link to your work.

Job Description

Reporting to the Director – Service Delivery, the Devops / Crawl Engineer will be a significant contributor to a research project involving the collection, archiving and preservation of datasets / linked open data. The project is scheduled to run into 2016, after which the technology and expertise will be transferred to Hanzo’s commercial offering.

Candidates should have 2-3 years industry experience, must be willing to learn on the job, and have a strong interest in problem solving, diagnostics, operations and systems, and be willing to work in a busy and challenging environment. Candidates must demonstrate experience in Python and Javascript, good knowledge of the workings of the web, solid Unix / Linux skills, and scripting with command line tools like Find, Grep and Awk.

The role will include collaboration with partner organisations around Europe and Hanzo’s engineering and crawl operations in UK and USA. Technical aspects include development of crawler plug-ins, API integration, crawl operations for pilot projects and customers, and evaluating and documenting research products.

Roles and Responsibilities:

  • Run crawler operations, including configuring crawls, diagnosing and resolving issues
  • Develop crawler plugins for datasets, with data identification and extraction
  • Collaborate with research partners on integrating data analysis tools and systems
  • Translate feedback from research partners and operations into software development
  • Maintain and enhance existing software (both internal products and open source)
  • Communicate systematically and at the right time
  • Work proactively, enthusiastically seeking problems in the software and systems and finding solutions
  • Be responsible for completing time-critical day-to-day tasks
  • Solve problems independently and as part of a team

Skills and Abilities Required for the Role:

  • Diagnose technical problems effectively
  • Work effectively on product development, research projects, crawl operations, and data analysis
  • Document software rigorously
  • Communicate and collaborate effectively
  • Offer or ask for advice when needed
  • Work remotely and with geographically dispersed teams

Person Specification:

  • Computer Science degree
  • 2-3 years industry experience
  • Understand technically complex systems and ideas
  • Write quality code
  • Understand and work with other people’s code
  • Solve technical and operational problems
  • Python and Javascript
  • Regular Expressions
  • Unix / Linux, including scripting with tools like grep, find and awk
  • In depth understanding of HTTP and web
  • Knowledge of, or a willingness to learn, semantic web, RDF, linked open data, etc.
  • Responsible and self-motivated
  • Eager to learn, teach, and solve problems