Pages

Monday, February 10, 2014

Fedora/Hydra/Blacklight on SmartOS (Part 1; Background, Research & Planning)

Software Background

ContentDM is a digital collection asset management software used by a majority of libraries. For a complete list of features etc. please take a look at their informational page.

An alternative to using ContentDM is to use a combination of FOSS projects:

  1. Fedora Commons: Digital content management
  2. Solr: Enterprise search platform (this will require a separate server)
  3. Project Blacklight: Discovery interface for Solr
  4. The Hydra Project: Front end component

Planning & Research

My research on the infrastructure used for these alternatives has pointed towards a simple operating system architecture of any number of linux distributions.

Currently our ContentDM infrastructure was given a five year life cycle. That was ten years ago. And due to the large digital collection puts the current system handling approximately ten terabytes. Currently our digital collection consists of a very large image and document library of various formats.

While ten terabytes today is not much, we also have close to sixty terabytes of video objects we wish to integrate.

That would bring the current project storage to a whopping seventy terabytes. The lifecycle for the project is again to be five years, but due to the amount of objects the solution will need to scale exponentially.

From a purely dev-ops perspective the traditional route is to implement a SAN, place a load balancer in front of a group machines, keep them synced, backed up and monitored and your up and running.

This project however will use a different strategy; implement a SAN, use a load balancer, fire up Smart-OS, use a KVM/Zone per application (Fedora/Hydra/Solr/Blacklight), use ACL's per VM instance, point all storage for Fedora (KVM/Zones) to the SAN, keep them synced, backed up and monitored.

The infrastructure

The planned environment:

  1. Scalability for physical hardware:
    • HAProxy can be used in front of any physical Smart-OS VM server and physical Solr servers, see for details
  2. Scalability for KVM/Zones:
    • Fedora VM instances can be added to each server instances quickly as the need arises and can use an NFS share on the SAN
    • Hydra VM instances can also be added to each server instance as new hardware is added to each server (memory, disks etc)
    • Blacklight VM instances can also be spun up when the need arises
  3. Security:
    • Fedora/Blacklight VM instances can be restricted to host based ACL's for internal VM networking only
    • Hydra VM instances can be publicly accessed per each Smart-OS host installation creating an internal DMZ per server

No comments:

Post a Comment