NIH/NLM Oral Histories Served by DLXS

December 07, 2010

As the creators of DLXS, we're quite impressed with their adaptation of Text Class by NLM/NIH to provide access to oral histories. The following is an excerpt from the press release.

The National Library of Medicine's History of Medicine Division is pleased to announce the release of a new Web interface (http://www.nlm.nih.gov/hmd/manuscripts/oh.html) to its oral history collections, as part of its growing electronic texts program. Content includes digital editions of transcripts and any accompanying audio content when feasible. Users can browse content by title, interviewee name, and subject. Full-text searching is available across all sub-collections, across each sub-collection, and within each transcript.

Browse the Archives and Modern Manuscripts Oral Histories to jump right into the audio content.

Posted by John Weise at 09:33 AM. Permalink | Comments (0)

New List of Digital Collections

September 17, 2010

The Digital Library Production Service or DLPS (part of our MLibrary Library Information Technology division) has created and hosted a very large number of digital collections over the last 10+ years. We have been working for many years to integrate those collections into MLibrary services, and we are now ready to present the next link in this chain-- a more easily navigable and more fully featured list of these collections:

http://quod.lib.umich.edu/lib/colllist/

The road to building this collection list has been long and sometimes difficult. Four years ago, Suzanne Chapman and John Weise discovered the Exhibit tool (part of the SIMILE set of tools from MIT Libraries). They worked through many of the initial hurdles regarding how, where, and what we host using this tool. After a languishing period, it was revitalized by Kat Hagedorn, Jose Blanco, Roger Espinosa and Saurabh Koparkar to its current state. We would not have gotten half so far without two ULAs who kindly volunteered their services to help categorize, describe and find image thumbnails for each collection-- Ellen Wilson and Lorelei Rutledge. Many hearty thanks to all involved in the process of putting this tool together.

The new collection list replaces the the old, which was created originally to gather administrative information about each collection, including the responsible party, and statistics about size and usage. Public discovery and usage of the list was not intended nor anticipated, but the list became a popular public access point for DLPS collections. The old list can still be seen: http://quod.lib.umich.edu/cgi/c/collsize/collsize. Better, no?

Posted by Kat Hagedorn at 02:09 PM. Permalink | Comments (0)

Brief Survey of Digital Library Software Systems

July 08, 2010

DLPS is currently (July 2010) exploring possible avenues for the future development of DLXS. DLXS is a mature and robust digital library information retrieval and repository system in use here at the University of Michigan Library and several other institutions for roughly a decade, with deeper roots back to the early 90's. DLXS has four classes, or components, supporting text (e.g., books), images (e.g., photos), finding aids (e.g., EAD) and bibliographic databases. At Michigan we are hosting over 250 collections with DLXS, which you may wish to visit to get a better sense of its capabilities and the baseline for this survey.

I compiled this brief survey of existing digital library software systems to gain a better understanding of where DLXS fits in the current landscape, and what other systems have to offer. I have included systems that are designed for libraries and which are either comparable or potentially complementary to DLXS. I thought, maybe, I would encounter many I had not heard of, but other than Veridian, and more recently SimpleDL, I did not. I intentionally excluded content management systems, such as Drupal, which have the potential to be implemented as digital library systems, but were not designed for that purpose specifically. I also excluded viable production systems not packaged and distributed as a product, such as what is behind HathiTrust. I mention HathiTrust in particular because Michigan has played a major role in it's development, and we have gained valuable experience that will be useful as we continue with DLXS. If you are interested, there is an emerging HathiTrust Collaborative Development Environment.

You will find minimal, broad stroke, commentary expressed in terms relative to DLXS. If it seems I've mischaracterized a system, or overlooked a key strength, feel free to comment. If you have other solutions to share, please do.

We'll start with DLXS because it was the basis for our exploration. This survey is not comprehensive nor in-depth, but I hope you find it to be useful, and maybe you can help fill-in some of the blanks.

* DLXS: Summary, Features, Technical Details
Examples:
- Examples from multiple institutions.
- University of Michigan Library: Text Collections, Finding Aids (EAD) Collections, Image Collections, Scholarly Publishing Office Collections
Notes:
- Developed by the University of Michigan Digital Library (that's us!).
- Strong support for search and display of highly structured XML (which is a rare and powerful feature).
- XPAT search engine has one time license fee. Image collections use MySQL, not XPAT.
- Scales reasonably well.
- Strong as an access system with very good support for collections of content, and searching across multiple collections within a class (text, image, finding aid).
- Similar to DLXS in that it IS DLXS.

* XTF: Summary, Features, Technical Details
Examples:
- http://www.marktwainproject.org/
- http://www.calisphere.universityofcalifornia.edu/
- http://www.oac.cdlib.org/ (Finding Aids from numerous institutions)
Notes:
- Developed by CDL (California Digital Library, University of California).
- Replaced DLXS, Greenstone, Dynaweb for CDL.
- Used for text, finding aids, image collections and more.
- Uses Lucene.
- Similar to DLXS in that it is strong as an access system and there is an affinity for presenting content as collections.

* Greenstone: Summary, Features, Same as Summary
Technical Details:
- GreenstoneWiki: documentation for users/content managers.
- Developer's Guide
- Greenstone FAQ
- Collection Size Limitations
- Overview from the developer's point of view: Witten, I.H. and Bainbridge, D. (2007) "A retrospective look at Greenstone: Lessons from the first decade." Proc Joint Conference of Digital Libraries, Vancouver, Canada, pp. 147-156, June.
- Architecture and DTD (See last 2 pages for DTD and internal document format): Witten, I., Bainbridge, D., Paynter, G., & Boddie, S. (2002). Importing Documents and Metadata into Digital Libraries: Requirements Analysis and an Extensible Architecture." In Research and Advanced Technology for Digital Libraries (pp. 219-229).
Examples:
- General
- The Greenstone discussion list archive is a Greenstone collection.
- Examples of Practical Digital Libraries: Collections Built Internationally Using Greenstone, Witten, Ian H., D-Lib Magazine, March 2003.
Notes:
- Developed at University of Waikato, Hamilton, New Zealand in cooperation with UNESCO and the Human Info NGO in Belgium.
- Provides GUI desktop applications for building and distributing digital library collections on the Internet and CD-ROM. Also has command line support. Runs in Windows and Mac OS X.
- Great deal of effort was made to support easy installation and configuration.
- Apparently strong support for multiple languages in the system (documentation, application interface, etc.).
- Similar to DLXS in that it is strong as an access system and there is an affinity for presenting content as collections. Different from DLXS (and pretty much everything else) in that it provides a desktop application for building collections. 
- Supports searching across collections.

* ContentDM (commercial): Summary, Features (same as Summary), Technical Details
Examples
Notes:
- Owned by OCLC.
- Same search engine as WorldCat.
- Supports images, newspapers, EAD Finding Aids, audio, video and any other web format.
- Support cross collection searching and cross server searching.
- Option to include metadata in WorldCat for increased visibility.
- Many licensing, hosting, and functionality options.
- Strong as an access system, and better with metadata than full-text.

* DSpace: Summary: Features: Technical Details
Examples
Notes:
- Fedora and DSpace receive stewardship from not-for-profit DuraSpace.
- We use DSpace for the University of Michigan institutional repository, Deep Blue.
- Primarily a repository system, but used in many ways (see examples).
- Provision of functionality for end-user interaction with objects is a weakness.
- Different from DLXS in that it is primarily a repository system.

* Fedora Commons: Summary (General, Structure of DuraSpace), Features, Technical Details (General, Fedora Create Community)
Examples
Notes
- Fedora and DSpace receive stewardship from not-for-profit DuraSpace.
- Fedora is first and foremost core repository functionality.
- Fedora has a large development community (see Fedora Create Community) working on additional services, frameworks, content models, and more.
- Different from DLXS in that it is primarily a repository system.
- There is some energy currently around Islandora and Hydra as applications for managing and providing access to content in Fedora.
- Can plug in different search systems: mySQL, SOLR, mulgara

* Misc Systems
- Veridian  (commercial)
- Cumulus digital asset management - mostly for images (commercial)
- Luna Insight - specific for images (commercial)
- JSTOR - all journals
- ArtSTOR - all images
- EPrints - institutional repository deposit, similar to DSpace
- bepress - ditto
- OJS - journals, but no customization
- SPO has looked at Drupal, OJS, WordPress (the latter promising)
- SimpleDL  (commercial)
- raven.scholarslab.org: Interesting example of how Solr and XSLT can be used to achieve the desired level of search granularity. XML is split in to different types of Solr documents as needed, and client XML/XSLT libraries are used to provide more granular search results on a per-page basis. From the TEI List.
- Acumen (Deserves a closer look.)
- Omeka (Deserves a closer look.)
- Blacklight (Deserves a closer look.)

* Log of updates to this posting
- Added placeholders for Omeka and Acumen and Blacklight. (7/9/2010)
- I originally wrote that XTF uses Lucene/Solr, but it uses Lucene, not Solr. Corrected above. (7/9/2010)

Posted by John Weise at 01:54 PM. Permalink | Comments (0)