Putting a Librarian's Face on Search

July 26, 2010

When you do a search on the University of Michigan Library's web site, you get not only results from the catalog, web site, online journal and database collections, and more, you also get a librarian who is a subject specialist related to your search term. While the matching is not perfect, it provides a human face on search results. So, for example, if you search for "Kant," in addition to books and databases, you also get the subject specialist librarians for humanities and philosophy. A search for "Jupiter," you get the subject specialists for Astronomy & Astrophysics and Humanities (after all, we don't know if you're searching for the planet or the Roman deity). When we can't make a reliable match to a subject specialist, we provide a link to Ask a Librarian, our reference service.

How does the matching work?

The University of Michigan library has long maintained a database of Library of Congress Call Numbers and "Academic Disciplines" -- which is what we call our subject taxonomy. (You can see it in action in the site's Browse function.) These categories broadly mirror the schools and departments at the University of Michigan. Librarians have assigned Library of Congress call numbers to each academic discipline. This mapping was originally done for our New Books tool so that students and faculty could find out when a new book related to their area of study was acquired by the library. A single call number can be assigned to multiple Academic Disciplines, so a given book could appear in multiple places.

In a site search, we do a special query of the library catalog behind the scenes and get the first 100 catalog results (sorted according to the catalog's relevance ranking algorithms). We sort those results into Academic Disciplines. If more than 25 items are in a single Academic Discipline, we include the subject specialist responsible for that particular area. (We set the threshold at 25 matches to help ensure a relevant match, but a librarian specializing in the "wrong" subject is arguably better than no librarian at all.)

We make the call number-to-academic discipline mapping available on our site at http://www.lib.umich.edu/browse/categories/. There is also an XML version of the mapping free for all to use or adapt.

Posted by Ken Varnum at 04:10 PM. Permalink | Comments (3)

Library Gateway Usability Testing

July 22, 2010

The Usability Group & its Usability Task Force conducted a series of
evaluations of the Library Gateway (http://www.lib.umich.edu/) during the Fall 2009 and Winter 2010 semesters. We used a number of different methods, some new to us, to conduct our evaluations.

Participatory Design
This method was designed to gain a better understanding of which parts of the Gateway users find most and least useful, and to help inform our follow-up evaluations. (Discussed more fully in a later post.)

Card Sorting
This method was designed to help us re-categorize content currently grouped under Services, Departments and Libraries.

For the card sorting, we purchased a license to OptimalSort that would allow us to place a card sorting exercise in front of many individual users. We sent this exercise to all of our Library staff and received 104 responses to the exercise, an excellent rate of return. We also ran group card sorting sessions, a new method for us, with undergraduates and graduate students. Groups of up to 5 people sorted paper cards into categories through consensus.

Several similarities between categories surfaced across the various user groups performing the card sort, whether performing a paper sort or using the online tool.
* Physical Locations: libraries and/or services with a physical location and hours of operation.
* Publishing: MPublishing, SPO and University of Michigan Press.
* Services: a broad category used by all groups which ranged from getting help with library resources to internal services for library staff.
* Administration: background support for library staff or as one student said, “Stuff that students wouldn’t necessarily need.”

As a group, the Task Force also came up with "unified" categories that carried the general scope of the categories suggested by our participants. Our categories were based on the categories the participants created, as well as the comments they made during the card sort. Both the similar groupings and the "unified" categories were suggested as bases for further tests.

Guerrilla Tests
This method was designed a) to help determine the order of the headings on our search results and browse results pages, and b) to fine-tune the contents & labels for our Quick Links section.

We have used this method for many years. We call this "guerrilla testing" because we hope to get quick and short answers to quick and short questions. Five minutes is our goal!

For the search and browse results pages, we found that the section labels were confusing and inconsistent across the results templates, and that there was not enough metadata available for users to make informed choices. Participants in our guerrilla tests also wanted to see sections in a different order (e.g., Databases before Catalog). Our recommendations were to add more metadata to the catalog results (e.g., author, publication information, format) and to change the order on the results pages according to participant consensus.

For the Quick Links section, we found that our Library Outages link (when databases are inactive or not working correctly) was not understood or considered to be useful inside this section. More than half of users also requested the addition of a University-wide Webmail link. The Quick Links section was modified to take into account what we heard from participants.

You may access the full reports of the evaluations:
* Organization of Services, Departments and Libraries: http://www.lib.umich.edu/files/services/usability/libs-svces-depts-card-sort-report.pdf
* Search and Browse Results: http://www.lib.umich.edu/files/services/usability/Search_Browse.pdf
* Quick Links: http://www.lib.umich.edu/files/services/usability/QuickLinks.pdf

We were also fortunate enough to have a poster accepted at ALA Annual 2010 detailing our year's work: "Budget Usability without a Usability Budget".

Many thanks to the Task Force project managers-- Kat Hagedorn & Ken Varnum-- and the group members-- Gillian Mayman, Devon Persing, Val Waldron, Sue Wortman, and Karen Reiman-Sendi-- for all their hard work!

Posted by Kat Hagedorn at 09:48 AM. Permalink | Comments (1)

HathiTrust Digital Library Functionality Enhancements

July 09, 2010

We have recently made a number of significant updates to the HathiTrust Digital Library:

* University of Michigan users (and a number of other HathiTrust partner institutions) can now login to the Digital Library using Shibboleth.
* All users can now download full PDFs of public domain volumes that were not digitized by Google. This currently includes nearly 100,000 Internet Archive-digitized volumes that were contributed by the University of California and thousands of volumes digitized locally by the University of Michigan.
* Authenticated users can now download full PDFs of ALL public domain volumes.
* All users can now add items to public or private collections via the full-text search results pages.

Questions or comments? Submit via the feedback link on the HathiTrust website or via DLPS-help@umich.edu.

Posted by Kat Hagedorn at 04:18 PM. Permalink | Comments (0)

Brief Survey of Digital Library Software Systems

July 08, 2010

DLPS is currently (July 2010) exploring possible avenues for the future development of DLXS. DLXS is a mature and robust digital library information retrieval and repository system in use here at the University of Michigan Library and several other institutions for roughly a decade, with deeper roots back to the early 90's. DLXS has four classes, or components, supporting text (e.g., books), images (e.g., photos), finding aids (e.g., EAD) and bibliographic databases. At Michigan we are hosting over 250 collections with DLXS, which you may wish to visit to get a better sense of its capabilities and the baseline for this survey.

I compiled this brief survey of existing digital library software systems to gain a better understanding of where DLXS fits in the current landscape, and what other systems have to offer. I have included systems that are designed for libraries and which are either comparable or potentially complementary to DLXS. I thought, maybe, I would encounter many I had not heard of, but other than Veridian, and more recently SimpleDL, I did not. I intentionally excluded content management systems, such as Drupal, which have the potential to be implemented as digital library systems, but were not designed for that purpose specifically. I also excluded viable production systems not packaged and distributed as a product, such as what is behind HathiTrust. I mention HathiTrust in particular because Michigan has played a major role in it's development, and we have gained valuable experience that will be useful as we continue with DLXS. If you are interested, there is an emerging HathiTrust Collaborative Development Environment.

You will find minimal, broad stroke, commentary expressed in terms relative to DLXS. If it seems I've mischaracterized a system, or overlooked a key strength, feel free to comment. If you have other solutions to share, please do.

We'll start with DLXS because it was the basis for our exploration. This survey is not comprehensive nor in-depth, but I hope you find it to be useful, and maybe you can help fill-in some of the blanks.

* DLXS: Summary, Features, Technical Details
- Examples from multiple institutions.
- University of Michigan Library: Text Collections, Finding Aids (EAD) Collections, Image Collections, Scholarly Publishing Office Collections
- Developed by the University of Michigan Digital Library (that's us!).
- Strong support for search and display of highly structured XML (which is a rare and powerful feature).
- XPAT search engine has one time license fee. Image collections use MySQL, not XPAT.
- Scales reasonably well.
- Strong as an access system with very good support for collections of content, and searching across multiple collections within a class (text, image, finding aid).
- Similar to DLXS in that it IS DLXS.

* XTF: Summary, Features, Technical Details
- http://www.marktwainproject.org/
- http://www.calisphere.universityofcalifornia.edu/
- http://www.oac.cdlib.org/ (Finding Aids from numerous institutions)
- Developed by CDL (California Digital Library, University of California).
- Replaced DLXS, Greenstone, Dynaweb for CDL.
- Used for text, finding aids, image collections and more.
- Uses Lucene.
- Similar to DLXS in that it is strong as an access system and there is an affinity for presenting content as collections.

* Greenstone: Summary, Features, Same as Summary
Technical Details:
- GreenstoneWiki: documentation for users/content managers.
- Developer's Guide
- Greenstone FAQ
- Collection Size Limitations
- Overview from the developer's point of view: Witten, I.H. and Bainbridge, D. (2007) "A retrospective look at Greenstone: Lessons from the first decade." Proc Joint Conference of Digital Libraries, Vancouver, Canada, pp. 147-156, June.
- Architecture and DTD (See last 2 pages for DTD and internal document format): Witten, I., Bainbridge, D., Paynter, G., & Boddie, S. (2002). Importing Documents and Metadata into Digital Libraries: Requirements Analysis and an Extensible Architecture." In Research and Advanced Technology for Digital Libraries (pp. 219-229).
- General
- The Greenstone discussion list archive is a Greenstone collection.
- Examples of Practical Digital Libraries: Collections Built Internationally Using Greenstone, Witten, Ian H., D-Lib Magazine, March 2003.
- Developed at University of Waikato, Hamilton, New Zealand in cooperation with UNESCO and the Human Info NGO in Belgium.
- Provides GUI desktop applications for building and distributing digital library collections on the Internet and CD-ROM. Also has command line support. Runs in Windows and Mac OS X.
- Great deal of effort was made to support easy installation and configuration.
- Apparently strong support for multiple languages in the system (documentation, application interface, etc.).
- Similar to DLXS in that it is strong as an access system and there is an affinity for presenting content as collections. Different from DLXS (and pretty much everything else) in that it provides a desktop application for building collections. 
- Supports searching across collections.

* ContentDM (commercial): Summary, Features (same as Summary), Technical Details
- Owned by OCLC.
- Same search engine as WorldCat.
- Supports images, newspapers, EAD Finding Aids, audio, video and any other web format.
- Support cross collection searching and cross server searching.
- Option to include metadata in WorldCat for increased visibility.
- Many licensing, hosting, and functionality options.
- Strong as an access system, and better with metadata than full-text.

* DSpace: Summary: Features: Technical Details
- Fedora and DSpace receive stewardship from not-for-profit DuraSpace.
- We use DSpace for the University of Michigan institutional repository, Deep Blue.
- Primarily a repository system, but used in many ways (see examples).
- Provision of functionality for end-user interaction with objects is a weakness.
- Different from DLXS in that it is primarily a repository system.

* Fedora Commons: Summary (General, Structure of DuraSpace), Features, Technical Details (General, Fedora Create Community)
- Fedora and DSpace receive stewardship from not-for-profit DuraSpace.
- Fedora is first and foremost core repository functionality.
- Fedora has a large development community (see Fedora Create Community) working on additional services, frameworks, content models, and more.
- Different from DLXS in that it is primarily a repository system.
- There is some energy currently around Islandora and Hydra as applications for managing and providing access to content in Fedora.
- Can plug in different search systems: mySQL, SOLR, mulgara

* Misc Systems
- Veridian  (commercial)
- Cumulus digital asset management - mostly for images (commercial)
- Luna Insight - specific for images (commercial)
- JSTOR - all journals
- ArtSTOR - all images
- EPrints - institutional repository deposit, similar to DSpace
- bepress - ditto
- OJS - journals, but no customization
- SPO has looked at Drupal, OJS, WordPress (the latter promising)
- SimpleDL  (commercial)
- raven.scholarslab.org: Interesting example of how Solr and XSLT can be used to achieve the desired level of search granularity. XML is split in to different types of Solr documents as needed, and client XML/XSLT libraries are used to provide more granular search results on a per-page basis. From the TEI List.
- Acumen (Deserves a closer look.)
- Omeka (Deserves a closer look.)
- Blacklight (Deserves a closer look.)

* Log of updates to this posting
- Added placeholders for Omeka and Acumen and Blacklight. (7/9/2010)
- I originally wrote that XTF uses Lucene/Solr, but it uses Lucene, not Solr. Corrected above. (7/9/2010)

Posted by John Weise at 01:54 PM. Permalink | Comments (0)

Making Personal Collections from Large Scale Search Results

July 07, 2010

We just released a new feature in our full-text Large Scale Search. When you do a search,you will see check boxes next to each search result. You can select items you want from the search results and create a personal collection. This should make it much easier to do repeated searches and explore a targeted subset of the HathiTrust volumes. If you are not logged in, the collection will be temporary. If you log in you can save the collection permanently. This enables users to do focused searching within a selected subset of search results.

Posted by Tom Burton-West at 01:41 PM. Permalink | Comments (0)