<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>[BLT] Blog for Library Technology</title>
<link>http://mblog.lib.umich.edu/blt/</link>
<description>Food for thought for library technologists</description>
<language>en</language>
<copyright>Copyright 2008</copyright>
<lastBuildDate>Wed, 03 Sep 2008 11:50:17 -0500</lastBuildDate>
<generator>http://www.movabletype.org/?v=3.17</generator>
<docs>http://blogs.law.harvard.edu/tech/rss</docs> 

<item>
<title>MBooks is now HathiTrust</title>
<description><![CDATA[<p>MBooks is becoming HathiTrust. See the new website for more information: <a href="http://www.hathitrust.org">http://www.hathitrust.org</a>.</p>
<p>Roy Tennant has already <a href="http://www.libraryjournal.com/blog/1090000309/post/260032226.html">commented</a>.</p>
]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/09/mbooks_is_now_h.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/09/mbooks_is_now_h.html</guid>
<category>HathiTrust</category>
<pubDate>Wed, 03 Sep 2008 11:50:17 -0500</pubDate>
</item>
<item>
<title>Searching for MBooks in Mirlyn</title>
<description><![CDATA[<p>There are three ways to find MBooks in <a href="http://mirlyn.lib.umich.edu">Mirlyn</a>, the U-M online catalog:</p>
<blockquote>
<p>1. Click on "Find Other Library Catalogs" in the upper right side of the Mirlyn screen, and you'll see the entry for MBooks/HathiTrust in the center of the page.</p>

<p>2. Limit searches in Advanced Search to "MBooks only" using the checkbox.</p>

<p>3. In Command Language, search for "wct=mdp"</p>
</blockquote>
<p>You may have noticed that many MBooks records contain this reproduction note:</p>

<blockquote>
<p>Electronic text and image data Ann Arbor, Mich. : University of Michigan Library 2008 Includes both image files and keyword searchable text. [Michigan Digitization Project]</p>
</blockquote>
<p>These notes are going away. Searching on the phrase "michigan digitization project" in Mirlyn no longer retrieves all MBooks. Instead, use one of the methods described above.</p>

<p>Finally, we have gotten questions about items in Mirlyn with links to Google Book Search, but no link to MBooks. This occurs when Google digitizes a book from another source before they digitize our copy. An example of this can be found (for the moment) in <a href="http://mirlyn.lib.umich.edu:80/F/?func=direct&doc_number=005683315&local_base=AA_PUB">this record</a>. The link to GBS is created using <a href="http://mblog.lib.umich.edu/blt/archives/2008/06/google_links_in.html">Google's API</a>. Eventually, Google will digitize the U-M copy and a link to MBooks will appear in Mirlyn.</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/08/searching_for_m.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/08/searching_for_m.html</guid>
<category>MBooks</category>
<pubDate>Fri, 15 Aug 2008 08:14:21 -0500</pubDate>
</item>
<item>
<title>Languages in MBooks</title>
<description><![CDATA[<p>Many people have asked us about the languages available in MBooks. In  particular, they want to know if Google is providing searchable text for non-Western languages or difficult scripts. Most Western European languages have been available from the beginning of the project, but here are some examples of books in languages that Google has added in the past few years:</p>

<p>* Chinese: <a href="http://hdl.handle.net/2027/mdp.39015055131992">http://hdl.handle.net/2027/mdp.39015055131992</a><br />
* Japanese: <a href="http://hdl.handle.net/2027/mdp.39015067188378">http://hdl.handle.net/2027/mdp.39015067188378</a><br />
* Hebrew: <a href="http://hdl.handle.net/2027/mdp.39015019327512">http://hdl.handle.net/2027/mdp.39015019327512</a><br />
* German/Fraktur: <a href="http://hdl.handle.net/2027/mdp.39015070866887">http://hdl.handle.net/2027/mdp.39015070866887</a><br />
* Russian: <a href="http://hdl.handle.net/2027/mdp.39015028011768">http://hdl.handle.net/2027/mdp.39015028011768</a><br />
* Czech: <a href="http://hdl.handle.net/2027/mdp.39015026722820">http://hdl.handle.net/2027/mdp.39015026722820</a><br />
* Polish: <a href="http://hdl.handle.net/2027/mdp.39015055374857">http://hdl.handle.net/2027/mdp.39015055374857</a><br />
* Greek: <a href="http://hdl.handle.net/2027/mdp.39015047659472">http://hdl.handle.net/2027/mdp.39015047659472</a></p>

<p>The process used to convert from page images to text is called Optical Character Recognition, or OCR. (You can view the OCR text of any of the pages by switching to "text" under "view page as" on the left-hand menu in the pageturner.) Without good OCR, there's no way to search the books. Google is all about search, and they're working to improve the OCR they produce. However, the multitude of languages, scripts, and fonts in this collection poses a serious problem for OCR, and it's likely that Google won't be able to OCR all languages as they encounter them. In addition, the quality of the page itself is critical to good OCR. In many older books, particularly those published between 1850-1950, the paper has deteriorated and discolored, resulting in lower quality OCR.</p>

<p>I can read German, so I know that the OCR for the Fraktur script in the above example isn't perfect. However, given that there isn't much OCR software that can handle Fraktur, it's not bad. I don't read any of the other languages, so I can't make any judgment about the accuracy of the OCR in the rest of the list.</p>

<p>I think that this is an area that Google will continue to improve. You will be able to find examples of books in these languages with very poor OCR. Google is reprocessing texts and will send us new and improved versions, so we will get better OCR as the project progresses. </p>

<p>One of the complexities of this work is assessing the quality of OCR in languages that you don't read. I don't read Italian or Spanish, but they use the same alphabet and Latin roots as other western European languages, so I'm able to at least verify the words without knowing the exact meaning. Chinese, Japanese, Korean, Hebrew, Russian and Greek present many more problems for me. For instance, the text in most of the books in Chinese that I've seen runs from top to bottom (including the example in Chinese above), but the OCR goes left to right. Is that right? Are all the characters there, in the correct order?</p>

<p>The Greek title in the list shows another complexity with OCR: the pages alternate between Latin and Greek, but the text has Greek characters throughout. It's difficult for most OCR software to handle multiple languages in the same book.</p>

<p>We don't have a lot of experience in dealing with non-Western languages in the Digital Library Production Service department, and we'll be reaching out to experts--in the library, in the university, in our consortium--to help us answer questions. </p>

<p>Hebrew and other languages that read right-to-left present special problems for us. In looking at the example in Hebrew in the above list, it looks like the glyphs have been converted correctly, but we're using a right-justified margin rather than left-justified. Here's a sample page image:</p>

<p><img src="http://www.lib.umich.edu/graphics/litblog/MBooksLanguage1.png" alt="MBooks Language page image" /></p>

<p>And here's the OCR:</p>

<p><img src="http://www.lib.umich.edu/graphics/litblog/MBooksLanguage2.png" alt="MBooks Language text example"/><br />
We'd welcome hearing from users about these issues. </p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/08/languages_in_mb.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/08/languages_in_mb.html</guid>
<category>MBooks</category>
<pubDate>Fri, 01 Aug 2008 10:26:14 -0500</pubDate>
</item>
<item>
<title>More MTagger Usability Research</title>
<description><![CDATA[<p>As previously mentioned, the Usability Working Group (UWG), along with our 2 fantastic and hardworking interns, have been conducting usability research on <a href="http://www.lib.umich.edu/mtagger/">MTagger</a>. </p>

<p>We've now completed 5 studies (heuristic evaluation, cognitive walk-through, interviews, an informal "guerilla" test, and a comparative evaluation). We've just completed 6 formal usability tests and are in the process of analyzing the results.</p>

<p><a href="http://www.lib.umich.edu/usability/projects/MTagger.html">Link to MTagger Usability Reports</a></p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/07/more_mtagger_us.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/07/more_mtagger_us.html</guid>
<category>MTagger</category>
<pubDate>Fri, 25 Jul 2008 12:18:09 -0500</pubDate>
</item>
<item>
<title>Drupal:  MLibrary&apos;s Future CMS</title>
<description><![CDATA[<p>The University of Michigan Library is in the process of a major site redesign.  Part of this design is moving -- at long last -- from static pages built on SSI to a full-blown content management system.  We started with a review of the (mostly open source) CMS software landscape, and winnowed this list down to nine candidate systems worth a deeper look:</p>

<ul>
	<li><a href = "http://www.alfresco.com/">Alfresco CMS</a></li>
	<li><a href = "http://cocoondev.org/daisy/index.html">Daisy</a></li>
	<li><a href = "http://drupal.org/">Drupal</a></li>
	<li><a href = "http://www.joomla.org/">Joomla!</a></li>
	<li><a href = "http://modxcms.com/">ModX CMS</a></li>
	<li><a href = "http://plone.org/">Plone</a></li>
	<li><a href = "http://www.silverstripe.com/">SilverStripe CMS</a></li>
	<li><a href = "http://typo3.com/">Typo 3</a></li>
	<li><a href = "http://www.plainblack.com/webgui">WebGUI</a></li>
</ul>

<p>We took a look at each of these in some detail, taking into consideration programming languages and local expertise, amount of documentation, vibrancy of the active developer community, comparable "peer" installations (either in other libraries or at other similarly large-scale sites), and a very subjective review of how the admin and authoring interfaces acted.  We summarized our findings in a spreadsheet:</p>

<div align = "center">
	<a href = "http://www.lib.umich.edu/graphics/litblog/CMS_Overview-lg.png"><img src = "http://www.lib.umich.edu/graphics/litblog/CMS_Overview-sm.png" width = "450" height =  "115" alt = "CMS Comparison Table" border = "0"></a>
</div>

<p>After this first pass, we ended up with a short list of 3 tools, the ones with the highest average score:</p>

<ul>
	<li><a href = "http://www.drupal.org/">Drupal</a></li>
	<li><a href = "http://www.joomla.org/">Joomla!</a></li>
	<li><a href = "http://www.plone.org/">Plone</a></li>
</ul>

<p>We then set up out-of-the box test installations of these three finalists and compared them in terms of workflow (our final designs weren't ready, so we didn't focus on making the test installs look "right").  We arranged phone conversations with library IT folks who were using these tools.  And in the end, we selected Drupal.  While the other two tools had strengths, we ended up deciding against Joomla! because of what we perceived as a surfeit of security problems over time with frequent releases to patch bugs or security holes.  We liked Plone, as well, but we felt taking on a new programming environment (Python) for something as critical as our web presence was not sensible. </p>

<p>In the end, though, it was Drupal's strengths in terms of its modular construction, very lively development community, and the number of large academic libraries using it that led our decision.</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/07/drupal_mlibrary.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/07/drupal_mlibrary.html</guid>
<category>Drupal</category>
<pubDate>Thu, 24 Jul 2008 11:10:11 -0500</pubDate>
</item>
<item>
<title>Google Still Not Indexing Hidden Web URLs</title>
<description><![CDATA[<p>Read our recent article in D-Lib Magazine:<br />
<a href="http://dx.doi.org/10.1045/july2008-hagedorn">http://dx.doi.org/10.1045/july2008-hagedorn</a>.</p>

<p>This report is a follow-up to the McCown et al. article in IEEE Internet Computing two years ago [1], in which the researchers investigated the percentage of URLs from OAI records in Google, Yahoo and MSN search indexes. We were interested in whether Google in particular had increased the number of OAI-based resources in its search index.</p>

<p>Google's indexing does not seem to have retrieved more of the hidden web since the publication of the McCown, et al. article in 2006. We would venture to conclude that Google has not endeavoured to increase their support and access to OAI materials. Even taking into account the caveats in our report, we would also conclude that aggregations of OAI records are as valuable for user research purposes as they were at least two years ago.</p>

<p>[1] McCown, F., Liu, X., Nelson, M. L., and Zubair, M. "<a href="http://doi.ieeecomputersociety.org/10.1109/MIC.2006.41">Search engine coverage of the OAI-PMH corpus.</a>" IEEE Internet Computing 10:2 (March/April 2006) pp. 66-73.</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/07/google_still_no.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/07/google_still_no.html</guid>
<category>OAI</category>
<pubDate>Tue, 22 Jul 2008 16:38:41 -0500</pubDate>
</item>
<item>
<title>Top Ten MBooks Collections</title>
<description><![CDATA[<p>Three weeks after it was launched, we can say a little bit about MBooks collection builder usage.  Right now, there are 47 public collections (more than half were created by LIT staff) and 170 personal collections.</p>

<p>I've done a little bit of rough assessment, and can report on the ten most-used MBooks collections (they are all public collections).  Collection usage includes viewing the collection page, searching the collection, sorting the books in the collection, and copying items to another collection.  It does not include searching or viewing the items within that collection -- tracking use of a book from a collection vs. from Mirlyn vs. from links from blogs was outside the scope of my quick-and-dirty analysis.  Usage from our network range was not included in this assessment.</p>

<p>Here they are:</p>

<ol><li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=1276272457">Abraham Lincoln: Fact and Fable</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=897641408">Great Britain</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=1874608773">Ann Arbor History</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=1679046231">How to be a Domestic Goddess</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=379">Gothic literature</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=272972852">Historical Bicycling</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=464226859">Adventure Novels: G.A. Henty</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=1580161751">What It Was, Was Football</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=503653486">Patents</a></li>
<li><a href="http://sdr.lib.umich.edu/cgi/mb?a=listis;c=1984268397">French Texts</a></li></ol>

<p><img src="http://www.lib.umich.edu/graphics/litblog/lincoln.jpg" class="left" alt="" />Abraham Lincoln: Fact and Fable is twice as popular as the next-most popular collection, Great Britain, which is almost twice as popular as Ann Arbor History.   As far as I can tell, none of these collections is linked from anywhere else except for the G. A. Henty Adventure Novels, which is included as a link in Henty's Wikipedia entry.  Even with the minimal metadata presently available on the Public Collections page, people are finding and using collections that are interesting to them.</p>

<p><br />
</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/07/top_ten_mbooks.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/07/top_ten_mbooks.html</guid>
<category>MBooks</category>
<pubDate>Mon, 21 Jul 2008 15:17:26 -0500</pubDate>
</item>
<item>
<title>New MBooks!</title>
<description><![CDATA[<p>As <a href="http://mblog.lib.umich.edu/blt/archives/2008/06/preview_of_the.html">previously mentioned</a>, we've been working on expanding the functionality of our MBooks system.</p>

<p>The new interface now allows users to create their own collections of MBooks items and view public collections created by others. Users can also do full text searching across all items within a collection. </p>

<p>So, check it out! <a href="http://sdr.lib.umich.edu/cgi/mb">MBooks Public Collections Page</a></p>

<p>We have quite a few more enhancements planned down the road that include adding <a href="http://www.lib.umich.edu/mtagger/">MTagger</a> and making it easier to find MBooks items in <a href="http://mirlyn.lib.umich.edu/">Mirlyn</a>.</p>

<p>We quietly released it last week so we could discover any remaining bugs and (my personal nemesis) browser display problems. We hope we caught them all, but please let us know if you experience any weird behavior. You can contact us via <a href="mailto:mdp-help@umich.edu">mdp-help@umich.edu</a> or the feedback form linked to from the top of every MBooks page.</p>

<p>And please take a few minutes to fill out our <a href="http://www.lib.umich.edu/survey/public/survey.php?name=MBooksCB_1">quick survey</a> to help us decide what features to add next.<br />
</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/07/new_mbooks.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/07/new_mbooks.html</guid>
<category>MBooks</category>
<pubDate>Tue, 01 Jul 2008 12:57:10 -0500</pubDate>
</item>
<item>
<title>Browsing in MBooks?</title>
<description><![CDATA[<p>Last month I attended the annual Digital Library Federation spring meeting and David Rumsey, renowned for his collection of historical maps, was one of the keynote speakers.  Prompted by David Rumsey’s map ticker (<a href="http://www.davidrumsey.com/ticker.html">http://www.davidrumsey.com/ticker.html</a>) and what he said in passing about “moving among the maps” in Second Life, I’ve been brooding about the perceived lack of browsability in the digital library context.  How would we “move among the books” in MBooks?</p>

<p>Presumably, one way we could do it would be to make a book ticker – perhaps with covers or title page thumbnails, arranged in call number order (as one would browse a shelf).</p>

<p>That raises a few immediate practical questions:</p>

<p>1.	Do we have identified title pages or cover thumbnails for all the books?  What do we do for cases where we don’t?<br />
2.	Should we precompute thumbnails or try to derive them on the fly?<br />
3.	Can we use the Mirlyn call number to browse?  They aren’t in the MARC record per se.</p>

<p>These practical questions raise a number of other usability issues, of course.  Some are about thumbnails – what size would the thumbnails have to be to make them useful?  When you clicked on them, where would you end up?  Could you hover over them and see some volume metadata?  Can we show thumbnails for in-copyright items?  Others are about call number browsing – would you really want to browse all items by call number, or just those from a given library?  That is, browse the “real” stacks for a holding location, like Shapiro Undergraduate Library, or the superset of all libraries, the stacks as they’ve never been in the physical world?  </p>

<p>To me, the latter choice seems like the best one – it’s something that is only possible in a digital library, as we’d be drawing together items that are housed in separate buildings yet may be related.  How do you imagine browsing in the digital library?</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/06/browsing_in_mbo.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/06/browsing_in_mbo.html</guid>
<category>MBooks</category>
<pubDate>Wed, 18 Jun 2008 16:23:15 -0500</pubDate>
</item>
<item>
<title>Google Book Search links in Mirlyn</title>
<description><![CDATA[<p>You may have noticed that the links to Google Books in Mirlyn have a little more information lately. We have always provided links to online copies in both Google Book Search and MBooks. We're now using the <a href="http://code.google.com/apis/books/">Google API</a> to provide links to any book in Mirlyn that is also in Google Book Search.</p>
<p>
We provide a thumbnail image of the cover or title page (although there's been some <a href="http://www.librarything.com/thingology/2008/06/covers-from-google-too-good-to-be-true.php">controversy</a> about this lately). In addition, we also tell you what level of access you can expect if you follow the link to Google Book Search. Google Books has three levels of access, while MBooks has only two:</p>
<table border=1 cellpadding=3>
<tr><td><strong>Google Book Search terms</strong></td><td><strong>MBook terms</strong></td></tr>
<tr><td>Snippet view</td><td>Search Only</td></tr>
<tr><td>Limited view</td><td></td></tr>
<tr><td>Full-text</td><td>Full Text</td></tr>
</table>

<p>In Google Book Search, "Snippet view" means that you cannot view the full-text, but can see up to three text snippets; "Search Only" in MBooks means that you can search for keywords, and discover where all the matches occur, but can't view the pages. (See <a href="http://mblog.lib.umich.edu/blt/archives/2008/05/what_to_do_with.html">this previous post</a> for more about "Search Only.") "Limited view" means that the book is part of Google's Publisher Partnership, and a limited number of pages is available for reading. You won't be able to see the entire book, but you will have access to a significant number of pages. "Full-text" in Google Book Search means that you can view the entire text, and get a PDF file of the entire text, while "Full Text" in MBooks means that you can view the page images using the MBooks pageturner, and get a 10-page PDF excerpt.</p>
<p>
If you look at very many records for MBooks in Mirlyn, you will soon note that in some cases the access levels differ between MBooks and Google Book Search. 
<ul>
<li>We make US Federal documents freely available, while Google restricts access to "snippet view" in many cases: <a href=" 	 http://mirlyn.lib.umich.edu:80/F/?func=direct&doc_number=003485219&local_base=AA_PUB">2010 and beyond : preparing Medicare for the baby boomers : hearing before the Special Committee on Aging
</a>
</li>
<li>In this case, Google Book Search offers "limited view," which allows you access to many pages in the book, while MBooks offers "search only" access: <a href="http://mirlyn.lib.umich.edu:80/F/?func=direct&doc_number=005417795&local_base=AA_PUB">500 bracelets</a>.
</li>
<li>In this case both Google Book Search and MBooks provides "full text" access: <a href=" 	 http://mirlyn.lib.umich.edu:80/F/?func=direct&doc_number=000324022&local_base=AA_PUB">The £1,000,000 bank-note, and other new stories, by Mark Twain</a>
</li></ul>
</p>
<p>In this last example you'll have full-text in either Google Books or MBooks, so you can decide which interface you prefer. Knowing how to read the Mirlyn record will help you find the best access for any given book. Happy reading!</p>

<p>--Perry Willett<br />
--Head, Digital Library Production Service</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/06/google_links_in.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/06/google_links_in.html</guid>
<category>MBooks</category>
<pubDate>Fri, 13 Jun 2008 09:13:00 -0500</pubDate>
</item>
<item>
<title>Preview of the new Collection Builder tool</title>
<description><![CDATA[<p>Over the past year we've been developing a new collection building tool to be used in conjunction with the MBooks "page-turning" application already available. This tool will allow users to create their own collections of MBooks items and view public collections created by others. Users will also be able to do full text searching across all items within a collection.</p>

<p>We're still working out some bugs and interface issues but hope to release soon. Check back in July!</p>

<p><img src="http://www.lib.umich.edu/graphics/litblog/MBooks_CB1.jpg" alt="MBooks preview"/></p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/06/preview_of_the.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/06/preview_of_the.html</guid>
<category>MBooks</category>
<pubDate>Mon, 09 Jun 2008 11:36:37 -0500</pubDate>
</item>
<item>
<title>MTagger Usability Research</title>
<description><![CDATA[<p>The Usability Working Group (UWG), along with our 2 fantastic and hardworking interns, is spending the summer conducting usability research on <a href="http://www.lib.umich.edu/mtagger/tags/faq">MTagger</a>. We started by doing a heuristic evaluation and cognitive walkthrough. The goal for these evaluations was to reveal a preliminary set of issues pertaining to the usability, functionality and aesthetics of <a href="http://www.lib.umich.edu/mtagger/tags/faq">MTagger</a> and to facilitate prioritizing further benchmarks. This report is now online.</p>

<p>We've also completed a "guerilla" test and we're in the process of conducting interviews and preparing for formal user tests and a survey. Reports for those studies will also be put online when they're done.</p>

<p><a href="http://www.lib.umich.edu/usability/projects/MTagger.html">Link to MTagger Usability Reports</a></p>

<p>- Suzanne Chapman<br />
-- UWG chair/DLPS Interface & User Testing Specialist</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/06/mtagger_usabili.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/06/mtagger_usabili.html</guid>
<category>MTagger</category>
<pubDate>Sun, 08 Jun 2008 14:22:46 -0500</pubDate>
</item>
<item>
<title>Page numbers and URLs in MBooks</title>
<description><![CDATA[<p>We get questions from MBooks users (most recently from dfulmer in the comments to <a href="http://mblog.lib.umich.edu/blt/archives/2008/05/search_full-tex.html">this post</a>) about how to link to pages, what the URL parameters such as "num" and "seq" mean, and other questions about links and page numbers.</p>

<p>There are a couple of issues. The first is about URLs. The most stable and persistent URL is the one that we include in the Mirlyn record, and also at the top of the pageturner with other descriptive metadata. It's called a "handle" and is a robust persistent identifier managed by CNRI (more on handles at <a href="http://www.handle.net/">http://www.handle.net/</a>). They look like this:</p>
<p><a href="http://hdl.handle.net/2027/mdp.39015021038404">http://hdl.handle.net/2027/mdp.39015021038404</a></p>
<p>and this is the URL that we encourage people to use and save. However, since they all start with http://hdl.handle.net/2027, people don't recognize them as belonging to the University of Michigan. Users are much more familiar with URLs that include the umich.edu domain. Nevertheless, since these handles are persistent and robust ("2027" is registered with CNRI as belonging to us) these are the URLs that should be used.</p>

<p>Other URLs will be less stable. The sharper-eyed among our readers will have noted that our URLs recently changed from starting with "mdp.lib.umich.edu" to "sdr.lib.umich.edu". We will redirect users any time they use a URL starting with "mdp.lib.umich.edu" but these local domain names will change over time. The same is true for the URL parameters such as "page," "num," "seq," "orient," etc. Phil Farber's response to <a href="http://mblog.lib.umich.edu/blt/archives/2008/05/search_full-tex.html">the same post noted above</a> provides documentation on what these mean, but be aware that these will change without warning. URL hacking will lead to tears before bedtime. </p>

<p>The other related issue has to do with page numbers and other metadata. People will notice that many MBooks include a table of contents with page numbers on the left-hand side, such as <a href="http://hdl.handle.net/2027/mdp.39015002064486">this one</a>. You may also notice that some books lack this table of contents, and use "sequence" instead of page numbers. <a href="http://hdl.handle.net/2027/mdp.39015021038404">Here's</a> an example of a book for which we do not have page numbers.</p> </p>

<p>It all has to do with the metadata. At a minimum, we know the sequence in which the pages of any given book should be displayed. The pageturner buttons for forward and backward use this information to work properly, but for some books, this is all the information we have. Since the sequence of pages starts with the front cover, it's unlikely that the sequence number will match the actual page number. (And as Suzanne noted in her comments to <a href="http://mblog.lib.umich.edu/blt/archives/2008/05/search_full-tex.html">this post</a>, if someone has a better term than "sequence" please let us know!) Many of these books without page numbers were early efforts by Google; they are sending us newer, better versions of these books, so eventually the entire collection will include page numbers.</p>

<p>In many (soon most or all) cases we will have page numbers, along with additional metadata identifying title pages, tables of contents, first pages of sections, and other page features. We get these metadata from Google. We don't know how Google generates them, but it's undoubtedly an automated method. This means that they won't be perfect. When we do have metadata indicating the title page, we will open the book to the title page as a default. If we don't have any metadata about the title page, we will open to the first image (usually the front cover).</p>

<p>Page numbers are, to quote the kids, whack. In some books, they are out of sequence, or repeated, or misnumbered, or missing. With many journals, the library has bound together two or more issues, each with its own pagination from 1 to whatever. Therefore, the online volume could have multiple pages numbered 207, as in the example that David points to in his comments to <a href="http://mblog.lib.umich.edu/blt/archives/2008/05/search_full-tex.html">the post mentioned above</a>. Right now, MBooks will take you to the first instance of p. 207 if you type that into the "goto" box. We could probably do something to alert people to the fact that there are multiple pages numbered 207, and give them links to each of them.</p> 

<p>We need to consider having persistent URLs to individual pages. People want to refer to individual pages, and we should have a method with a stable URL to allow them to do it. We could also do more to have a predictive method of referring to a page. Ed Vielmetti recently wrote some ideas about this in <a href="http://vielmetti.typepad.com/superpatron/2008/05/how-to-structur.html">his blog</a>.</p>

<p>We will look at this more carefully soon, once we get through the current round of development for collection builder and other new features.</p>

<p>--Perry Willett<br />
--Head, Digital Library Production Service<br />
</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/06/page_numbers_an.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/06/page_numbers_an.html</guid>
<category>MBooks</category>
<pubDate>Fri, 06 Jun 2008 10:38:39 -0500</pubDate>
</item>
<item>
<title>New REST-ful API for Mirlyn</title>
<description><![CDATA[<p>Earlier this week, I had a chance to give a brown-bag session on a new API into our catalog, Mirlyn (Ex Libris's Aleph software).</p>

<p>One of the great things about working at a library is the depth and breadth of data at our disposal. One of the more frustrating things is how terribly locked-up all that data is.</p>

<p>What, I wondered, would happen if I could radically lower the bar of entry to the catalog for programmers of even marginal ability? The University of Michigan has a pretty big collection, and there's no telling what people could do with that data if getting at it was a lot easier, if they didn't need special permission or access to a particular machine, and if it was useful inside the browser using Javascript as well as in server-side operations?</p>

<p>So I went about trying to create a system that fulfilled, at least partially, those criteria. Unlike many ILS systems, Aleph already provides a whole suite of interfaces, including an XML-based API they call the XServer. Unfortunately, the XServer has, in my opinion, a number of shortcomings:</p>

<ul>
<li>As its name suggests, it's based on XML, which can be confusing to deal with to the uninitiated. Remember, my focus is on weekend programmers, maybe just writing javascript inline in an HTML document.
<li>URLs for an XServer search don't mean anything. First you do a search, and get back a search set. Then you ask for some records using that search set in the URL. It's essentially a random identifier, and looking at the URL doesn't tell you anything about what search was done or what you're getting, and you're sure not going to construct one by hand.
<li>The interface is...messy. It's clearly a system that grew up organically, and there are a lot of inconsistencies concerning how things are named, what parameters are called, etc. I wanted an interface where you could take a good guess at what the URL should look like and be right 90% of the time.
</ul>

<p>When I was all done, I had a system that supports queries like this:</p>

<dl>
<dt style="font-weight: bold; margin-top: 0.5em;">A book by ISBN:</dt>
<dd><a href="http://mirlyn.lib.umich.edu/cgi-bin/api/basic.json/isbn/097669400x">http://mirlyn.lib.umich.edu/cgi-bin/api/basic.json/isbn/097669400x</a></dd>
<dt style="font-weight: bold; margin-top: 0.5em;">Or a couple:</dt>
<dd><a href="http://mirlyn.lib.umich.edu/cgi-bin/api/basic.json/isbn/0596000278;097669400x?records=all">http://mirlyn.lib.umich.edu/cgi-bin/api/basic.json/isbn/0596000278;097669400x?records=all</a></dd>
<dt style="font-weight: bold; margin-top: 0.5em;">The most recent 10 books by anyone named 'Bonk'</dt>
<dd><a href="http://mirlyn.lib.umich.edu/cgi-bin/api/basic.json/author/bonk?records=1-10">http://mirlyn.lib.umich.edu/cgi-bin/api/basic.json/author/bonk?records=1-10</a></dd>
<dt style="font-weight: bold; margin-top: 0.5em;">And how about the next ten?</dt>
<dd><a href="http://mirlyn.lib.umich.edu/cgi-bin/api/basic.json/author/bonk?records=11-20">http://mirlyn.lib.umich.edu/cgi-bin/api/basic.json/author/bonk?records=11-20</a></dd>
</dl>

<p>You can search by title, author, keyword, most standard numbers -- just about anything you can use to search the catalog  via the website. The <a href="http://webservices.itcs.umich.edu/mediawiki/MLibraryAPI/index.php/Mirlynapi:Searchable_Indexes">full list of searchable indexes and their aliases</a>, as well as <a href="http://webservices.itcs.umich.edu/mediawiki/MLibraryAPI/index.php/Mirlynapi:Home">all the current Mirlyn API documentation</a>, is on the new <a href="http://webservices.itcs.umich.edu/mediawiki/MLibraryAPI/index.php/Main_Page">MLibrary API wiki</a>.</p>

<p>While all the above examples return a subset of available data in the <a href="http://www.json.org">JSON</a> format, you can also return XML if you're more comfortable with it, and besides the "basic" data you can get circulation status or full MARC records (expanded into either XML or JSON). Just replace "basic.json" in the above URLs with something like "marc.xml" or "circstatus.json".</p>

<p>There's still a lot to do (allow user-defined sorting, let people browse by callnumber, etc.) but it works and is useful and is certainly friendly, in enough ways, that people can start digging into it if they want. </p>

<p>I've put some simple javascript examples on the <a href="http://www.lib.umich.edu/labs/">MLibrary Labs page</a>; check them out, and drop me a note (or, better yet, comment here!) if you have questions or ideas.</p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/06/new_rest-ful_ap.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/06/new_rest-ful_ap.html</guid>
<category>API</category>
<pubDate>Mon, 02 Jun 2008 08:51:33 -0500</pubDate>
</item>
<item>
<title>Full-Text MBook Searches from the Library Catalog</title>
<description><![CDATA[<p>At the University of Michigan Library, in partnership with Google, we have been busily scanning our collections.  This opens up lots of possibilities, including an exciting one that launches today:  search the full text of a book from within <a href = "http://mirlyn.lib.umich.edu/">Mirlyn</a>, the library's catalog.</p>

<p>If a book has been scanned by Google, there is a "search in in this book" field within the library catalog record.  Depending on the particular book, a search will result in full text results (if the book is in the public domain) or search-term only view (if the book is in copyright).  </p>

<p>Here is an example of an out-of-copyright book (with full-text results available): <a href = "http://mirlyn.lib.umich.edu/F/?func=direct&doc_number=001962145&local_base=AA_PUB"><cite>1931: A Glance at the Twentieth Century</cite></a>.  The record in the catalog looks like this:</p>

<div class="center">
  <h3 class="caption">Screen Shot of Mirlyn Record with "Search in this Book" Option</h3>
  <img src = "http://www.lib.umich.edu/graphics/litblog/mirlyn-search-in-book.png" border = "1" alt = "Screen shot of Mirlyn record with 'search in this book' option" width = "450">
</div>

<p>And here are the results of that search:</p>

<div class="center">
  <h3 class="caption">Screen Shot of MBook Search Results</h3>
  <img src = "http://www.lib.umich.edu/graphics/litblog/mirlyn-search-results.png" border = "1" alt = "Screen shot of Mirlyn record with 'search in this book' option" width = "450">
</div>

<p>All books that have been scanned -- <a href = "http://www.lib.umich.edu/news/millionth.html">one million and counting</a> -- are searchable.  Search results are linked to the full text for those works that are in the public domain.  Search results for books that are still under copyright are shown in brief view.  Brief view displays a phrase or two on either side of the search term, but doesn't include full-text display of the page.  In either case, the search in the book tool will help you know if you want to get the actual book off the shelf before you visit the library or make a delivery request.</p>

<p>Try these sample records:<p>

<p><b>Full-text</b>:  <a href = "http://mirlyn.lib.umich.edu:80/F/?func=direct&doc_number=000538790&local_base=AA_PUB">The Miscellaneous Writings of Lord Macaulay</a></p>

<p><b>Search only</b>:  <a href = "http://mirlyn.lib.umich.edu:80/F/?func=direct&doc_number=005417795&local_base=AA_PUB">500 Bracelets: An Inspiring Collection of Extraordinary Designs</a></p>]]></description>
<link>http://mblog.lib.umich.edu/blt/archives/2008/05/search_full-tex.html</link>
<guid>http://mblog.lib.umich.edu/blt/archives/2008/05/search_full-tex.html</guid>
<category>Mirlyn</category>
<pubDate>Fri, 30 May 2008 09:25:39 -0500</pubDate>
</item>


</channel>
</rss>