Full-Text MBook Searches from the Library Catalog

May 30, 2008

At the University of Michigan Library, in partnership with Google, we have been busily scanning our collections. This opens up lots of possibilities, including an exciting one that launches today: search the full text of a book from within Mirlyn, the library's catalog.

If a book has been scanned by Google, there is a "search in in this book" field within the library catalog record. Depending on the particular book, a search will result in full text results (if the book is in the public domain) or search-term only view (if the book is in copyright).

Here is an example of an out-of-copyright book (with full-text results available): 1931: A Glance at the Twentieth Century. The record in the catalog looks like this:

Screen Shot of Mirlyn Record with "Search in this Book" Option

Screen shot of Mirlyn record with 'search in this book' option

And here are the results of that search:

Screen Shot of MBook Search Results

Screen shot of Mirlyn record with 'search in this book' option

All books that have been scanned -- one million and counting -- are searchable. Search results are linked to the full text for those works that are in the public domain. Search results for books that are still under copyright are shown in brief view. Brief view displays a phrase or two on either side of the search term, but doesn't include full-text display of the page. In either case, the search in the book tool will help you know if you want to get the actual book off the shelf before you visit the library or make a delivery request.

Try these sample records:

Full-text: The Miscellaneous Writings of Lord Macaulay

Search only: 500 Bracelets: An Inspiring Collection of Extraordinary Designs

Posted by Ken Varnum at 09:25 AM. Permalink

Comments

That's nice, but "Sequence" means nothing outside the spiraling towers of the Michigan libraries. And perhaps Google. Is a sequence a page? Two pages? A folio? A sentence? A placeholder?

Posted by: hampelm at May 31, 2008 06:23 PM

Sequence is indeed a page - but numbering starts with the cover. We originally didn't receive actual page number data (sequence 12 = page 7) so we had to use this. We've since begun getting actual page number data and when we do have that information, the interface uses "page" instead of the "sequence" number. Eventually we'll have actual page data for all our items.

We debated the term but couldn't come up with anything better. I'd welcome suggestions!

Posted by: suzchap at June 1, 2008 08:27 AM

Matt, "Sequence" doesn't even mean anything inside the spiraling towers of the Michigan libraries!

Maybe this would be a good place to explain the anatomy of an MDP url. It looks to me like the interface uses "num" instead of "page" and "page" always equals "root" (when it doesn't equal "search"). Also, what does "u=1" mean? The num doesn't seem to work without it.

You might also explain all the other parameters and defaults (view, size, seq, num, page, u, q1, start, etc.) and whether they are required or not.

What sort of behavior might we expect when trying to link to page 208 in a book with two volumes bound together and hence two pages numbered 208, like this one: http://sdr.lib.umich.edu/cgi/m/mdp/pt?view=image;size=100;id=mdp.39015000547821;page=root;u=1;num=208

Is there a parameter that will always get me to the table of contents?

Posted by: dfulmer at June 3, 2008 11:02 PM

David,

There is a set of complex issues here, with page numbers, URLs and metadata, probably worthy of its own blogpost. We'll work on addressing your questions.

The URL you include isn't valid--I think you mean this:

http://hdl.handle.net/2027/mdp.39015000547821

Thanks,

Perry

Posted by: pwillett at June 4, 2008 09:23 AM

Following on Perry's post, I'll try to explain parameter semantics in more detail. I'd note, however, that pageturner URL parameters are not intended to provide an API to the data. They have meaning mainly within the context of the pageturner application as experienced by the user. So:

id - the item identifier

page - the web page to display, which would be one of 'root' (the view of the item) or 'search' (the search results page)

seq - the sequential number of the scanned page starting at 1 (usually the front cover)

num - a page number as printed on the page. This could be 2 or 7, or xxi and so on if the item has page number metadata available. Not all items have page number metadata yet.

view - one of 'image' (a page image), 'pdf' (a page image rendered as a pdf), 'text' (the OCR of the page).

size - a percentage of a nominal 680 pixel width scaling of the full resolution tiff or the size of the result list slice.

q1 - the query string entered by the user when searching

start - the beginning offset into the list of search results

u=1 - indicates this page or seq value was entered by the user. It it part of an algorithm that allows us to handle the problem that sometimes the number must be treated as a sequence number and sometimes as a page number.

There is no url parameter that indicates the table of contents page. If the item has page feature metadata, such as Table of Contents or Title Page, links to the corresponding sequence numbered page image will appear in the left-hand side bar.

For a volume that has repeating page numbers, assuming that page number metadata is available, entering a given page number will take you to the first page so numbered. If page number metadata is not available, the number entered is treated as a sequence number which is always unique.

Phil

Posted by: pfarber at June 4, 2008 10:17 AM

Thanks for that information!

As for my url, it isn't all showing up but if you triple-click on the line you can copy it.

Posted by: dfulmer at June 5, 2008 01:43 AM

This makes me think about making links to mbooks from MY catalog, for titles we also hold. How would you all feel about that? I'm trying to think of ways to do that without putting unreasonable load on your servers. It would be great if you wanted to contact me in email to discuss this further (or a phone conversation?).

Posted by: rochkind@jhu.edu at June 16, 2008 01:26 PM

Hi Jonathan,

As I mentioned on the next gen catalog listserv a few weeks ago, we have OAI records for all the freely available titles. The OAI records have OCLC numbers in them. I'd be happy to discuss other strategies. Thanks,

Perry
pwillett@umich.edu

Posted by: pwillett at June 16, 2008 03:42 PM

Login to leave a comment. If you don't have already have a University of Michigan uniqname, create a Friend account -- all you need is a valid email address.