HathiTrust "Search in this text." Now with relevance ranking and better multilingual support!
August 29, 2011
Today we released the third high priority feature identified by the HathiTrust Full-Text Working Group: Relevance ranking for "Search in this text." Now when using the "Search in this text" feature, instead of having to scroll through numerous pages of results in page order, the results are now returned in relevance order with most relevant pages at the top of the list. The default is to list only pages that contain all the words in a user's search (a Boolean "AND" search.) However, there is also a link that will search for pages containing one or more of the search terms. If this option is selected, the pages containing more of the user's search terms are ranked higher.
In addition to relevance ranking, searching for non-Latin languages such as Hindi, Arabic, Hebrew, or Thai, now matches the capabilities of the Full text search of all 9 million volumes.
HathiTrust Full-Text search: Now with Facets!
August 10, 2011
On July 27th we went live with faceted search and relevance ranking based on both OCR and MARC metadata in Full-Text search. (www.hathitrust.org) These are the top two features identified by the HathiTrust Full-Text Working Group.
The relevance ranking now will give volumes that match a user's query terms in both the OCR and in the title or author or subject a higher ranking than a match in only the OCR. There is much more work to be done in tuning relevance ranking, but this is a first step.
Search results can now be refined by selecting facets such as subject, date or author. Although selecting facets can help users drill down to narrow large result sets, using very specific terms and especially using phrases in quotes remain one of the best ways to get reasonably small result sets.
Over the next few months we will be releasing further improvements in ranking and more of the features identified by the task force.