« May 2007 | Main | July 2007 »

June 27, 2007

In the Chronicle's Blog - Librarians & Web 2.0

We will get back to more how-to entries, but in the meantime, you might find this of interest from the Chronicle.

The Chronicle of Higher Education: Wire Campus Blog: Librarians Find a Place in a 'Web 2.0' World: http://chronicle.com/wiredcampus/index.php?id=2182

If you didn't already, now you know where to go ... ;)

Posted by pfa at 08:45 AM | Comments (0)

June 10, 2007

MLibrary2.0 Kickoff, Part 2: Kristin Antelman

The MLibrary 2.0 forum series got off to a great start on Friday. A capacity crowd of about 140 people joined us in the Michigan Union for presentations by Peter Morville, Kristin Antelman and Jessamyn West (speaker bios are here). I'll be writing up my notes from the event and posting the writeups here. Slides and videos from Peter's and Kristin's presentations will be posted to the Events page within the next few days; many thanks to Peter Knox for documenting this event!

Update: Having lost Saturday's blogging battle to the forces of beautiful weather, and seeing Patricia's excellent post covering Peter Morville's presentation, I've decided to skip my coverage of Peter's talk (which has also been blogged here and here) and go straight to Kristin's. Note: Since this was the geekiest presentation by far, and hence my automatic favorite, I'm going to include as many links as I can to the things Kristin mentioned in her talk. I might throw in a few related references of my own, and I will label them as such.

Kristin Antelman: Next generation catalogs

MLibrary 2.0: Kristin Antelman

Preview: Kristin's going to describe some features of the NCSU catalog, then ask us where we think we as a profession are going with this whole catalog thing.

We've had online catalogs for a long time, but we've never quite gotten it right - in fact, for many purposes the OPAC is worse than the card catalog. Subject browsing is one of them, and there have been several experiments with advanced subject browsing in the OPAC: Mark Ludwig's project at SUNY Buffalo, which indexed MARCXML records; Casey Bisson's WPopac (now Scriblio), which uses faceted browse and runs entirely on free software; Casey Durfee's "Open Source Endeca" (Apache Solr + Django) from Code4lib 2007. Kristin mentioned a couple of recent reports that gave some theoretical background to such efforts: the Karen Calhoun report for the Library of Congress and the BSTF (University of California Bibliographic Services Task Force) report.

Editorial aside: Advanced browsing in the OPAC has a longer history than many people realize. One of the things Peter Morville mentioned in his talk was the Flamenco Project at UC Berkeley, which has been going on for awhile. And Endeca may have succeeded (finally) in patenting their particular implementation of faceted browsing, but a quick glance at the patent itself reveals a reference to Dr. Stephen Pollitt's work on HIBROWSE, one of the first faceted OPACs from the mid-1990s. Many of the ideas behind subject-enhanced browsing in the OPAC were first explored by Karen Markey (for OCLC) in 1983, so it's great to see some of those ideas finally coming to fruition.

Now let's talk about Endeca at NCSU. Shortcomings of the old catalog included a subject search "feature" that was nearly impossible to use, and results were always presented in "system sort" order which made them difficult to navigate (sound familiar?). Title was the standard default search, but of course many users expect a keyword search. Clearly a better solution was needed. NCSU's Endeca implementation uses two search boxes, because they weren't willing to get rid of authority searching (a future goal is to combine them). The top box is for keyword search, while the bottom is for authority (primarily known-item) searching.

Let's try a really general search in the keyword box, like "art history." Sure, we get thousands of results, but the LCSH-based subject facets at the top allow us to narrow the result set very quickly based on readily apparent criteria. So, faceted navigation actually works very well with MARC metadata to allow for fast narrowing. Another benefit of the Endeca software is relevance ranking of results (this was actually their primary goal, not the faceted navigation!). So even for large result sets, the most relevant books are right at the top. Studies they've done show a vast improvement in relevance of keyword searches. The new system also allows users to see at a glance where the book is located, both on campus and in the stacks, and whether it is checked out or not. We can even filter the results to include only books that are currently available. No need to click half a dozen times! We can also sort results by "popularity" (uses circulation data); this is one of the most frequently used sort criteria, along with availability. Another useful feature is the "did you mean" spelling suggestion (ALL catalogs should have this feature!). And the new catalog allows users to subscribe to new books lists (or any catalog search) via RSS.

Kristin modestly asserts that the NCSU catalog is Library 1.1 (not even close to 2.0). However, they have been taking additional steps toward building the catalog of the future. One big step is CatalogWS, a generic XML web service layer implemented on top of the existing catalog. This allows users to search the library from their web-enabled cell phone or other mobile device; it also includes library locations and hours. They are currently working on integrating the catalog into the website, so you'll be able to search the whole library right from the front page using a single search box. [Editorial comment: the UPenn library has had a similar cross-search feature for some time, and it is amazing.]

MLibrary 2.0: Antelman Slides

Kristin then presented some usage data from a few studies they have done since implementing the new catalog. About 67% of transactions are search ONLY. Facets are still used, however (the most popular facets are subject and LC classification). Publication date is the most popular sort option. So users are still searching the catalog in similar ways, but they are able to do it much more efficiently and effectively.

However, Kristin is quick to point out the limitations of the current system. One major shortcoming is that it doesn't really solve the perennial problem of syndetics (how to make a connection between the user's search vocabulary and the controlled vocabulary of LCSH). It's still a keyword search at bottom, and the keywords are not mapped onto LCSH terminology. As an example of this, Kristin did a search for "revolutionary war" and found 870 hits. If you knew the proper LCSH heading to search, you'd get over 3,000 hits, with many useful subdivisions to help narrow your search. Unfortunately these are still not exposed in the catalog in any meaningful way - in fact, faceted navigation "disguises" the problem of syndetics by presenting the results of the keyword search as if it constituted the entire universe of materials on that topic. This leads users (and even librarians) to "satisfice" because they will generally find something, even if it isn't the best or most comprehensive information that's available. [Editorial comment: this is a problem Thomas Mann discusses in his book Library Research Models. Often finding something is worse than finding nothing, because most users will make do with whatever they happen to find on the first try.] It's important to make sure patrons are finding the right book (Ranganathan's second law: every reader his/her book). In principle it should be possible to correct this problem programmatically, because in LCSH there is an entry for "revolutionary war" with a see reference to the proper subject heading. The NCSU folks are working on mapping these references and leveraging them in the catalog.

Kristin then rattled off a list of other "experimental" catalogs and related websites which she considers part of the "next generation catalog" trend:

MLibrary 2.0: Antelman at the Podium

Having discussed the NCSU catalog and a few kindred efforts, Kristin moved on to the future of the catalog and its relationship to the evolution of bibliographic control. There has been a lot of recent activity toward developing a new metadata framework, including the Library of Congress working group on the Future of Bibliographic Control and the recently announced partnership between the RDA (Resource Description and Access, the successor to AACR2) and DCMI (Dublic Core Metadata Initiative) efforts (see Karen Coyle's writeup for more info). Much of the discussion has been taking place on mailing lists, notably NGC4lib and RDA-L, so anyone who is interested in these recent developments should go check them out.

What's more important, making holdings available, or bringing them under bibliographic control? There is a real tension here (which was brought out in the Karen Calhoun report), because libraries have scarce financial resources, and when our catalogers are creating metadata that's difficult to leverage in our fancy new OPACs then all that cataloging labor begins to look like a diminishing ROI [Editorial note: I may be interpolating a bit here; my notes are somewhat sketchy at this point.] There seems to be a cultural disconnect between the cataloging and metadata standards community (MARBI, etc.) and the community of OPAC interface hackers (Code4lib). These two groups move at very different paces (RDA has been in the works for, what, 6-7 years now?). You can't have search without metadata, at least not with our current means of bibliographic control, so there is a real danger that the recent advances in catalog interfaces will be stymied by the failure of the cataloging community to keep up.

As a case in point, Kristin asks the question: "What is an identifier?" Librarians have one idea of what it is: a title and an author string. But to web programmers, this is a hopelessly unreliable means of identifying unique objects - in a networked environment you really need some sort of URI. Most OPACs that are currently in use don't even have stable URLs for each record! This leads Kristin to ask what is the most effective way to expose our library metadata to the web - can't we just dump it into an index and have Google crawl it all, so that it shows up in people's web search results? But think of how confusing it would be if everybody did that! We need to develop our own networked services for exchange of bibliographic data - a sort of distributed "bibliographic cloud." Kristin mentioned Jason Griffey's recent comment on NGC4lib that "the true future for bibliographic data has to be in some P2P form, distributed and shared in the background of our systems" (read Jason's follow-up post for more explanation).

So the bibliographic standards community shares many goals with the Semantic Web community; this is what makes the recent decision to use RDF and SKOS to disclose the new RDA element vocabulary (which will be based on the DCMI Abstract Model) so very exciting [Editorial comment: my geek sense is tingling!]. As a further harbinger of momentous changes to come, the most recent issue of Cataloging and Classification Quarterly, "Knitting the Semantic Web," is devoted entirely to the intersection of bibliographic data and Semantic Web technology (UMich people can read the whole issue online). One of the big problems with the current standard of AACR + MARC is that it fails to achieve a separation between the metadata elements and the cataloging rules. Kristin gave an example of how an alternative scheme could work, using SKOS to describe some microform materials [my notes are pretty sparse here, sorry!].

Given all this churn within the bibliographic standards arena, what direction should we be going in with our next-generation OPACs? One thing Kristin mentioned was using FAST to achieve better faceting than regular LCSH can provide; they are investigating migrating to FAST at NCSU. The Worldcat Identities project uses aggregate data from Worldcat to display works clustered by author (see Thom Hickey's post for more info). However, these are both OCLC projects and are based on closed, proprietary technology. On the other hand, open frameworks do exist for achieving the same results - this is what semantic web technology is all about. The biggest problem the library community currently faces is that our metadata vocabularies are not prepared to be incorporated into semantic web applications. For example, authority files are not identified by a URI. Cataloging practices are only loosely standardized; we use partial information to fill in our records and because our search systems rely on vague identifiers that's usually good enough. If librarians want to play on the semantic web, we're going to need a serious cleanup of our metadata models. We also need to have more openness and less concern about authority. How much bibliographic control do we really need? Many people will scoff at this question, but with the volume of information we face today it's a serious issue. This is not to say we should jettison the idea of authority, but we could do much more with our data if our metadata vocabularies were open and extensible.

So what are some requirements for the ideal catalog? Among other things it should recognize clusters of knowledge, show the lineage of publications, identify authors, make previously unknown connections between works visible to the user, and show the authoritativeness and popularity of sources. [Editorial comment: I believe Kristin was referring to a recent thread on NGC4lib which featured a discussion of one person's wishlist for a next generation library catalog. See Futurelib Wiki for more details.] Some of these ideas have been explored by people at the Institute for the Future of the Book. Kristin referred to this post by Ben Vershbow about the idea of a "people's card catalog" built from open source software, to serve as an alternative to commercial products like Google Book Search. Google Book Search has raised the spectre of the elimination of metadata: why spend all that time creating document surrogates when all of the library's text is online and searchable? Needless to say, the library science community is not yet comfortable with such a flagrant conflation of data with metadata, but it's something to which we should give serious thought.

Kristin mentioned David Weinberger's new book Everything Is Miscellaneous (which Peter Morville also referenced in his talk - read his review here) with its notion of three "orders" of information: the book on the shelf, the catalog, and the web. We need to get our data ready for the "third order," but our legacy metadata sets and arcane cataloging practices create a cultural barrier that prevents easy integration with the current state of the art. Librarians cherish consensus on an international level, which makes it tough to innovate in the library world. How can we standardize internationally and still be innovative? Individual libraries can do a lot, and the current technology tools make it much easier, but we could do so much more if our vocabularies were open and extensible! We've put so much effort into the metadata vocabularies that we use, but they are still owned by private institutions (LoC, OCLC). How can we migrate our data into an open web environment? The objective is to be able to control the data that goes into (and comes out of) our ILS. This in an impedance mismatch that needs to be fixed, and soon!

MLibrary 2.0: Antelman with Superpatron and JP Wilkin

Q & A

Q: Can you give a brief definition of the semantic web?
A: Basically it means we can search for meaning instead of just keywords. The semantic web shares an objective with the OPAC, in that we're trying to apply controlled vocab in a way that allows this kind fo searching.

Q: What about licensed content, like articles? Aren't there going to be some legal challenges in obtaining better control over those?
A: NCSU has looked at clustering metasearch results using something like Vivisimo, but there is no good solution to on-the-fly clustering in metasearch due to huge inconsistencies in the results you get from different vendors. This is a hard problem: as long as we don't have control over the data, there's not much we can do about it. Unfortunately librarians made a decision long ago to give up control over our data, and now we are paying the price.

Q: What about recommender systems and "find similar?"
A: This is a great idea and some libraries have tried it, but it's hard to do well without the ability to get more detailed use data out of our ILS. Our surrent systems don't track much of the info that would be required for building systems like what Amazon provides. We need to do a better job of leveraging use data, but there are also privacy concerns. Think about the long tail effect: we need to be able to make connections between items that may not be very popular in themselves, but that might mean a lot to a student or scholar in the right discipline.

Posted by jkglenn at 04:06 PM | Comments (0)

June 09, 2007

MLibrary2.0 Kickoff, Part 1: Peter Morville via Twitter

Remember, read this post from the bottom up.


PM The values of librarianship are very important, and we need to find a way to ensure those are incorporated into the new environment.

PM People exaggerate the way web20 / library20 obviate the need for the old

PLEASE NOTE: These events will be podcasts at the site http://www.lib.umich.edu/lib20

PM: embedded information dense spaces - walking becomes a new form of query

PM: Trends: push for local information / yellow pages will disappear

Q&A - location, location, location = Google rank and what else?

MLibrary 2.0: Morville Slides: Contact Info

PM: Libraries as cathedrals of knowledge

PM: Story of the 3 Stone Cutters: 1. making a living; 2. the best stonecutting job in the county; 3. building a cathedral

http://tinyurl.com/2gjzwr

PM: Shaping Things / Everyware

PM: Julian Bleecker - blogjects and pigeons / manifesto for networked objects

http://www.delicious-monste...

PM - delicious library scans barcodes and ISBNs for personal libraries / neighborhoods / etc.

PM http://semapedia.org -- tagging RL objects and spaces

PM: Google Book Search / podzinger -- expanding what we consider the web

PM: flickr successful with clustering tags that often appear together

PM - hybrid solution - clustering driven by human selected taxonomy

PM: Clusty & Automated categorization http://clusty.com

PM: NCSU Libraries using guided navigation for site / flamingo project

PM - http://buzzilions.com - guided navigation / search

PM: Search is one of the most important ways we learn.

PM: Marcia Bates - Berrypicking, 1989

PM: Interfaces - one size does not fit all.

PM: John Battelle "search has become the new interface of commerce."

PM - http://etsy.com taxonomic shopping, vendor driven w/ tags ad feedback loops

MLibrary 2.0: Morville Slides: Pace Layering

PM: Stewart Brand - how buildings learn - pace layering (important concepts evolve slowly, less critical concepts quickly - fashion)

PM: Leaves become food for trees.

PM: David Weinberger - Everything is miscellaneous "The old way creates a tree. The new rakes leaves together."

PM: "This is not your mother's metadata."

PM: Who can help? :) Revenge of the Librarians

PM: http://map.net --- interesting products that are fun but not useful

PM: How do we create bigger needles for our haystacks?

PM: David Brin's Transparent Society -- YAYYYY!!!! David! :) http://www.davidbrin.com/ts...

PM - Google StreetView

PM: http://amal.net/rfid

PM: Bruce Sterling - the internet of things / the internet of objects

PM - Apple iPhone - web in your pocket, full featured

PM: Control granularity of information and location, and who sees it.

PM - Privacy concerns of ubiquitous geo-info for real people

MLibrary 2.0: Morville Slides: Scary Gadget

PM: device to scare people about the future - wristwatch to track your child's location.

Sorry - http://neighbor.com

PM: neighbor.com beta - mashup of political affiliations

PM: http://microsoft.com/surface

MLibrary 2.0: Morville Slides: The Other Ambience

PM: All sort of alternate interfaces -- it won't just be about PDAs and smartphones and ...

PM: highlighting David Rose - http://AmbientDevices.com

MLibrary 2.0: Morville Slides: Dilbert

PM: Wealth of information creates poverty of attention. Shift from push to pull. What happens to how we make decisions?

MLibrary 2.0: Morville Slides: Chained Libraries

PM: The good old days when librarians had *real* power. (Library thieves in middle ages were cursed forever)

PM: Perfect findability is impossible.

@GardnerCampbell -- Listening to Peter Morville at http://www.lib.umich.edu/lib20

PM: We can talk about findability at the object and system levels. We need to think across channels, in transmedia terms.

PM: Every architect needs to have one foot in the past and one foot in the present. We also need to design for the future.

MLibrary 2.0: Morville Slides: Findability

PM: NCI portal. Findability example. Search broad (ie "cancer") NCI comes up; search narrow (ie "ovarian cancer"), they don't.

PM: Trust is associated with high Google results - findability and credibility are interrelated

PM: Credibility audit.

PM: Ask 3 qs: Can users find our site / find their way around our site / can they find our services in spite of our site

PM: Strive for desirability ... Attractive things work better.

MLibrary 2.0: Morville Slides: What's Important

PM: "What does usability really mean?"

MLibrary 2.0: Peter Morville

PM: I tell my mom I organize web sites so people can find things.

Peter Morville - "I'm one of those librarians who fell in love with the Internet."

"Rather than going to someplace on the web, the library comes to you. That's what it means to me." Eric Frierson

"Library 2.0? Sounds like a buzzword to me."

Library Revolution highlighted http://libraryrevolution.com/

John Seeley Brown - Learning Reconceived for the Networked Age http://tinyurl.com/detxs

MLibrary 2.0 begins -- http://www.lib.umich.edu/lib20/

Posted by pfa at 11:59 AM | Comments (0)

MLibrary2.0 Kickoff Twittered!

I will say that I have never seen a lbirary workshop here with so many computers and digital gadgets present!

MLibrary 2.0: The Computers Await

Because so many other people were liveblogging during the MLibrary 2.0 Kickoff, I twittered throughout the event. Twitter is a way of communicating and chatting with people about what you're doing at the moment in very small soundbytes (no more than 140 characters). More on Twitter itself later. For now, as a follow-up to the blog entry by JK Glenn, I'd like to provide the Twitter stream from Peter Morville's talk, with photos from Flickr embedded at appropriate places. Because Twitter is always arranged in reverse chronological order, you will want to read the Twitter stream below from the bottom up to get a sense of how this flowed at the moment. See the next blog entry for the Peter Morville
Twitterstream.

MLibrary 2.0: The Crowd Awaits

Posted by pfa at 11:40 AM | Comments (0)

June 08, 2007

Prepackaging RSS Feeds to Share & Teach

Are you a fan of RSS Feeds? Have you found a number of favorites you are tracking? Ah, but now you want your students to review a select set as part of their regular class readings.

There is an easy way to share selected RSS feeds with others. Pageflakes is a service that allows you to collect functional tools (flakes) and RSS feeds into a single web page. Here is an example that collects example RSS feeds in medicine and dentistry.

Pageflakes: Med/Dent RSS Demo: http://www.pageflakes.com/pfa/11118181

Notice you'll see organizations, journal table of contents, blogs, image streams, videos, and more.

Pageflakes: Med/Dent RSS Demo 1

Here is a screeshot that shows this entire collection, to give a better sense of the range of information included.

PageFlakes Med/Dent RSS Demo 2

Just imagine what you could do!

Posted by pfa at 06:57 AM | Comments (0)

June 01, 2007

Web 2.0 and Television

Someone at work was raving today about Zatoo, one of the new Web 2.0 approaches to television. Yes, really -- you can watch television on your computer, tag your favorite choices, see what other people watch who like the same shows you do, and all the regular web 2.0 interactions. Zatoo isn't the only one, either. I've been hearing buzz about Joost and Bablegum as well, and there are probably others.

What John liked best about Zatoo wasn't the social aspects, though -- it was watching international news shows, and seeing how the same news events are reported differently in different places. You see, because these TV aggregators online are often collecting different channels than what you might see on local cable options. For example, a lot of Zatoo's content comes from European sources.

If you want to check out this idea further, here are a few links.

SOURCES:

Babelgum: http://www.babelgum.com/

blip.tv: >a href="http://blip.tv/">http://blip.tv/

Joost (Beta): http://joost.com/

Zatoo: http://zattoo.com/

MORE INFO:

AdministerIT: Internet TV: Zatoo and Joost (beta): http://winmaclin.wordpress.com/2007/03/15/internet-tv-zatoo-and-joost-beta/

Ghacks.net: Say Goodbye to Joost and Babelgum, here comes Zattoo: http://www.ghacks.net/2007/04/08/say-goodbye-to-joost-and-bablegum-here-comes-zattoo/

Watching TV Online: http://www.textually.org/tv/

Posted by pfa at 08:45 PM | Comments (1)