« May 2007 | Main | July 2007 »
June 27, 2007
In the Chronicle's Blog - Librarians & Web 2.0
We will get back to more how-to entries, but in the meantime, you might find this of interest from the Chronicle.
The Chronicle of Higher Education: Wire Campus Blog: Librarians Find a Place in a 'Web 2.0' World: http://chronicle.com/wiredcampus/index.php?id=2182
If you didn't already, now you know where to go ... ;)
Posted by pfa at 08:45 AM | Comments (0)
June 10, 2007
MLibrary2.0 Kickoff, Part 2: Kristin Antelman
The MLibrary 2.0 forum series got off to a great start on Friday. A capacity crowd of about 140 people joined us in the Michigan Union for presentations by Peter Morville, Kristin Antelman and Jessamyn West (speaker bios are here). I'll be writing up my notes from the event and posting the writeups here. Slides and videos from Peter's and Kristin's presentations will be posted to the Events page within the next few days; many thanks to Peter Knox for documenting this event!
Update: Having lost Saturday's blogging battle to the forces of beautiful weather, and seeing Patricia's excellent post covering Peter Morville's presentation, I've decided to skip my coverage of Peter's talk (which has also been blogged here and here) and go straight to Kristin's. Note: Since this was the geekiest presentation by far, and hence my automatic favorite, I'm going to include as many links as I can to the things Kristin mentioned in her talk. I might throw in a few related references of my own, and I will label them as such.
Kristin Antelman: Next generation catalogs
Preview: Kristin's going to describe some features of the NCSU catalog, then ask us where we think we as a profession are going with this whole catalog thing.
We've had online catalogs for a long time, but we've never quite gotten it right - in fact, for many purposes the OPAC is worse than the card catalog. Subject browsing is one of them, and there have been several experiments with advanced subject browsing in the OPAC: Mark Ludwig's project at SUNY Buffalo, which indexed MARCXML records; Casey Bisson's WPopac (now Scriblio), which uses faceted browse and runs entirely on free software; Casey Durfee's "Open Source Endeca" (Apache Solr + Django) from Code4lib 2007. Kristin mentioned a couple of recent reports that gave some theoretical background to such efforts: the Karen Calhoun report for the Library of Congress and the BSTF (University of California Bibliographic Services Task Force) report.
Editorial aside: Advanced browsing in the OPAC has a longer history than many people realize. One of the things Peter Morville mentioned in his talk was the Flamenco Project at UC Berkeley, which has been going on for awhile. And Endeca may have succeeded (finally) in patenting their particular implementation of faceted browsing, but a quick glance at the patent itself reveals a reference to Dr. Stephen Pollitt's work on HIBROWSE, one of the first faceted OPACs from the mid-1990s. Many of the ideas behind subject-enhanced browsing in the OPAC were first explored by Karen Markey (for OCLC) in 1983, so it's great to see some of those ideas finally coming to fruition.
Now let's talk about Endeca at NCSU. Shortcomings of the old catalog included a subject search "feature" that was nearly impossible to use, and results were always presented in "system sort" order which made them difficult to navigate (sound familiar?). Title was the standard default search, but of course many users expect a keyword search. Clearly a better solution was needed. NCSU's Endeca implementation uses two search boxes, because they weren't willing to get rid of authority searching (a future goal is to combine them). The top box is for keyword search, while the bottom is for authority (primarily known-item) searching.
Let's try a really general search in the keyword box, like "art history." Sure, we get thousands of results, but the LCSH-based subject facets at the top allow us to narrow the result set very quickly based on readily apparent criteria. So, faceted navigation actually works very well with MARC metadata to allow for fast narrowing. Another benefit of the Endeca software is relevance ranking of results (this was actually their primary goal, not the faceted navigation!). So even for large result sets, the most relevant books are right at the top. Studies they've done show a vast improvement in relevance of keyword searches. The new system also allows users to see at a glance where the book is located, both on campus and in the stacks, and whether it is checked out or not. We can even filter the results to include only books that are currently available. No need to click half a dozen times! We can also sort results by "popularity" (uses circulation data); this is one of the most frequently used sort criteria, along with availability. Another useful feature is the "did you mean" spelling suggestion (ALL catalogs should have this feature!). And the new catalog allows users to subscribe to new books lists (or any catalog search) via RSS.
Kristin modestly asserts that the NCSU catalog is Library 1.1 (not even close to 2.0). However, they have been taking additional steps toward building the catalog of the future. One big step is CatalogWS, a generic XML web service layer implemented on top of the existing catalog. This allows users to search the library from their web-enabled cell phone or other mobile device; it also includes library locations and hours. They are currently working on integrating the catalog into the website, so you'll be able to search the whole library right from the front page using a single search box. [Editorial comment: the UPenn library has had a similar cross-search feature for some time, and it is amazing.]
Kristin then presented some usage data from a few studies they have done since implementing the new catalog. About 67% of transactions are search ONLY. Facets are still used, however (the most popular facets are subject and LC classification). Publication date is the most popular sort option. So users are still searching the catalog in similar ways, but they are able to do it much more efficiently and effectively.
However, Kristin is quick to point out the limitations of the current system. One major shortcoming is that it doesn't really solve the perennial problem of syndetics (how to make a connection between the user's search vocabulary and the controlled vocabulary of LCSH). It's still a keyword search at bottom, and the keywords are not mapped onto LCSH terminology. As an example of this, Kristin did a search for "revolutionary war" and found 870 hits. If you knew the proper LCSH heading to search, you'd get over 3,000 hits, with many useful subdivisions to help narrow your search. Unfortunately these are still not exposed in the catalog in any meaningful way - in fact, faceted navigation "disguises" the problem of syndetics by presenting the results of the keyword search as if it constituted the entire universe of materials on that topic. This leads users (and even librarians) to "satisfice" because they will generally find something, even if it isn't the best or most comprehensive information that's available. [Editorial comment: this is a problem Thomas Mann discusses in his book Library Research Models. Often finding something is worse than finding nothing, because most users will make do with whatever they happen to find on the first try.] It's important to make sure patrons are finding the right book (Ranganathan's second law: every reader his/her book). In principle it should be possible to correct this problem programmatically, because in LCSH there is an entry for "revolutionary war" with a see reference to the proper subject heading. The NCSU folks are working on mapping these references and leveraging them in the catalog.
Kristin then rattled off a list of other "experimental" catalogs and related websites which she considers part of the "next generation catalog" trend:
- Phoenix Public Library has a new Endeca implementation that uses the BISAC subject headings to provide browse functionality independent of MARC records.
- UVA's Project Blacklight, a faceted catalog prototype based on Ruby on Rails and Apache Solr (more information on Bess Sadler's blog).
- The Communicat project at Georgia Tech, which integrates MARC records and user-supplied data into one big open catalog using the Daisy CMS.
- Worldcat Local, which has been implemented by the University of Washington (the University of California is also working on an implementation, in response to issues raised in the BSTF report.)
- Librarything, which has been very successful in creating a kind of "community FRBR" by allowing users to cluster books into editions. (Kristin also mentioned the Thingology blog, which has accumulated a lot of interesting material over the past year.)
- Google Book Search, which at least has the potential to create a scholarly community around works.
- Editorial comment: for more in this vein, see MLibrary2's bookmarks tagged with 'OPAC' on del.icio.us.
Having discussed the NCSU catalog and a few kindred efforts, Kristin moved on to the future of the catalog and its relationship to the evolution of bibliographic control. There has been a lot of recent activity toward developing a new metadata framework, including the Library of Congress working group on the Future of Bibliographic Control and the recently announced partnership between the RDA (Resource Description and Access, the successor to AACR2) and DCMI (Dublic Core Metadata Initiative) efforts (see Karen Coyle's writeup for more info). Much of the discussion has been taking place on mailing lists, notably NGC4lib and RDA-L, so anyone who is interested in these recent developments should go check them out.
What's more important, making holdings available, or bringing them under bibliographic control? There is a real tension here (which was brought out in the Karen Calhoun report), because libraries have scarce financial resources, and when our catalogers are creating metadata that's difficult to leverage in our fancy new OPACs then all that cataloging labor begins to look like a diminishing ROI [Editorial note: I may be interpolating a bit here; my notes are somewhat sketchy at this point.] There seems to be a cultural disconnect between the cataloging and metadata standards community (MARBI, etc.) and the community of OPAC interface hackers (Code4lib). These two groups move at very different paces (RDA has been in the works for, what, 6-7 years now?). You can't have search without metadata, at least not with our current means of bibliographic control, so there is a real danger that the recent advances in catalog interfaces will be stymied by the failure of the cataloging community to keep up.
As a case in point, Kristin asks the question: "What is an identifier?" Librarians have one idea of what it is: a title and an author string. But to web programmers, this is a hopelessly unreliable means of identifying unique objects - in a networked environment you really need some sort of URI. Most OPACs that are currently in use don't even have stable URLs for each record! This leads Kristin to ask what is the most effective way to expose our library metadata to the web - can't we just dump it into an index and have Google crawl it all, so that it shows up in people's web search results? But think of how confusing it would be if everybody did that! We need to develop our own networked services for exchange of bibliographic data - a sort of distributed "bibliographic cloud." Kristin mentioned Jason Griffey's recent comment on NGC4lib that "the true future for bibliographic data has to be in some P2P form, distributed and shared in the background of our systems" (read Jason's follow-up post for more explanation).
So the bibliographic standards community shares many goals with the Semantic Web community; this is what makes the recent decision to use RDF and SKOS to disclose the new RDA element vocabulary (which will be based on the DCMI Abstract Model) so very exciting [Editorial comment: my geek sense is tingling!]. As a further harbinger of momentous changes to come, the most recent issue of Cataloging and Classification Quarterly, "Knitting the Semantic Web," is devoted entirely to the intersection of bibliographic data and Semantic Web technology (UMich people can read the whole issue online). One of the big problems with the current standard of AACR + MARC is that it fails to achieve a separation between the metadata elements and the cataloging rules. Kristin gave an example of how an alternative scheme could work, using SKOS to describe some microform materials [my notes are pretty sparse here, sorry!].
Given all this churn within the bibliographic standards arena, what direction should we be going in with our next-generation OPACs? One thing Kristin mentioned was using FAST to achieve better faceting than regular LCSH can provide; they are investigating migrating to FAST at NCSU. The Worldcat Identities project uses aggregate data from Worldcat to display works clustered by author (see Thom Hickey's post for more info). However, these are both OCLC projects and are based on closed, proprietary technology. On the other hand, open frameworks do exist for achieving the same results - this is what semantic web technology is all about. The biggest problem the library community currently faces is that our metadata vocabularies are not prepared to be incorporated into semantic web applications. For example, authority files are not identified by a URI. Cataloging practices are only loosely standardized; we use partial information to fill in our records and because our search systems rely on vague identifiers that's usually good enough. If librarians want to play on the semantic web, we're going to need a serious cleanup of our metadata models. We also need to have more openness and less concern about authority. How much bibliographic control do we really need? Many people will scoff at this question, but with the volume of information we face today it's a serious issue. This is not to say we should jettison the idea of authority, but we could do much more with our data if our metadata vocabularies were open and extensible.
So what are some requirements for the ideal catalog? Among other things it should recognize clusters of knowledge, show the lineage of publications, identify authors, make previously unknown connections between works visible to the user, and show the authoritativeness and popularity of sources. [Editorial comment: I believe Kristin was referring to a recent thread on NGC4lib which featured a discussion of one person's wishlist for a next generation library catalog. See Futurelib Wiki for more details.] Some of these ideas have been explored by people at the Institute for the Future of the Book. Kristin referred to this post by Ben Vershbow about the idea of a "people's card catalog" built from open source software, to serve as an alternative to commercial products like Google Book Search. Google Book Search has raised the spectre of the elimination of metadata: why spend all that time creating document surrogates when all of the library's text is online and searchable? Needless to say, the library science community is not yet comfortable with such a flagrant conflation of data with metadata, but it's something to which we should give serious thought.
Kristin mentioned David Weinberger's new book Everything Is Miscellaneous (which Peter Morville also referenced in his talk - read his review here) with its notion of three "orders" of information: the book on the shelf, the catalog, and the web. We need to get our data ready for the "third order," but our legacy metadata sets and arcane cataloging practices create a cultural barrier that prevents easy integration with the current state of the art. Librarians cherish consensus on an international level, which makes it tough to innovate in the library world. How can we standardize internationally and still be innovative? Individual libraries can do a lot, and the current technology tools make it much easier, but we could do so much more if our vocabularies were open and extensible! We've put so much effort into the metadata vocabularies that we use, but they are still owned by private institutions (LoC, OCLC). How can we migrate our data into an open web environment? The objective is to be able to control the data that goes into (and comes out of) our ILS. This in an impedance mismatch that needs to be fixed, and soon!
Q & A
Q: Can you give a brief definition of the semantic web?
A: Basically it means we can search for meaning instead of just keywords. The semantic web shares an objective with the OPAC, in that we're trying to apply controlled vocab in a way that allows this kind fo searching.
Q: What about licensed content, like articles? Aren't there going to be some legal challenges in obtaining better control over those?
A: NCSU has looked at clustering metasearch results using something like Vivisimo, but there is no good solution to on-the-fly clustering in metasearch due to huge inconsistencies in the results you get from different vendors. This is a hard problem: as long as we don't have control over the data, there's not much we can do about it. Unfortunately librarians made a decision long ago to give up control over our data, and now we are paying the price.
Q: What about recommender systems and "find similar?"
A: This is a great idea and some libraries have tried it, but it's hard to do well without the ability to get more detailed use data out of our ILS. Our surrent systems don't track much of the info that would be required for building systems like what Amazon provides. We need to do a better job of leveraging use data, but there are also privacy concerns. Think about the long tail effect: we need to be able to make connections between items that may not be very popular in themselves, but that might mean a lot to a student or scholar in the right discipline.
Posted by jkglenn at 04:06 PM | Comments (0)
June 09, 2007
MLibrary2.0 Kickoff, Part 1: Peter Morville via Twitter
Remember, read this post from the bottom up.
PM The values of librarianship are very important, and we need to find a way to ensure those are incorporated into the new environment.
PM People exaggerate the way web20 / library20 obviate the need for the old
PLEASE NOTE: These events will be podcasts at the site http://www.lib.umich.edu/lib20
PM: embedded information dense spaces - walking becomes a new form of query
PM: Trends: push for local information / yellow pages will disappear
Q&A - location, location, location = Google rank and what else?
PM: Libraries as cathedrals of knowledge
PM: Story of the 3 Stone Cutters: 1. making a living; 2. the best stonecutting job in the county; 3. building a cathedral
http://tinyurl.com/2gjzwr
PM: Shaping Things / Everyware
PM: Julian Bleecker - blogjects and pigeons / manifesto for networked objects
http://www.delicious-monste...
PM - delicious library scans barcodes and ISBNs for personal libraries / neighborhoods / etc.
PM http://semapedia.org -- tagging RL objects and spaces
PM: Google Book Search / podzinger -- expanding what we consider the web
PM: flickr successful with clustering tags that often appear together
PM - hybrid solution - clustering driven by human selected taxonomy
PM: Clusty & Automated categorization http://clusty.com
PM: NCSU Libraries using guided navigation for site / flamingo project
PM - http://buzzilions.com - guided navigation / search
PM: Search is one of the most important ways we learn.
PM: Marcia Bates - Berrypicking, 1989
PM: Interfaces - one size does not fit all.
PM: John Battelle "search has become the new interface of commerce."
PM - http://etsy.com taxonomic shopping, vendor driven w/ tags ad feedback loops
PM: Stewart Brand - how buildings learn - pace layering (important concepts evolve slowly, less critical concepts quickly - fashion)
PM: Leaves become food for trees.
PM: David Weinberger - Everything is miscellaneous "The old way creates a tree. The new rakes leaves together."
PM: "This is not your mother's metadata."
PM: Who can help? :) Revenge of the Librarians
PM: http://map.net --- interesting products that are fun but not useful
PM: How do we create bigger needles for our haystacks?
PM: David Brin's Transparent Society -- YAYYYY!!!! David! :) http://www.davidbrin.com/ts...
PM - Google StreetView
PM: http://amal.net/rfid
PM: Bruce Sterling - the internet of things / the internet of objects
PM - Apple iPhone - web in your pocket, full featured
PM: Control granularity of information and location, and who sees it.
PM - Privacy concerns of ubiquitous geo-info for real people
PM: device to scare people about the future - wristwatch to track your child's location.
Sorry - http://neighbor.com
PM: neighbor.com beta - mashup of political affiliations
PM: http://microsoft.com/surface
PM: All sort of alternate interfaces -- it won't just be about PDAs and smartphones and ...
PM: highlighting David Rose - http://AmbientDevices.com
PM: Wealth of information creates poverty of attention. Shift from push to pull. What happens to how we make decisions?
PM: The good old days when librarians had *real* power. (Library thieves in middle ages were cursed forever)
PM: Perfect findability is impossible.
@GardnerCampbell -- Listening to Peter Morville at http://www.lib.umich.edu/lib20
PM: We can talk about findability at the object and system levels. We need to think across channels, in transmedia terms.
PM: Every architect needs to have one foot in the past and one foot in the present. We also need to design for the future.
PM: NCI portal. Findability example. Search broad (ie "cancer") NCI comes up; search narrow (ie "ovarian cancer"), they don't.
PM: Trust is associated with high Google results - findability and credibility are interrelated
PM: Credibility audit.
PM: Ask 3 qs: Can users find our site / find their way around our site / can they find our services in spite of our site
PM: Strive for desirability ... Attractive things work better.
PM: "What does usability really mean?"
PM: I tell my mom I organize web sites so people can find things.
Peter Morville - "I'm one of those librarians who fell in love with the Internet."
"Rather than going to someplace on the web, the library comes to you. That's what it means to me." Eric Frierson
"Library 2.0? Sounds like a buzzword to me."
Library Revolution highlighted http://libraryrevolution.com/
John Seeley Brown - Learning Reconceived for the Networked Age http://tinyurl.com/detxs
MLibrary 2.0 begins -- http://www.lib.umich.edu/lib20/
Posted by pfa at 11:59 AM | Comments (0)
MLibrary2.0 Kickoff Twittered!
I will say that I have never seen a lbirary workshop here with so many computers and digital gadgets present!
Because so many other people were liveblogging during the MLibrary 2.0 Kickoff, I twittered throughout the event. Twitter is a way of communicating and chatting with people about what you're doing at the moment in very small soundbytes (no more than 140 characters). More on Twitter itself later. For now, as a follow-up to the blog entry by JK Glenn, I'd like to provide the Twitter stream from Peter Morville's talk, with photos from Flickr embedded at appropriate places. Because Twitter is always arranged in reverse chronological order, you will want to read the Twitter stream below from the bottom up to get a sense of how this flowed at the moment. See the next blog entry for the Peter Morville
Twitterstream.
Posted by pfa at 11:40 AM | Comments (0)
June 08, 2007
Prepackaging RSS Feeds to Share & Teach
Are you a fan of RSS Feeds? Have you found a number of favorites you are tracking? Ah, but now you want your students to review a select set as part of their regular class readings.
There is an easy way to share selected RSS feeds with others. Pageflakes is a service that allows you to collect functional tools (flakes) and RSS feeds into a single web page. Here is an example that collects example RSS feeds in medicine and dentistry.
Pageflakes: Med/Dent RSS Demo: http://www.pageflakes.com/pfa/11118181
Notice you'll see organizations, journal table of contents, blogs, image streams, videos, and more.
Here is a screeshot that shows this entire collection, to give a better sense of the range of information included.
Just imagine what you could do!
Posted by pfa at 06:57 AM | Comments (0)
June 01, 2007
Web 2.0 and Television
Someone at work was raving today about Zatoo, one of the new Web 2.0 approaches to television. Yes, really -- you can watch television on your computer, tag your favorite choices, see what other people watch who like the same shows you do, and all the regular web 2.0 interactions. Zatoo isn't the only one, either. I've been hearing buzz about Joost and Bablegum as well, and there are probably others.
What John liked best about Zatoo wasn't the social aspects, though -- it was watching international news shows, and seeing how the same news events are reported differently in different places. You see, because these TV aggregators online are often collecting different channels than what you might see on local cable options. For example, a lot of Zatoo's content comes from European sources.
If you want to check out this idea further, here are a few links.
SOURCES:
Babelgum: http://www.babelgum.com/
blip.tv: >a href="http://blip.tv/">http://blip.tv/
Joost (Beta): http://joost.com/
Zatoo: http://zattoo.com/
MORE INFO:
AdministerIT: Internet TV: Zatoo and Joost (beta): http://winmaclin.wordpress.com/2007/03/15/internet-tv-zatoo-and-joost-beta/
Ghacks.net: Say Goodbye to Joost and Babelgum, here comes Zattoo: http://www.ghacks.net/2007/04/08/say-goodbye-to-joost-and-bablegum-here-comes-zattoo/
Watching TV Online: http://www.textually.org/tv/
Posted by pfa at 08:45 PM | Comments (1)
















