14 December 2009

In the continuing saga of our information overlords, they’ve come out with Google Translate. As a former Latin teacher, I mostly love and partly dislike this system:

+ The on-the-fly translation is pretty sweet. In particular I love seeing how it recalibrates its concept of whole phrases as it gets new input — something I would have liked to have shown my students as a good practice.

+ It supports a bunch of languages and lets you choose any pair of them for initial & target (including some helpful options for non-Latin scripts and Romanization).

– Latin is not among the languages it supports, which limits my ability to probe it.

-/+ Using a language I know less well but can hack at lower levels (Spanish), I can see there are definite (and unsurprising) weaknesses, especially as sentences get longer (and presumably as grammar gets more complex, although once that happens my ability to translate the Spanish is also hampered). So minus, it doesn’t work as well, but plus, it still won’t be supplanting anyone’s language-homework-doing any time soon ;).

-/+ It uses statistical patterns derived from really big corpora (as we might expect of Google), not computational rules. On the one hand, my inner linguistics nerd is sorta sad. On the other hand, it’s awesomely googly (and more pragmatic/scalable, I’m sure).

However! The library angle I was getting at here is that you can search web sites in other languages. Enter terms in the language you know, and it’ll translate and search. Looks like it will only search one language at a time, and I don’t know how it deals with ambiguous terms, and I’m sure the quality degrades with phrase searches, but this does increase our ability to find all relevant information on a query, and I’m sure the tools will improve with time.

10 November 2009

data mining for fun and…

That slideset yesterday was funny, so I’ve RSSed the guy’s blog. Liked this recent post about data-mining your circ records. His university now has a recommender system (both “people who liked this book also liked” and “people in this course of study tend to like”) and a course-of-study-specific search functionality (nursing and law students want different books when they search for “ethics”). Turns out the recommender service is very popular and noticeably increases how much of their collection circulates (which my little ROI neurons like). Also provides suggestions for refining large searches based on search data. And keep an eye out for the very clever acronym which will warm your heart if you, like me, were online in the early ’90s.

12 October 2009

discovery interfaces in the Chronicle

Chronicle of Higher Ed article on discovery layers in library catalogs. Doesn’t say much I haven’t already seen (although if you have no idea what I mean by “discovery layers” do read it; it’s a good overview). I did like this bit, though:

“It’s sort of our answer to, Why it is you need a library when you have Google?” said Ms. Gibbons [vice provost and dean of the University of Rochester’s River Campus Libraries]. “What this is going to do is show how much you’ve been missing.”

Positioning libraries to stay relevant is, of course, a major obsession these days, and I liked how she phrased it — not exactly as “let’s present ourselves in ways that are familiar to the users” (although I do think that matters), but “by presenting ourselves in ways that are familiar to the users, we can better showcase ways that we are already awesome.”

Comments section is kind of disheartening. I shouldn’t be surprised that the demographic that reads the Chronicle is the demographic that is conversant with old-school catalog searching ;), but so many of the comments read as “fix the user, not the catalog” and…that just never works. Even if the user is uneducated about, e.g., subject headings (and let me tell you, one semester of library school showed me it is amazing how undereducated you can be about catalogs after even a humanities MA), even if the existing technology works really well once you put in the time to learn it — fixing users just never works.

It would make me sad if discovery layers made it impossible to do the sort of precise, controlled searching library nerds get good at, but another of the lessons of Google (or, for that matter, of any number of intimidating databases) is that your clean searchbox doesn’t mean you can’t have that functionality. But if you say to users “you can’t even play until you’ve spent a couple hours learning how” — well, just like my last post — that means there will be a lot of users you never get at all.

Make it easy. Or, at least: make the first hit free.

3 October 2009

“why google and apple win and you don’t”

This cartoon doesn’t realize that it’s making the same claim as discovery interfaces and other current OPAC design thinking, but it is.

18 September 2009

the perfect is the enemy of the good; the good is the enemy of the perfect?

In my Library Automation class yesterday, the concept of satisficing came up.

Digression: satisficing is where I feel most acutely the cultural conflict between the librarians I read and talk with in school, and the software geeks I socialize with. So any time that comes up, there’s a lot going on in my head.

Someone noted how the nature of research was changing as new search tools become available — not, to be tactful, that the quality was suffering, but that people are drawn to accessibility over exhaustivity. A favorite classmate of mine leaned over and said, “How is that quality not suffering?”

Well, class is not the time to go into that, but here’s my answer to her:

It depends.

Making search easier, making records and then content more accessible, means that more searches come up with something. It means that people are more prone to treat searching for information as a realistic tactic. It means that the generation of ideas, and the development of content and other products based on those ideas, is easier. It means we will have a world with more generation, more creativity, more content, more entrepreneurship.

And that content will cover our world with information kudzu which, like kudzu, will often have to be macheted away. Some of that content, those prototypes, those ideas, will be horribly flawed (broken, misleading, decontextualized) because they were based on incomplete or inaccurate information. But sometimes, the idea that exists, the product that exists, even if broken, is better than the idea or product that does not. I’m typing this on a browser with bugs on an operating system with bugs on hardware that’s getting increasingly apoplectic, but my life is better for having these.

So satisficing, yes, you are my little love for what you bring to our lives. But I think the cataloguers and old-school library theorists of the world have a very real point as well when they decry you. Because sometimes, the incomplete search really isn’t enough. There are objectives and applications for which good-enough is good-enough, but if I’m talking academic research (at least, past the undergraduate level)? If I’m talking, good heavens, medical research? Intelligence and security work? I would really rather the investigators not satisfice. And to this extent, the easy availability of patchy search, the least-effort temptation, really is a problem, and even a threat.

So there you go, M: the answer behind my expression.

13 August 2009

Create Your Own Economy (part I?)

I’ve just started reading Tyler Cowen’s new book, Create Your Own Economy. (That is to say, I’ve just finished Chapter 1.) I should preface this by saying that Cowen is one of my great intellectual crushes and his blog, Marginal Revolution, has taught me a lot and strongly influenced my thinking on some matters (as well as introducing me to one of my other great intellectual crushes, Sudhir Venkatesh). And I say all of these complimentary things because I’m going to spend the rest of the post cranky.

Chapter 1, roughly speaking, is about two things: the information explosion in modern society, including the tools that both generate and help us manage it; and the autism spectrum as a frame for helping Cowen understand his own thinking, and all of us better manage that information explosion in our own lives.

Now, I’m fascinated by the autism spectrum. I will download/read anything I come across with Temple Grandin in it, I’m fascinated by the way non-normative minds both illuminate the norm and broaden the meaning of humanity, and reports (particularly self-reports) from that spectrum tend to be the most personally gripping of all dispatches from non-normative terrain. But I can’t stand the way geekdom, a few years back, flocked to the spectrum — or, rather, the metaphor of the spectrum — for self-understanding. There’s a reason the DSM includes differential diagnoses, and therapy, outside (and perhaps neutral) observers. The faddishness of self-diagnosis, the appropriation of the metaphor as an explanation (or perhaps excuse) for oneself without the actual diagnostic process and its consequences, the cherry-picking of personally useful or (dare I say) sexy elements of a descriptive sketch on a web site without taking into account the full picture…right. Drives me crazy. For all that it’s a fascinating spectrum and, even, sometimes, a great metaphor.

And then (page 9!) I hit the word “catalog”.

Librarians have a passionate conversation going on the nature and meaning and management of information overload. Part of this passion surrounds the idea of cataloguing. And one of the key things here is — a lot of librarians get apoplectic about the lack of cataloguing online (in the very services Cowen refers to — Flickr, del.icio.us, iTunes, among others. Cataloguing’s a technical term, a technical idea in librarianship. It involves high (often very exacting) standards for metadata which facilitate precise and comprehensive searches. (Which are, really, often neither as precise nor as comprehensive as some librarians would like to think, but let’s leave that aside for the moment.)

Cowen sees a world of technical tools helping us to manage information overload…I see a world of tools which, don’t get me wrong, I spend a ton of time on and am madly in love with, but which create as many problems as they solve in that. I can get freakishly excited about crowdsourcing and folksonomies and what-have-you, but they also have very serious flaws with regard to some of the problems that cataloguing, in the librarianship sense, aims to solve. The tools we have now are very nascent. Our ability to organize information with them is in some ways very limited. (Why does my iPod have three different genres with names like “Electronica/Dance”, except differently punctuated? Did the geeks at the wedding I was just at get around to creating a hashtag for their photo uploads of the event — and if not, how will I find out what happened after I left, and even if so, how many sites is it scattered across, and how many photos will I miss because they missed the message? Why does my task management software not freaking integrate with my calendar?)

The fact that I can even ask these questions is, don’t get me wrong, pretty cool. This sort of participatory, decentralized information culture is going to lead us in all sorts of great directions, even though few to none of them will, I expect, resemble cataloguing (and somewhere in the dusty corners of librarianship, people will be shaking their fists at the sky about this). But Cowen’s view of what is going on in information tools is so very, very different from a lot of the views I encountered in my Information Organization class.

And that’s the other thing that made it hard to read this chapter — hard because some little bat of an idea was beating its wings against the cage of the book, wanting to argue and break and go off some other way. It’s one of the major difficulties I had in 415 in reverse. In 415, I read librarians’ conversations on these themes, and they had so little in common with conversations, on the same topics, that I’ve seen socially, in the worlds of computer geeks and online communities; I kept ranting at the papers I was reading, when they’d say something was obviously impossible but I could point to real-world examples, when they’d make statements with fundamentally different assumptions than those I’m used to seeing and take them as absolute truth. And here, I read Cowen’s piece of the conversation, and it has so little in common with what librarians have to say. “Libraries” appears precisely once in the index (page 43!). A brief scan of the index suggests that none of the philosophies and technical contributions of librarianship make an appearance in this book at all — and Cowen has a tremendously wide-ranging intellect and is a heavy user of his local libraries. Among non-librarians, he seems one of the most likely to really know things about library ideas.

I kept having the feeling in 415 that if librarians and non-librarians are having separate conversations about information tools, culture, philosophy — and if non-librarians are the ones out there generating and using the tools, with or without the theories, in a flawed but fecund creative explosion — then librarians, convening slow committees to generate precise tools — will be obsolete and never even notice. Cowen’s book, thus far, does not bode well for this.

How do we bridge those divided networks? How do we bring some of those conversations, and conversationalists, into a common sphere?

4 July 2009

why even the future needs librarians

So I was watching the new Star Trek movie and… (bear with me here).

At the end of the movie, offstage, we’ve got 10000 Vulcans on some colony, bereft of their planet, trying to rebuild their culture. And what’s one of the first things they’re going to do? Re-establish libraries. And hand out research grants to anyone who wants to fly around the galaxy combing libraries and archives and museums for vestiges of Vulcan culture. (Because, come on. Even the ten thousand Vulcans remaining are sure to be ludicrously wealthy, due to their skills with Science, and the Banking System of the Future has to be massively distributed, or it’d be incompatible with widespread spaceflight. They’ve still got access to their cash.)

So why (I think to myself) do they not sit at their awesome future computers, with their faster-than-light internet and digital libraries, rather than handing out all these research grants to people, going on long trips to interact with physical objects?

Well, they do that too, of course. But the future — while it may boldly go where no one has gone before, having toppled racial and species barriers — has probably not toppled bureaucracy, and funding shortages, and backlogs. Museums which have five copies of something have only gotten around to digitizing (or uploading) one, because they have more pressing things to do than be comprehensive, and it’s probably one of the other four that has some marginalia of suddenly crucial importance. Or they’ve digitized (and uploaded) all five, but it’s in some cruddy format that’s hard to search, like today’s jpgs of pages of text, or utterly obsolete. Or they had enough cheap interns from Starfleet Library School to get everything online in whatever the cutting-edge format is, but their indexing systems can’t keep up. Or weren’t designed for the kind of queries that a nearly-extinct civilization on a sudden cultural heritage binge is going to generate. (Because, seriously, what are those? I can’t even imagine.)

Doubtless I’m projecting the present too much into the future here. Maybe the future has robots that digitize everything for you, and seamlessly cross-platform file types, and automatic indexing so perfect that unicorns and rainbows pour forth from the servers.

And yet…I doubt it. I doubt that even the Gene Roddenberry utopia is free from everyday logistical constraints.

And even if it is, in the present, those everyday logistical constraints are hard. And indexing is desperately hard, and even more desperately underappreciated. You can’t connect people with information if you don’t have findability, per the Peter Morville book whence my tagline comes. And one of the best tools for bridging that findability gap is between our ears. (Even if they aren’t pointy.)