11/7/2005

Search Engine

Filed under: — Massimo @ 9:53 am

I agree with Mike that the Balzac example is an interesting one, although it clearly applies to a corpus (oeuvre) by a single author. From this point of view, let’s remember that VHL is not a “single author” project - I find more affinities with the WWP or the EEBO. Of course, the search engine is a valuable tool for annotating. However, what our search engine should be able to do, eventually, is to maximize the possibilities embedded in our “differentiated” encoding. For example: crossreferencing names, places, dates, visualizing text strings and paragraphs etc., but also allowing to perform more sophisticated searches for authorial, thematic, semantic/rhetorical structures as we identify and encode them in the various texts (what fields would be appropriate for these other tasks?). Our goal is to enable a comparative and explorative approach to texts that belong to the same cultural context but also to different typologies of writing and rhetorical genres (we have chosen these texts precisely because of the wide spectrum they represent). How does the search engine help us reach that goal? Another question raised by Mike: keeping commentary and text separated is ok, but isn’t encoding a form of embedded commentary? Does Mike mean annotations? Will we be able to search annotations as well - in relation to text - once we have a significant amount of annotations? I suppose we can proceed by stages and add functionality and power to our engine as we progress in the encoding and annotating process. However, in designing it, one of the fundamental prerequisites we should keep in mind is its “expandibility” - to keep it open to the possibilities that lie ahead of us, including potential applications in the seminar room.

11/3/2005

What do you want to search for?

Filed under: — vika @ 1:29 pm

Anticipating Paul’s work on the search engine, a question for the text scholars:

What do you want a semantic search engine operating on a text to do for you?

Please have one or more texts in mind, regardless of whether they’re texts we’re putting up or those that interest you personally. The functionality, however, should be generalized. (For example: want to search for words in proximity to each other. How much proximity? Occurring within 3/5/10/? words of each other. Or: want to search for words with similar spellings, like love and lov’d and loves.

Examples of search engines for various corpora can be found here. The features you want may or may not be available on them, and you are certainly not limited to what you see – this is just to get you going.

5/14/2005

Esposizioni Mach 1: Verifying the Index

I am pleased to announce that there is now stuff to play with.

A part of the Esposizioni, the part that has been most thoroughly encoded so far, has been put up. From this rather large chunk (I’m guessing roughly 175 modern print pages), we have built an index of people’s names. Now, this index must be verified, and we need your enthusiastic help.

Not many people besides the project’s participants read this blog, so on Monday I’ll compose an email to be sent out (with modifications as you see fit) to various pertinent mailing lists. I’ll be happy to send it out to Humanist and Digital Medievalist lists. Anyone else willing to forward it along to colleagues or lists? If so, would you please let me/us know which lists you’re going to cover?

The project’s current status is critically important for a smooth interaction with it. For the moment, most of VHL’s stuff (everything except for this weblog and a discussion forum, about which below) currently lives on the development server of the Scholarly Technology Group here at Brown. It is very much a work in progress. At any time, it may simply not work, or work in unexpected ways. If you’re really lucky (?), you could happen upon a moment when one of us is working on the site, and the same page loaded twice a minute apart could well be completely different the second time around!

Believe it or not, however, this isn’t the most exciting part. The exciting part is this (n.b.: don’t use Internet Explorer to look at these):

  • The Esposizioni table of contents; click on a chapter to see it. Note, when viewing the text, that some terms are highlighted: proper names in blue, themes that we have begun to encode in pink, and words or phrases that Boccaccio regards as terms, and defines, in green. Hovering over a highlighted segment of text reveals more information about it. (For now, this information is in rudimentary form. We’ll be working on that.)
  • Indexes -> Esposizioni: People. The only index we have finished thus far. If you are interested in contributing verifications, additions or corrections for the entries in that index, we would welcome your contribution. You can click on any one of them to see a page of paragraphs in which a given entry appears. There are instructions on the main index page as well as on the index entry matches page; they explain how to contribute using the
  • discussion forum. Regardless of whether you participate in work on the Index, if you would like to discuss other ideas about the Esposizioni or the way our project is working out so far, please let us know by starting a discussion!

Please note: the annotation engine, built by Paul Caton, is not quite ready for use yet, and we will not be using it for verifying the index. When it’s ready, we would like for the annotators with sufficient access privileges to focus on their individual research, or that done with a small group of people on a specific issue. It would be beneficial for projects being researched by a larger group to be discussed on the forum, so as to alert the public and perhaps increase the level of interest and participation.

Thoughts?

9/14/2004

The Laws of Cool

Filed under: — vika @ 1:09 pm

Alan Liu’s new book The Laws of Cool: The Culture of Information looks really interesting. I’ve put in a purchase request at our library, and am thinking of buying it myself. (If it were in stock at the Brown Bookstore, it would’ve been an impulse purchase. Luckily for my finances, they’re fresh out.)

The ultimate message of The Laws of Cool is that “cool” may be the most authentic response of contemporary culture to postindustrial knowledge work because it holds open a reserve of counter- or anti-knowledge (an “ethos of the unknown”), but nevertheless in its current form cool is often also know-nothing, narrow, shallow, self-centered, cruel, and coopted. Laws of Cool posits that the task of the humanities and arts at the present time is to educate the cool to use technology in a way that mediates between knowledge work and a fuller lifework glimpsed in historically other lives and works.

I liked this quote so much that I printed it out. It’s so easy to get carried away in the flashy-toy aspect of technology, especially before we have a good idea of what kind of work and expertise is required in order to use the flashiest of tech toys. It’s more challenging, and I think more interesting, to make things that are both elegant and supremely useful; either one of these taken separately is a common occurrence, but together they can be hard to find.

I look forward to reading Alan’s book. Thanks for the pointer, Scott!