11/3/2005

What do you want to search for?

Filed under: — vika @ 1:29 pm

Anticipating Paul’s work on the search engine, a question for the text scholars:

What do you want a semantic search engine operating on a text to do for you?

Please have one or more texts in mind, regardless of whether they’re texts we’re putting up or those that interest you personally. The functionality, however, should be generalized. (For example: want to search for words in proximity to each other. How much proximity? Occurring within 3/5/10/? words of each other. Or: want to search for words with similar spellings, like love and lov’d and loves.

Examples of search engines for various corpora can be found here. The features you want may or may not be available on them, and you are certainly not limited to what you see – this is just to get you going.

RSS feed | Trackback URI

3 Comments »

Comment by mike
2005-11-03 14:24:47

Great idea. Out of these examples, I have to say that I especially like the Balzac one.

Since other, interesting things are being put into the encoding of our texts, what about doing something like getting a list of the names that are within var. # words from the search object?

Regarding Villani, are the dates being encoded as dates? It would be great to be able to search for some common word by restricting its location to one or more of the dates that appear in the text (or to a predeterimned array).

For the Esposizioni, I would be absolutely overjoyed to have the pieces of the commented text marked as distinct from the commentary that surrounds it so that, for example, you can search for something in one without getting the ohter. SImilarly, users could strip all the commentary from the text and see what would amount to the copy of the INf that B had in front of him (which would be particularly useful for a place to stick variants in annotations). That would be a fantastic philological recontruction perhaps worthy of an article…

Other than these special things, nothing comes to my mind. It would be super, super fantastic to get the results in KWIC format. How does that seem as a possibility?

M

Comment by matt
2005-11-07 00:39:52

Mike,
As you’ll see when the full description of Villani encoding principles is up, and as you may be able to imagine from the entry below, dates are encoded. The dates are given as Villani gives them. The values include 1) the date and 2) the year. Since Villani jumps around a great deal in his chapters we’ve encoded expressions of dates themselves rather than whole chapters.

 
 
Comment by guyda
2005-11-07 10:07:13

I agree with Mike that KWIC results would be terrific - exactly the kind of thing which would constitute a real value-added function for textual scholars. Since Boccaccio tends to cluster lexical items (words, associations, intertextual references) in the same way across his corpus, to display this feature across a number of his texts would be tremendous.

Likewise, distinguishing between commentary and commented text, would be welcome, as would the citations. I’ll post on that separately, ‘momentarily’, as you say in American English.

 
Name (required)


E-mail (required - never shown publicly)


URI


Subscribe to comments via email

Your Comment
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong> in your comment.
Comments are moderated. Please submit only once.