11/3/2005

What do you want to search for?

Filed under: — vika @ 1:29 pm

Anticipating Paul’s work on the search engine, a question for the text scholars:

What do you want a semantic search engine operating on a text to do for you?

Please have one or more texts in mind, regardless of whether they’re texts we’re putting up or those that interest you personally. The functionality, however, should be generalized. (For example: want to search for words in proximity to each other. How much proximity? Occurring within 3/5/10/? words of each other. Or: want to search for words with similar spellings, like love and lov’d and loves.

Examples of search engines for various corpora can be found here. The features you want may or may not be available on them, and you are certainly not limited to what you see – this is just to get you going.

10/25/2005

A couple very practical questions

Filed under: — mike @ 5:32 pm

While reading the posts on the encoding of Villani, several not-too-pleasant memories came to mind from months of encoding the Decameron. I too was desperately looking forward to using the encoding, like Matt, for pedagogical purposes, but once Dynaweb went awol all that work went down the drain, leaving us with no ability to run cross-searches on the text. (I still lose sleep thinking about it.) So, this leads me to pose a couple questions, which - I hope - lead in productive directions. Forgive me if I’m missing something simple.

1. Is there a plan within the framework of the VHL to recuperate this functionality in the Decameron’s part of the project? If so, could it be done before the course starts up in January? Indeed, is there any way to tweak it so that it will come back to life within what Paul is doing?

2. How reliant are we now with the new projects upon an external structure like this? I would hate to go through all of this only to have it die a Dynaweb death.

Hopefully,

M

10/17/2005

Villani: next steps

Filed under: — vika @ 7:51 pm

I’ve been thinking about what to do next with the Villani text. Considering we’ve been dealing in detail with the Esposizioni, the work we need to do between now and putting Villani online should be simple and quite similar to what we’ve done before, shouldn’t it? Well, yes and no: our two Villani editors-and-encoders are both at institutions other than Brown, not particularly close to each other. I’ll be getting a bit more involved in Villani as well, documenting the process as thoroughly as possible. All of this points to the blog as a good, systematic communication system that hopefully opens us up to critique and other helpful input from colleagues we don’t even know we have.

So, this is a first attempt at a to-do list for Villani. It’s tentative, but I’m pretty sure we have to do the first two bullet points before moving on to other things.

  • Go through dtd and correct the typos: I’ve used the power of software to create a DTD (Document Type Definition, where the formal rules for encoding are stored). We won’t actually be using the DTD to prescribe encoding, but making it automatically has helpfully revealed some typos in the XML, which need to be corrected. More on this below.
  • Debug attributes: A bunch of unconventional (to me) stuff was done with the encoding; some attributes have more than one value and we decided to use blank spaces to represent that (for example, <person role=”duke captain”>). But we must go through all the attributes again and delete any stray spaces: for example, it would be confusing to write <person role=”duke captain ruler of the people”> because “of” is not on the same semantic level as “duke”. I will leave that to Matt and Rala, and will help as needed.
  • Run text by Paul: I’m pretty sure the blank space thing is going to be all right, but I’m not comfortable enough with how XML interacts with other languages we’re using in the back end (notably PHP). So I’ll show it to Paul and ask him whether our mark-up is unnecessarily clunky in a glaring way. Anything that needs fixing won’t need re-encoding from scratch; we’ll find a way to automate as much of the transformation as possible, if it comes to that.
  • Put the text online: …and debug until it works. :)
  • Blog encoding principles: Matt and Rala will post here a completist account of their encoding principles, perhaps with a chart or table of all elements and attributes accompanied by the reasoning behind them. This will be the beginning of a document that will be publically linked from the main VHL site, and that will serve as a starting point for introducing other scholars to our work.
  • Move on!: Then the real fun begins: I will perhaps do a first pass at indexes (*cough* with a little help from my friends…) while the Villani scholars annotate?

Comments? Additions? Subtractions?

9/28/2005

Meeting with Paul: abandon all despair!

Yesterday, Massimo and I met with Paul to talk about where our various projects stand and where we go from here. The following are some highlights of that exciting hour and a half, full of cautious optimism and web browsing, as well as a general recap of the project so far..

The VHL interface is currently located here, on STG’s development server. It is very much a work in progress, and some features may not work at any given moment. But it’s coming along!

Part of Boccaccio’s Esposizioni sopra la Comedia is already online, and the rest is currently being encoded. The current plan is to have the entire text up online by the end of the current semester. The text can be viewed by exposition (example). We are looking for alpha testers of the annotation system! If you are a scholar with relevant expertise and would like to get an account, please email me (vhl-at-wordsend-org) There is no quantity commitment; however, at this point we’re looking for people who can both annotate the text and give us constructive feedback: what is good, what needs work (and what kind of work), what features would be desirable. The content of the annotations is up to the participant scholars. Current project participants: if you can think of possible interested parties, please email me as well.

We also have indexes, notably of people and places in the Esposizioni. If you would like to help us verify the entries, please help yourself! Instructions are on the above-linked pages.

Pico’s Conclusiones Nongentae, also known as the 900 Theses, is coming along. A group of scholars is ready to start annotating it as well. In order to render it more easily cross-referenced with other texts, Paul will merge the Pico database with the VHL database (which contains the Esposizioni annotations). This will not affect the user’s experience.

Massimo showed us a Latin lemmatizer called LemLat, the standalone software version of which looks potentially useful for the Pico text(s). We’re looking into it.

Paul told us about PhiloLogic, a search engine that STG has been studying. It is a powerful piece of software, which copies texts into its electronic brain and does its own thing with them, but allows you to modify the interface to fit into your project. We can potentially ask it to search annotations, if they are located within a file (as opposed to database). Paul is looking into its redundancy with MySQL; if it has unique features that we like, it may be our search engine soon.

The search engine is the largest overall VHL task for the year, technically speaking. What would you, o Researcher, like to be able to search for in our texts? Aside, that is, from simple string searches and already-developed things like word collocations?

One wish list item, which perhaps we’ll get to before the end of the grant, is a comparative Boccaccio/Villani glossary. This would be in addition to a glossary of terms that Boccaccio defines in the Esposizioni.

That’s about it, for the moment. Massimo, Paul: have I missed something?

8/12/2005

Linking indexes and forum

Filed under: — vika @ 12:14 pm

Pipe dream: automatically put an asterisk (or something) by an index entry, if there is a topic for that index entry on the discussion forum.

I’ll try to figure this out later on; or if someone else wants to, well, great. :)

5/14/2005

Esposizioni Mach 1: Verifying the Index

I am pleased to announce that there is now stuff to play with.

A part of the Esposizioni, the part that has been most thoroughly encoded so far, has been put up. From this rather large chunk (I’m guessing roughly 175 modern print pages), we have built an index of people’s names. Now, this index must be verified, and we need your enthusiastic help.

Not many people besides the project’s participants read this blog, so on Monday I’ll compose an email to be sent out (with modifications as you see fit) to various pertinent mailing lists. I’ll be happy to send it out to Humanist and Digital Medievalist lists. Anyone else willing to forward it along to colleagues or lists? If so, would you please let me/us know which lists you’re going to cover?

The project’s current status is critically important for a smooth interaction with it. For the moment, most of VHL’s stuff (everything except for this weblog and a discussion forum, about which below) currently lives on the development server of the Scholarly Technology Group here at Brown. It is very much a work in progress. At any time, it may simply not work, or work in unexpected ways. If you’re really lucky (?), you could happen upon a moment when one of us is working on the site, and the same page loaded twice a minute apart could well be completely different the second time around!

Believe it or not, however, this isn’t the most exciting part. The exciting part is this (n.b.: don’t use Internet Explorer to look at these):

  • The Esposizioni table of contents; click on a chapter to see it. Note, when viewing the text, that some terms are highlighted: proper names in blue, themes that we have begun to encode in pink, and words or phrases that Boccaccio regards as terms, and defines, in green. Hovering over a highlighted segment of text reveals more information about it. (For now, this information is in rudimentary form. We’ll be working on that.)
  • Indexes -> Esposizioni: People. The only index we have finished thus far. If you are interested in contributing verifications, additions or corrections for the entries in that index, we would welcome your contribution. You can click on any one of them to see a page of paragraphs in which a given entry appears. There are instructions on the main index page as well as on the index entry matches page; they explain how to contribute using the
  • discussion forum. Regardless of whether you participate in work on the Index, if you would like to discuss other ideas about the Esposizioni or the way our project is working out so far, please let us know by starting a discussion!

Please note: the annotation engine, built by Paul Caton, is not quite ready for use yet, and we will not be using it for verifying the index. When it’s ready, we would like for the annotators with sufficient access privileges to focus on their individual research, or that done with a small group of people on a specific issue. It would be beneficial for projects being researched by a larger group to be discussed on the forum, so as to alert the public and perhaps increase the level of interest and participation.

Thoughts?

3/3/2005

New semester, new update.

Filed under: — vika @ 5:47 pm

Okay, so it’s well into the new semester. But, although things have been slow here on the weblog, they’ve been moving at a pretty exciting speed here at VHL HQ.

In a nutshell:

  • Paul has created the annotation engine! It was tested with a dummy text, and next we’ll be testing it with real ones.
  • Part of the Esposizioni, just over 2000 of its 5000 paragraphs, is now ready to be put up, tested and cleaned up. Segments of Villani to follow imminently.
  • It looks like several of us are presenting VHL in many different venues; as such, at a meeting held last weekend we decided it would be great to have some business cards. I’m looking into that.
  • This semester will be dedicated to testing the annotation engine, beginning to actually annotate the texts we’re putting up, and hopefully gaining some momentum in terms of public interest by contacting people who might want to annotate specifically the texts we’re putting up. Italianists, medievalists – anyone reading this want to play? Please let us know in comments if you do.
  • We’re leaving the search engine and generating indices mostly for the second year of the grant. That’ll be a big project.

Aside from writing the papers I’ll be presenting in March, June and July, as well as checking in on the progress of various project components, I’ll be concentrating on getting the texts to play nicely with the code. Having just spent several hours cleaning up the Esposizioni chunk a bit (nothing major, just structural well-formedness), I’m excited to get my hands on actual code again. It’s so… logical. XML is a great toy, and tool.

Guyda and I have been asked for a few screenshots of the annotation engine, for the Digital Medievalist article; I’ll post them here when I make them.

My colleagues can update the world about their own work in more detail. Undoubtedly I’m missing something(s). Feel free to supplement this short report, o Team.

1/6/2005

dropdown fix

Filed under: — mike @ 1:59 pm

Here’s a dynamic dropdown that I used for another site. It works fine with IE.
http://www.dynamicdrive.com/dynamicindex1/hvmenu/

You can see that I set it up very similarly: http://www.holycross.edu/departments/mll/website/flas/flafront2.htm

11/10/2004

More on the prototype

Okay, so we’ve got some problems. That’s why I wanted feedback — thanks to all who responded.

The problem is this: Internet Explorer is notoriously bad at interpreting CSS (Cascading Style Sheets, the language that tells your browser how and where to display things). CSS is a standard that has been around for years; all other browsers conform to it, but IE doesn’t. Why? Because it’s Microsoft, no other reason. I am biased, it’s true; but it’s also true that Microsoft designers should get it together and un-break their software.

Rebecca’s comment about Firefox 1.0 does give me pause. I think I know how to fix it, though, and will do so momentarily.

Because there are now many browsers available for download at no charge, all of which interpret CSS correctly, I don’t think it’s practical to spend too much time fixing the IE bugs. I will look into it, to be sure; but it’s likely to require more time than we have. I spoke to Paul about this yesterday, too, and he agreed: for now, we should work with standards-compliant software and make things work there; then, if we still have the time and money, we can go back and fix the IE bugs. Meanwhile, if you have a chance, please look at the prototype again with Mozilla/Safari/Firefox/your non-IE poison, and let me know if there are any more problems.

10/18/2004

What information to store on the VHL site?

Filed under: — vika @ 1:00 pm

Dear all,

Paul and I are working on building the overall VHL site these days. I need a bit of brainstorming help (and if anyone’s reading, please feel free to chime in).

Simply put: every time a user enters in a URL, that user’s browser “talks” to the server hosting the site. Sometimes, the server will store unique identifying information about that particular computer so that, for instance, you don’t have to re-enter your login information at nytimes.com between visits, it just “remembers” you.

For various reasons, within each session, the VHL site will want to remember what a user is doing. For this, we have to be able to store information dynamically; in other words, we have to not only be able to put files on the server and be able to deliver them to the browser, we must also have the possibility of taking information from a user (when they click on a link, or fill out a web form) and store that for later retrieval. The question is, what kinds of information to store? Paul came up with a list, and I’ve added to it. Please take a look and tell us if we’re missing something. The list items starting with question marks are only tentative suggestions; please let us know in comments whether you think they’d be useful or not, and why. The more detail, the better.

N.B.: this isn’t a full list of features for the site. It just tackles one area of technical development. More actual web-based fun coming soon.

And so: we want to store information on at least two things: annotations and people.

For annotations we want to store at least:

  • a unique ID
  • text of the annotation
  • IDs of the text/texts the annotation is relevant to
  • the specific word or phrase that the annotation refers to [if any]
  • ? do we want to have different types of annotation (as in Pico); if so, what would the types be? (I would think at least two types – one being “normal,” prose annotation of a sentence or a paragraph, and another being a variant encoding. Thoughts?)

For people we want to store:

  • user name
  • password
  • different levels of access: it’d be nice to have an admin level (can edit others’ comments and do everything else), a participant user level (can edit own comments and post new ones), and a registration option for guest users that would remember their login info and perhaps the posts they’ve made on a forum. (It would, I think, make the site a lot more widely useful if there were a place where anyone at all could post a question or comment, regardless of their stature.) Others?
  • ? email address and institution, so that we may generate lists of annotators and send email to them as a group, if we need to?

So, o Team: what’s missing from this list?

9/30/2004

Update on what I’ve been doing

It’s been an eventful month here at VHL HQ. Guyda and I have just submitted a paper proposal to a publication, and separately from that I’ve submitted a couple of others, for conferences. We’ve set up people with encoding, which for now seems to be going well. Have met with Rala and Matt regarding Villani – I’ll let them expound on that. Esposizioni work seems to be going well too: I have repeatedly met with Roberto, who is kicking pretty seriously on the encoding; and Cristiana, Guyda and Mike have been researching aspects of the text.

Paul and I have started talking about the overall structure of the interface. For now, we’re focusing on the seminar room features, which will be the most widely used; he is thinking about back-end architecture and I am playing with layouts. For now, there’s a lot of preliminary dirty work on this front; as soon as there’s something concrete to share, we’ll post.