12/10/2004

Trip report: The Face of Text (post the last)

Filed under: — vika @ 3:11 pm

On Sunday, things were generally slower, as often happens on the last day of a conference. It was a half-day, and everybody was tired; consequently, although the presentations themselves were interesting, my notes on them ended up more laconic.

Jean-Guy Meunier talked of two related but distinct projects, CARAT and SATIM. (The latter had been presented in more depth the previous day by his student Dominic Forest.) CARAT is an approach that expands to “computer assisted reading and analysis of text”; SATIM is a tool, a piece of software, for text analysis. CARAT, Meunier said, is a refusal of viewing reading and analysis as automatable processes. The computer is seen as a tool, not a robot; the work is performed using an interpretive, as opposed to analytical, paradigm.

Here, Meunier is advocating for building software that learns, as opposed to software that only computes according to pre-established, static algorithms. The CARAT abstract from ACH/ALLC 2001 gives more detail on learning software, in section 2 titled “Methodology.”

Next, Pamela Asquith and Peter Ryan gave us a tour of the Kinji Imanishi Digital Archive Project, dedicated to a remarkable Japanese scientist and mountaineer. With this one… well, like HyperPo, it just has to be seen. What a well-made site, both information-wise and aesthetically.

In the course of presenting the project itself, Asquith and Ryan spoke of design types. I know next to nothing about them, but perhaps they are worth research. The four types of design mentioned are: system-centered; user-centered; interaction design (bridges the first two); and situated activity (unclear to me what exactly this is, but Lucy Suchman wrote about it in 1987).

Eugene Lyman presented the Piers Plowman Electronic Archive. A good talk involving a lot of common-sense wisdom about usability. Ironically, the project’s site merely provides information about the project, which is being slowly released on a CD-ROM. And get this: you can only run the SGML version on Windows. If you’ve got a Mac, it’s HTML only, presumably with some semantic functionality missing. Usability, indeed.

Marc Pladmonton spoke of “Computer-assisted phonetic analysis of English poetry.” Can we quantify beauty, he asked? Can computers give us insights with regard to poetic beauty? His hypothesis was the following: poetry that is not melodious would have many unpleasant, hard-to-pronounce consonants, whereas melodious poetry would contain many pleasant-sounding vowels. Pladmonton did a phonetic analysis of poems by Browning and Tennyson, and found the results almost identical.

So he presented this approach as a tested one. Unfortunately, he has not tested it against a more random sample of the English language — prose, for example, or contemporary poetry. The approach is interesting, to be sure; but the research did not have a control group, and is thus not complete.

Jason Boyd spoke about REED (Records of Early English Drama). The site is dedicated to “Patrons and Performances,” and the project’s current aim is to “facilitate… research by undertaking the challenging task of abstracting the ‘hard’ data from these often ambiguous and imperfect historical documents and by enabling the user to effectively search this data through multiple avenues and angles which encompass a spectrum of research interests.” Interesting; certainly worth it to keep an eye on this one.

Finally, Elaine Toms presented the results of a web-based survey she and colleagues conducted, in a talk titled “Modelling the humanities scholar at work.” Is there a “generic” humanities scholar, they wondered, in terms of their use of e-texts and electronic text tools? If not, are there groupings with common sets of characteristics? Much statistical data flew about, and Toms made recommendations for where to go next in humanities computing. I admit to being sceptical about the results: the pool of surveyed humanists was both very small (under 300 people, I think) and heavily skewed towards computing humanists. In addition, from what I remember of the survey (which I took, and which is no longer online), it wasn’t constructed particularly well. Too bad, since the kind of data that Toms (along with Geoffrey Rockwell, Ray Siemens, Lynne Siemens and Stefan Sinclair) is after would actually be very useful. Perhaps, based on this experience, a more effective and thorough survey might be conducted in the near future, provided that the funding gods smile on the idea.

Picture this: Sunday afternoon. Everybody exhausted. The last keynote is coming up: how do you think it would go? Well, I’ll tell you what: Steve Ramsay made sure the conference went out with a bang. He didn’t just speak but performed, enthusiastically praising pattern and cracking effortless jokes of varying subtlety. His talk was so good that when the videos are published I’ll be watching it repeatedly, to relive the sheer pleasure of hearing it, and again to get inspired when I’m feeling down. Here are just a few things that Steve touched upon.

- It’s easy to use computers to amass empirical data about texts; it’s harder to make these data fully participate in the dialogue and research of the “hard” humanities.
- We [in humanities computing] are not out to provide objective solutions to interpretive problems.
- If you love computers very much, they will eventually lead you to study mathematics.
- Diagrams should offer readers the open possibility of interpretive insight. (insert pipe dreams of a 3D, rhizomatic diagram of, say, The Decameron)
- A good humanities research methodology should enjoy as much serendipity as possible. (I particularly like this one. It encourages us to expose ourselves to as many new experiences as possible, because otherwise, how are you going to get increased serendipity?)
- Some things you guess… or choose, based on past experience, possibly based on years of study. Or you could stand back and enjoy the variety of possibilities. A computer, however, refuses to choose arbitrarily, so you are forced to make concrete decisions.
- [Ergo,] Software must explicitly assert its utter lack of neutrality. (Through good documentation, sorely lacking in the field!)

There was much more, but I was too mezmerized to write it down.

Thus ends the Face of Text trip report. Corrections, conversation and competing conceptions cheerfully craved.

12/9/2004

Trip report: The Face of Text (post three)

Filed under: — vika @ 7:42 pm

Day two, Saturday, started with Julia Flanders’ keynote titled “Text Analysis and the Problem of Pedantry.” Julia is an engaging speaker, and it is tough to do descriptive justice to any talk she gives. McMaster techies recorded the entire conference (or was it just the keynotes?) on video, which they will be making available on the conference website. When this is done, I highly recommend watching Julia’s talk. In the meantime, here are some disjointed notes (any particularly clever turns of phrase are Julia’s own; most of the rest I’m paraphrasing):

- Why do we feel the way we do about detail? How do our tools engage detail-related concerns?
- Expressions such as “…and the like” and “et cetera” connect details to a larger whole. (vz: It occurs to me that providing one or two detailed examples followed by “et cetera” makes them more than mere examples: it singles them out as exemplary of a category, elevates their status.)
- Textual analysis is caught up in a methodological bind. Historically, it has tried to be Scientific; at the same time, alliances with the more empirical sciences are treated with suspicion.
- Pedants, far from being innocuous grudges, hold standards and convictions about scholarship. These are loudly voiced and hard to evade.
- The mark of a pedant is “the itch of contradicting great men on very slight grounds.” (This quote was borrowed from Richard Bentley, 18th-century textual critic.)
- So attention to detail is both important and unattractive.
- Tools (!) like TextArc may make quantitative analysis more attractive to people who do not consider themselves That Kind of Scholar.
- We are challenged to project ourselves through detail on to a larger something-else. Use pattern as a clue to something, a further causality.
- The role of the interface (there’s that all-important visualization again) in expressing text analysis has changed. It heightens and alters our perceptions, seeks to stimulate an open-ended interpretive process.
- Pedantry makes explicit (postmodernism provides other clues) that totality - of a text, a corpus, a biography - is an illusion.
- We have to not only seek and value the pattern, but be inquisitive as to why certain patterns seek us out, and why we build the tools we build to seek them.

The concept of completeness in presenting research results has long bothered me. Conference papers, articles, books - especially books - are expected to be nigh unto perfectly researched. The impossibility of such an endeavor (nothing would ever be written!) does not entirely discard this expectation. I think VHL is partially addressing this issue. One of our aims, as I understand it, is to allow scholars to annotate a tiny bit of text, perhaps draw a parallel between two or more segments where the connection may not be obvious… It’s crucial to have a space where people can present an idea that may well be more free-discourse than the result of months of research. This kind of environment has the potential to stimulate public humanistic conversation on a large scale, unhindered by the months it takes to publish an article or a book and possibly years to see a published response to it.

The morning session brought us Claire Warwick’s analysis exploring whether scholars of English literature actually use computational methods in their research. Her findings are predictably disappointing; there are smatterings here and there, but humanities computing continues to be on the fringe in English studies. In the end, the presentation confirmed, for a specific field of literary inquiry, what we already knew about the general [non-]use of computing in the humanities.

Following this, Susan Brown presented The Orlando Project, which - hooray, finally! - is going at least partially public. The public site contains no more than teasers, but it is a very exciting women’s-writing project indeed.

Finally, Paul Scifleet and Concepcion S. Wilson from South Wales spoke about “The Markup Analysis Engine.” These are heavily technical application development folks; I did not understand much of their talk. The abstract, though, is available here.

After a break, another keynote, this time from John Bradley. “What you (fore)see is what you get,” he bluntly states in the title. A clear, mostly easy-to-follow presentation of some complex models, all of which speak to a known truth: our tools affect what we do, and what we can do. Bradley quoted a 2001 paper by Brockmann et al.: “[T]he functions on which research in the humanities depends are neither well understood nor well supported by librarians.” Before we build tools, we need to figure out what scholars actually do. Through the analysis of scholars’ practices, Bradley said, we can conceptualize the type of information environment that would best support their activities.

Simple enough. Who are your users? What are they likely to do with your gadget? But first - what do they do now, and how will your gadget help them do it better?

In the course of his talk, Bradley presented four models (or perhaps categories?) of a computing humanist’s mental activity. I’ll try to quickly summarize them here; perhaps they will be useful to us in the future.

- The conduit model. The web itself; digital editions; digital libraries. Paradigm: “user has access — user will be supplies.” Really, this is the point of view of an editor. It does not take into account what the user will actually do.
- The markup model. Editor works with the tags, not with the concepts that the tags represent. (Here I have to disagree: how are the tags created in the first place? Shouldn’t you find the concepts you want to encode, before you encode them? Perhaps the point here is, sometimes encoding happens without a whole lot of thought put into the actual tagging scheme.)
- The transformation model. User controls transformation automaton (tool), sees results. Text passes through transformation automaton, into results.
- The object model. User/researcher reads text, records and organizes notes into an annotation collection. The text in question is connected to each of the notes in the annotation collection. Of course, users of a particular resource may have different levels of access.

A chock full of McMaster presentations was next. Jenna Wells and Madeleine Jeay presented Hyperlistes (site in French). Their project presents medieval French texts in which long lists - of things, actions, human qualities - were a common trope. The lists are cross-linked, so that a user may find, for example, all of the texts containing lists in which one of the items is wine. Or thieving. Or Gawain. You get the idea. This project is particularly near to my heart; both the primary texts and the researchers’ approach to them resemble my own work on Roland.

Nicholas Griffin (McMaster) and James Chartrandhe (Open Sky Solutions) presented the Bertrand Russell Archives, a well-done project about the British philosopher. To finish up, Stéfan Sinclair gave us a more in-depth tour of HyperPo. I’ve raved about it before; seriously, the only way to fully grasp its coolness is to go play with it. Well, what are you waiting for? Click! Play!

Afternoon session brought us a demo of LetSum, a legal text summarizer developed at the Université de Montréal (abstract, tool is not online from what I can see), a paper on the Trend Mining Framework (an approach to research that seems to combine the dream of the semantic web and the never-ending search for pattern), and a presentation on just-in-time text analysis by two researchers attempting to hack Google.

Thus ends the second day… or the official part of it, anyway. A tasty and jovial banquet was held, at the end of which many of us decided that we hadn’t had nearly enough of each other and headed to the bar for beer and pool. We played in pairs; my partner, who was most excellent, was also supremely patient.

Day three, perhaps, tomorrow.

Trip report: The Face of Text (post two)

Filed under: — vika @ 6:21 pm

[day one continued]

On to John Unsworth’s keynote, then. It was titled “Forms of Attention: Digital Humanities Beyond Representation.” He spoke of the ways in which we value and attend to works of art, which segued into a discussion of how (and why) forms of attention in humanities computing change over time.

Most of the talk was about tools. Unsworth whizzed us through a fast and thorough history of humanities computing tools, starting with old gems like TUSTEP and mentioning also TACT. These are useful for gathering quantifiable data; but, Unsworth said, statistical methods have had a “limited vogue,” not because the tools aren’t there, but because up until lately, the available tools did not answer the questions that are most interesting to humanists. Such as: what don’t we know?

Humanities aren’t about problem solving, he said, but about appreciation. There are tools, and then there are texts as tools (like dictionaries). Archives are also tools, and here we heard of those sprouted by IATH, in particular the Rossetti Archive and the Blake Archive. (Unsworth did not fail to remark that none of these archives have a long-term plan for preservation; I wonder if computing humanists are using “Acid-Free Bits: Recommendations for Long-Lasting Electronic Literature” in their work?)

These and other archives are fascinating projects that, among other things, model their subject matter. Modeling projects, Unsworth said, strives to show us not so much what’s there as what is no longer there; and there is a difficult problem inherent in distinguishing what you do and do not know for sure.

So he proposes that visualization is the Next Step in humanities computing. This proposal reiterates what has been proposed by others before, but contextualizes it perfectly. Unsworth’s talk showed a nicely presented, logical evolution of the field. Now that we’ve got a healthy amount of theory down, complete with as-yet unrealized Exciting Ideas and the beginnings of technology that will help us realize them, it’s time to focus on aesthetics. Visualization, then, seems a natural next point of focus. It’s no wonder that humanists don’t like the word “tool”: it carries implicit baggage of heavy, repetitive tedium. Wouldn’t it be nice someday, Unsworth asked us, instead of telling our colleagues “this is a tool for text mining,” to be able to propose to them: “would you like to play a game of text exploration?”

(Ivanhoe comes to mind again. Please oh please, someone, make this excitement public already…)

The audience chuckled appreciatively. Yes, we all want to play.

I confess, the afternoon sessions took place towards the end of a very long and saturated day. This batch was more presentation of specific projects than general theory; instead of trying to summarize them all here, I’ll cheat and point you to the five abstracts available online for your perusal.

12/8/2004

Trip report: The Face of Text (post one of several)

Filed under: — vika @ 8:25 pm

Now that Thanksgiving has passed and my laptop has a shiny new hard drive (the old one died; and when did you last back up your data?), I am catching up on my blogging. It’s been quiet here. This trip report is rather long, so I will split it up into several posts. Any misrepresentation of what others said at the conference is, of course, entirely my fault, and corrections from participants are most welcome. My own reactions are interspersed; where it’s not obvious which thoughts are mine and which are the speaker’s, I’ve put mine in italics.

In late November, I went to McMaster for a most excellent conference organized by Geoffrey Rockwell and his colleagues. The Face of Text: Computer Assisted Text Analysis in the Humanities was a three-day extravaganza of lucid thought eloquently presented, stimulating conversation, and surprisingly good beer at the graduate student pub. As full disclosure, I’ll also state that I discovered my quota in playing pool: one ball in per game. It’s clear that I need to attend more conferences with pool tables.

Highlights in chronological order, then. Friday started off with Prof. Rockwell’s presentation of the conference and of TAPoR (Text Analysis Portal for Research), under whose auspices the conference was held. TAPoR is funded by the Canada Foundation for Innovation and is co-hosted at six Canadian universities. It’s a wonderful project; their site is worth a look or three.

Stéfan Sinclair presented HyperPo, a toolset that he is currently developing independently from TAPoR, but that will be integrated into the portal later on. Sinclair wants to focus on tools that provide user friendliness, and so far he is succeeding admirably. HyperPo can take any reasonably short text you give it and present it in a number of ways that are both clear and fun to play with. Word frequency, cooccurrences, a concordance of sorts, word distribution graphs, Oulipian functions like the ability to search for palindromes, anagrams, pangrams… and that’s just for now. More exciting visualization toys coming soon, watch that space.

Jerome McGann’s keynote was the first of six. (!) I’d never heard him speak live; what a treat. McGann spoke of the gap in communication between computing humanists and non-computing humanists, something that became a recurring topic during the conference, and stated that the problem was institutional. Using the word “tools” with non-computing humanists will turn them off, he said; but he also claimed that throwing out common terminology, such as TEI and XML, turns off the minds of your interlocutors.

Terminology in conversation, and indeed in publication (which is one way of conversing for academics, no?), is indeed a fine line to walk. Still, the statements above were perhaps a bit too categorical for me, too black-and-white. Conversation starts small, and hopefully grows outward; so there is plenty of time to fine-tune your speech to your audience. McGann’s implicit point about keeping audience foremost in mind, though, is well taken.

We don’t have an integrity of approach to the new media in humanities computing as a whole. McGann, along with NINES (Networked Interface for Nineteenth-Century Electronic Scholarship), are going after this institutional problem.

NINES is based at UVA, but is separate from IATH. Its purpose is to integrate peer-reviewed electronic scholarly activity. They want scholars to go about doing their traditional work in an easy-to-grasp, non-forbidding electronic format. This goal is being pursued in two different ways. On one hand, they are putting together a set of traditional editorial boards, which will act as a peer-reviewing and vetting mechanism. On the other hand, easy-to-use tools are being built.

Sounds great. I haven’t looked at the NINES website in depth yet, but the NINES thinking along the lines of accessibility to non-computing humanists does ring a familiar bell, where VHL’s work is concerned. That’s gratifying.

Some of this keynote was, of course, devoted to the Ivanhoe Game. There was much discussion of the ludic in research, and whether the Ivanhoe Project is a game or a playspace; but it does not seem to be public yet, so I’ll not go into details here.

After a break, David Hoover spoke on “(De)Facing the Text: Irradiated Textuality and Deformed Interpretations.” Some of the key points from that talk:

- Thesis: valorizing instability of texts and celebrating indeterminacy of interpretation are likely to lead us astray.
- We forget that all texts are marked, semantically and graphically; they are not containers of meaning but rather algorithms for generating themselves.
- Hoover likes deforming poems, rewriting them as a sort of call-and-response with the original author. Is this interpretation or not? Hoover brings in McGann’s claim that “interpreting a poem after it has been deformed clarifies the secondary status of interpretation.” [But isn’t a deformed poem a different text altogether? What specifically does “deformation” have to do with the status of interpretation?]
-Useful reminder: it is important to discard “incorrect” interpretation arising from one’s own cultural context. (For example, Shakespeare’s use of the word “gay” does not in itself lend a work to queer-studies analysis.)

Ray Siemens spoke about “Modelling Humanistic Activity in the Electronic Scholarly Edition.” We model texts, Siemens said; but let’s go further and model humanistic activity itself, so that we may better understand how electronic environments can best serve that activity. Excellent paper, indeed so saturated with information that I failed to keep good notes, finally giving up and asking Ray for his PowerPoint presentation, which he graciously promised to provide. (As an aside, computing humanists’ generosity in sharing knowledge is astounding, and one of my favorite features of this field as a whole.)

Patrick Juola spoke of “Proving and Improving Authorship Attribution Technologies.” Authorship attribution is an old problem, he said, traceable at least to the late 19th century. The assumption seems to be: there exists an authorial fingerprint, omnipresent and unchangeable, and detectable by some sort of black magic (Patrick’s words). To get better results when proving authorship, we need to use more information, more common information, more unconscious information.

I confess, I am sceptical about (and thus biased against) the reliability of authorship attribution in any form. Patrick’s example of tracing a misspelled “toutch” to link different texts to the same person is all right, but there do exist common misspellings in any culture. How do you know which of a dozen teenage authors under consideration consistently use “wierd” instead of “weird”? Etc. I may be unfair here, but there has yet to be an authorship attribution tool that convinces me of its usefulness. So far, they all seem too unreliable.

I’ve arrived at John Unsworth’s keynote; it’s long, so I’ll finish this post here. Until next time, may the gods of computing and black magic be with you.