Choosing tools, or why your computer is an abacus

If you wanted to put your small basil plant in your garden, would you find a backhoe to do it? Probably not, as you could dig a reasonable sized hole pretty quickly with a small shovel. Just like you don’t really need to bring out heavy machinery to do a simple gardening task, you probably don’t need complex tools to do small bits of text analysis. Is it impressive? Sure. Is it really necessary? Um, no, probably not.

When it comes to choosing and using digital tools for text analysis it’s a bit like gardening: you want to choose the right tool for the task. Some projects, like planting our basil, don’t really require anything complex, and you might be better off doing something the “old-fashioned” way of reading than overcomplicating with things that aren’t actually adding anything to your analysis.

Computers are very good at counting things. There are a nearly-endless number of tools which will help you count a variety of things in texts; that link probably doesn’t cover all of them. How do you know if you’re using the right one? Digital projects can be great, and digital analysis can be really useful, but if you can see it with your own eyes you probably don’t need a computer to tell it to you. Thematic elements often come out as being specific when comparing texts against one another. In The Tempest, words like ‘drown’ ‘island’ ‘isle’ ‘fish’ and ‘sea’ are more likely to appear – but you really don’t need a computer or complex statistics to tell you that, as Jonathan Hope points out. Digital tools that count things are much better suited to projects which are larger and when you’re looking for something much less thematic and much more specific.

So how do you know if you’re using the right tool for the task at hand? Well, you don’t always. Currently I have at least six tools for straight-up text analysis installed on my computer, and I can access more than a few others from my web browser. I’m compiling one myself. Do I really need all of these? In a word: yes. One is not better than the others. One might be more robustly informative than the others, depending on what I’m looking for.

In my recent research on the Shakespeare corpus I’ve found myself cross-slicing between a concordance program (AntConc), a statistical analysis tool (WordHoard), and the texts themselves (Open Source Shakespeare), and I will pull in others as they’re useful. It’s not that these tools individually aren’t doing enough, it’s that between the three of them, I can get a much more clear picture of what’s actually happening in my texts. Professor Alan Bryman has an excellent paper on triangulation from 2004 (pdf), where he argues for a three-check system “to enhance credibility and persuasiveness of a research account” (2004: 4). In other words: can you find it once, that’s exciting; if you can find it twice, even better, but if you can find it three times it’s a truth. Justifiably, it’s even more exciting when someone using entirely different tools and asking an entirely different question can arrive at the same conclusion that you did, albeit on a much larger scale.  Of course, I have the unspoken benefit of working on Shakespeare, who is widely digitized: but I’d return to the texts regardless of who I’m working on – I just might have to change my approach a bit.

When it comes to choosing tools for text analysis, “it was there so I used it” is not an acceptable answer. You should know what your tool can and cannot do; its benefits and its limitations, and you should be able to account for them. A tool is just an interpretation of data, as I said previously, and what you can see in one tool might not be enough to justify your claim. Trying a variety of approaches might show you something that you missed the first, second, third time around: a small detail can lead to much bigger and better questions than simply accepting the first thing you try. A KWIC concordance might not be showing you enough of your data; a log-likelihood analysis might be telling you too much, and your wordcloud might not be showing you anything useful at all. Like anything else, I have my favorite tools and I’m likely to turn to them first and recommend them above other text analysis tools. Are they right for your project? In all honesty: I don’t know.

But all of this shouldn’t stop you from using digital tools, though. I occasionally use KWIC tools as a search engine for a specific corpus, and I will introduce friends and colleagues to them for that purpose, which is probably poor scholarship. But much more interesting things can happen when you break the rules of what the tool should do, which is another blogpost in and of itself.