// you’re reading...

Uncategorized

post-text information retrieval

I’m interested lately in how the internet would work, especially in regards to information retrieval, without text. I’m aware that the computernet is not just about information retrieval - plenty of what goes on has more to do with social interaction (including commerce and all types of communication) than it does looking stuff up.

I have long been fascinated by an article by David Forsyth et al, Finding Images in Large Collections, which includes a description of training a computer algorithm to recognize horses in images. That is, without tagging anything with the text “horse”.

What information retrieval systems can we think of that don’t use text at all?

For audio, Songtapper compares your keyboard tapping with MIDI files, Midomi compares your humming or singing with a database of the same, and services like Shazam will identify recorded music that you play into your phone.

For images, a couple of services retrieve “similar” images based on one you provide. retrievr and BYO Image Search Lab are interesting to compare. retrievr is also the one that lets you draw your own picture and BYO Image Search is made by the same folks that created multicolr, an image search by color.

I could probably do some research in the information science and computer science literatures, but for discussion purposes I am just going to guess about how they work.

The mothers of these searches are speech recognition, which matches audio with text, and optical character recognition, which matches images to text. But I think we can move beyond that - images can be indexed by shape and colors, and audio by its frequencies and tones.

So first off, I don’t know a heck of a lot about sound or about visual images to say that much about how they are encoded on computers, but I do know that they are encoded in a linear fashion. Computer files have a start and an end, and everything in between is the data. If it is a two-dimensional picture or a multi-layered sound, it is straightened out into a line of ones and zeroes.

It occurs to me that text is linear, that audio is also, and that images are definitely not. This makes me think that you could search audio in similar ways that you search text: isolate phrases, index them, and use them to look up queries later.

SongTapper works by looking at the rhythm of notes in a song, Midomi is probably looking at a number of factors like tempo and pitch and the linear sequence of things, and Shazam might be compressing samples of recorded music to compare directly to other compressed samples.

But images are different; images are two dimensional, and there isn’t an obvious corollary to the way we index and search for text.

Retrievr is making some decisions about what shapes are in a query image and then comparing that to images in a database. BYO/multicolor seems to be comparing only the colors. Both do interesting things, but I don’t think either works consistently well. I love this, of course. Imagine if image searching were like text searching, and all you got were boring predictable lists of hits.

So the question I am trying to answer is, do we need text? I think we don’t, and I think life gets pretty interesting when we move on.

Discussion

2 comments for “post-text information retrieval”

  1. Reading this post and the one before it has got me thinking. I wonder what kind of ideas/experiences can only be expressed/understood/communicated through text. Not in just an Algebra vs. Geometry kind of way, although that could be part of it. I guess a question I would want to know is; once you have been exposed to text, how close are thought and text are related. How are text or pictures or sounds related to consciousness? How far removed from the internal are these external things (for example which is farther removed; text and thought or speech and thought or images and thought).

    Because of its abstractness, I sort of feel text encapsulates our ideas of pictures and sounds in a way real pictures and sounds do not (not to mention the nuances of punctuation). bird communicates a very different ‘thing’ from a picture of a bird or the spoken word bird (which can’t be removed from the speaker).

    Posted by Luke | January 7, 2009, 5:59 pm
  2. I think you’re at least partly right, Luke. Literacy is not required for abstract thought, but the way we read and write in our culture certainly influences the way we think, especially when the people doing the thinking make their livings reading and writing.

    Posted by caleb | January 11, 2009, 9:14 pm

Post a comment