Subscribe to the Free Print Edition now!
Defense Systems Wednesday, August 20, 2008

Current Issue eSeminars Jobs FAQ
1105 Media [justice]
quickfind
purchase
reprint
link to
this page
categories
C4ISR
Network-Centric Warfare
Training and Simulation
Security and Intelligence
online resources
White Papers
RSS Feed
Military Links
1105 Media, Inc.
» Government Computer News
» Government Leader
» Washington Technology
» FOSE

home > March 24, 2008 issue > article

|  Features  |

Digital graffiti



Software seeks to read the writing on the wall — and elsewhere

Making timely sense of information contained in printed documents, handwritten letters and even graffiti scrawled on a wall can be of huge value to warfighters, but doing that with English sources is hard enough, let alone with Arabic script.

The Defense Advanced Research Projects Agency is trying to overcome those barriers with a new language technology program called Multilingual Automatic Document Classification Analysis and Translation (MADCAT), whose goal is to develop ways to automatically convert foreign-language text images into English transcripts.

Such a system would reduce the military’s dependence on linguists and analysts who are now needed to help decide what information is valuable and what is not. Often, the value of information is drastically reduced by the time the experts arrive on the scene and sort through it all.

But researchers face a number of significant technical challenges, according to Prem Natarajan, the principal MADCAT investigator at BBN Technologies, which was recently awarded a $5.7 million DARPA grant for work on the project.

“This is the first organized attempt to go after this kind of hard-copy document processing,” he said. “It’s similar to the problems associated with [optical character recognition scanning] which works well for English-language, well-structured documents but not at all well for degraded, real-world documents.”

BBN has recently shown that the kind of vocabulary training that current OCR systems can be given to recognize and translate English documents can be used with handwritten documents also, and that it can probably be applied to similar Arabic and Chinese documents, he said.

However, a big problem is the variability of language and script used by writers, he said.

“We’re talking about handwritten messages here, of various orientations, with certainly less-than-perfect lettering and spelling,” said Howard Bender, chairman of Any Language Communications. “The image software has to recognize individual letters so they can be expressed in Unicode. If the image software can’t do it, no language analysis can be done.”

DARPA is setting a goal of being able to accurately translate 90 to 95 percent of the content in 95 percent of the material scanned, which is “quite a high bar,” Bender said.

On the other hand, Natarajan said, if the problems that DARPA has set out are solved over the next four or five years, “it will revolutionize the field.”


purchase
reprint
link to
this page
ADVERTISE CONTACT US CUSTOMER HELP EDITORIAL INFO SITE MAP