Technology Detail

Topic Classification

Get More

Contact Us

Contact UsTo learn more about Raytheon BBN's technology development, call 617-873-8000 or email us at technology@bbn.com.

Launch the Quick Contact Form

Topic classification is the process of assigning topic labels to a piece of text or document. Typically, any piece of text contains or includes a number of topics. At Raytheon BBN Technologies, we have pioneered a technology, based on hidden Markov models, which assigns multiple topic labels to a document. The assignment includes a confidence score associated with each topic label. The system we developed, called OnTopic, is automatically trained from text data that is annotated manually with a list of topics for each document or story. The training technique automatically figures out which words in the stories are related to which of the topics assigned to the documents.

Often, the annotation process of providing topic labels for the training data is not feasible. For that common case, we have developed a new technology, called Unsupervised Topic Discovery, which, when given a set of documents, automatically "discovers" the topic labels that best describe those documents. This technology has been shown to produce reasonable topic labels for English as well as Arabic documents.