Abstract

This work presents a language independent keyword based document indexing and retrieval system using SVM as classifier. Word spotting presents an attractive alternative to the traditional Optical Character Recognition (OCR) systems where instead of converting the image into text, retrieval is based on matching the images of words using pattern classification techniques. The proposed technique relies on extracting words from images of handwritten documents and converting each word image into a shape represented by its contour. A set of multiple features is then extracted from each word image and instances of same words are grouped into clusters. These clusters are used to train a multi-class SVM which learns different word classes. The documents to be indexed are segmented into words and the closest cluster for each word is determined using the SVM. An index file is maintained for each word containing the word locations within each document. A query word presented to the system is matched with the clusters in the database and the documents containing occurrences of the query word are retrieved. The system realized promising precision and recall rates on the IAM database of handwritten documents.