Peelle Technologies Document Scanning and Document Management Services Campbell, CA
197 East Hamilton Avenue
Campbell, CA 95008
Phone: 800.233.5006
Peelle Technologies Document Scanning and Document Management Services Campbell, CA

Section Two: Document Management Basics


Indexing and Retrieval

In a recent survey, three-fourths of executives said that information is their organization’s most important asset. Ensuring that this information is readily available to the employees who need it is one of the major challenges for today’s executives.


An enterprise-quality digital document management system is uniquely positioned to help you solve the problem of helping employees quickly search through thousands of documents in order to pinpoint the information they need. Many employees use search tools that are nearly identical to commercial search engines designed for use with the Internet. Although these search engines are efficient at helping consumers find information like Web pages of retailers selling a certain product, they are not geared toward the specialized searches necessary in many business environments.


Most commercial search engines only support basic keyword searches. The user types in a word or phrase, and the engine returns a group of matching documents. Typically, the engine ranks results according to its own logic; depending on the user’s needs, this ranking system may or may not be helpful.


A full-featured document management system makes it easy to find what you want when you want it. Retrieval of relevant documents should be fast, easy and efficient, with multiple methods of indexing (categorizing) information. Indexing allows users to quickly sort large volumes of data to find the right document. Whatever the combination of indexing methodologies, search methods need to be easily used and understood by the people who retrieve the documents, as well as those who file them.


There are three primary ways of indexing files in a document management system:

  • Full-text indexing, or indexing every word in a document
  • Template fields, or indexing through keyword categories of documents
  • Folder/file structure, or indexing by associated document groups

Retrieval is where the quality of the indexing system is most evident. Some document management systems let users search only by indexed keywords, which requires a person to know how the document was categorized and what template fields were assigned to it. A powerful indexing system will make it possible for users to find any document based on what they know, even if that amounts to no more than a word or phrase within the document. The more a document management system adapts to an organization’s existing procedures, the less upheaval and training are involved for users of the system.


Full-Text Indexing

Full-text indexing allows users to locate any word or phrase that appears in the document.
By providing full-text indexing, document management systems can eliminate the need to read and manually index documents using keywords.


To enable full-text indexing, the software must have the capability to perform Optical Character Recognition (OCR). The OCR process translates printed words into alphanumeric characters with near-perfect accuracy, enabling each occurrence of a word to be tracked by the application. OCR dramatically reduces the cost of manual indexing while providing improved search capabilities.


However, OCR cannot process handwriting or images. Moreover, when a computer performs OCR on a document, it typically uses English as the default alphabet. If multiple languages are required, the document management system should support OCR and full-text searches in these languages. To avoid creating extra work, a well-designed document management system should provide the ability to automate the OCR and full-text index processing of documents.


There are several helpful options to maximize the effectiveness of full-text searches including fuzzy logic, wildcards, Boolean operators and proximity searches.


Most searches assume that the search words have been spelled correctly and perfectly indexed during the OCR process or during manual entry into template fields. Unfortunately, people frequently misspell words and no OCR process is 100% accurate.


Fuzzy logic compensates for these errors by searching for spelling variations. A document management system should allow the user to control the search by setting how many letters can be wrong or what percentage of a word can be wrong. For example, a fuzzy logic search for the word “goat” would also find “gout” and “coat.”


Wildcards are characters, like the asterisk and the question mark, which can be used in searches to compensate for misspellings or unknown spellings. The asterisk stands for any character or characters, while the question mark stands for any single character. For example, searching for “c*t” would find the words “cat,” “cot,” “coat,” “cut” and “chest,” while searching for “c?t” would only find the words “cat,” “cot” and “cut.”


Whenever full-text searches are performed, there are usually several documents that meet the search criteria. Boolean operators (AND, OR and NOT) help fine-tune searches and reduce the number of unrelated documents on the results page. For example, to find documents relating to Gray Davis, the former governor of California, and not to the University of California at Davis, you could search for “Davis AND governor.”


Proximity searches can also be used to narrow the search results. They are used to find words that occur within a certain number of words, sentences or paragraphs of each other. For example, to find documents relating to tobacco lawsuits, but not smoking ordinances or tobacco growing, users could search for “tobacco” within one sentence of the word “lawsuit.”


Template Field Searches

Template field searches enable users to comb through millions of records in seconds to find necessary documents. The ability to use index field information to locate documents is important in cases where a topic search is more expedient than finding every occurrence of a particular word or where the repository contains images without printed text, such as photographs or maps.


A template field search is roughly equivalent to searching a library’s collection using a card catalog subject. If you are in a library searching for information on the Pacific Ocean, you would pull the card for “Pacific Ocean” (or, now, you would enter “Pacific Ocean” in the computerized card catalog) and you would see a listing of all books that discuss the Pacific Ocean. A template field search works the same way.


If you are searching for information on a particular county, for instance, you would enter the county’s name into the search field and the document management system would retrieve all documents, images, electronic files and audio or video recordings that are stored in the repository that contain the county’s name in the template field.


Template fields are based on metadata, or simply, data about data. Metadata is used to facilitate the understanding, use and management of data. The metadata required for this will vary with the type of data and context of use. So, in the context of a library, where the data is the content of the titles stocked, metadata about a title might typically include a description of the content, the author, the publication date and the physical location. In the context of a digital document management system, where the data is the content of computer files, metadata might include the name of the file, the type of file (document, email message, spreadsheet or image) and the name of the data administrator.


A document management system should allow users to customize templates, create multiple templates and support different types of field data within each template, such as data, number and alphanumeric characters. Template fields can be used to categorize documents, track creation or retention dates, or record subject matter, among other information. A document management system should enable pull-down boxes of common key words to speed field entry and have tools available to assist in automating the data entry process.


An enterprise-quality document management system should also have user-definable template fields. In situations where the person who selected the keywords is not the one searching for the files, this method has obvious limits.


Folder/File Structure Searches

Along with enabling full-text and template field searches, a document management system should enable users to locate documents by browsing the folder/file structure. A full-featured document management system lets an organization electronically recreate its existing filing system through a nested folder structure. A flexible folder structure eases the transition from paper filing to electronic filing, which makes the transition to digital document management smoother.


The way search results are displayed has a considerable impact on the usability of the document management system. Even the most specific full-text searches can produce several hits when large document databases are involved. In addition to providing users with a list of documents meeting their search criteria, some document management systems reveal lines of context that display each occurrence of the search word in each document. Lines of context help users pinpoint the appropriate document without having to view every document in the search results.


Once a document is identified, the search word needs to be located within the document. To help with this, some document management systems display the appropriate page of the document and highlight the search word in both the text and on the document image. This makes it easy for the user to immediately zoom in on the relevant section of the document instead of having to look through multiple pages. The importance of this becomes obvious when the needed word occurs on page 97 of a 200-page document.


Go to Next Page

Go Back to the Guide Table of Contents

Document Management Home | About Peelle | Document Management Systems| Document Scanning Services |
Government Document Management Solutions | Waiting List Solutions| Site Map | Contact Us


© 2015 Peelle Technologies 197 East Hamilton Avenue Campbell, CA 95008 Phone: 800.233.5006

Peelle Technologies is a leading provider of document scanning, document conversion, document imaging, document management, microfilm scanning, microfilm conversion, microfiche conversion services and software products. Our document scanning, microfilm scanning and other document management services are provided in the San Jose, San Francisco, Bay Area, Silicon Valley, Oakland, Napa, Monterey, Sonoma, Visalia, Stockton, Modesto areas of Northern California. Our other products and services are available across the United States.