in How To Guides

How to Automate Document Storage and Routing

Automating where your files are stored is a simple first step to organize your documents in a meaningful and structured manner. For Patient Records, Invoices, Bills of Lading, Student Records, or any other document type, storing them in a structured manner saves time and can ease the conversion to electronic records management somewhere in the future.

To utilize this level of automation, information needs to be extracted from the documents during the scanning process or from existing electronic files.  Here we will talk about some approaches used to help automate this process. 

Working with Existing Electronic Files

Data Mining to extract textIf your documents are already text based (PDF, OCR scans, Word, Excel, etc.), you can take advantage of data mining tools which can search through a document for keywords and extract adjoining text strings based on regular expression scripts.  This is more common in AP/AR and other form-based applications where common keywords like "Invoice Number" help identify the desired text strings, which are extracted and tagged for later use with the file or desired path creation.

This process is more commonly referred to as Data Mining, and you can learn how this is used with some of our products.


With the data captured, any of the extracted fields can be used for the following:

  • Identify splitting rules (ie: create a new file when a value changes)
  • Identify the destination Path (referring to the illustration on the left, "C:\output\%field1\%field2\%field2 would store the file into 'C:\output\EXCH136600\10000003001\EXCHANGE CHARGE")
  • Identify the file name using any of the extracted fields. 

Working with Scanned Files

Pulling intelligence from scanned documents requires some element of image recognition.  Without it, the documents are just a large series of on and off pixels (for black and white scans). 

Many of the methods used employ OCR technology.  Image Quality has a significant impact on the accuracy of OCR.  The most common approaches used today include:

  • Full Page OCR -  which converts all identified text into searchable text, allowing for full-page indexing, search and retrieval. 
  • Zonal OCR - which converts specific zones into text for automated indexing or extraction
  • Form Recognition - which identifies the type of document to aid with the OCR data extraction and automated workflow
  • Barcodes - which are 1 dimensional or 2 dimensional patterns that can be read with very high degrees of accuracy.

Using Barcodes

Barcodes represent the most accurate method if they are readily

Barcodes can help automate file naming and folder destinations

available on documents.  While this is more common in specific industries, you can also use a number of free tools to create cover sheets that can then be used to separate and identify document groups for large stack scanning.

In a typical environment, the pages are scanned for barcodes or text is mined from the existing text, using extraction scripts.  Once the barcode is interpreted and the text is captured, settings that identify how to name the file, where to place the files, and how to store index data can lead to tremendous automation benefits to users. 

In the diagram to our right, we have patient records being captured for integration into an EMR system.  A folder system is desired that creates multi-level folders so we can organize the patient records based on Practice ID, Doctor, and Patient ID.  This is done by assigning the output path to match whatever values are captured from the mined text data or barcodes (ie: %bar1\%bar4\%bar7).   

What's Next

DocuFi is a provider of document capture and delivery solutions including ImageRamp™ Batch, a batch-based solution that helps automate the document to folder process. Call for more information.