Document Capture for Alfresco – Part 1

This write-up details how documents can easily be scanned into the Alfresco document management system. The InstaCapture Advantage with Alfresco connector is being used in this scenario to provide a distributed scanning and indexing capability in addition to being able to release documents into Alfresco.

The write-up has 2 parts.

  1. Configuring InstaCapture for Scanning & Indexing
  2. Configuring iConnect for Alfresco

Terms used

Document: Documents are a logical collection of scanned pages that pertain to specific process and that may contain one or more pages.

Batch: A batch is a set of business documents that are typically scanned together as a logical collection. It typically contains related documents that pertain to a business process. For e.g. an Invoice Batch may have Purchase Order, set of Invoices, Delivery Receipts, Scanned copies of Cheques and any other Supporting Documents.

Pages: Pages put together form a document in a batch.

Properties: Properties or metadata are used to index or reference a batch or document. They are used to logically classify them for search and retrieval. A batch or a document may have one or more properties.

Setup

The first step in the setup is to define the structures of the documents and batches that will be scanned using the InstaCapture Configuration utility. This involves:

  1. Creating properties and assigning them to documents and batches.
  2. Selecting document recognition mechanisms and assigning them to the newly created document types. There are three such methods for recognition:
  • Position : The documents can be recognized by their position in a batch.
  • Barcode : A barcode can be specified to distinguish a document.
  • ADR : Automatic Document Recognition uses templates to recognize documents. ADR is used only when the documents to be recognized are of uniform structure and form.

 

3. Selecting the batch recognition mechanism which can be either manual or using barcodes. Commit the batch after the configuration changes.

 Batch Types

Scan

Once the configuration is complete, we can go on to the scan step. First a scan profile has to be created which specifies the scanner to be used, the format of the image, the desired brightness, contrast, paper-size, dpi etc. One or more profiles may be created and a default profile set as well. Once the profile is set, scanning can begin.

Batches can be scanned using the batch scan option or by explicitly creating a new batch of the desired type and then scanning documents into the batch. Once batches are scanned, they can be sent to the next step which is separation.

 Scan

Separate

Documents are separated automatically during this step. Separation is the process by which the scanned pages are identified with the documents to which they are associated. This is based on the Document Recognition mechanism specified while setting up the Batches and Document Types. If any error occurs, separation can switch to the manual mode. The separated documents are then sent on to the indexing stage.

Index

The documents can then be indexed by entering the values for the properties. Indexing can be set to automatic by configuring index zones for zonal OCR and mapping them to their respective properties. If the documents are not structured uniformly however, manual indexing is preferable. After completion of indexing, the documents are moved into Alfresco Repository.

4

Part 2

1 comment to Document Capture for Alfresco – Part 1

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>