Document AI Accelerator Scales and Improves Efficiency
Cindy Barrientos, Senior Data Science Engineer, Rackspace Technology
An estimated 80% of all business data is in the form of unstructured data, including emails, images, multimedia files and PDFs. This unstructured data contains highly valuable information, such as HR records, patient records, insurance claims, invoices, purchase orders, maintenance logs, and order scheduling and tracking.
Onica by Rackspace Technology™ leverages AWS’s powerful Intelligent Document Processing (IDP) managed services as an Accelerator program that helps organizations make their unstructured data searchable, so they can intelligently access their data quickly.
Onica IDP is entirely built on AWS cloud infrastructure. It is fully customizable and configurable, allowing it to seamlessly plug into an organization's existing systems. The solution employs Amazon Textract and Amazon Comprehend Models to increase data quality and reduce costs by operationalizing document processing workflows. In addition, it provides scalability, improves efficiency and enhances governance and security while supporting compliance and regulatory requirements. The uses for this solution are vast, and include:
- Classifying and tagging documents and images
- Creating intelligent filing systems
- Eliminating the need for manual data entry
- Formatting document/form fields into logical formats
- Translating documents into different languages
- Digitizing paper documents like invoices, receipts, photos, forms and other files
- Reducing physical storage space and costs
Solution Overview
Onica has built a front-end user interface that makes uploading, searching and analyzing documents easy. The analysis begins with Textract to run text detection or document analysis. Wait steps are in place to monitor if the AWS managed services have completed. If not, the wait persists until the job completes and then the transformation and extraction of the output continues. Textract word map is the precursor to combine the outputs of Textract and Comprehend. The Comprehend classifier and entity recognition run in parallel.
Any documents classified with low-confidence scores are sent to Amazon A2i via a Lambda. End-users will be invited to review and label the data to ensure labels are correct and to retrain and Comprehend. Post-processed Textract and Comprehend outputs are sent to S3 and an OpenSearch cluster.
Monitoring
CloudWatch metrics of the document processing Lambdas are tracked to prevent or quickly recognize failures. Considering the hard limits of AWS Lambda, we want to be aware of processing loads that are reaching capacity before problems arise.
We provide a CloudWatch dashboard with an overview of the health of the processing pipeline. When one or more Lambdas in the pipeline reach an 80% threshold of memory utilization or execution duration, an alarm is triggered to notify an established point of contact. This user can then determine the source by investigating the CloudWatch dashboard, and opt to change the processing schedule. Because of the scheduled batch processing feature, our pipeline can handle most document workloads. To reach a target pipeline utilization, we set a schedule to match the typical day-to-day usage.
Logging and Data Visualization
We export CloudWatch logs to s3, using Kinesis fire hose. AWS glue then runs crawlers to check the tables and get all the common values run through Athena, allowing users to query the data quickly via Quicksight. A dashboard with a high-level overview of the document processing pipeline is published for easy access through the front-end interface.
Front-End Interface
The front-end is hosted and deployed on AWS Amplify for easy infrastructure management. We then use Amazon Cognito to authenticate users who are granted access to the document data.
New documents may be uploaded for processing on the upload page. The processing workflow is run on a customizable schedule (e.g., hourly, weekly, monthly).
You can find documents of interest based on their classification of the document type for your use case. The following example shows case study documents grouped by relevant industry. When we select an industry in the search bar, previews of the documents are displayed and can be clicked on to enlarge.
Alternatively, or in addition to filtering by classification, you can search for text in the documents, such as client name, invoice line item, prescribers name, etc.
The ability to quickly classify documents and detect entities unlocks limitless workflow possibilities for increasing efficiency and improving employee productivity. For instance, we can include additional post-processing on documents that have been determined to be purchase orders, receipts, or invoices to extract and organize specific line items into a database. Or we could route incoming pages from a fax to their respective departments for more streamlined communications. That automation of these processes can take your business to the next level by helping you to improve accuracy and freeing your employees to work on more meaningful tasks.
< entidad-drupal data-align="left" data-embed-button="media_entity_embed" data-entity-embed-display="view_mode:media.full" data-entity-type="media" data-entity-uuid="ffc70d81-8bc0-42e9-b646-2faec089e354" data-langcode="en"> < /drupal-entity>
Aproveche el poder de la IA de forma rápida y responsable con Foundry for AI de Rackspace Technology (FAIR™). FAIR™ está a la vanguardia de la innovación global en IA, allanando el camino para que las empresas aceleren la adopción responsable de soluciones de IA. FAIR se alinea con cientos de casos de uso de IA en una amplia gama de industrias y, al mismo tiempo, permite la personalización mediante la creación de una estrategia de IA personalizada que se aplica a sus necesidades comerciales específicas. Capaces de implementarse en cualquier plataforma de nube pública privada, híbrida o de hiperescala, las soluciones FAIR empoderan a las empresas de todo el mundo al ir más allá de la transformación digital para desbloquear la creatividad, liberar la productividad y abrir la puerta a nuevas áreas de crecimiento para nuestros clientes.Siga a FAIR en LinkedIn.
Recent Posts
Patrones de redes híbridas de Google Cloud - Parte 2
Octubre 16th, 2024
Patrones de redes híbridas de Google Cloud - Parte 2
Octubre 15th, 2024
Cómo aprovecha Rackspace AWS Systems Manager
Octubre 9th, 2024
Windows Server impide la sincronización horaria con Rackspace NTP
Octubre 3rd, 2024