How your unstructured data is processed?

DATA INGESTION 

Unstructured documents come in a vast range of formats and layouts. The files are commonly in .pdf, .docx, .xlsx, pptx, .las and .segy formats:

Geology Reservoir engineering Drilling Geomechanics Administrative
Geophysics Petrophysics Production Facilities QHSE

The files are ingested through a consecutive pipeline of workflows using machine learning techniques.

The workflow for automatically extracting information from the documents starts with a set of heuristic algorithms to identify blocks/segments within a document, after which, supervised machine learning is used to classify the document segments as either text or non-text.

TEXT

Optical Character Recognition (OCR) is applied to the text segments to convert them into editable text. Named-entity Recognition (NER) and Pattern-Based Recognition (PBR) techniques are applied to these OCR results in order to extract metadata from for example a well report such as well name, kelly-bushing, spud dates, and contractors.

IMAGES AND TABLES

On a separate data pipeline, the non-text components such as images and tables are tagged and using deep convolutional neural networks (DCNN), the machine learns to auto classify different image types, including seismic images, stratigraphic charts, maps, cores, drawings, and tables to enable aggregation of the images per type.

Data Ingestion

PRODUCTS

Data Atelier

ElasticDocs

ED2K

Bonaparte 400

RESOURCES

People

Careers

Blog

Publications

SERVICES

Data Factory

Knowledge Workflows

GET IN TOUCH

WHY IRAYA

Value Proposition

Data Ingestion

Data Digestion

Knowledge Extraction

Copyright © 2024 Iraya Energies. All rights reserved.