How your unstructured data is processed?
DATA INGESTION
Unstructured documents come in a vast range of formats and layouts. The files are commonly in .pdf, .docx, .xlsx, pptx, .las and .segy formats:
Geology | Reservoir engineering | Drilling | Geomechanics | Administrative |
Geophysics | Petrophysics | Production | Facilities | QHSE |
The files are ingested through a consecutive pipeline of workflows using machine learning techniques.
The workflow for automatically extracting information from the documents starts with a set of heuristic algorithms to identify blocks/segments within a document, after which, supervised machine learning is used to classify the document segments as either text or non-text.
TEXT
Optical Character Recognition (OCR) is applied to the text segments to convert them into editable text. Named-entity Recognition (NER) and Pattern-Based Recognition (PBR) techniques are applied to these OCR results in order to extract metadata from for example a well report such as well name, kelly-bushing, spud dates, and contractors.
IMAGES AND TABLES
On a separate data pipeline, the non-text components such as images and tables are tagged and using deep convolutional neural networks (DCNN), the machine learns to auto classify different image types, including seismic images, stratigraphic charts, maps, cores, drawings, and tables to enable aggregation of the images per type.
PRODUCTS
Data Atelier
ElasticDocs
ED2K
Bonaparte 400
RESOURCES
People
Careers
Blog
Publications
SERVICES
Data Factory
Knowledge Workflows
GET IN TOUCH
WHY IRAYA
Value Proposition
Data Ingestion
Data Digestion
Knowledge Extraction
Copyright © 2024 Iraya Energies. All rights reserved.
Read our privacy statement