Data ingestion | Iraya Energies

WHY IRAYA?

Value Proposition
Data Ingestion
Data Digestion
Knowledge Extraction

How your unstructured data is processed?

DATA INGESTION

Unstructured documents come in a vast range of formats and layouts. The files are commonly in .pdf, .docx, .xlsx, pptx, .las and .segy formats:

Geology	Reservoir engineering	Drilling	Geomechanics	Administrative
Geophysics	Petrophysics	Production	Facilities	QHSE

The files are ingested through a consecutive pipeline of workflows using machine learning techniques.

The workflow for automatically extracting information from the documents starts with a set of heuristic algorithms to identify blocks/segments within a document, after which, supervised machine learning is used to classify the document segments as either text or non-text.

TEXT

Optical Character Recognition (OCR) is applied to the text segments to convert them into editable text. Named-entity Recognition (NER) and Pattern-Based Recognition (PBR) techniques are applied to these OCR results in order to extract metadata from for example a well report such as well name, kelly-bushing, spud dates, and contractors.

IMAGES AND TABLES

On a separate data pipeline, the non-text components such as images and tables are tagged and using deep convolutional neural networks (DCNN), the machine learns to auto classify different image types, including seismic images, stratigraphic charts, maps, cores, drawings, and tables to enable aggregation of the images per type.

Subscribe to get updates and more.

PRODUCTS

Data Atelier

ElasticDocs

ED2K

Bonaparte 400

RESOURCES

People

Careers

Blog

Publications

SERVICES

Data Factory

Knowledge Workflows

GET IN TOUCH

WHY IRAYA

Value Proposition

Data Ingestion

Data Digestion

Knowledge Extraction

Read our privacy statement

WHY IRAYA?

Thank you for signing up!