Shahed University

A New Machine Learning Ensemble Model for Class Imbalance Problem of Screening Enhanced Oil Recovery Methods

Seyede Alemohammad | meysam pirizadeh | Mohammad Manthouri | mohsen pirizadeh

URL :   http://research.shahed.ac.ir/WSR/WebPages/Report/PaperView.aspx?PaperID=148110
Date :  2020/12/13
Publish in :    Journal of Petroleum Science and Engineering
DOI :  https://doi.org/https://doi.org/10.1016/j.petrol.2020.108214
Link :  http://dx.doi.org/https://doi.org/10.1016/j.petrol.2020.108214
Keywords :Classification, Ensemble learning, EOR screening, Class imbalance, problem Mutual information, Hyper-parameter tuning

Abstract :
Enhanced Oil Recovery (EOR) methods have received a lot of attention today due to the increase in global oil demand and the reduction of oil production capacity from natural extraction methods. EOR methods are the solution for extracting oil from heavy and extra-heavy reservoirs, but the problem for reservoir engineers is choosing the right EOR method for an oilfield. In fact, the number of EOR methods and the features associated with them are very high, so the financial risk of implementing the wrong EOR method leads to feeling the need for an expert screening tool. Until now, various individual classifiers have been used for this purpose, but due to the class imbalance problem of the EOR dataset, most of them do not have good results, therefore, to increase the accuracy of the model, EOR methods with low records in the dataset are ignored, while industry needs all EOR methods. In this study, an ensemble learning-based approach is used to overcome the problem of class imbalance at the algorithmic level instead of the data level. For this purpose, an effective model called B2S is proposed by gathering the advantages of each of the ensemble-based methods, including Bagging, Boosting, and Stacking. As a result, B2S by creating a balance between variance and bias achieved an average test accuracy of 96.94 on the dataset including 426 samples from 10 different EOR methods that had a highly skewed distribution. The average accuracy also increases to 98.12 with a standard deviation of 1.63, after grouping some relatively similar EOR methods, which indicates the stability of the proposed method. The study also demonstrates direct relation between model performance and the impact of precise pre-processing, nonlinear feature selection using Mutual Information rather than conventional linear methods, and hyper-parameters tuning with newer methods such as Random Search.