Iranian Journal of Oil & Gas Science and Technology
, Vol. 4 (2015), No. 2, pp. 01-14
http://ijogst.put.ac.ir
Separating Well Log Data to Train Support Vector Ma
chines for Lithology
Prediction in a Heterogeneous Carbonate Reservoir
Mohammad Ali Sebtosheikh
1
, Reza Motafakkerfard
1
, Mohammad Ali Riahi
2
, and Siyamak
Moradi
1,*
1
Department of Petroleum Exploration, Petroleum Uni
versity of Technology, Abadan, Iran
2
University of Tehran, Geophysics Institute, Tehran,
Iran
Received
: July 16, 2013;
revised
: June 29, 2014;
accepted
: August 23, 2014
Abstract
The prediction of lithology is necessary in all are
as of petroleum engineering. This means that to
design a project in any branch of petroleum enginee
ring, the lithology must be well known. Support
vector machines (SVM’s) use an analytical approach
to classification based on statistical learning
theory, the principles of structural risk minimizat
ion, and empirical risk minimization. In this
research, SVM classification method is used for lit
hology prediction from petrophysical well logs
based on petrographic studies of core lithology in
a heterogeneous carbonate reservoir in southwestern
Iran. Data preparation including normalization and
attribute selection was performed on the data. Well
by well data separation technique was used for data
partitioning so that the instances of each well
were predicted against training the SVM with the ot
her wells. The effect of different kernel functions
on the SVM performance was deliberated. The results
showed that the SVM performance in the
lithology prediction of wells by applying well by w
ell data partitioning technique is good, and that i
n
two data separation cases, radial basis function (R
BF) kernel gives a higher lithology misclassificati
on
rate compared with polynomial and normalized polyno
mial kernels. Moreover, the lithology
misclassification rate associated with RBF kernel i
ncreases with an increasing training set size.
Keywords
:
Lithology Prediction, Support Vector Machines, Kern
el Functions, Heterogeneous
Carbonate Reservoirs, Petrophysical Well Logs
1. Introduction
Lithology prediction is one of the most important i
ssues in all fields of petroleum engineering such a
s
reservoir characterization, formation evaluation, g
eological studies, reservoir modeling, enhanced oil
recovery processes, and well planning including dri
lling and well completion management. It is
absolutely necessary to identify the exact litholog
y of a predetermined depth, especially in
heterogeneous carbonate reservoirs, in order to mak
e petroleum engineering related decisions
Lithology prediction from drilling cuttings is not
accurate due to problems associated with depth
matching of cuttings; lithology determination from
core plugs is not also economic because of
operation costs. Accordingly, petrophysical well lo
gs are used for lithology identification as a more
efficient and cheaper approach than drilling cuttin
g and core analysis (Rider, 2002). There are some
traditional lithology identification methods that h
ave been developed from petrophysical well logs by
combining them and using cross plots. These methods
are still useful today for quick evaluations
(Ellis and Singer, 2008). However, traditional cros
s plotting methods have lost their efficiency in
*
Corresponding Author:
Email: moradi.s@put.ac.ir
2
Iranian Journal of Oil & Gas Science and Technolog
y,
Vol. 4 (2015), No. 2
large data sets of heterogeneous reservoirs. Severa
l approaches have been introduced for lithology
classification such as cross plots interpretation a
nd statistical analysis (Delfiner et al., 1987), st
atistical
analysis based on histogram plotting (Busch et al.,
1987), associating analysis by fuzzy logic, neural
network and multivariable statistical methodologies
(Carrasquilla et al., 2008), artificial intelligen
ce
approach and multivariate statistical analysis (Lim
et al., 1999), fuzzy logic technique (Cuddy, 2000)
,
artificial neural network methodologies (Chang et a
l., 2002; Chikhi and Batouche, 2005; Katz et al.,
1999; Raeesi et al., 2012; Tang, 2009; Chikhi and B
atouche, 2007), multi-agent collaborative learning
architecture approach (Gifford and Agah, 2010), mul
tivariate statistical method (Tang and White,
2008), facies classification using seismic attribut
es by SVM (Bagheri and Riahi, 2013), aggregation of
principal components, clustering and discriminate a
nalysis (Teh et al., 2012), and statistical
characterization, discrimination, and stratigraphic
correction methodologies (Borsaru et al., 2006).
The performance of artificial neural networks and f
uzzy logic approaches are better than statistical
analyses (Busch et al., 1987; Carrasquilla et al.,
2008; Chang et al., 2002; Katz et al., 1999; Raeesi
et
al., 2012; Tang, 2009; Tang and White, 2008). Self-
organized Map (SOM) neural network method
shows a better performance in lithology classificat
ion compared with other methods (Chikhi and
Batouche, 2005). Probabilistic neural network invol
ves more computational steps and thus is slower
than other kinds of neural networks (Tang, 2009). T
he minimum misclassification rate of the said
methods is 19% which belongs to fuzzy logic techniq
ues (Cuddy, 2000) and the maximum
misclassification rate is 26% belonging to probabil
istic neural networks (PNN) (Tang, 2009).
Support vector machine (SVM) is based on the statis
tical learning theory and was first introduced by
Boser, Guyon, and Vapnik at the Computational Learn
ing Theory conference where they presented
their paper in 1992 (Boser et al., 1992). SVM has s
hown good performance in classification tasks.
This is attributed to the fact that SVM’s minimize
an upper bound of the generalization error through
maximizing the margin between the separating hyper
planes (Amari and Wu, 1999). Recently, SVM’s
have successfully been applied to a number of appli
cations such as drug design (Burbidge et al.,
2001), fault diagnosis in power transmission system
s (Ravikumar et al., 2008), microarray data
classification (Huerta et al., 2006), protein struc
ture prediction (Hua and Sun, 2001), text detection
in
digital videos (Shin et al., 2000), microarray data
and satellite radiance data classification (Lee et
al.,
2004), speaker identification (Mezghani et al., 201
0), document classification (Wang and Sun, 2011),
and hyper spectral images classification (Ding, 201
1).
This study examines the performance of different ke
rnel functions of SVM in the lithology prediction.
It uses a well by well data separation technique in
a heterogeneous carbonate reservoir in
southwestern Iran by applying petrophysical well lo
gs. To choose the well logs which are most
effective on the SVM performance, an attribute sele
ction approach was used. In order to find the best
SVM parameters, a grid search technique has been ut
ilized. The effect of different kernel types on the
SVM performance was investigated.
2. Methodology
2.1. Support vector machines
Support vector machines represent a machine learnin
g algorithm for both classification and regression
tasks (Alpaydin, 2010; Hamel, 2009). Support vector
classifiers are maximum margin classifiers that
find a decision function for pattern vectors
X
of dimension
n
attributes belonging to either of the two
classes (Boser et al., 1992). Maximum margin classi
fiers construct decision surfaces that are
M. A. Sebtosheikh et al. /
Separating Well Log Data to Train Support Vector M
achines ...
3
equidistant to the class boundaries called hyper pl
anes. They maximize the margin between the two
classes supporting hyper planes (Hamel, 2009).
An SVM, as a maximum margin classifier, is an optim
ization routine. It represents an ideal method to
select the best solution from a number of feasible
or possible solutions. That solution has the
maximum distance from the two-class hyper planes in
the presence of limitative instances of the two
classes which are called support vectors as constra
in conditions of the optimization problem. Support
vectors are instances that classes hyper planes can
not cross over in order to maximize the margin
(Hamel, 2009). To find an optimal decision function
, first the margin between the boundaries of the
classes is formulated in the direct space and then
it is transformed into the dual space by means of t
he
Lagrangian (Boser et al., 1992; Hamel, 2009).
2.2. SVM’s as linear classifiers
A data set, in its simplest form, is assumed to be
linearly separable. A training data point,
n
i
X
R
∈
,
1,...,
i
l
=
, where
l
is the total number of the training instances, and
n
is the dimension of the input
attributes. Every instance has a class which is lab
eled with
{ 1, 1}
i
y
∈ − +
. The decision function is
given as:
( )
,
T
f X
W X b
=
+
(1)
where,
,
n
W X
R
∈
;
b
is a scalar bias and
W
is a weight vector that is called the normal vecto
r. A
standard hyper plane to facilitate the computation
of the support vectors is given by:
1
min
n
i
T
i
X
W X b
∈
+ =
ℝ
(2)
The normal vector is obtained from the following op
timization problem (Kecman, 2005):
Minimize
1
2
T
W W
(3)
1 ,
1,..., ,
i
i
T
W X
b
i
l
subject to y
+
=
≥
(4)
where, Equation 4 is the constraint of Equation 3.
Equation 3, subject to Equation 4, has a solution i
n
its saddle point which can be determined by using t
he following Lagrangian functional:
(
)
1
1
, ,
(
) 1 ,
2
l
T
T
i
i
i
W X
L W b
W W
b
α
α
=
=
−
−
+
∑
(5)
where,
i
α
are Lagrangian multipliers as the dual parameters
(Kecman, 2005).
As the saddle point, the Lagrangian
L
is minimized with respect to
W
and
b
and is maximized with
respect to non-negative
i
α
.
To solve the optimization problem, the dual problem
is formed and from the Karush-Kuhn-Tucker
(KKT) conditions, the following relationship must b
e satisfied at the saddle point so as to differenti
ate
with respect to
W
s
and
b
s
; setting the derivatives equal to zero yields (Kec
man, 2005):
1
0
l
s
i
i
i
i
s
L
W
y X
W
α
=
∂
=
⇒
=
∂
∑
(6)
4
Iranian Journal of Oil & Gas Science and Technolog
y,
Vol. 4 (2015), No. 2
1
0
0
l
i
i
i
s
L
y
b
α
=
∂
=
⇒
=
∂
∑
(7)
where, the
s
subscript denotes the values at the saddle point.
In order to find
b
s
, by substituting
Equations 6 and 7 into Equation 5, the following du
al variables Lagrangian
( )
d
L
α
is obtained:
1
( )
2
T
T
d
maximize L
H
e
α
α
α
α
= −
+
(8)
( )
0
ˆ
0 ,
1,...,
T
i
y
subject to f x
i
l
α
α
=
=
≥
=
(9)
where,
1
2
[
]
, ,.....,
i
α
α α
α
=
T
;
H
is the Hessian matrix (
T
ij
i
j
i
j
H
y X
y
X
=
) and
e
is the unit vector. The
results of solving Equation 8 are the optimal value
s of
i
α
which are denoted by
si
α
. The optimal
values
W
s
and
b
s
are then found as follows (Kecman, 2005):
1
l
s
si
i
i
i
W
y X
α
=
=
∑
(10)
1
1
,
1,...,
sv
N
T
s
t
t
s
sv
t
sv
b
y
X W
t
N
N
=
=
−
=
∑
(1
1)
where,
sv
N
is the number of support vectors. The decision fun
ction
f
(
x
) and the indicator labeling
function,
i
F
are then given as:
1
( )
T
si
i
s
l
i
W X
b
f X
=
+
=
∑
(12)
(
)
( ) ,
F
i
sign f X
=
(13)
where, the indicator labeling function by performin
g the labeling task defines the classification
between categories
1
F
i
= −
or
1
F
i
= +
.
2.3. SVM’s as nonlinear classifiers
Very few data sets in the real world are linearly s
eparable. The remarkable characteristic of the
support vector classifiers is that the basic linear
framework is extended to the case where the data i
s
not linearly separable. The fundamental idea behind
this extensibility is to transform the input space
where the data set is not linearly separable into a
higher dimensional space called a feature space
(Hamel, 2009). The input vectors
n
X
R
∈
are projected onto vectors
( )
X
Φ
of a higher dimensional
feature space, where the data set is linearly separ
able and then the SVM separates new images of the
projections of
X
using the linear classifier formulation. This tran
sformation results in a quadratic
programming optimization problem with constraint in
the feature space (Kecman, 2005). The
indicator labeling functions becomes:
1
1
(
)
,
( )
(
)
T
i
l
l
F
i i
i
i
i
i
i
sign
y
sign
v k X X
b
X
X
α
=
=
Φ
=
=
+
Φ
∑
∑
(14)
where,
i
v
are the weights and
(
,
)
i
k X X
is the kernel function. Kernel functions evaluate
a dot
M. A. Sebtosheikh et al. /
Separating Well Log Data to Train Support Vector M
achines ...
5
product in feature space and the defining character
istic of a kernel is that the value of this dot pro
duct
is actually computed in the input space (Hamel, 200
9). Some common kernel functions and their
associated mathematical formulas are listed in Tabl
e 1.
Table 1
Common kernel functions and their associated mathem
atical formulas.
K erne l F unct io n
M a t he ma t ica l F o r mu la
Linear kernel
(
, )
i
i
k X X
X X
=
i
Polynomial kernel
(
, ) (
)
e
i
i
k X X
X X
=
i
Normalized Polynomial kernel
(
,
)
(
)
(
) (
)
e
e
e
i
i
i
i
k X X
X X
X X
X X
=
i
i i
Radial basis kernel
(
)
2
2
2
(
, )
i
X
X
i
k X X
e
σ
−
−
=
Following the linear support vector classifier, the
formulation for the nonlinear support vector
classifier is a generalization on the linear SVM an
d it is given by (Kecman, 2005):
1
,
1
(
)
(
,
1
)
2
l
l
d
i
i
j
i
j
i
i j
i
j
Maximize L
y
k
y
X X
α
α
α α
=
=
=
−
∑ ∑
(15)
1
0 ,
1,...,
0
i
l
i
i
i
i
l
subject to
for a separable nonlinear classifie
r
y
α
α
=
≥
=
=
∑
(16)
or
1
0 ,
1,...,
.
0
i
l
i
i
i
C
i
l
subject to
for an overlapping nonlinear classi
fier
y
α
α
=
≥
≥
=
=
∑
(17)
where,
C
is a positive nonzero value called penalty paramet
er (or cost factor) and determines the
trade-off between the training error and Vapnik–Che
rvonenkis (VC) dimension of the model. The VC
dimension of the classifier in a model class is the
number of instances in training data set that can
be
separated by the classifier (Hamel, 2009). More pre
cisely, a large value for
C
forces the optimization
to consider solutions with small margins (Hamel, 20
09). The penalty parameter
C
is an important
parameter on which the performance of SVM extremely
depends (Witten et al., 2011; Hamel, 2009).
The penalty parameter was chosen by grid search tec
hnique using the Weka software (Hall et al.,
2009).
After the optimization of the training data in orde
r to determine the Lagrangian multipliers, the
decision function and the indicator labeling functi
on are obtained by (Kecman, 2005):
1
(
)
)
(
,
l
i
i
i
i
f X
y
k X X
b
α
=
=
+
∑
(18)
1
(
,
)
i
l
i
F
i
i
i
sign
k X X
b
y
α
=
=
+
∑
(19)
Choosing the appropriate kernel function, it is pos
sible to classify nonlinearly separable data sets.
The
evaluation criterion for a nominal prediction work
is defined as the number of testing set instances
6
Iranian Journal of Oil & Gas Science and Technolog
y,
Vol. 4 (2015), No. 2
which were predicted correctly divided by the total
number of testing set instances. This criterion is
called the misclassification rate.
3. Data preparation
The lithologies of 427 instances from petrographic
analysis with the exact core depth were prepared.
These instances are anaclitic to three wells of a h
eterogeneous carbonate reservoir in southwestern
Iran. Three diagnosed lithology from petrographic a
nalysis were used. All the instances had been
matched with the log depth, whereas every instance
has exact values of petrophysical well logs as its
associated attributes. Deep latero log (LLD), shall
ow latero log (LLS), micro spherically focused log
(MSFL), sonic transit time log (DT), neutron porosi
ty log (NPHI), caliper log (CALI), photoelectric
factor log (PEF), density log (RHO) and gamma ray l
og (GR) were nine petrophysical well log
measurements in the three wells used for lithology
prediction. Since the learning process is easier on
equal limited range input data, all the input petro
physical well log data of every well were normalize
d
separately in the range of [-1, +1].
4. Implementation
4.1. Cross-validation
The result credibility of a data mining process dep
ends on choosing an efficient method for data
partitioning. In practical terms, it is common to h
old-out one-third of the data for testing and use t
he
remaining two-thirds for training. When trying to p
erform an accurate grid search technique for the
optimization of kernel function parameters, the ins
tance that is used for training or testing might no
t
be representative and all instances with a certain
class might be omitted from the training set. To
overcome these shortcomings of hold-out method in g
rid search, an important statistical technique,
called cross-validation, was used. In cross-validat
ion, the data set is divided into a fixed number of
folds or partitions. Since it has experimentally be
en proved that a 10-fold cross-validation gives the
best estimate of misclassification rate error, usin
g it as a standard cross-validation technique is
recommended. In a 10-fold cross-validation, the who
le of the data set is randomly separated into ten
equal partitions. Each part is held out in turn and
then the learning scheme is trained on the remaini
ng
nine; then, this procedure is executed on the nine
remaining different partitions (Witten et al., 2011
).
4.2. Attribute selection
In practice, adding irrelevant or distracting attri
butes to the input data set often confuses machine
learning systems such as SVM’s (Witten et al., 2011
).
The Weka software (Hall et al., 2009) was used to p
erform attribute selection against the whole data
set, where all the petrophysical well logs were ran
ked. Table 2 shows the ranked well logs during the
attribute selection operation. The top six relevant
attributes of RHOB, LLS, NPHI, LLD, DT, and PEF
were selected as the input data.
M. A. Sebtosheikh et al. /
Separating Well Log Data to Train Support Vector M
achines ...
11
Case 3: Prediction of well 3 instances
Three kernel types with their associated optimal pa
rameter values were tested separately to predict al
l
105 instances of well 3 against training with 322 i
nstances of well 1 and well 2. In this case, the ra
tio
of the training set size to the whole data set size
is 75.40%. Tables 11-13 show the confusion matrice
s
for RBF, polynomial, and normalized polynomial kern
el types respectively. Total lithology
misclassification rates associated with well 3 pred
ictions for RBF, polynomial, and normalized
polynomial kernels were 18.09%, 17.14%, and 17.14%
respectively as displayed in Figure 3.
Table 11
Confusion matrix of well 3 instances using RBF kern
el.
Predicted
Dolomite
Limestone
Anhydrite
Actual
Dolomite
28
11
0
Limestone
7
58
0
Anhydrite
1
0
0
Table 12
Confusion matrix of well 3 instances using polynomi
al kernel function.
Predicted
Dolomite
Limestone
Anhydrite
Actual
Dolomite
29
10
0
Limestone
7
58
0
Anhydrite
1
0
0
Table 13
Confusion matrix of well 3 instances using normaliz
ed polynomial kernel function.
Predicted
Dolomite
Limestone
Anhydrite
Actual
Dolomite
30
9
0
Limestone
8
57
0
Anhydrite
1
0
0
Figure 3
Total of lithology misclassification rate of well 3
instances using RBF, polynomial, and normalized po
lynomial
kernels.
0
10
20
30
40
50
60
70
80
RBF
Polynomial
Normalized Polynomial
Misclassification Rate (%)
Kernel Types
12
Iranian Journal of Oil & Gas Science and Technolog
y,
Vol. 4 (2015), No. 2
6. Conclusions
A support vector machine, as a supervised learning
classifier, was used to predict the lithology from
petrophysical well logs in a heterogeneous carbonat
e reservoir in southwestern Iran based on core
lithology verification. In order to remove irreleva
nt or distracting input well logs, an attribute
selection approach was employed to rank the input w
ell logs. Three well logs with less dependency
were therefore omitted from input well logs. aRHOB,
LLS, NPHI, LLD, DT, and PEF logs are most
lithology affected well logs in the investigated re
servoir. Well by well data separation as a data
partitioning criterion was performed for generating
training and testing data sets. Because of the
dependency of SVM on its associated parameters, a g
rid search algorithm was used to characterize
parameter optimization for each kernel function and
each data separation case. It is concluded that th
e
SVM is capable of predicting lithology in heterogen
eous carbonate reservoirs. All of the
misclassification rates of this study are less than
those in previous works. The results show that the
SVM performance in the lithology prediction of well
s by applying well by well data partitioning
technique is good and that, in two data partitionin
g cases, radial basis function (RBF) kernel gives
more lithology misclassification rate than polynomi
al and normalized polynomial kernels. In addition,
the lithology misclassification rate associated wit
h RBF kernel function increases with increasing
training data set size. Using these kernels with th
eir associated optimal parameter values, it is poss
ible
to predict lithology in the investigated reservoir.
In most cases, the anhydrite instances were predic
ted
to be dolomite. It seems that this occurred because
of the similarity of the petrophysical properties
of
these two lithology types. Therefore, using more li
thology affected well logs is recommended to
overcome this shortcoming.
Acknowledgements
The authors would like to thank the Exploration Dir
ectorate of National Iranian Oil Company (NIOC)
for providing the data for this research.
Nomenclature
b
: Bias constant
C
: Penalty parameter
e
: Exponent parameter of polynomial kernel
H
: Hessian matrix
k
: Kernel function
l
: Number of instances
L
: Lagrangian equation
X
: Input vector of attributes
y
: Output vector of class label
α
,
β
: Lagrangian multipliers
σ
: RBF parameter
ξ
: Slack variable
W
: Normal vector
Ф
: Mapping function from input space to feature spa
ce
Superscripts and subscripts
d
: Dual
i
,
j
: Indices
n
: Input space dimension
M. A. Sebtosheikh et al. /
Separating Well Log Data to Train Support Vector M
achines ...
13
s
: Optimal
p
: Primal
SV : Support vectors
T
: Transpose