Data balancing methods by fuzzy rough sets - Tài liệu, ebook, giáo trình

Công nghệ thông tin 40 T. T. Huyen, L. B. Dung, M. Dinh, “Data balancing methods by fuzzy rough sets.” DATA BALANCING METHODS BY FUZZY ROUGH SETS Tran Thanh Huyen1,2*, Le Ba Dung3, Mai Dinh4 Abstract: The robustness of rough sets theory in data cleansing have been proved in many studies. Recently, fuzzy rough set also make a deal with imbalanced data by two approaches. The first is a combination of fuzzy rough instance selection and balancing methods. The second tries to use diffe

20 trang | Chia sẻ: huongnhu95 | Lượt xem: 787 | Lượt tải: 0

Tóm tắt tài liệu Data balancing methods by fuzzy rough sets, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

rent criteria to clean majorities and minorities classes of imbalanced data. This work is an extension of the second method which was presented in [16]. The paper depicts complete study about the second method with some proposed algorithms. It focuses mainly on binary classification with kNN and SVM for imbalanced data. Experiments and comparisons among related methods will confirm pros and coin of each method with respect to performance accuracy and time consumption. Keywords: Rough Set theory; Fuzzy-rough sets; Granular computing; Imbalanced data; Instance selection. 1. INTRODUCTION Rough set theory was first introduced by Pawlak [21, 22] in early 1980s as a machine learning methods for knowledge acquisition from data. It provides mathematical approach to deal with inconsistency between items in datasets. It consequently can be used for pattern extraction, decision rule generation, feature selection and data reduction. In case of data reduction/selection, main ideas are extracting fuzzy memberships of positive regions [23] and choosing the instances which have large membership degrees (degrees have to larger than thresholds) for training phases [3, 13, 28]. It thus can reduce noise by defining and eliminating low quality instances in balance datasets. Other issue that may cause an inconsistency problem is imbalanced data [15]. A dataset is imbalanced when the numbers of instances in some classes are much larger than in others. Such classes are called majority classes. The classes with small cardinality are referred to as minority classes. There are two possible approaches in using fuzzy rough set [23] to select instance from imbalanced datasets. The first is combination of balancing methods and a rough set based noise removed technique [24, 25, 29]. In these approaches, fuzzy rough sets are first used to remove low quality instances. Then, a well-known balancing technique called “Synthetic Minority Oversampling Technique (SMOTE)” [4] is employed to form a candidate set. Finally, fuzzy-rough sets are used again to select quality instances from the candidate sets. The second approach is using different criteria in defining different thresholds for majority and minority classes [16, 27]. By using small threshold for minority class, more items from minority classes can be selected. Besides, the research in [16] introduces other method to deal with highly imbalanced data by keeping all instance in minority class while remove/change label in majority class. The experimental results in those studies show considerable improvement in classification performance. However, the absence of generating candidates and optimizing thresholds methods limits application in real. There are also some study in using fuzzy-rough sets as classification methods for imbalanced data [26, 30]. However, making a classifier is not in scope of this paper. This research will show complete study of using fuzzy rough set to select instance in Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số Đặc san CNTT, 12 - 2020 41 imbalanced dataset without any balancing techniques. The paper is structured as follows: Section 2 reviews the original rough set theory and an extension called fuzzy-rough set; In section 3, fuzzy rough instance selection approach to deal with inconsistency is discussed along with their issues; The proposed algorithms of this research with methods to choose parameters are introduced in section 4; Experiments with results and comparisons are discussed in section 5; Finally, section 6 concludes the study. 2. THEORETICAL BACKGROUND 2.1. Information Systems and Tolerance Relation An information system is represented as a data table. Each row of this table represents an instance of an object such as people, things, etc. Information about every object is described by object attribute (feature) values. An information system in the rough sets study is formally defined as a pair ( ),I U A= where U is a non-empty finite set of objects called the universe and A is a non-empty finite set of attributes such that :af U A→ for every a A [21, 22]. The non-empty discrete value set Va is called the domain of a. The original rough set theory deals with complete information systems in which ( ), , ax U a A f x   is a precise value. If U contains at least an unknown value object, then I is called an incomplete information system, otherwise complete [14]. In incomplete information systems, unknown values are denoted by special symbol "∗" and are supposed to be contained in the set Va. Any information system taking the form  ( ),I U A d=  is called a decision table where d A is called a decision (or label) and elements of A are called conditions. Let  1,..,d kV d d= denote the value set of the decision attribute, decision d then determines a set of partitions  1 2, ,.., kC C C of universe U, where ( ) ,1 .i d iC x U f x d i k=  =   Set Ci is called the i-th decision class or concept on U. We assume that every object in U has a certain decision value in Vd. Formally, in incomplete information systems, relation ( ), , ,PTOR x y P A denotes a binary relation between objects that are possibly equivalent in terms of values of attributes in P [14]. The relation is reflexive and symmetric, but does not need to be transitive. Let ( ) ( ) ,P PT x y U TOR y x=  be the set of all objects that are equivalent to x by P , which is then called an equivalence class. The family of all equivalence classes on U based on an equivalence relation is referred to as a category and is denoted by .PU TOR From equivalence classes, Kryszkiewicz [14] defined an approximation space that contains lower and upper approximations denoted by apprX and apprX respectively, of set X U as follows: ( ) ( )  ( )  , = . P PP P appr X T x x U T x X x U T x X =     (1) Công nghệ thông tin 42 T. T. Huyen, L. B. Dung, M. Dinh, “Data balancing methods by fuzzy rough sets.” ( ) ( )  ( )  , = . P PP P appr X T x x U T x X x U T x X =       (2) In decision table  ( ), ,I U A d=  the positive region ( )PPOS X of class X in terms of attribute set P A is defined as: ( )   ( ) .P dPPOS X appr T x x X=  (3) Apart from using tolerance relations to define a rough set in incomplete information systems, there are also numerous studies [11, 18, 19, 20] that deal with incomplete or imperfect information systems in which data are not described by precise and crisp values. 2.2. Fuzzy Rough Set A classical (crisp) set is normally defined as a collection of elements x X that can be finite, countable or over countable. Each single element can either belong to or not belong to a set ' .X X For a fuzzy set, a characteristic function allows various degrees of membership for elements of a given set. Then a fuzzy set X in X is a set of ordered pairs [34]: ( )( ) , ,x x x X= XX (4) Where ( )  0,1x  is called the membership function or grade of membership (also the degree of compatibility or degree of truth) of x inX . In crisp sets, for two special sets  and U the approximations are simply defined as ( ) 1 R appr U x = and ( ) 0. R appr x  = Based on the two equivalent definitions, lower and upper approximations may be interpreted as follows: An element x belongs to the lower approximation R appr X if all elements equivalent to x belong to X. In other words, x belongs to the lower approximation of X if any element not in X is not equivalent to x, namely, ( ), 0.R x y = Likewise, x belongs to the upper approximation of X if ( ), 1.R x y = Now the notion of rough sets in crisp sets is extended to included fuzzy sets. Let  and  R denote the membership functions of set X and of the set ( ) ( ) , , ,x y U U x y  R respectively. The fuzzy approximation space [23] of fuzzy set X on X in terms of fuzzy relation ( ),x yR can be defined as follows: ( ) ( ) ( ) inf , ,appr Rx x y y  =X X R I (5) ( ) ( ) ( ) sup , ,Rappr x x y y  = XX T R (6) Where I and T are fuzzy implicators1 and triangular norms (t-norm)2, respectively. 1A fuzzy implicator is a function : 0,1 0,1 0,1           →I which satisfies the following properties: ( ) ( )0,0 1, 1, ;a a= =I I I is decreasing in the first argument and increasing in the second argument. Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số Đặc san CNTT, 12 - 2020 43 From the above equations, it is noticed that membership of lower and upper approximation for one instance strongly depend on only one other instance since they are defined by minimum and maximum. To soften this definition, Cornelis et al. [5] suggest a fuzzy-rough set definition using order weight averaging (OWA) aggregation [32]. An OWA operator WF is a mapping : nF R R→ using weight vector 1,.., nW w w= such that. ( )1 1 ,.., n W n i i i F a a w b = = (7) Where ib is thi largest of the values in  1,.., .na a An OWA operator is bounded by the minimum and maximum of vector W. It thus can soften minimum and maximum operators. Suppose that there are two vectors min min min 1 ,.., nW w w= and max max max 1 ,.., nW w w= where min min 1 ... nw w  and max max 1 ... ,nw w  we can define OWA fuzzy rough sets as follows: ( ) ( ) ( ) ( )min , , .appr Wx F x y y  =X X R R I (8) ( ) ( ) ( ) ( )max , , .appr Wx F x y y  = XX T R R (9) The effective of using OWA to define fuzzy rough set in dealing with imbalance has been proved in literature [26, 30]. In fact, the definition of a fuzzy relation on an attribute set depends on individual systems. One example of defining relation methods to deal with incomplete information systems is shown in [17]. Further investigation of the combination between fuzzy and rough sets can be found in [5, 8, 9, 23, 33]. 3. ROUGH SELECTION AND PROBLEM STATED Significant research on instance selection using rough sets model was introduced the work of Caballero et al. [3]. For this purpose, the authors calculated approximations and positive regions for each class in training sets. Instances for training phases are then selected in two ways. In the first method, they try to delete all instances in the boundary. The second employs a nearest neighborhood algorithm to relabel instances in the boundary region. The issues associated with these approaches were raised in Jensen and Cornelis’ study [13]. Another possible limitation is that they just deal with crisp sets of attributes in decision tables. To deal with the problems identified in the study of Caballero et al., an approach called Fuzzy-Rough Instance Selection (FRIS) has been proposed [13]. The notion of this approach is using memberships of positive regions to determine which instances should be kept and which instances should be discarded for training. First, the method calculates relations among instances and then measures quality of each instance by its positive region membership. An instance will be removed if the membership degree is less than a specified threshold. For the same purpose, another approach of measuring 2A t-norm is a function : 0,1 0,1 0,1           →T which satisfies the following properties: Commutativity: ( ) ( );, ,a b b a=T T Monotonicity: ( ) ( ), ,a b c dT T if a c and b d ; Associativity: ( )( ) ( )( ), , , , ;a b c a b c=T T T T The number 1 acts as an identity element: ( ),1 .a a=T Công nghệ thông tin 44 T. T. Huyen, L. B. Dung, M. Dinh, “Data balancing methods by fuzzy rough sets.” quality of instances was introduced in [28]. Such approaches will undoubtedly refine the positive regions with quality instances. However, one limitation to be considered is that there is only one threshold used for all classes. In some applications such as medical disease prediction, it is not uncommon to be interested in some specific groups, disease groups for example, rather than the others (healthy groups). When only one threshold is used, once noise is eliminated, valuable instances in interesting classes also disappear. Figure 1 illustrates approximations of two datasets, namely “vehicle3” and “yeast6”3. In “vehicle3”, if we use one threshold for lower approximations of the both classes, the number of deleted items could be the same for negative and positive classes. In contrast, if threshold is 0.5 in dataset “yeast6”, the system will remove almost data in positive class. Figure 1. Approximations distribution of two datasets. Some approaches use the advantage of fuzzy-rough to qualify instance sets formed by SMOTE [24,25,29]. Therefore, the issues above may be avoided. The steps of these methodologies can be summarized as follows: 1. Using fuzzy-rough sets to calculate the qualities of every instance and choose instances having high quality. 2. Using SMOTE to create artificial instances from the first step and add to datasets. 3. Using fuzzy-rough sets to eliminate low quality instances from datasets after the second step. There are some differences among these methods. In Ramentol et al.’s study named SMOTE-RSB∗ [24,25], the first step does not appear. In step 3, the algorithm removes only low quality instances in the artificial instance sets. In Verbiest’s research [29], quality measurements are different for step 1 and 3. The first measures performance for imbalanced data while the last deals with balanced information. Thus, they call the algorithm FRIPS- SMOTE-FRBPS (hereinafter referred to as FRPSS in this paper). It is also claimed in [29] that FRPSS produces better training sets compared with SMOTE-RSB∗. On the other hand, measuring quality of each instance and apply different thresholds for majority and minority classes could be a good deal for imbalanced datasets [16]. The 3 Properties of the two datasets are described in Table 1 Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số Đặc san CNTT, 12 - 2020 45 idea is that choosing different thresholds results in removing more data in majority classes. In [27], each class has their own thresholds for positive regions. Thus this method can be placed in this group when applying for imbalanced datasets. In [16], another method to sub-sampling majority classes and oversampling minority class was proposed based on this idea. Some experiments already showed considerable result in better classification performance. There are two issues that need to be solved. Firstly, in some datasets, quality measurements for all instances are the same in each class. Therefore it cannot decide which instance should be removed/ reserved. Secondly, in the previous study of using multiple thresholds fuzzy rough instance selection, there are not methods to optimize thresholds for such methods. 4. MULTIPLE THRESHOLDS FUZZY ROUGH INSTANCE SELECTION As stated in the last section, the study in [16] has not been complete. This section first revises two algorithms proposed in [16] and introduces a new method with some mathematical definitions. It then presents methods to make higher standard deviation, thus, approximations are spread out over a wider range of values. Last, we will show the methods of choosing thresholds for the algorithms. 4.1. The algorithms In decision table  ( ), ,I U A d=  let ( ),a x yR and ( ),a x yR denote a fuzzy relation and membership function, respectively, between objects ,x y U on attribute .a A Then the membership function of the relation on an attribute set P A is defined as: ( ) ( ), , P aa P x y x y =R RT (10) Where T is a t-norm. In fact, there are many ways to calculate the membership function of a fuzzy relation between two instances. It is also possible to use distance based similarity such as the normalization of Euclidean distance. It depends on the characteristics of each system. The membership functions of approximation spaces for an object set X U on attribute P A can be defined by either equations (5) and (6) or (8) and (9). In this study, as mentioned in Section 2, we must note that in the decision table  ( ),I U A d=  is a single label decision table such that ( )df x is a precise value. In fuzzy sets, for a single label decision table, ( ) 0 iC x  if ( )d if x d= and ( ) 0,iC x = otherwise. In selecting learning instances, we first define a function to measure the quality of an instance for each class X in terms of relation R : ( ) ( ) ( ) ( )1appr apprx x x   = + −X X X R R R (11) Where  0,1 , X is the fuzzy set on X. In the above equation, note that x may or may not belong to X. This is where this approach differs from other approaches. When comparing ( )x X R for ,x X we can assume that 1 = as in the study of Jensen et al. [13] and 0.5 = as in the approach of Verbiest et al. [29]. Công nghệ thông tin 46 T. T. Huyen, L. B. Dung, M. Dinh, “Data balancing methods by fuzzy rough sets.” In this paper, we focus on developing methods to balancing data with binary classes. X and Y present two classes in imbalanced dataset. The sooner is minority class and the later is majority class. The term positive/negative classes some time are also used as exchangeable for minority/majority classes in this article. Now, let XT and YT be the selection thresholds for minority and majority classes, X and be the fuzzy sets on minority class X and majority classes Y, respectively, R denotes the fuzzy relation on the universal. The set of instances selected for the training phase can be defined as: ( )  ( ) X YS x X x T x Y x T =   =  XR R (12) Then, the algorithm to select instances can be described as shown in Algorithm 1. Algorithm 1 - MFRIS1 - Choosing or eliminating instances for both majority and minority classes Require: , ,X Y minority and majority classes; , ,X YT T the selection thresholds for minority and majority classes. Ensure: Decision table  ( ),S A d Calculate the quality measurement of all instances for their classes S for ix X do if ( ) Xx t X R then  S S x  end if end for for x Y do if ( ) j Y x t R then  S S x  end if end for In the first algorithm, depending on labels of instances, the quality measurement of every instance on their classes will be compared with the threshold for minority or majority thresholds. This means that an instance will be deleted from a training set if it is low quality even it is in minority classes. The different thresholds may help us to keep more instances in minority classes if necessary. In addition, we also proposed the second algorithm to select instances. The notion of this method is keeping all instances in minority classes while removing or relabeling some instances in majority classes. To describe the algorithm, we first introduce some definitions. Let ,X denote fuzzy sets on , ,X Y respectively. Thresholds for minority and Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số Đặc san CNTT, 12 - 2020 47 majority classes are Xt and Yt , respectively. The set of instances for which labels could be changed can be defined as: ( ) ( ) Y X Y XS x Y x t x t → =    R RX (13) From the above definition, there are some instances in majority classes whose labels could be changed to a minority class label. Now we can re-calculate the class membership functions of an instance Y Xx S → as ( ) ( ) ( ); 0.x x x  = =X Finally, the selected instances for the training phase can be defined as: ( )  .Y Y Y XS X x Y x t S →=    R (14) The algorithm MFRIS2 is then described as shown in Algorithm 2. Algorithm 2 - MFRIS2 - Choosing, removing or relabelling instances for majority classes Require: X; Y, the family set of minority and majority classes; , ,X Yt t the selection thresholds for minority and majority classes. Ensure: Decision table  ( ),S A d Calculate quality measurement of all instances in the majority classes for all classes ;YS  Y XS →  for x Y do if ( ) Yx t  R then  Y YS S x  else if ( ) Xx t  R X then ( ) ( )x x R R X ( ) 0x R  Y X Y XS S x→ →  end if end for Y Y X S X S S →   It is discussed in our earlier papers that the second algorithm working well in highly imbalanced data. These dues to oversampling in minority classes and sub sampling in majority side. However, if the quality of the original instance in positive classes is low, it may cause low performance consequently. Therefore, in this state, another algorithm is introduced to such kind of datasets. At first, set of negative that need to change labels is defined by Equation 13. Quality membership of all instances including original and synthesis items will be compared with thresholds to choose valuable training sets. The selected set is defined as follows: ( )  ( ) .Y X X YS x X S x t x Y x t →=    =  R RX (15) Công nghệ thông tin 48 T. T. Huyen, L. B. Dung, M. Dinh, “Data balancing methods by fuzzy rough sets.” The algorithm MFRIS3 is then depicted as shown in Algorithm 3. Algorithm 3 - MFRIS3 - Choosing instances for positive, removing or relabeling instances for majority classes Require: X; Y, minority and majority classes. , ,X Yt t the selection thresholds for minority and majority classes Ensure: Decision table  ( ),S A d Calculate quality measurement of all instances in the majority classes for all classes ;YS  Y XS →  Y XS →  for x Y do if ( ) Yx t  R then  Y YS S x  else if ( ) Xx t  R X then ( ) ( )X Yx x  ( ) 0Y x   Y X Y XS S x→ →  end if end for for x X do if ( ) Xx t  R X then  X XS S x  end if end for X Y Y X S S S S →   4.2. Thresholds Optimization and Granular Tuning In [16], researchers have just illustrated improvement in classification performance by changing thresholds for different classes of minority and majority. However, at that state, they have to manually choose thresholds for each dataset. Furthermore, in some case, quality of instances to their classes are not distinguishable, thus it makes impossible to cleanse datasets. Figure 2, the second row shows an example of the histogram of negative class approximations with low dispersion of a dataset. All instances in negative class have the same average approximation membership as 0.8. Therefore, while optimizing thresholds, we will try to spread out the membership values if such issues occur. Because instances are chosen/removed based on comparison quality degree with thresholds. It is thus possible to use qualities of instances as threshold candidates [29]. Then using leave-one out strategy to test and calculate accuracy on learning data. However, it results in thousands of times for train and test when number of instance is big. Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số Đặc san CNTT, 12 - 2020 49 Figure 2. Histogram to show distribution of approximations and average approximations for a dataset. For the algorithms in this paper, we divide qualities of instances into some groups by limited number of cuts and use these cuts as threshold candidates. For dataset which has low dispersion of approximations/qualities of instance, it first needs to spread out such degrees by making change in relations among instances. To do this, the membership function of relation between two objects is defined as follows: ( ) ( ), 1 , a a x y x y = − R (16) Where ( )  , 0,1a x y  is normalized distance between x and y,  0.1  is called granularity. For discrete values ( ), 1a x y = if ( ) ( ) ( ) ( )* *,a a a af x f y f x f y=  =  = ( ), 0,a x y = otherwise. In case of continuous value, ( ) ( ) ( ) ( ) , , a a a f x f y x y l a  − = where l(a) is the range of value domain on a. Figure 3. Standard derivation for ozon_one_hr dataset. Figure 3 depicts the change of standard derivation of negative instance qualities for ozone_one_hr dataset. The higher standard derivation, the more spread out data points are. However, it is unnecessary to find the granularity that derives maximal standard derivation. For the three algorithms in this research, it just needs to find the granularity that enables broaden instance quality data points to some extent. Back to Figure 2, the Công nghệ thông tin 50 T. T. Huyen, L. B. Dung, M. Dinh, “Data balancing methods by fuzzy rough sets.” first row shows that with granularity equal to 0.125, approximations on the dataset spread out more than that with granularity equal to 1.0. The Algorithm 4 shows the method to tune granularity. Algorithm 4 Tuning granular Require: X; Y, the family set of minority and majority classes n groups, the number of thresholds candidate Ensure: Granular  quality measurements 1  max 0groups for  from 1 down to 0 step by - 0.05 do 1  Calculate quality measurement with  ( ) ( )_ ,Ycut make cut x x Y ngroups  if maxcut groups then Store  Qualities end if if cut groups then Break end if end for It notices that granularity in this research is different from study in [28]. In such approach, the authors try to find maximal granularity ( )  )0, .x   Every instance must have its own granularity by which the instance fully belongs to the positive region. In this research, there is only one granularity for the whole system and for spreading out instance qualities. The tuned granularity is then used to find optimal thresholds as depicted in Algorithm 5. Algorithm 5 Optimizing thresholds Require: X; Y , the minority and majority classes ngroups, number of threshold candidates for each class ( ) ( ), , ,x x x U  R R X qualities for both X and Y Ensure: Optimal thresholds if MFRIS1 then ( ) ( ) ( ),Xc t cut x x X ngroups RX ( ) ( ) ( ),Yc t cut x x Y ngroups R else if MFRIS2 then ( ) ( ) ( ),Xc t cut x x Y ngroups RX ( ) ( ) ( ),Yc t cut x x Y ngroups R else if MFRIS3 then Nghiên cứu khoa học công nghệ Tạp chí Nghiên cứu KH&CN quân sự, Số Đặc san CNTT, 12 - 2020 51 ( ) ( ) ( ) ( ) ( ), ,Xc t cut x x X ngroups cut x x Y ngroups    R RX ( ) ( ) ( ),Yc t cut x x Y ngroups R end if for ( ) ( ),X X Y Yt c t t c t  do ( ), ,X YTrainSet MFRISs U t t by the algorithms for x U do  TrainSet TrainSet x − Determine decision of x base on ( ) ,R x y y TrainSet  end for Calculate ( )AUC ,X Yt t end for ( ) ( ) aO rpt gimi , ax ,z me X Y X Yt t AUC t t= 5. EXPERIMENTS 5.1. Setting up experiments Experiments were conducted on datasets from KEEL dataset repository [1] and UCI Machine Learning Repository [2]. Datasets and their properties are shown in Table 1. In this table, “IR” show imbalance rates for the number of instances between majority and minority classes. Each dataset is divided into three parts for both minority and majority classes. For each train test, we used two parts to form training data and the rest were used as testing instances. Experiments were conducted using R programming4 running on RStudio5. Table 1. Datasets for experiments. Datasets Instances Feature IR iris0 150 4 2.00 Indian Liver Patient 579 10 2.51 vehicle3 846 18 2.99 glass-0-1-2-3 vs 4-5-6 214 9 3.20 new-thyroid1 215 5 5.14 newthyroid2 215 5 5.14 segment0 2308 19 6.02 glass6 214 9 6.38 paw02a-800-7-30-BI 800 2 7.00 glass-0-1-5_vs_2 172 9 9.12 yeast-0-2-5-6 vs 3-7-8-9 1004 8 9.14 yeast-0-2-5-7-9 vs 3-6-8 1004 8 9.14 4 54 Công nghệ thông tin 52 T. T. Huyen, L. B. Dung, M. Dinh, “Data balancing methods by fuzzy rough sets.” ecoli-0-4-6 vs 5 203 6 9.15 ecoli-0-6-7 vs 5 220 6 10.00 ecoli-0-1-4-7 vs 2-3-5-6 336 7 10.59 ecoli-0-1 vs 5 240 6 11.00 glass2 214 9 11.59 cleveland-0 vs 4 177 13 12.62 ecoli-0-1-4-6 vs 5 280 6 13.00 ozone eight hr 1847 72 13.43 shuttle-c0-vs-c4 1829 9 13.87 glass4 214 9 15.46 abalone9-18 731 8 16.40 yeast-1-4-5-8 vs 7 693 8 22.10 glass5 214 9 22.78 kr-vs-k-one vs fifteen 2244 6 27.77 yeast4 1484 8 28.10 winequality-red-4 1599 11 29.17 yeast-1-2-8-9 vs 7 947 8 30.57 ozone one hr 1848 72 31.42 yeast5 1484 8 32.73 winequality-red-8 vs 6 656 11 35.44 ecoli-0-1-3-7 vs 2-6 280 7 39.00 yeast6 1484 8 41.40 winequality-red-8 vs 6-7 855 11 46.50 abalone-19 vs 10-11-12-13 1622 8 49.69 shuttle-2 vs 5 3316 9 66.67 winequality-red-3 vs 5 691 11 68.10 poker-8-9 vs 5 2075 10 82.00 Training data are first cleaned by instance selection algorithms. For rough set based instance selection methods, we measure fuzzy tolerance relations on an attribute by membership function ( ), a x y R as discussion in Section 4.2. The fuzzy tolerance relation on a set of attributes can be calculated by a combination of the relation on each attribute by Lukasiewicz’s t-norm ( )2 1max 1,0 .x x= + −T For those which use OWA fuzzy rough sets, the OWA operators are set as follows: 3 min min min 1 3 2 2 1 i i n iW w w − + −= = = − for 1,2,3i = and 0 for 4,.., ,i n= 3 max max max 3 2 2 1 i i iW w w − = = = − for 1,2,3i = and 0 for 4,.., ,i n= where n is number of instances in datasets. For our approach, we also use OWA fuzzy rough set with above operators and set 0.5 = to calculate . X R That is the same as instances quality measurement in FRIPS- SMOTE-FRBPS in terms of distinguishing instances qual

Các file đính kèm theo tài liệu này:

data_balancing_methods_by_fuzzy_rough_sets.pdf