The performance issue model classifies performance issues into five categories

MARSRFGPRFeaturesPCAFeaturesPCAFeaturesPCAAccuracy0.740.640.900.740.900.67Precision0.810.660.900.740.930.67MCC0.55-0.110.780.360.790

Table 2. – Performance of classification models on test data

RFGPRAccuracy0.890.89Precision0.920.87MCC0.720.72

For all of the machine learning techniques tested, the classification models using the model-selected features yielded better performance (Table 1). This suggests that while the principal components successfully explain the variance in the data, they fail to accurately characterize the relationship between the features and the cardiomyocyte content. RF models and GPR had similar performance with an accuracy and precision both of about 90%, while MARS models did not perform as accurately.

The performances of the RF and GPR classification models trained using the model-selected features were evaluated on the test data (Table 2). Both classification models performed comparably for the test data with an accuracy of 89%, precisions near 90%, and MCC values of 0.72. The results obtained for the test data are comparable to those obtained from LOO cross validation on the training data, indicating that the models accurately captured the relationship between the features and the cardiomyocyte content, while avoiding overfitting.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128233771502743

Internet of animal health things (IoAT): A new frontier in animal biometrics and data analytics research

Santosh Kumar, ... Mithilesh K. Chaube, in IoT-Based Data Analytics for the Healthcare Industry, 2021

3.4 Classification module

A classification model has been built using supervised machine learning techniques to perform animal disease prediction over sensory data (shown in Fig. 3).

The sensory database is prepared from the animal body equipped with sensor devices. The sensory database has been collected by intelligent sensors are such as the accelerometer sensors, gyroscope sensors, temperature sensors (equipped with animal's neck), load sensors (weight sensors), the microphones, heartbeat sensors, electrocardiogram (ECG) recording sensors, and gas sensor (attached with the nose (muzzle) of the animal).

Furthermore, to achieve better system performance, we have also captured multimedia databases (the animal's video and images (complete profile image, backside image, frontal image left side body images) using surveillance systems. The multimodal database provides better analysis based on collected sensory.

The sensory and multimedia databases are stored in the distributed integrated database system to train the learning models (shown in Fig. 3).

If the classification model failed to predict the correct health status (healthy or unhealthy) of animals using supervised machine learning techniques. The veterinary experts can then analyze the health status of animals quickly using their profile images and watching videos. The proposed model's solutions can be used for future applications such as real monitoring of animal health, clinical decision systems, tracking of animal, disease classification of animals.

It can also help establish a new avenue for smart farming using smart sensing models or disruptive technologies in precision agriculture frameworks.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012821472500003X

Handbook of Statistics

N. Mohanty, ... T.M. Rath, in Handbook of Statistics, 2013

4 Features

Good classification models are not sufficient to appropriately classify and retrieve images but instead have to work in conjunction with good features that suitably characterize the images. In the case of shape-related images it is frequently desired that the features be invariant to rotation, translation, and change in scale. Latecki et al. (2000) compared a number of different shape-related features that would possess the above-mentioned characteristics. The broadly defined shape descriptors are classified into two categories:

1.

Contour-based descriptors:The contour of an object is obtained and represented appropriately which serves as a shape descriptor.

2.

Image-based descriptors: The shape descriptor is obtained by summing the pixel values in the image and deriving a number of parameters from it.

The features described in this paper fall into the first category. In the following sections we will describe how the features were extracted including the preprocessing involved. Following the description of the representation used to characterize the contour of the object we will present a set of operations performed to ensure that the shape descriptors are invariant to translation, rotation, and change in scale. The paper describes two different sets of features to characterize the image where both sets of features are derived from the contour of an image.

4.1 Preprocessing

In order to extract features we assume that each object can be characterized by a single closed contour. To ensure this we preprocess each image by filling in all the holes and breaks in the contour of the object.

4.2 Centroid distance function

Our first feature set involved extracting a single feature vector from each object using a centroid distance function (Zhang and Lu, 2003). We calculate the distance between every point on the boundary of the object and the centroid of the object. The feature results from recording one number per pixel in the contour, thus creating a time series of the width of the length of the contour. The centroid distance function is given below:

(20)s(t)=(x(t)-x0)2+(y(t)-y0)21/2,

where x0and y0are the centroids in the x and y directions, respectively. The centroid distance function provides us with a unique shape signature for each object.

4.3 Profile feature set

This feature set representation of the shape of the object is based on a modification of the features described by Rath and Manmatha (2003) for the purpose of handwriting recognition and retrieval. The features are:

1.

Horizontal projection profile: The value of each column is the number of white pixels in the column.

2.

Vertical projection profile: The value of each row is the number of white pixels in the row.

3.

Upper shape profile: The value of each column is the distance between the top of the image and the first white pixel in the column.

4.

Lower shape profile: Same as the upper word profile except that it is the distance from the bottom of the image.

5.

Right shape profile: Distance from the right of the image.

6.

Left shape profile:Distance from the left of the image.

An example of the projection profile for an image labeled “camel” can be seen in Fig. 10.2.

The performance issue model classifies performance issues into five categories

Fig. 1. Sample images and associated labels from the MPEG-7 database.

The performance issue model classifies performance issues into five categories

Fig. 2. Preprocessed image of a camel and its projection profile.

4.4 Fourier transform

Both the centroid distance function and the profile features capture the shape of an object in great detail. However, the length of the feature vector varies depending on the object and our models for classification and image retrieval require a fixed length feature vector. In order to convert each of the aforementioned profiles into a fixed length feature vector we compute its Discrete Fourier Transform.

We perform the Discrete Fourier Transform on the time series, s=s0,s1,…,sn-1where sk=s(t). This results in a frequency space representation S=S0,S1,…,Sn-1for 0⩽k⩽n-1

(21)Sk=∑t=0n-1ste-2πik/n.

The DFT takes into account that different objects are of different sizes since a single period of the DFT basis function is equal to the number of sample points. The lower order coefficients provide an overall shape representation while the higher order coefficients provide the details of the shape.

From the DFT of the centroid distance function we extracted the first 40 coefficients and used them as features. These lower order coefficients are sufficient features since they are a good approximation of the complete frequency space representation and the goal is not to represent the original shape in great detail but rather to get a good approximation of it with a fixed number of feature values.

In the case of the profiles features, each profile is a one-dimensional curve. We computed the DFT of each profile and extracted its first seven coefficients. Therefore, 6∗7=42features are obtained from the profile. We do not consider higher order coefficients since they frequently contain information about noise. Combined with the 40 features from the centroid distance function this gives a total of 82 features which are used to describe the shape of each object.

4.5 Invariance

Shapes generated through scaling, rotation, translation of a shape are similar to the original shape and the shape descriptors chosen should be invariant to these operations (Zhang and Lu, 2003). It is possible to adjust the Fourier descriptors such that they are invariant to changes in scaling, rotation, translation, and a change in starting point of the contour. Let s(t)be the original shape signature and S0,S1,…,Sn-1be its corresponding Fourier representation. We note that we only use the magnitude of the Fourier descriptors. The operations on shapes and the resulting Fourier coefficients can be described as follows:

1.

Change in starting point: The change in starting point can be expressed as

(22)s′(t)=s(t)+s(t+τ),

where s(t)describes the original contour. The new Fourier coefficient may be expressed as

(23)Sk′=einτSk,

where Skis the kth Fourier coefficient of the original shape. The magntiude of the Fourier coefficient is, therefore, invariant under a change in starting point.2.

Rotation: Given that the center of object is at the origin, rotation of a curve u(o)(t)about the origin by an angle θresults in a new curve

(24)s′(t)=s(t)eiθ.

The Fourier coefficients of this new contour may be expressed as

(25)Sk′=Skeiθ.

Again, the magnitude of the Fourier coefficient is invariant under rotation.3.

Scaling: The scaling of a curve can be expressed as

(26)s′(t)=h·s(t),

where his the scaling constant. This results in the new Fourier coefficient

(27)Sk′=h·Sk.

If we rescale the Fourier coefficients so that one of the coefficients is always one, then the Fourier coefficients are also invariant under scaling.4.

Translation: The translation of a shape may be expressed as

(28)s′(t)=s(t)+c,

where cis a constant by which the shape is translated. This results in new Fourier coefficients that can be expressed as follows:

(29)Sk′=Sk+cifn=0,Skotherwise.

Hence, all the Fourier coefficients are invariant to translation except the first coefficient. The first coefficient is not a good descriptor of the shape since it only reflects the position or average scale of the shape (Zhang and Lu, 2003), and so it can be ignored while examining the shape of a contour.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780444538598000102

Data Fusion Methodology and Applications

D. Ballabio, ... V. Consonni, in Data Handling in Science and Technology, 2019

3.3 Validation Protocol

Performances of classification models calibrated on the single analytical sources and on high-level data fusion approaches were estimated through a strict validation protocol based on random resampling (Fig. 5.3). This protocol was adopted to get realistic estimations of predictive performances despite the low number of available samples. The following steps were thus iterated 1000 times:

The performance issue model classifies performance issues into five categories

Figure 5.3. Scheme of the validation procedure.

1.

Divide I samples into training and test sets, containing 75% and 25% of the total number of considered samples (Itrain and Itest), respectively. The selection was performed maintaining the class proportions, that is, the number of test samples of each class was proportional to the number of training samples of that class.

2.

Use the Itrain training samples to build the classification models for each of the B available analytical sources (blocks). Use cross-validation on training samples (fivefold venetian blinds) to select the optimal parameters of models (i.e., the number of latent variables when dealing with PLSDA, the number of neighbors for kNN, and the number of principal components for PCA-LDA).

3.

Predict the class of the Itest test samples by means of the B models previously calibrated on the training set.

At the end of the 1000 iterations, experimental class labels and cross-validated classification of training samples were concatenated (thus obtaining for each classification model vectors of size 1000 times Itrain) and used to calculate likelihood and conditional probabilities. This procedure was applied to get a better estimation of these probabilities, owing to the lack of samples. Finally, experimental class labels and predictions of test samples were again concatenated (thus obtaining for each classification model vectors of size 1000 times Itest) to calculate the predictive performances for each classification model. Predictions of test samples were thus organized into a matrix P with 1000 × Itest rows and B columns, where each entry pib represents the prediction achieved on a specific test sample along the 1000 iterations by means of the classification model of the b-th analytical block.

The P matrix, collecting predictions achieved on the test samples along the validation procedure, was then used to integrate the outcomes and calculate the final fused predictions by means of the high-level data fusion approaches. Thus, for each high-level method, each row of the P matrix was independently integrated and 1000 × Itest fused predictions were obtained. Finally, classification performances for each fusion method were determined by comparing the 1000 × Itest fused predictions with the experimental class of the corresponding test samples.

The model classification ability was evaluated on the basis of non error rate (NER), which is the arithmetic mean of the class sensitivity values. The sensitivity of a class represents the ability of a classifier to correctly identify the samples of the class and is calculated as the ratio of correctly assigned samples of the class over the total number of samples of the class [31,32].

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780444639844000053

Information Security Essentials for IT Managers

Albert Caballero, in Managing Information Security (Second Edition), 2014

Data Classification

Various data classification models are available for different environments. Some security models focus on the confidentiality of the data (such as Bell-La Padula) and use different classifications. For example, the U.S. military uses a model that goes from most confidential (Top Secret) to least confidential (Unclassified) to classify the data on any given system. On the other hand, most corporate entities prefer a model whereby they classify data by business unit (HR, Marketing, R & D …) or use terms such as Company Confidential to define items that should not be shared with the public. Other security models focus on the integrity of the data (for example, Bipa); yet others are expressed by mapping security policies to data classification (for example, Clark-Wilson). In every case there are areas that require special attention and clarification.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124166882000015

A Neuro-Fuzzy Inference Model for Diabetic Retinopathy Classification

Mohammed Imran, Sarah A. Alsuhaibani, in Intelligent Data Analysis for Biomedical Applications, 2019

Several DR classification models were developed to detect and classify DR using different techniques. In Ref. [8], Parmar et al. proposed a model to detect DR from retinal images and classifying the images into five classes. The proposed model was based on deep convolutional neural networks. The model yielded an accuracy of 85%. In Ref. [9], Kumar et al. presented a model for retinal classification using two-field mydriatic fundus photography. Firstly, the authors located the optic disc by using multilevel wavelet decomposition and recursive region growing. Then, the blood vessels were extracted by using histogram analysis on the two median filtered images. Microaneurysms and hemorrhages were detected using three-stage intensity transformation, while exudates were detected by using multilevel histogram analysis. Finally, the extracted lesions were aggregated to classify the image as being infected by DR or not. The model yielded a sensitivity of 80% and a specificity of 50%. In Ref. [10], Xu et al. used a deep convolutional neural network method to classify diabetic retinopathy using a color fundus image. Their method yielded an accuracy rate of 94.5%. In Ref. [11], Islam et al. proposed a DR classification model based on a bag-of-words approach (BOW) with support vector machines (SVM) to classify normal and abnormal retinal. This model yielded an accuracy of 94.4%, precision of 94%, and recall of 94%. In Ref. [12], Ravishankar et al. proposed a new model to find the approximate location of the optic disk by using the intersection found from the major blood vessels detection model. They also used different morphological operations to detect DR features like exudates and microaneurysms. They evaluated the algorithm on a database of 516 images with different contrast, illumination, and disease stages. The algorithm achieved 97.1% accuracy for optic disk localization, a sensitivity and specificity of 95.7% and 94.2%, respectively, for exudate detection and 95.1% and 90.5% for microaneurysm and hemorrhage detection. In Ref. [13], Zhang et al. proposed a bright lesions detection and classification model using local contrast enhancement with fuzzy C-means and support vector machine (SVM) classifier. The proposed model achieved a sensitivity of 97% and specificity of 96% in classifying the bright lesions and bright nonlesion, sensitivity of 88% and specificity of 84% in classifying the exudates and cottonwool spots. In Ref. [14], Wang et al. proposed a classification model to classify the retinal images into two classes, normal and abnormal retinal, by classifying each pixel into two classes which are lesion or nonlesion by using the Bayesian statistical classifier. The authors achieved an accuracy of 100% in classifying all the retinal images with exudates, and 70% accuracy in classifying normal retinal images as normal. In Ref. [15], Thin et al. used the recursive region growing segmentation algorithms with the combination of the Moat Operator to classify DR. Moat operator rises the contrast between the retinal red lesions and background to makes the red lesions segmentation easier [16]. The proposed classification model yielded a sensitivity of 88.5% and a specificity of 99.7% for exudate detection, and sensitivity of 77.5% and specificity of 88.7% for hemorrhages and microaneurysms detection. In Ref. [17], Thin et al. classified the retinal changes in digital retinal images as normal, abnormal, and unknown, by first applying a local contrast enhancement technique in the preprocessing stage. Then, the retinal landmarks were eliminated from the fundus image. Finally, DR signs were recognized by using two algorithms which are recursive region growing segmentation algorithms, and a color and template matching algorithm to detect exudate and hemorrhages, respectively. In the end, all the information was collected and construed as normal, abnormal, or unknown. The proposed system yielded a sensitivity of 74.8% and specificity of 82.7%. In Ref. [18], Kahai et al. proposed the use of a decision support system (DSS) for DR classification. The authors used Bayes optimality criteria to detect microaneurysms. The proposed model yielded a sensitivity of 100% and specificity of 67%. In Ref. [19], Antal et al. introduced an ensemble-based method to detect DR. The model was built on features extracted from the output of numerous retinal image processing algorithms, like image-level including the quality and prescreening, disease signs such as microaneurysms and exudates, and retinal landmarks such as macula and the optic disc. The result concerning the existence of the disease is computed by using the ensemble approach of machine learning classifiers. The authors used the public Messidor dataset to test the model, where 90% sensitivity, 91% specificity, and 90% accuracy and 0.989 AUC was reached. In Ref. [20], Acharya et al. proposed a method to automatically identify normal, mild, moderate, severe, and prolific DR by using higher-order spectra (HOS). Firstly, the features of the row image were extracted using HOS. Then, the support vector machine (SVM) classifier was used to classify the DR level. The proposed model yielded sensitivity of 82% and specificity of 88%. In Ref. [21], Kavitha et al. proposed a model for exudates detection by using ANFIS. Features like texture and homogeneity properties and area of the exudates, were used as input to ANFIS. The classifier was tested on 200 fundus images and achieved accuracy of 99.5%.

Some researchers proposed different models to classify the severity of DR using different classification methods, but most of these methods depend on the extracted features from the retinal fundus images without taking into consideration the patient’s history or the disease risk factors, such as in Ref. [22]. Choomchuay et al. proposed a model to classify the severity of DR. Firstly, the authors extracted the lesions on the retina, then the different features such as area, perimeter, and count were used with an artificial neural network (ANN) to classify the disease severity. This model achieved a classification accuracy of 96%. Also, in Ref. [23], Anidha et al. proposed a model for exudates detection and DR severity classification using ANFIS. The authors performed different prepressing techniques on the retinal images, such as green channel extraction, median filter, and histogram equalization. Then morphological operation was applied on the image. Some of the retinal features were eliminated by using the connected component technique and statistical features like exudates area, size, and color. Finally, the images were classified to the normal eye and affected eye. The model yielded accuracy of 100% in classifying retinal with pre-proliferative stage, and 97% in classifying the retinal proliferative stage.

This chapter is aimed at introducing an automated method that classifies the DR severity level into four classes using ANFIS by taking in consideration both DR signs and risk factors. To the best of our knowledge, we have not found any such work done before.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128155530000070

Biosignal time-series analysis

Serkan Kiranyaz, ... Moncef Gabbouj, in Deep Learning for Robot Perception and Cognition, 2022

19.3.2.4 Development and validation of classification model

Several different classification models were investigated in this study: multilayer perceptron (MLP), random forest, support vector machine (SVM), and logistic regression. Top-ranked 10 features using two imputation techniques were investigated to identify the best performing feature combinations. Therefore each feature was individually and in combination (top 2 and top 3, top 4, etc.) were investigated using different machine learning models. Since this work is not only focused to classify the death and survival group using an important feature matrix but also intended to develop a prediction scoring system using the nomogram technique, it was important to identify a common classification and nomogram development technique. Logistic regression is a supervised machine learning (ML) method dedicated to classification tasks [60], which is a very useful tool for binary classification [61]. Moreover, the logistic function uses sigmoid function, which converts linear inputs into a probability of 0 to 1, makes it suitable for clinical application. Another important reason for choosing a supervised logistic regression classifier [62] as it can also be used for nomogram development. This will ensure the coherent performance of classification and regression models.

The different combinations of features as discussed earlier were validated using 5-fold cross-validation and receiver operating characteristics (ROC) curves were used to calculate the area under the curve (AUC) for the predictor variables separately and also in combination. The AUC values for different combinations of top-ranked features were investigated for the binary classification. Several performance metrics including AUC values, specificity, sensitivity, negative likelihood ratio (NLR), and positive likelihood ratio (PLR) were calculated to evaluate the models. For the five unseen test folds, the overall confusion matrix was calculated and the per-class metrics were computed.

(19.23)Sensitivity=TPTP+FN,

(19.24)Specificity=TNTN+FP,

(19.25)PLR=Sensitivity1−Specificity,

(19.26)LR=1−SensitivitySpecificity.

In Eqs. (19.23)(19.26), true negative (TN), false negative (FN), true positive (TP), and false-positive (FP) were used to refer to the survivors identified as survivors, dead patients identified as dead, deceased patients incorrectly recognized as survivors, and the survivors erroneously recognized as deceased patients, respectively.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780323857871000245

Key technologies and software platforms for radiomics

Jie Tian, ... Jingwei Wei, in Radiomics and Its Clinical Application, 2021

2.5.2 Linear classification model

The linear classification model is basically the same as the linear regression, but the target value is generally 1/0 dichotomous or discrete (Fig. 2.14), rather than outputting continuous values like regression. A linear classifier is an algorithm that separates two types of objects by a line or a hyperplane. Commonly used are logistic regression, SVM, and a series of variants of them. Their formula base structure is similar to linear regression, and different classification effects are achieved by changing the objective function and the objective function form.

The performance issue model classifies performance issues into five categories

Figure 2.14. Schematic diagram of linear classification.

1.

Logistic regression

Logistic regression is a generalized linear model. It is more suitable for classification tasks than direct linear regression model logistic regression. It introduces the sigmoid function as an activation function to control the original output based on the linear regression model.

(2.35)f(x)=11+e−x

The above formula is a sigmoid function, the domain is in (−∞,+∞), and the value range is between (0, 1). In the neural network, it is also often used as an activation function to constrain the output, so that the original continuous and infinite range of the output is limited to a fixed small range (Fig. 2.15).

The performance issue model classifies performance issues into five categories

Figure 2.15. Schematic diagram of the sigmoid function.

Consider a vector xi(i=1,2,…,n)with n independent variables, and let the conditional rate P(y=1|x)=pbe the probability that the observation will occur relative to an event x. Then, the logistic regression model can be expressed as follows:

(2.36)P(y=1|x)=11+e−(a0+∑aixi)

where (a0+∑aixi)is the same as the formula for linear regression, and the probability that the relative observation does not occur is as follows:

(2.37)P(y=0|x)=1−11+e−(a0+∑aixi)

Compare the odds of experiencing an event, abbreviated as odds:

(2.38)P(y=1|x)P(y=0|x)=e(a0+∑aixi)

A logarithmic version of the linear regression model is also obtained. From this equation, the maximum likelihood function of the logistic regression formula can be obtained. Let the observation value be pi=P(yi=1|xi), then the maximum likelihood function is as follows:

(2.39)J=∏i=1npiyi(1−pi)1−yi

2.

SVM

SVM is a popular supervised learning-based method. It is a classification technology proposed by Vanpik’s AT&T Bell laboratory research team in 1963. It is a pattern recognition method based on statistical learning theory and shows many unique advantages in solving small sample, nonlinear and high-dimensional pattern recognition problems. Based on the structural risk minimization theory, the optimal hyperplane is constructed in the feature space, so that the learner is globally optimized, and the expectation of the entire sample space satisfies a certain upper bound with a certain probability.

The difference between SVM and ordinary linear regression lies in the objective function. By using the hinge loss function, it uses only the sample (support vector) closest to the separation interface to evaluate the interface. The idea of SVM is to find a subinterface that perfectly separates the two categories and is equidistant from the two categories.

This formula is also based on a linear regression model, and in order to facilitate the calculation, the linear regression is rewritten as

(2.40)g(x)=wx+b

wdenotes a coefficient matrix, x denotes a set of feature vectors, and b denotes a bias. Through the formula, the vertical direction w||w||of the classifier can be obtained, and the closest distance from the classification surface is as follows:

(2.41)d=mini∈n||wxi+b||||w||

Here we set ||wxi+b||=1, so there is a distance between the sides of the m=2d=2||w||, and get the following formula:

(2.42){wx+b>1,whenyi=+1wx+b<−1,whenyi=−1

In this way, we can calculate the target we need, minimize 12||w||2, and guarantee yi(wx+b)>1, thus establishing the objective function:

(2.43)J=12||w||2−∑i=1nai(yi(wx+b)−1),ai>0

Deriving the objective function to obtain two conditions for minimizing the function:

(2.44){∂J∂W=∑i=1naiyix=0∂J∂b=∑i=1naiyi=0

Substituting the objective function to get the simplified objective function:

(2.45)J=∑i=1nai−12∑i=0n∑j=0naiajyiyjxiTxj,ai>0,∑i=1naiyi=0

The classifier iterated according to this objective function is the SVM. As shown in the Fig. 2.16, the triangles and circles represent the support vectors of the two types of data. It can be seen from both the formula and the graph that the iterative process of the SVM only needs the data features of the support vector, and other data points basically do not directly contribute to the classifier, so the SVM can also effectively perform the classification task for small data volumes, and at the same time, it is also more susceptible to singular support vector effects resulting in overall offset.

The performance issue model classifies performance issues into five categories

Figure 2.16. Schematic diagram of SVM classification.

However, in practice, it is difficult to strictly guarantee the distance between the two sides, so there is an SVM using slack variables, and the formula changes as follows:

(2.46){minimize12||w||2,yi(wx+b)>1↓minimize12||w||2+C∑ζ,yi(wx+b)>1−ζ,ζ>0

In this way, the slackness of the SVM, that is, the degree of strictness to the two boundaries, can be controlled by any small slack variable ζand parameter C. The larger the C, the higher the fit to the data, and the easier it is to overfit. Conversely, the smaller the C, the lower the degree of fit and the more robust it is.

Read moreNavigate Down

View chapterPurchase book

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128181010000033

Anomaly detection, classification and CEP with ML methods

Patrick Schneider, Fatos Xhafa, in Anomaly Detection and Complex Event Processing over IoT Data Streams, 2022

End-to-end one-class classification

This method aims to train a one-class classifier that learns to distinguish between normal and abnormal consistently. This approach does not rely on existing one-class classifiers such as one-class SVM or SVDD. This approach comes from the combination of GANs and the concept of one-class classification (e.g., adversarial learned one-class classification). The concept is to train a one-class discriminator of the normal instances to distinguish these instances from the adversarially generated pseudo-anomalies. This method differs from the GAN-based approach in that the GAN-based methods aim to learn a generative distribution to maximally approximate the data distribution to achieve a generative model that captures well the behavior of the normal training instances. Moreover, the GAN-based methods determine the anomaly scores based on residuals between the real instances and the corresponding generated instances, while the methods directly use the discriminator to classify anomalies (e.g., discriminator D acts as τ).

This type of method assumes that data instances that are approximated as anomalies can be effectively synthesized, as well as that a discriminative one-class model can summarize all normal instances. The concept of adversarially learned one-class classification (ALOCC) was first explored in [129]. The idea is to train two deep neural networks, where one network is trained as a one-class model to classify normal instances of anomalies, and the other network improves the normal instances and generates biased outliers. The two networks are instantiated and optimized by the GAN approach. The one-class model is based on the discriminator network, and the generator network is based on a denoising-AE [152]. The goal of the AE-empowered GAN is defined as:

(9.38)minAE⁡maxD⁡V(D,G)=Ex∼px[log⁡D(x)]+Exˆ∼pxˆ[log⁡(1−D(AE(xˆ)))]

where pxˆstands for a data distribution of X manipulated by a Gaussian noise, (e.g. xˆ=x+nwith n∼N(0,σ2I). The objective is optimized along with the following data construction error in AE.

(9.39)lae=||x−AE(xˆ)||2

The intuition of Eq. 9.39 is that AE can reconstruct and improve normal instances, yet it can be confused by input outliers to further produces biased outliers. The discriminator D learns through minimax optimization to distinguish normal instances from outliers better than using the original instances. Therefore, D(AE(hasx))can be used directly to detect anomalies. The outliers in the work of [129], are randomly drawn from classes other than the normal instance classes. Nevertheless, obtaining reference outliers outside the given training data as in [129] may not be available in many domains.

In [179], one-class adversarial networks (OCAN) are introduced to exploit the concept of bad GANs [32] to generate edge instances based on the distribution of normal training data. Instead of traditional generators in GANs, the generator network in bad GANs is trained to generate instances that are complementary and do not match the training data. The formulation of the complement generator is:

(9.40)minG⁡−H(pz)+Ezˆ∼pzlog⁡px(zˆ)I[pxzˆ>ϵ]+||Ezˆ∼pzh(zˆ)−Ezˆ∼pxh(z)||2

where H(⋅)is entropy; I[⋅]is an indicator function; ϵ a threshold hyperparameter; and h a feature map from an intermediate layer of the discriminator. The first two terms are used to generate low-density samples in the original feature space. However, it is computationally infeasible to obtain the probability distribution of the training data. Instead, the density estimate px(zˆ)is approximated via the discriminator of a regular GAN. The last term is the feature matching loss, which helps to generate data instances within the original space. An additional, conditional entropy term extends the discriminator target in OCAN to enable a high confidence detection:

(9.41)maxD⁡Ex∼px[logD(z)]+Ezˆ∼pz[log(1−D(zˆ))]+Ex∼px[D(x)logD(x)]

In [109], Fence GAN generates data instances that lie closely along the boundary of the training data distribution. This is done by adding two loss functions to the generator that force the generated instances to be uniformly distributed along the spherical boundary of the training data. Formally, the goal of the generator is defined as follows:

(9.42)minG⁡Ez∼pz[log[|α−D(G(z))|]]+β1Ez∼pz||G(z)−μ||2

α∈(0,1)is a hyperparameter used as a discrimination reference value for the generator to create the edge instances, and μ is the midpoint of the generated data instances. The first term is called enclosure loss, which enforces that the generated instances have the same discrimination value, resulting in a tight enclosure of the training data. The second term is called dispersion loss, which enforces that the generated instances cover the entire perimeter uniformly. Other methods have been introduced to generate the reference instances effectively. For example, uniformly distributed instances are generated to enforce that the normal instances are uniformly distributed over the latent space [120]. An ensemble of generators is used in [89], where each generator synthesizes edge instances for a given cluster of normal instances.

Advantages

Its anomaly classification model is adversarial optimized in an end-to-end approach.

It can be developed and supported by the rich techniques and theories of adversarial learning and one-class classification.

Disadvantages

It is difficult to guarantee that the generated reference instances resemble the unknown anomalies well.

The instability of GANs may lead to generated instances with different quality and consequently unstable classification performance of the anomalies. This problem was investigated in [172], where it was shown that the performance of this type of anomaly detector could vary in different training steps.

The application is limited to semisupervised anomaly detection scenarios.

The adversarially learned one-class classifiers learn to generate realistic edge/boundary instances, which enables the learning of meaningful low-dimensional normality representations.

What are the steps in the performance issue model?

The first step of the performance management process is Planning..
1.1 The defining stage. ... .
1.2 The feedback stage. ... .
1.3 The approval stage. ... .
2.1 Organize meetings on a timely, regular basis. ... .
2.2 Provide necessary training, coaching and solutions. ... .
2.3 Solicit feedback on both sides. ... .
2.4 Revisit objectives as necessary..

What are the types of performance issue?

Types of Performance Problems Lateness, absenteeism, leaving without permission. Excessive visiting, phone use, break time, use of the Internet. Misuse of sick leave.

What are the five categories of personnel performance problems?

Here are five performance problems present in the modern workforce — and five simple ways you can help employees overcome them..
Shallow Work. ... .
Inability to Prioritize. ... .
False Sense of Urgency. ... .
Productive Procrastination. ... .
Low-Quality Output..

What is a performance issue?

Generally, a performance problem is the result of some workload not getting the resources it needs to complete in time. Or the resource is obtained but is not fast enough to provide the desired response time. The most frequent cause of performance problems is having several address spaces compete for the same resource.