In the previous part, we analyzed individual processes in order to detect events that would indicate that the tool used had not performed as it should. To do so, we first determined the maximum value for each process, calculated a range of features (statistical measures) and assigned a score to each process, the scores being calculated using the first principal component of a PCA.

Now, this method is an excellent candidate for the detection of deviant processes, but it says nothing about the nature of this deviance, simply that “something went wrong”. As interesting as this may be, it should be completed with more information. Indeed, the knowledge that a given process has failed is of greater worth if it is endowed with a mention of the reason or reasons it did so. In each individual process, the tool, the material, the operator are things, amongst others, that can influence its outcome. Is the tool programmed as it should? Are any parts of the tool defective (e.g. eroded or loose)? Is the treated material different from what was previously operated upon (e.g. different batch having unusual material properties)? Is there any substance on the material that shouldn’t be there (e.g. water or oil)? Is the operator handling the tool as he should or what he should? As you can see, the list of plausible explanations for failed process can be made very long, and it would be of considerable worth to operating crews, as well as maintenance and management, to understand why each individual process has defaulted. Given the proper knowledge of specific industrial processes one can envisage that the nature of the failure of a process can be seen from the trace (the measurements of each individual process). Even without any knowledge, and assuming different causes to failure leave distinct “fingerprints” in the form of traces that differ from some expected curve, one can use technique to classify the ensemble of traces. The next, and essential, step is to study the processes belonging to these classes to determine what the causes to failure might be.

In this part, we present several techniques that enable the classification of process traces by distinct characteristics. If a process has been design and programmed to perform in a particular manner, it is expected that all traces have approximately the same shape at every neighborhood of a measure point. A failed process can imply a shift or distortion from what is sought to be a normal behavior. In other cases, the shape and position of a trace can be a is expected while it contains a lot of noise which would indicate that the process is not seen through as was thought. These are two distinct types of events that might demand different approaches in the classification process. In the first case, a simple k-means or h-clust approach might be sufficient to classify the trace, while in the second, a decomposition in harmonies followed by some clustering method might be more appropriate. Also, given the difficulty in generating data that would give any remarkable clustering results, we choose to be more descriptive than technical in this part.

## k-means

The goal with k-means is to group an ensemble of observations into *k* clusters. If the observation consists of *q* variables, the method will partition the *q*-dimensional space into *k* disjoint regions. The borders of each cluster can be constructed from *k *different points, . A point in the -dimensional space belongs to the cluster corresponding to the closest cluster-point $latex m*_*i$. k-means is a numeric iterative algorithm, which begins by either carefully choosing where cluster centers should be initiated or by randomly placing *k* points in the *q*-dimensional space. In our setting, we do not have any preconceived bias to where the cluster centers should be located and let the initial positions of the cluster centers be randomly assigned. The algorithm is known to a be robust, cheap and fast converging clustering algorithm. Choosing random start points should, therefore, not be an issue. For a description of the algorithm, we invite the reader to visit this site (see https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html)

Of course, k-means is not a perfect clustering algorithm. It has a few benefits, though, in that it is computationally cheap and due to its robustness in finding satisfactory clusters. However, it is worth mentioning some of the limitations that k-means as an algorithm imposes. Finding “optimal” clusters with k-means is often not feasible since k-means only produces linear discrimination lines, not every problem is solvable given linear discrimination. Another limitation of k-means is that is tends to produce cluster of comparable sizes in terms of the number of samples within each cluster. This means that k-means is most suitable when the size of the clusters are well-balanced within the data. This is far from certain if we are looking for rare outliers. Further time would be needed to test and validate the choice of using k-means for this specific clustering problem.

## Hierarchical clustering – hclust

Suppose that our process contains process traces. The goal of our clustering endeavor is to group individual object sharing similarities. One method, known as hierarchical clustering, can easily be used to perform this task. There are two types of hierarchical clustering methods, one where each element in the ensemble of traces starts by belonging to its own cluster and in which clusters are merged if they share similarities. The second one does basically the opposite, namely, all elements initially belong to one single cluster which is split by grouping all elements that share some similarities. In both cases, the measure of similarity or dissimilarity, of course, depends on the metric used. We have chosen to use the default settings of the hclust() function in R, which implies that the Euclidean distance between groups is used as well asthe method of farthest “neighbor clustering”. The latter implies that the clusters are sequentially combined into larger clusters until all elements end up being in the same cluster, unless the clustering procedure is ended to achieve a required number of clusters. At each step, the two clusters separated by the shortest distance are combined. In complete-linkage clustering, the link between two clusters contains all element pairs, and the distance between clusters equals the distance between those two elements (one in each cluster) that are farthest away from each other. The shortest of these links that remains at any step causes the fusion of the two clusters whose elements are involved. For a thorough description of the algorithm, see https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/hierarchical.html

A major drawback of hierarchical clustering is that they are computationally onereous for large datasets and bootstrapping results may be prohibitively expensive. Being a heuristic approach, miss-groupings that occur at early stages of the clustering process are not corrected at later stages. Many implementations support bootstrapping or other resampling techniques to assess the stability of a clustering solution and suggest a consensus grouping. One should also know outliers can strongly affect the resulting clustering. It is therefore essential to conduct a thorough outlier analysis, something that also is time demanding.

Finally, another approach which has gained popularity the past few years and which is used in a wide range of settings is self-organizing maps or SOM. They are a type of artificial neural network trained using unsupervised learning to produce a low-dimensional, discretized representation of some input space of the training samples, called a map. Self-organizing maps apply competitive learning as opposed to error-correction learning, and in the sense that they use a neighborhood function to preserve the topological properties of the input space, which in our case are the measurements done during a process. A SOM consists of components called nodes or neurons and associated to each node are weight vectors of the same dimension as the input data vectors as well as a position in the map space. The most common arrangement of nodes is a two-dimensional regular spacing in a hexagonal or rectangular grid. The self-organizing map describes a mapping from a higher-dimensional input space to a lower-dimensional map space. The procedure for placing a vector from data space onto the map is to find the node with the closest (smallest distance metric) weight vector to the data space vector. We do not go further into the description of the SOM-algorithm and instead refer to a previous blog, Artificial Neural Network and Patient Segmentation, see

https://kentoranalytics.com/blog/2017/3/21/prgq2srnxrjimjd5jc36dq377p7w90

As we mentioned in the introduction, all the above clustering methods can be used on raw data from the measuring process, given that the traces analyzed are comparable, but can also be used in combination with a frequency domain analysis, or a spectral analysis. In cases where we want to identify periodic recurring patterns, such as discord sounds or vibrations, it can be useful to analyze the frequency domain rather than the time domain. Fourier transform is one commonly used method to decompose time series signals into a superposition of sinusoidal base functions. In the case of measure industrial processes, we are in the presence of discrete time signals, which suggest the use of Fast Fourier Transform (*FFT*). The *FFT*is a computationally efficient implementation of the *DFT *(Discrete Fourier Transform). Given N as the number of samples in a signal and k and integer describing the different harmonies, the formula of the transformed signal is expressed by

In our case we are interested in finding the contributions from the different harmonic frequencies in terms of their strength, or in other words, calculating features that describes the characteristics of the signal.

The strength of the contribution of different harmonies can be calculated as the complex norm of the complexed valued coefficients . As an example, the complex norm of is the strength of the DC component of the signal, the complex norm of is the strength of the first harmony and so forth. The motivation for using FFT here to classify processes is that noise in a trace can be the result of vibrations which can be caused by a wide range of external factors. Different traces carrying noisy signals can then be identified by the which harmonies are contributing the most. This then done by applying clustering techniques to the Fourier-transformed signal we expect to find similar signals from a frequency point of view.

Check this Youtube video to get a fairly good introduction to the decomposition of signal in different harmonies.

https://www.youtube.com/watch?v=r18Gi8lSkfM

Together with Part 1 of this blog, we think we have given a fairly good introduction to possible techniques to be used in industrial process control. These rather straightforward methods can easily be implemented and have the potential to enable industries to better their production lines and thereby the quality of their products. The economic implications of these need not be stated.

very interesting subject , outstanding post.

LikeLike