This is an old revision of the document!
As has been discussed previously in the data preparation chapter, time series usually represent the dynamics of some process. Therefore, the order of the data entries has to be preserved. As emphasised, a time series is simply a set of data - usually events, arranged by a time marker. Typically, time series are placed in the order in which events occur/are recorded.
In the context of IoT systems, there might be several reasons why time series analysis is needed. The most widely ones are the following:
Due to its diversity, a wide range of algorithms might be used in anomaly detection, including those that have been covered in previous chapters. For instance, clustering for typical response clusters, regression for normal future states estimation and measuring the distance between forecast and actual measurements, and classification to classify normal or abnormal states. An excellent example of using classification trees based methods for anomaly detection is Isolation forests 3)
While in the time series analysis, most of the methods covered here might be employed, anomaly detection and classification cases are outlined through an example of an industrial cooling system in this chapter.
A given industrial cooling system has to maintain a specific temperature mode of around -18oC. Due to the technology specifics, it goes through a defrost cycle every few hours to avoid ice deposits, leading to inefficiency and potential malfunction. However, at some point, a relatively short power supply interruption has been noticed, which needs to be recognised in the future for reporting appropriately. The logged data series is depicted in the following figure:
It is easy to notice that there are two normal behaviour patterns: defrost (small spikes), temperature maintenance (data between spikes) and one anomaly – the high spike.
One possible alternative for building a classification model is to use K-nearest neighbours (KNN). Whenever a new data fragment is collected, it is compared to the closest ones and simply applies a majority principle to determine its class. In this example, three behaviour patterns are recognised; therefore, a sample collection must be composed for each pattern. It might be done by hand since, in this case, the time series is relatively short.
Examples of the collected patterns (defrost on the left and temperature maintenance on the right):