Big data analysis and prediction is classified into two research categories.
Intelligent Manufacturing Big Data Analysis and Prediction
In production 4.0, it is very important that in a production system, such as machine, warehouse, and machines in a production line can communicate with each other. IOT stores a lot of data. The idea of industrial applications is to let the machines do automatic operations, repair, self-diagnosis, bad pieces analysis, and fault tolerance process by reading and analyzing these big data. Factories face quality control problem and good production rate of their products. Every job-shop manufacturer wants to improve their profit, one way is to reduce their defective products. It can be done by processing the raw materials with machinery which is in very good condition. Then, keeping all working machines in very good condition means immediately repair the machines whose conditions are getting worse. To know which machine is getting worse, it could be seen from each machine yield rate data, in real-time. But sometimes the manufacturer did not have the real-time data, so alternatively could use their periodic data. Other previous research suggests using maintenance history, and/or machine deterioration rate. But sometimes those are not reliable and difficult to finds. On the other side, the machine yield rate data can’t be observed completely, because the manufacturer will not install inspection tools in all the machines. It’s costly and increases time-consuming. The manufacture will certainly install the inspection tool on only several machines/stations. The worst case is on the last step machines, for quality control purposes. Therefore, this research seeks to fill this gap by proposing a new approach to calculate unknown all machines yield rate based on inspection data. Therefore, bad pieces analysis program uses some of academic disciplines, such as non-linear mathematical model and data mining approach to solve the industry related problems. It also uses EM algorithm to complete machine learning task. The likelihood function that serves as the part of the EM algorithm was used to trace back the root cause by using the final data. Using Artificial Intelligence (AI) method, computer will infer under serious data inefficiency. In short, all of these efforts are done to help Taiwanese IT industries in maximizing their production outcomes.
This lab consists of some professors and scholars who study the production 4.0 project. This lab also uses industrial key techniques and technologies, such as production due date model, special production routers, platform of Big Data Analysis, Regression Model(
Download), Automated Optical Inspection(
Download)(
Download), and Online Analytical Processing (OLAP) to be a consultant of northern Taiwan IT industries, helping them to solve academic and practical problems. In the recent two years, this lab owns some of the Taiwanese National Science and Technology Council (MOST) grand projects and three industrial-academic collaboration grand projects. In the current time, the lab operates one MOST and one industrial-academic project.
Financial Prediction
Effective prediction of financially distressed firms is critical for financial institutions to make appropriate lending decisions. In general, the input variables (or features): financial ratios, the choice of feature selection methods, and the use of appropriate statistical and machine learning techniques, are the three important factors that affect the prediction performance. In the past decade, our research team has gained tremendous insights on the financial distress prediction problem (or FDP) in all three directions. In the recent studies, corporate governance indicators (CGIs) have been found to be another important type of input variables in addition to financial ratios (FRs). However, the performance obtained by combining CGIs and FRs has not been fully examined since only some selected CGIs and FRs have been used in related studies and the chosen features may differ from study to study. Therefore, the research in this direction is to assess the prediction performance obtained by combining seven and five different categories of FRs and CGIs respectively. The experimental results, based on a real-world dataset from Taiwan, show that the categories of solvency and profitability in FRs and the categories of board structures and ownership structures in CGIs are the most important features in bankruptcy prediction. In particular, the best prediction model performance is obtained with combination of prediction accuracy, Type I/II errors, ROC curve, and the misclassification cost. More details can be found in our recent publications [LLT16][LLL15].
Since disputes remain regarding financial ratios as input features for model development, many studies consider feature selection as a pre-processing step in data mining before constructing the models. Apart from most studies, which have focused on applying one specific feature selection methods to FDP, we have conducted a comprehensive study to examine the effects of performing filter and wrapper based feature selection methods. In addition, the effect of feature selection on the prediction models obtained using various classification techniques is also investigated. In the experiments, two financial distress datasets are used. Moreover, three filter and two wrapper based feature selection methods combined with six different prediction models are studied. Our experimental results indicate that filter based feature selection methods perform better than models that are wrapper based. Moreover, depending on the chosen techniques, performing feature selection does not always improve the prediction performance. Interested readers can refer to [LTW15][LLY14].
Last but not least, the use of machine learning techniques to construct a prediction model is also a key factor to its performance. We introduce a classifier ensemble approach to reduce the misclassification cost. The outputs produced by multiple classifiers are combined by utilizing the unanimous voting (UV) method to find the final prediction result. Experimental results obtained based on four relevant datasets show that our UV ensemble approach outperforms many baseline single classifiers and classifier ensembles. More specifically, the UV ensemble not only provides relatively good prediction accuracy and Type I/II errors, but also produces the smallest misclassification cost.