Center Achievement | Xu Xin, Xiong Feng, etc.:Using Machine Learning to Predict Corporate Fraud: Evidence Based on the GONE Framework

pubdate:2023/03/21

This paper is extracted from the paper titled "Using machine learning to predict corporate fraud: evidence based on the GONE framework" by Associate Professor Xiong Feng from the Accounting Department of Xiamen University Accounting Development Research Center/School of Management, Dr. Xu Xin from the Accounting Department of Xiamen University School of Management and Associate Professor An Zhe from Monash University in Australia, and published online in the Journal of Business Ethics.


01

introduction

Because of its destructive influence, enterprise fraud has eroded the confidence of investors and disrupted the market order, which has always attracted great attention from practitioners and regulators. With the rapid economic growth, since the 21st century, corporate fraud has increased in China's capital market. From 2008 to 2020, China Securities Regulatory Commission issued a total of 7,285 penalty notices, of which 482 companies were punished for violations. On average, it takes China Securities Regulatory Commission 1.76 years to investigate and expose fraud cases. Therefore, it is a big challenge for China capital market to accurately and effectively predict enterprise fraud. In the information technology era, new information channels and machine learning algorithms will help to improve the accuracy of enterprise fraud prediction models. Therefore, based on the GONE framework (Bologna et al., 1993), this paper uses a more comprehensive set of input variables and a new machine learning algorithm to improve the prediction accuracy of corporate fraud and provide more accurate early warning for regulators and investors. At the same time, the importance of social media as an Exposure factor is confirmed through empirical research, which enriches the relevant literature on social media's supervision of enterprises.


02

Causes and prediction of enterprise fraud

Like all forecasting models, enterprise fraud prediction is based on the confirmation of the causes of enterprise fraud. Cressey(1953) put forward the famous fraud triangle theory (incentive, opportunity and rationalization), while Bologna(1993) further put forward the theoretical framework of GONE (greed, opportunity, demand and exposure). In recent years, with the rise of new information channels such as social media, integrating new media information into the existing enterprise fraud prediction model will help to expand the relevant literature of social media incremental information on the one hand, and improve the accuracy of the existing enterprise fraud prediction model on the other hand.

The GONE framework shows that corporate fraud is caused by four main factors, namely greed, opportunity, demand and exposure. Greed refers to the ethical and moral character of an individual's inner or personality attributes, which plays an important role in people's behavior and cognition. Therefore, the moral awareness and cognition of key people in the organization will have a great impact on the possibility of corporate fraud, such as the age, education, academic background and professional qualifications of senior executives. Opportunity (the opportunity to commit fraud) refers to the corporate governance structure, such as the supervision of independent directors, which will help reduce the possibility of fraud. As far as demand is concerned, an important fraud motive is economic demand, including improving the company's performance, stock price and personal interests, such as improving the company's operation and stock market performance, satisfying the contract in the existing debt contract, raising funds with favorable conditions, and avoiding being suspended in the stock exchange; Or managers cover up deteriorating performance to get paid. Exposure refers to the possibility of fraud being exposed and the degree of punishment after fraud being exposed. The possibility of exposing fraud is related to the degree of external supervision, including auditors, media, etc. For example, larger audit companies can better supervise listed companies, and the change of auditors will weaken the effectiveness of external supervision, thus making fraudulent activities more difficult to detect; In the spotlight of the media, enterprises often improve the efficiency of the board of directors and the probability of governance violations; When enterprises face a higher degree of fraud punishment, it is even more impossible to commit fraud.

The supervisory role of the media lies in influencing the public image of enterprises and attracting the attention of regulatory agencies, thus significantly increasing the cost of enterprise fraud. In recent years, more and more research documents show the value of social media in corporate information disclosure, but few empirical studies prove the information function of social media in corporate fraud.


03

Application, Opportunities and Challenges of Machine Learning in Corporate Governance (Forecasting)

With the development of machine learning algorithms, it has become a major trend to use machine learning models to solve accounting and financial problems, including various algorithm models and data sources, such as support vector machine model (Cecchini et al., 2010), logistic regression model (Perols, 2011), and random forest model based on the characteristics of management discussion texts (Purda and Skillicorn, 2015). In recent years, the research also includes the RUBoost model based on the original accounting figures (Bao et al., 2020) and the research on the major misstatement behavior of enterprises based on the gradient lifting regression tree algorithm model (Bertomeu et al., 2021).

Using machine learning model to predict corporate fraud faces four major challenges. The first and most important thing is to ensure that the input data variables are correct and appropriate, that is, to cover all the motives of enterprise fraud that should be included. The input variables of the above-mentioned documents are mostly original data, lacking a specific theoretical framework to give theoretical support to data selection; On the other hand, there is a lack of comparison of different algorithm models and further classification test of different enterprise fraud types. The remaining challenges include ensuring that the corporate fraud samples in the corporate fraud prediction model meet the matching requirements, giving the correct parameters in the prediction model, and that different prediction models can be applied to different (especially emerging) capital markets like China.


04

Machine learning model and its fraud prediction performance (accuracy) evaluation

In this paper, three ensemble learning models (RF, GBDT and RUSBoost models) and three traditional machine learning models (LR, SVM and ANN models) are used to predict enterprise fraud, and Scikit-Learn is used to construct a machine learning model to predict enterprise fraud. The research results show that through a series of indicators, such as AUC value, accuracy rate and recall rate, as well as the ability of NDCG@k, Precision@k and Recall@k to measure the fraud prediction model, this paper finds that the random forest (RF) model performs better in enterprise fraud prediction, especially in predicting more serious fraud (that is, fictitious assets and fabricated profits). Further research shows that social media variables subordinate to "exposure factors" are the most significant compared with other variables.


05

Conclusion, contribution and enlightenment

In this paper, based on the GONE theory framework, and incorporating the "greed factor" dominated by social media, an enterprise fraud prediction model based on random forest (RF) algorithm is constructed. Firstly, this paper improves the accuracy of the existing forecasting model, which is helpful for regulators and investors to find corporate fraud in advance. Secondly, this paper confirms the importance of social media information, on the one hand, it expands the relevant research literature on the incremental role of social media information, on the other hand, it also expands the relevant research on corporate fraud factors. Finally, by comparing six machine learning methods, this paper shows the algorithm advantages of random forest (RF) in predicting enterprise fraud. With the change of business model, the forms and motivations of enterprise fraud will also change. Subsequent research can try to integrate data from different sources and update the measurement indicators of factors to further improve the prediction accuracy.