Applications of Machine Learning in Accounting Research
- Publication Type:
- Thesis
- Issue Date:
- 2021
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Over the last two decades, accounting research has identified an increasing number of incremental explanatory variables. However, owing to a combination of inadequate review studies and the limitations of conventional tools, researchers have expressed concerns about the growth in incremental explanatory variables, particularly as p-hackers may manipulate test design and data selection to produce statistically significant results. This thesis presents a comprehensive overview of the acute challenges faced by accounting research, including p-hacking, overreliance on t-statistics, arbitrary selection of control variables, lack of replication culture, and a shortage of careful review studies. In response to calls for more review studies and more advanced techniques in accounting research, this thesis applies a novel technique—machine learning—to systematically evaluate the vast number of incremental variables in explaining two popular outcomes, namely, audit fees and tax avoidance engagement. Following developments in other areas of economics and business research, this thesis applies two widely used variable-selection-oriented algorithms, LASSO and random forest, to systematically evaluate the large number of explanatory variables that the extant audit and tax literature has increasingly identified.
By focusing on two well-explored research questions (i.e., the determinants of audit fees and tax avoidance), this thesis identifies strong variables that form robust baseline models. These models provide a solid foundation for subsequent audit fee and tax avoidance studies and thereby enhance the comparability and credibility of their results. By replicating a number of prior works and showing the sensitivity of results to these robust baseline models, this thesis also demonstrates the importance of valid arguments, robust baseline models and strong theory, prior to concluding that a novel independent variable is economically and statistically significant. Overall, this thesis provides an example of applying more advanced techniques to tackle problems that are beyond the capability of conventional regression approaches typically relied upon by accounting research.
Please use this identifier to cite or link to this item:
