A Data Mining Model for an Effective Trading System
Authors: Ahmed Gomaa, Marywood University, Daniel Sputa, Marywood University, Rex Dumdum, Marywood University
Abstract
This research explored how financial statement ratios can help predict future stock direction on a quarterly basis. The study focused on a contrarian investment strategy and tested whether financial data could be collected, organized, and used to train predictive models effectively.
Several data mining models were applied to identify the variables with the strongest predictive value. The findings suggested that the Association Rule model performed best, reaching 71.43% accuracy in predicting stock direction while using only a limited number of variables.
The analysis was based on published financial statements from companies listed in the S&P 500 over a five-year period.
Introduction
Investors use different methods to predict stock price direction, including fundamental analysis, technical analysis, and behavioral analysis. During this process, they choose which variables to focus on, often based on experience, investment style, or strategy.
At Marywood University in Scranton, Pennsylvania, the Pacer Investment Fund gave graduate students the opportunity to manage real investments under faculty supervision. The team focused primarily on value investing and contrarian strategies, aiming for long-term profitability with controlled risk.
Their investment decisions typically began with fundamental analysis, reviewing financial statements and calculating key ratios to evaluate a company’s financial health. This was followed by technical analysis to study stock price trends and behavioral analysis to consider broader economic, political, and market influences.
Because each stage of the decision-making process depends on valid assumptions and reliable data, understanding which financial variables matter most is critical.
Research Problem
One of the main challenges in fundamental analysis is determining which financial variables most strongly influence stock price direction. It is also difficult to identify the threshold values those variables should reach in order to suggest whether a stock price is likely to rise or fall.
The goal of this study was to identify the most important financial variables affecting stock price direction and determine the thresholds associated with them, particularly in the context of a contrarian investment strategy.
Related Work
Previous research on contrarian investing often focused on constructing portfolios using selected variables and then measuring performance over time. However, many studies did not clearly explain why specific variables were chosen.
For example, prior research used measures such as beta, market risk premium, earnings-to-price ratio, book-to-market value, cash flow-to-price, and growth rates. While these studies showed results, they often lacked clear justification for variable selection.
This study aimed to address that gap by using data mining techniques to identify which variables actually matter most when analyzing financial statements.
Methodology
The analysis used published financial statements from S&P 500 companies over the previous five years. Quarterly financial data was collected, and a variety of ratios were calculated, including measures related to liquidity, capital structure, and inventory management.
The initial dataset included thousands of rows of company-quarter observations. Data was reviewed internally for validity and completeness. Because the study focused on contrarian stocks, only companies with a beta of 1 or lower were included.
The data was then matched to stock price movement by quarter, indicating whether the stock price increased or decreased compared with the previous quarter.
Different data mining models were then trained using part of the dataset and tested on the remaining portion. The models received only financial ratios as input and attempted to predict whether the stock price would go up or down. Their predictions were then compared with actual market outcomes.
Results
Association Rules (AR)
The Association Rule model identified meaningful relationships among variables linked to stock price increases and decreases.
For stock price increases, the strongest indicators included:
- High P/E ratio
- High current ratio
- High quick ratio
- Inventory turnover within a specific range
The most relevant thresholds included:
- Current ratio ≥ 2.2
- Inventory turnover between 1.8 and 4.2
- Cash ratio ≥ 0.64
- P/E ratio > 14.04

For stock price decreases, the model found significance in:
- High receivables turnover
- Low current ratio
- Low cash ratio
- Low debt-to-equity ratio
- Low P/E ratio

Decision Trees (DT)
The Decision Tree model confirmed the importance of the P/E ratio. When the P/E ratio was below 11.46, more than half of the stocks decreased in price. When the P/E ratio was 14 or higher, stocks were much more likely to increase in price.
This suggested a general pattern: higher P/E ratios were associated with stronger stock price growth.

Naïve Bayes (NB)
The Naïve Bayes model produced similar results, also indicating that higher P/E ratios were associated with an increased likelihood of stock price growth.

Neural Network (NN)
The Neural Network model calculated probabilities for each variable and its significance. For example, when the P/E ratio was below 14, the model predicted stock price decline in 57% of cases.

Which Model Performed Best?
Each model approached prediction differently, and their accuracy varied. Among the models tested, the Association Rule model proved to be the most accurate.

It achieved a prediction accuracy of 71.43%, making it the strongest performer in this study.

Conclusion
This research demonstrated that financial statement ratios can be used to predict stock price direction with a meaningful level of accuracy. More importantly, it showed that data mining methods can help identify not only which variables matter, but also the threshold values that may signal upward or downward movement.
The study also validated the overall learning method: collecting financial data, loading it into a system, training models, and testing predictions on separate data. In that sense, the project successfully proved the concept and established a foundation for further work.
Among all tested models, Association Rules delivered the strongest results and highlighted a small group of financial indicators with the greatest predictive relevance.