How good is the world’s worst algorithmic trading program?

This is a description of a simple data science experiment I conducted while learning about basic data science technologies and approaches. I’ll explain what question I was trying to answer, how I answered it, and what lessons I learned in the process.

*That’s a lot of green. Can a computer do as well?*

Context

Automatic, computer-driven stock trading programs have probably been around since the the ENIAC was invented in 1946. I’ve always suspected that these programs can’t be any good, or else everyone would be using them and stock brokers would have gone the way of elevator operators and travel agents by now. But just how bad are they? I can’t test the effectiveness of sophisticated trading programs like big investment banks use, but I can devise and test a simple algorithmic trading program that might be used by an unsophisticated individual investor.

The question I’m trying to answer

How well would a dead-simple algorithmic trading program have worked against five years’ worth of historical data, compared to buying and holding over that same time period?

The dataset

As any data scientist knows, Kaggle is a treasure trove of free data, available for anyone to download and use. It turns out to have exactly what I need to get this experiment off the ground: five years’ worth of historical price data for the S&P 500 companies for every trading day between 8 February 2013 and 7 February 2018. This data was released with a CC0 open source license.

Preparing the data

The dataset was packaged as a well-structured CSV file that was easy to import into whatever data structure I wanted to use. The sane thing to do would be to load the CSV directly into a pandas DataFrame. But I wanted to brush up on my SQL and experiment with SQLite, so I first loaded the CSV into a SQLite database, and then slurped it from there into a DataFrame.

Because the data was so well groomed before I got it, not much cleanup was required. All I had to do was remove any companies that either joined or left the S&P 500 midway through the five-year cycle.

Analyzing the data

The trading algorithm consisted of the simplest rules I could come up with:

At the beginning of the five year span, buy $1000 worth of each stock.
Whenever a stock I hold goes up by a certain percent (the rise limit), take profits by selling all shares I own in that stock.
Whenever a stock I hold goes down by a certain percent (the sink limit), guard against further losses by selling all shares I own in that stock.
After selling a stock, wait for a certain number of days (the cool-off span) and then get back in by buying $1000 worth of that stock.
At the end of the five years, sell whatever I’m still holding.
Ignore dividends and commissions.

I ran this simulation 1000 times, using 10 values for each of the rise limit, sink limit, and cool-off span variables. These variables ranged between 1 and 100, and NumPy’s logspace function clustered them tightly for smaller values and spread them out as they got larger:

>>> numpy.logspace(0.1, 2, 10, dtype=int)
array([ 1, 2, 3, 5, 8, 14, 23, 37, 61, 100])

For each combination of rise limit, sink limit, and cool-off span, I ran the simulation and added up the total profit (or loss) for each stock over the five year period. This gave me a grand percent change on the $500,000 that it took to buy $1000 of each of the 500 stocks at the beginning. I used grand percent change to evaluate how successful each combination of the three variable was. That number, along with the three variable values that produced it, went into a CSV results file.

As a basis for comparison, I also ran the simulation using a strict buy-and-hold strategy, where I didn’t sell any of the stocks until the five-year span was up.

Visualizing the results

Now the fun began. I loaded the results CSV into a pandas DataFrame and did some exploratory data analysis.

First, let’s answer the question posed at the beginning of this write-up. How does the algorithm’s results compare to a buy-and-hold strategy?

The best-performing combination of variables is… buy-and-hold (setting the three variables to -1 was a signal to the simulator to buy at the beginning and not sell anything until the end).

Behind buy-and-hold, we see what is essentially a many-way tie between combinations of variables that all include the maximum rise limit, a small sink limit, and a very small cool-off span. This isn’t surprising given that those variables are about as close to buy-and-hold in spirit as we can get: they all result in staying in the market with very little churn.

Let’s try to get a sense of how much each of the three variables affects the result. Time to haul out the graphs!

This scatter plot shows how different values of sink limit affect the total profit or loss at the end of each simulation run.

Each dot represents one combination of the three variables, so each vertical bar represents all the simulated runs for a fixed value of sink limit. The fact that the vertical bars are tall for any value of sink limit tells us that sink limit doesn’t by itself affect the end result that much (in other words, setting the other two variables to different values produces a wide range of final results, as sink limit is held fixed). It does appear that we avoid truly terrible results by pushing sink limit above 15 or so (notice how the bottom of the bars rise until around that value). But it also looks like raising sink limit too high (above about 22) limits how high the results can go when the other two variables are adjusted (notice how the tops of the bars aren’t as high for large values of sink limit). The moral of the story is that although sink limit isn’t all that important, keeping it around five is necessary for maximizing overall results. Lo and behold, this is borne out by fig. 1 above.

How about the role played by the other two variables? Here’s how rise limitaffects the end result.

This variable has a far bigger impact than sink limit. Notice that as you increase rise limit, both the low and high ends of the bar rise significantly. Conclusion: when it comes to rise limit, go big or go home.

And here’s cool-off span.

Wow, look at that. The bars’ top and bottom ends drop as cool-off span increases, which means that the smallest values of cool-off span not only produce the highest maximum results, but also the highest minimum results. In addition, notice how smaller values of cool-off span have shorter bars. That means that small values of cool-off span constrain the effects of the other two variables. The lesson: setting cool-off-span to the smallest possible value is probably the single most important factor in maximizing the simulation’s results.

Conclusion

For the S&P 500 during the span of 8 February 2013 to 7 February 2018, my simple trading algorithm doesn’t come close to matching the results I would have gotten using a simple buy-and-hold strategy. This is true no matter how you adjust the three variables of rise limit, sink limit, and cool-off span.

Future work

The historical prices used in this experiment spanned a bull market. I’m curious if the algorithm fares any better during a bear market or a mixed market.

Also, I’d like to analyze the interplay between variable values and end results using more sophisticated statistical techniques, as I become familiar with them.

Technical details

All of the Python code for this experiment is freely available. I used the NumPy, DataFrame, and Matplotlib libraries.

Since my life before data science focused on software development with a special focus on testing and quality, I used the BDD (behavior-driven design) software development philosophy for this project. My code is covered by BDD-style user acceptance tests, using the behave test framework (Python’s version of the well-known Cucumber test framework for Ruby and Java). One of the advantages of BDD is that these tests not only prove that the algorithm works correctly, but also serve as documentation for the simulation and its data loading tool.

Lessons learned

SQLite is a gem of a package: easy to set up, easy to use, devoid of surprises. Unfortunately it’s also a lot slower than I expected — at least when accessed via its Python API.
If you need any confidence whatsoever in your results, automated tests for your data analysis code are essential. Budget at least as much time for writing tests as for writing code. Only neglect the tests if your project is a quick proof of concept or a learning exercise.
I’ve got to figure out how to embed a Jupyter Notebook cell into a WordPress page. Suggestions are welcome!
Pandas has a massive number of features, and that’s just for its DataFrame class. But I bet you can get a lot of work done knowing just 10% of its feature set.