Bad Data – Garbage In, Garbage Out
This lesson is about quotes and data. We take it for granted that you are backtesting all your trading ideas, and this lesson is all about the quality of your data. As you have probably heard many times already, garbage in is garbage out. You want to protect your capital and one way of doing that is by using good datasets.
Having data is important for two reasons:
- You need data for backtesting, and
- You need data for trading
Your backtest is only as good as the data you are testing on. Make sure you are backtesting on reliable and “clean” data. In the long run, it pays off to spend money on a good data source for backtesting. Additionally, your data needs to be accurate but also long enough to be useful.
As a safety valve against bad data and errors in the backtest, you must also perform out-of-sample testing (see later lesson).
Errors in datasets:
When you backtest, the quotes might contain errors. For example, the high and low during the day are frequently prone to be wrong in many datasets. The reason is that some trades from the day might be reported the day after etc.
This is an example:
The “tail” below is plain wrong. Here is another example:
How do we know it’s wrong? We know it’s wrong because we compared it to good and clean data. Besides, it doesn’t look right either. The differences are too high compared to the “body” of the candle.
Below is a comparison of the quotes comparing the manually downloaded quotes from Interactive Brokers and the free end of day (EOD) quotes from Yahoo!finance. It shows the percentage difference between the OPEN to HIGH and OPEN to LOW (the OPEN to HIGH from IB is deducted the OPEN to HIGH from Yahoo!finance).
The first bar shows that Yahoo!finance has a lot of high quotes that are a lot higher than IB’s. The second chart shows the same attributes: The low in Yahoo!finance is a lot lower than IB’s.
The bad quotes might have implications on your backtest, especially if you are using the HIGH and LOW for exits or entry.
Correct data
If you are serious about trading and backtesting we recommend paying a premium to obtain good data. We don’t recommend any free data.
