We’ve long had interest in starting our own quantitative stock fund. For a few reasons:
It’s a good data science exercise. There are lots of signals, including quarterly reports, prices at time t, and an internet full of information about the individual businesses, and lots of training labels – those are the prices at time t + 1.
You find yourself learning more about interesting areas: data science, the history of technology and economics.
There are reasons to think you can beat the market with a moderate amount of attentive work. (Caveat: there are reasons to think you can’t beat the market.) For example, a very good investing approach is simply to invest in the market, e.g. by owning Vanguard’s whole-stock-market fund (VTI). But that means your choices are market-cap-weighted, and market cap may not be the best indicator of future growth.
You can apply your preferences to exclude some stocks without reducing yield too much. Preferences might include sustainability, consumer value produced, no tobacco, etc..
Once you purchase the stocks (or fantasy-purchase them), you have another angle through which to follow the daily news. For example, it’s fun to follow clinical trial results.
So we spun up a side project to predict stock growth. First let’s talk about how easy that is to do. (Pretty easy!). Then about the industry-specific signals we generate for biotech/pharma companies, and about Moderna in particular.
(Legal Disclaimer: This post is for informational purposes only. You should not construe it as legal, tax, investment, financial, or other advice.)
How to Build Your Own Quant Fund: Getting Historical Data
To predict future stock growth we need three main kinds of data: a list of stocks to investigate, quarterly financial reports for those stocks that go back as far as possible, and price data that goes back as far as possible.
If you just want to get your feet wet without spending money, you can get the NASDAQ’s list of all NASDAQ and NYSE tickers from the NASDAQ ftp site (remember ftp?) and use yfinance, a python package, to get the prices and financials. It has nice instructions, but the financials data is only for the last 4 quarters and the last 4 years – not nearly enough training to make good predictions.
EOD Historical Data is a better data source. Their documentation is clear, brief when it should be, and extensive. A month of their $50/month Fundamentals Data Feed subscription was all we needed to get started. It includes:
stock tickers lists , including delisted tickers, which we want so we can avoid survivor bias, and exchange lists.
quarterly financial reports data that tests out pretty well going back 30 years, and that go back further for some stocks.
How to Build Your Own Quant Fund: Predicting Stock Gains
Once we downloaded this data we were ready to build features and models. What features might predict success? We learned a lot from Patrick O'Shaughnessy's book about features. He suggests several things that our analysis of the data does indeed suggest are strongly predictive of stock price gains, at least on the data from the last 30 years. An illustrative example: price-to-cash-flow, which is related to the more popular price-to-earnings and price-to-sales. Other predictive features involve revenues, stock prices, other cash flow quantities, income numbers, etc.
Given data and meaningful features, how do we predict gains? We use a simple approach that will be familiar to practicing data scientists.
First, we train our model at various points in time, with previous data as inputs. For financials, that means knowing the reporting dates, and not applying the data from a financial report (by buying or pretending to buy a stock) until after the report has happened and indeed until enough time has passed that you’re sure your pipelines would have run and given you time to do pre-launch tests and buy the stock.
Second, we use supervised learning, with the training labels being price growth measured starting from the time we buy the stock, after discounting the gains we would have gotten from a standard index investment (VTI is strongest over the long term, but many like S&P), and after adding dividends and discounting capital gains taxes. Growth labels can be integrated over any time you like: a quarter, a year, 5 years, whatever.
Finally, we use a deep lattice network with monotonicity constraints, (and also use simpler calibrated linear models for comparison), with the features I mentioned above as input, and price growth as the output, thus predicting the label. Calibrated linear models are broadly-used and sometimes called generalized additive models. You can read about deep lattice networks in this NIPS paper.
You can do this exercise with any machine learning architecture (e.g. a DNN), but beware that this scenario is ripe with opportunities to overfit. Stock prices are buffeted by unpredictable external events that won’t be repeated. The only way to collect new test data is to wait a quarter. So you want a lot of regularization and not too much flexibility.
We’ve done many model runs, and learned interesting things. Patrick is right: price-to-cash-flow is strongly predictive of future performance. It’s a much better indicator than prices-to-sales (by about 10-1) or price-to-earnings (which usually provides no additional value). But of course remember that the three are correlated. These price measures are strongly predictive for a good reason. The key to not losing money when you buy anything is to buy it at a cheap price compared to its long-term value. Price-to-cash-flow is a bit like price-per-square-foot when you’re buying a house. The cash flows, like the square feet, are real value. Patrick argues, and I agree, that cash flow is a more transparent indicator of how well the company is doing than earnings, because the latter can be easily manipulated, because companies have more choices about how to book earnings.
The algorithm kicked out lists of stocks with the best expected yields for each of the last 120 quarters. Most of them are stocks I’ve never heard of, and it’s been interesting learning about a few of them. A few are quite familiar. Including Moderna.
What does the model think of Moderna? Above all, it likes that Moderna has a price-to-cash-flow around 4, which is about 30% of the industry median or the market-wide median. Why is Moderna so cheap? It’s not like anyone’s forgotten it exists. But analysts predict that Moderna’s profits will fall as Covid vaccine rates fall. My analysis suggests that in itself this is overstated. Moderna’s profits are falling, and when they report Thursday perhaps we’ll hear they’ve fallen even further, but I see analysts throwing around “forward price-to-earnings (P/E)” predictions like 19. Today’s P/E is 4.7, so earnings would have to fall a lot to make that happen.
And this brings us to the second domain of analysis: Moderna’s product pipeline.
Biotech/Pharma Company Pipelines
The core of a biotech company is its product pipeline. This is a list of all drugs that they’ve announced as in development, along with their clinical trial status. Drugs have five main statuses: preclinical, phase 1, phase 2, phase 3, and launched. It can take $100M+ and many years to get from preclinical status (which mostly means doing experiments in dishes or with mice) through the three phases of clinical studies of safety and efficacy in patients, past the final FDA approval to launch. Clinical trials are such a big deal that when trial results are reported it routinely moves the stock, about as reliably as an earnings report. For example when Mirum reported a positive Phase 3 liver-disease trial on October 25th, its stock jumped 18%. This happens all the time (both up and down), and is part of the fun of owning biotech stocks. Biotechs go public very early relative to software companies, and their products are very expensive to develop, so we watch their struggles and successes easily.
What’s the likelihood that a drug will make it across the finish line, given where it is now? I use the data from Pharma Intelligence (see page 6):
8% from start of Phase 1
16% from start of Phase 2
53% from start of Phase 3
91% from start of a new drug application or biologics license application
Those are tough numbers, and especially notice that Phase 2 trials sink a lot of promising drugs. Success rates also vary with features of the drug and with what diseases they attack and with time.
Moderna has a very rich pipeline. Not counting additional Covid vaccine variants (they have a bunch of those, including one that’s easier to refrigerate), it has the three products in Phase 3:
Flu vaccine.
Respiratory syncytial virus (RSV) vaccine for older adults
Cytomegalovirus (CMV) vaccine
Each of these has a nominal 53% chance of becoming an approved product. If the three products have independent chances of success, that’s three coin tosses at having another product online, with 1.6 expected. Moderna Chairman Noubar Afeyan is fond of saying that they strive to create multiple independent programs each containing multiple related drugs. Thus they get multiple independent chances at success, and when one of the drugs does succeed, they then know they can press forward with the other drugs in that program, with an increased chance of success. How well have they done at this? Time will tell. For now, one positive sign is that these RNA vaccines have some similarities to the approved Covid RNA vaccines. So perhaps each has more than a 53% chance of getting to market. There are other important numbers, which I’ll skip over today: market sizes for Flu, RSV, and CMV, and the competitive landscape.
The Phase 3 drugs aren’t all Moderna has, though. They also have four Phase 2 drugs, and many Phase 1 and preclinical products. Ordinarily those four Phase 2 drugs would look like four 16% shots, but we know that on October 12th, before the market opened, Merck paid Moderna $250M to exercise an option on their Phase 2 personalized cancer vaccine. My understanding is they’re all seeing intermediate results from that trial, so that raises the expected value. Certainly the market liked the news – Moderna’s price went up 8% that day. They expect to announce results from this trial later this quarter.
Overall Moderna’s pipeline indicates suitcases full of future cash. I believe bearish analysts are missing that revenue.
Conclusion
We have two lines of evidence that indicate future success for Moderna. One is from a careful machine-learning analysis of the stock fundamentals, especially the price-to-cash-flow and the very similar price-to-earnings. The other is from looking at their rich pipeline of future products. We bought most of our Moderna shares in recent months at $132 or even $124, but it’s still cheap at $155. We bought for the long term, and expect to learn more from Thursday’s earnings report and from announcements about future clinical trial results over the coming years.