Project · 2021
NLP on 10-K filings
A text-based stock selection model built on the Lazy Prices paper — scraping SEC filings, scoring sentiment drift, and back-testing the alphas.
Inspired by the academic paper Lazy Prices (Cohen, Malloy, Nguyen), this project tests whether year-over-year changes in how a company writes its 10-K filing predict future returns.
I scraped filings directly from the SEC, vectorized them with TF-IDF, measured cosine similarity between consecutive years, and ran the resulting factor through Alphalens to look at Sharpe ratios by quantile. Filings that changed the most — especially in risk language — showed meaningfully different forward returns than filings that barely changed.