Project · 2021

NLP on 10-K filings

A text-based stock selection model built on the Lazy Prices paper — scraping SEC filings, scoring sentiment drift, and back-testing the alphas.

Stack Python · BeautifulSoup · TF-IDF · Alphalens · NLTK

Role Solo research project


Inspired by the academic paper Lazy Prices (Cohen, Malloy, Nguyen), this project tests whether year-over-year changes in how a company writes its 10-K filing predict future returns.

I scraped filings directly from the SEC, vectorized them with TF-IDF, measured cosine similarity between consecutive years, and ran the resulting factor through Alphalens to look at Sharpe ratios by quantile. Filings that changed the most — especially in risk language — showed meaningfully different forward returns than filings that barely changed.

← all projects