import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
from pathlib import Path
from datetime import datetime
# Read parameters if using R
# SYMBOL = params.get("symbol", "AAPL")
# START = params.get("start_date", "2018-01-01")
# END = params.get("end_date", "")
# ROLL = int(params.get("rolling", 20))
if not END:
= pd.Timestamp.today().strftime("%Y-%m-%d")
END
SYMBOL, START, END, ROLL
3 Session 3 — Quarto Reports (Python)
Assumptions: Students already have (from Sessions 1–2) a repo like
unified-stocks-teamX
in Drive (or they can create it now) and basic Git push workflow with a short‑lived token. Today focuses on Quarto.
3.1 Session 3 — Quarto Reports (Python) — 75 minutes
3.1.1 Learning goals
By the end of class, students can:
- Create a parameterized Quarto report (
.qmd
) that runs Python code. - Render a report from Colab using the Quarto CLI (with caching).
- Pass parameters on the command line to re‑render for different tickers/date ranges.
- Configure a minimal Quarto website that builds to
docs/
and publish it via GitHub Pages.
3.2 Agenda (75 min)
- (8 min) Why Quarto for DS: literate programming, parameters, caching, publishing
- (12 min) Anatomy of a
.qmd
: YAML front matter,params:
, code chunks,execute:
options, figures - (35 min) In‑class lab: install Quarto in Colab → create
_quarto.yml
→ writereports/eda.qmd
→ render for AAPL/MSFT → output todocs/
- (10 min) GitHub Pages walkthrough + troubleshooting + homework briefing
- (10 min) Buffer for hiccups (first Quarto install/render often needs a minute)
3.3 Slides
Why Quarto
- One source of truth for code + prose + figures → reproducibility and explainability.
- Parameterization = fast re‑runs with different inputs (ticker/horizon).
- Publishing to GitHub Pages gives a permanent, shareable artifact.
Key concepts
Front matter:
format:
controls HTML/PDF/RevealJS (we’ll use HTML).execute:
controls caching, echo, warnings.params:
defines inputs; accessed asparams
dict in Python cells.
Performance: enable
execute.cache: true
to avoid refetching/recomputing.Publishing: write to
docs/
then enable GitHub Pages (Settings → Pages → “Deploy from a branch” →main
//docs
).
Ethics/footnote
- Financial data EDA here is educational only; not trading advice.
3.4 In‑class lab (35 min)
Instructor tip: Ask students to follow step‑by‑step. If they didn’t complete Session 2’s clone, they can create a fresh folder under Drive and initialize a new GitHub repo afterward.
3.4.1 0) Mount Drive and set repo paths
Run each block as a separate Colab cell.
from google.colab import drive
'/content/drive', force_remount=True)
drive.mount(
= "YOUR_GITHUB_USERNAME_OR_ORG" # <- change
REPO_OWNER = "unified-stocks-teamX" # <- change
REPO_NAME = "/content/drive/MyDrive/dspt25"
BASE_DIR = f"{BASE_DIR}/{REPO_NAME}"
REPO_DIR = f"https://github.com/{REPO_OWNER}/{REPO_NAME}.git"
REPO_URL
import pathlib, os, subprocess
=True, exist_ok=True)
pathlib.Path(BASE_DIR).mkdir(parents
if not pathlib.Path(REPO_DIR).exists():
!git clone {REPO_URL} {REPO_DIR}
else:
%cd {REPO_DIR}
!git pull --ff-only
%cd {REPO_DIR}
3.4.2 1) Install Quarto CLI on Colab and verify
# Install Quarto CLI (one-time per Colab runtime)
!wget -q https://quarto.org/download/latest/quarto-linux-amd64.deb -O /tmp/quarto.deb
!dpkg -i /tmp/quarto.deb || apt-get -y -f install >/dev/null && dpkg -i /tmp/quarto.deb
!quarto --version
3.4.3 2) Minimal project config: _quarto.yml
(website to docs/
)
from textwrap import dedent
= dedent("""\
qproj project:
type: website
output-dir: docs
website:
title: "Unified Stocks — EDA"
navbar:
left:
- href: index.qmd
text: Home
- href: reports/eda.qmd
text: EDA (parametrized)
format:
html:
theme: cosmo
toc: true
code-fold: false
execute:
echo: true
warning: false
cache: true
""")
open("_quarto.yml","w").write(qproj)
print(open("_quarto.yml").read())
Create a simple homepage:
= """\
index ---
title: "Unified Stocks Project"
---
Welcome! Use the navigation to view the EDA report.
- **Stock set**: see `tickers_25.csv`
- **Note**: Educational use only — no trading advice.
"""
open("index.qmd","w").write(index)
print(open("index.qmd").read())
3.4.4 3) Create the parameterized EDA report: reports/eda.qmd
::::
import os, pathlib
"reports/figs").mkdir(parents=True, exist_ok=True)
pathlib.Path(#
= """\
eda_qmd ---
title: "Stock EDA"
format:
html:
toc: true
number-sections: false
execute-dir: "/content/drive/MyDrive/dspt25/STAT4160/reports"
execute:
echo: false
warning: false
cache: false # keep off while testing params
jupyter: python3
params:
symbol: "AAPL"
start_date: "2018-01-01"
end_date: ""
rolling: 20
---
::: callout-note
This report is parameterized. To change inputs without editing code, pass
`-P symbol:MSFT -P start_date:2019-01-01 -P end_date:2025-08-01 -P rolling:30` to `quarto render`.
:::
## Setup if using Python
::: {#ebadf2f2 .cell tags='["parameters"]' execution_count=1}
``` {.python .cell-code}
# Default values (overridden by -P at render time)
SYMBOL = "AAPL"
START = "2018-01-01"
END = ""
ROLL = 20
:::
3.5 Download and prepare data
# Fetch adjusted OHLCV
try:
= yf.download(SYMBOL, start=START, end=END, auto_adjust=True, progress=False)
data except Exception as e:
print("yfinance failed, falling back to synthetic series:", e)
= pd.bdate_range(START, END)
idx = np.random.default_rng(42)
rng = rng.normal(0, 0.01, len(idx))
ret = 100 * np.exp(np.cumsum(ret))
price = rng.integers(1e5, 5e6, len(idx))
vol = pd.DataFrame({"Close": price, "Volume": vol}, index=idx)
data
# Tidy & features
= data.rename(columns=str.lower).copy()
df = df[["close","volume"]].dropna()
df "log_return"] = np.log(df["close"]).diff()
df["roll_mean"] = df["log_return"].rolling(ROLL, min_periods=ROLL//2).mean()
df["roll_vol"] = df["log_return"].rolling(ROLL, min_periods=ROLL//2).std()
df[= df.dropna()
df df.head()
3.6 Price over time
= plt.subplots(figsize=(8,3))
fig, ax "close"])
ax.plot(df.index, df[f"{SYMBOL} — Adjusted Close")
ax.set_title("Date"); ax.set_ylabel("Price")
ax.set_xlabel(
fig.tight_layout()# figpath = Path("reports/figs")/f"{SYMBOL}_price.png"
= Path("figs")/f"{SYMBOL}_price.png" #same changes for the rest of the figures
figpath =144)
fig.savefig(figpath, dpi figpath
3.7 Daily log returns — histogram
= plt.subplots(figsize=(6,3))
fig, ax "log_return"], bins=50, alpha=0.8)
ax.hist(df[f"{SYMBOL} — Daily Log Return Distribution")
ax.set_title("log return"); ax.set_ylabel("count")
ax.set_xlabel(
fig.tight_layout()= Path("figs")/f"{SYMBOL}_hist.png"
figpath =144)
fig.savefig(figpath, dpi figpath
3.8 Rolling mean & volatility (window = {params.rolling})
= plt.subplots(figsize=(8,3))
fig, ax "roll_mean"], label="rolling mean")
ax.plot(df.index, df["roll_vol"], label="rolling std")
ax.plot(df.index, df[f"{SYMBOL} — Rolling Return Stats (window={ROLL})")
ax.set_title("Date"); ax.set_ylabel("value")
ax.set_xlabel(
ax.legend()
fig.tight_layout()= Path("figs")/f"{SYMBOL}_rolling.png"
figpath =144)
fig.savefig(figpath, dpi figpath
3.9 Summary table
= pd.DataFrame({
summary "n_days": [len(df)],
"start": [df.index.min().date()],
"end": [df.index.max().date()],
"mean_daily_ret": [df["log_return"].mean()],
"std_daily_ret": [df["log_return"].std()],
"ann_vol_approx": [df["log_return"].std()*np.sqrt(252)]
}) summary
Note: Educational use only. This is not trading advice. ““”
::::
```python
open("reports/eda.qmd","w").write(eda_qmd)
print("Wrote reports/eda.qmd")
3.9.1 4) Render the report for one ticker (AAPL) and put outputs in docs/
# Single render with defaults (AAPL)
!quarto render reports/eda.qmd --output-dir docs/
Open the produced HTML (Colab file browser → docs/reports/eda.html
). If the HTML is under docs/reports/eda.html
, that’s expected (Quarto keeps layout mirroring source folders).
3.9.2 5) Render for multiple tickers by passing parameters
# Render for MSFT with custom dates and rolling window
!quarto render reports/eda.qmd -P symbol:MSFT -P start_date:2019-01-01 -P end_date:2025-08-01 -P rolling:30 --output-dir docs/
# Render for NVDA with a different window
!quarto render reports/eda.qmd -P symbol:NVDA -P start_date:2018-01-01 -P end_date:2025-08-01 -P rolling:60 --output-dir docs/
This will create docs/reports/eda.html
for the last render (Quarto overwrites the same output path by default). If you want separate pages per ticker, render to different filenames:
# Example: write MSFT to docs/reports/eda-MSFT.html via project copy
import shutil, os
"reports/eda.qmd", "reports/eda-MSFT.qmd")
shutil.copy(!quarto render reports/eda-MSFT.qmd -P symbol:MSFT -P start_date:2019-01-01 -P end_date:2025-08-01 -P rolling:30 --output-dir docs/
3.9.4 7) Commit and push site to GitHub (so Pages can serve docs/
)
!git add _quarto.yml index.qmd reports/eda*.qmd reports/figs docs
!git status
!git commit -m "feat: add parameterized Quarto EDA and publish to docs/"
# Push using a short-lived fine-grained token (as in Session 2)
from getpass import getpass
= getpass("GitHub token (not stored): ")
token = f"https://{token}@github.com/{REPO_OWNER}/{REPO_NAME}.git"
push_url !git push {push_url} HEAD:main
del token
3.9.5 8) Enable GitHub Pages (one-time, UI)
On GitHub: Settings → Pages
- Source: Deploy from a branch
- Branch:
main
- Folder:
/docs
Save. Wait ~1–3 minutes. Your site will be live at the URL GitHub shows (usually
https://<owner>.github.io/<repo>/
).
3.10 Wrap‑up (10 min)
- Re‑rendering with
-P
lets you build many variants quickly. - Keep data fetches cached and/or saved to files to speed up renders.
- Your team can add more pages (e.g., Methodology, Results, Model Card) and link them via
_quarto.yml
.
3.11 Homework (due before Session 4)
Goal: Enhance the EDA report with two features and publish distinct pages for three tickers from tickers_25.csv
.
3.11.1 Part A — Add drawdown & simple regime shading
Edit
reports/eda.qmd
. After computingdf["log_return"]
, compute:cum_return
and drawdown- A simple volatility regime indicator (e.g., rolling std quantiles)
# Add to the "Tidy & features" section in eda.qmd
"cum_return"] = df["log_return"].cumsum().fillna(0.0)
df[= df["cum_return"].cummax()
peak "drawdown"] = df["cum_return"] - peak
df[
# Regime via rolling volatility terciles
= df["log_return"].rolling(ROLL, min_periods=ROLL//2).std()
vol = vol.quantile([0.33, 0.66])
q1, q2 def regime(v):
if np.isnan(v): return "mid"
return "low" if v < q1 else ("high" if v > q2 else "mid")
"regime"] = [regime(v) for v in vol]
df["regime"].value_counts().to_frame("days").T df[
- Add a drawdown plot and shade high‑volatility regimes:
# Drawdown plot
= plt.subplots(figsize=(8,3))
fig, ax "drawdown"])
ax.plot(df.index, df[f"{SYMBOL} — Drawdown (log-return cumulative)")
ax.set_title("Date"); ax.set_ylabel("drawdown")
ax.set_xlabel(
fig.tight_layout()= Path("reports/figs")/f"{SYMBOL}_drawdown.png"
figpath =144)
fig.savefig(figpath, dpi figpath
# Price with regime shading (simple)
= plt.subplots(figsize=(8,3))
fig, ax "close"])
ax.plot(df.index, df[f"{SYMBOL} — Price with High-Volatility Shading")
ax.set_title("Date"); ax.set_ylabel("Price")
ax.set_xlabel(
# Shade where regime == 'high'
= (df["regime"] == "high")
mask # merge contiguous regions
= False
in_region = None
start for i, (ts, is_high) in enumerate(zip(df.index, mask)):
if is_high and not in_region:
= True
in_region = ts
start if in_region and (not is_high or i == len(df)-1):
= df.index[i-1] if not is_high else ts
end =0.15) # shaded band
ax.axvspan(start, end, alpha= False
in_region
fig.tight_layout()= Path("reports/figs")/f"{SYMBOL}_price_regimes.png"
figpath =144)
fig.savefig(figpath, dpi figpath
3.11.3 Part C — Makefile convenience targets
Append these to your project Makefile
:
report:
\tquarto render reports/eda.qmd --output-dir docs/
reports-trio:
\tquarto render reports/eda-AAPL.qmd -P symbol:AAPL -P start_date:2018-01-01 -P end_date:2025-08-01 --output-dir docs/
\tquarto render reports/eda-MSFT.qmd -P symbol:MSFT -P start_date:2018-01-01 -P end_date:2025-08-01 --output-dir docs/
\tquarto render reports/eda-NVDA.qmd -P symbol:NVDA -P start_date:2018-01-01 -P end_date:2025-08-01 --output-dir docs/
On Colab, running
make
requiresmake
to be available (it is). Otherwise, keep usingquarto render
commands.
3.11.4 Grading (pass/revise)
reports/eda.qmd
renders with parameters and caching enabled.- At least three ticker pages rendered and linked in navbar.
- Drawdown and simple regime shading working on the EDA page(s).
- Site published via GitHub Pages (
docs/
present onmain
and live).
3.12 Key poitns
- Parameters make reports reusable; don’t copy‑paste notebooks for each ticker.
- Cache for speed; docs/ for Pages.
- Keep figures saved under
reports/figs/
and referenced in the report. - Keep secrets out of the repo; EDA uses public data only.
Next time (Session 4): a quick RStudio Quarto cameo and more report hygiene (citations, figure captions, alt text), then into Unix automation.