3  Session 3 — Quarto Reports (Python)

Assumptions: Students already have (from Sessions 1–2) a repo like unified-stocks-teamX in Drive (or they can create it now) and basic Git push workflow with a short‑lived token. Today focuses on Quarto.


3.1 Session 3 — Quarto Reports (Python) — 75 minutes

3.1.1 Learning goals

By the end of class, students can:

  1. Create a parameterized Quarto report (.qmd) that runs Python code.
  2. Render a report from Colab using the Quarto CLI (with caching).
  3. Pass parameters on the command line to re‑render for different tickers/date ranges.
  4. Configure a minimal Quarto website that builds to docs/ and publish it via GitHub Pages.

3.2 Agenda (75 min)

  • (8 min) Why Quarto for DS: literate programming, parameters, caching, publishing
  • (12 min) Anatomy of a .qmd: YAML front matter, params:, code chunks, execute: options, figures
  • (35 min) In‑class lab: install Quarto in Colab → create _quarto.yml → write reports/eda.qmd → render for AAPL/MSFT → output to docs/
  • (10 min) GitHub Pages walkthrough + troubleshooting + homework briefing
  • (10 min) Buffer for hiccups (first Quarto install/render often needs a minute)

3.3 Slides

Why Quarto

  • One source of truth for code + prose + figures → reproducibility and explainability.
  • Parameterization = fast re‑runs with different inputs (ticker/horizon).
  • Publishing to GitHub Pages gives a permanent, shareable artifact.

Key concepts

  • Front matter:

    • format: controls HTML/PDF/RevealJS (we’ll use HTML).
    • execute: controls caching, echo, warnings.
    • params: defines inputs; accessed as params dict in Python cells.
  • Performance: enable execute.cache: true to avoid refetching/recomputing.

  • Publishing: write to docs/ then enable GitHub Pages (Settings → Pages → “Deploy from a branch” → main / /docs).

Ethics/footnote

  • Financial data EDA here is educational only; not trading advice.

3.4 In‑class lab (35 min)

Instructor tip: Ask students to follow step‑by‑step. If they didn’t complete Session 2’s clone, they can create a fresh folder under Drive and initialize a new GitHub repo afterward.

3.4.1 0) Mount Drive and set repo paths

Run each block as a separate Colab cell.

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

REPO_OWNER = "YOUR_GITHUB_USERNAME_OR_ORG"  # <- change
REPO_NAME  = "unified-stocks-teamX"         # <- change
BASE_DIR   = "/content/drive/MyDrive/dspt25"
REPO_DIR   = f"{BASE_DIR}/{REPO_NAME}"
REPO_URL   = f"https://github.com/{REPO_OWNER}/{REPO_NAME}.git"

import pathlib, os, subprocess
pathlib.Path(BASE_DIR).mkdir(parents=True, exist_ok=True)

if not pathlib.Path(REPO_DIR).exists():
    !git clone {REPO_URL} {REPO_DIR}
else:
    %cd {REPO_DIR}
    !git pull --ff-only
%cd {REPO_DIR}

3.4.2 1) Install Quarto CLI on Colab and verify

# Install Quarto CLI (one-time per Colab runtime)
!wget -q https://quarto.org/download/latest/quarto-linux-amd64.deb -O /tmp/quarto.deb
!dpkg -i /tmp/quarto.deb || apt-get -y -f install >/dev/null && dpkg -i /tmp/quarto.deb
!quarto --version

3.4.3 2) Minimal project config: _quarto.yml (website to docs/)

from textwrap import dedent
qproj = dedent("""\
project:
  type: website
  output-dir: docs

website:
  title: "Unified Stocks — EDA"
  navbar:
    left:
      - href: index.qmd
        text: Home
      - href: reports/eda.qmd
        text: EDA (parametrized)

format:
  html:
    theme: cosmo
    toc: true
    code-fold: false

execute:
  echo: true
  warning: false
  cache: true
""")
open("_quarto.yml","w").write(qproj)
print(open("_quarto.yml").read())

Create a simple homepage:

index = """\
---
title: "Unified Stocks Project"
---

Welcome! Use the navigation to view the EDA report.

- **Stock set**: see `tickers_25.csv`
- **Note**: Educational use only — no trading advice.
"""
open("index.qmd","w").write(index)
print(open("index.qmd").read())

3.4.4 3) Create the parameterized EDA report: reports/eda.qmd

::::

import os, pathlib
pathlib.Path("reports/figs").mkdir(parents=True, exist_ok=True)
#
eda_qmd = """\
---
title: "Stock EDA"
format:
  html:
    toc: true
    number-sections: false
execute-dir: "/content/drive/MyDrive/dspt25/STAT4160/reports"
execute:
  echo: false
  warning: false
  cache: false     # keep off while testing params

jupyter: python3
params:
  symbol: "AAPL"
  start_date: "2018-01-01"
  end_date: ""
  rolling: 20
---


::: callout-note
This report is parameterized. To change inputs without editing code, pass
`-P symbol:MSFT -P start_date:2019-01-01 -P end_date:2025-08-01 -P rolling:30` to `quarto render`.
:::

## Setup if using Python

::: {#ebadf2f2 .cell tags='["parameters"]' execution_count=1}
``` {.python .cell-code}
# Default values (overridden by -P at render time)
SYMBOL = "AAPL"
START  = "2018-01-01"
END    = ""
ROLL   =  20

:::

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
from pathlib import Path
from datetime import datetime

# Read parameters if using R
# SYMBOL = params.get("symbol", "AAPL")
# START  = params.get("start_date", "2018-01-01")
# END    = params.get("end_date", "")
# ROLL   = int(params.get("rolling", 20))

if not END:
    END = pd.Timestamp.today().strftime("%Y-%m-%d")

SYMBOL, START, END, ROLL

3.5 Download and prepare data

# Fetch adjusted OHLCV
try:
    data = yf.download(SYMBOL, start=START, end=END, auto_adjust=True, progress=False)
except Exception as e:
    print("yfinance failed, falling back to synthetic series:", e)
    idx = pd.bdate_range(START, END)
    rng = np.random.default_rng(42)
    ret = rng.normal(0, 0.01, len(idx))
    price = 100 * np.exp(np.cumsum(ret))
    vol = rng.integers(1e5, 5e6, len(idx))
    data = pd.DataFrame({"Close": price, "Volume": vol}, index=idx)

# Tidy & features
df = data.rename(columns=str.lower).copy()
df = df[["close","volume"]].dropna()
df["log_return"] = np.log(df["close"]).diff()
df["roll_mean"]  = df["log_return"].rolling(ROLL, min_periods=ROLL//2).mean()
df["roll_vol"]   = df["log_return"].rolling(ROLL, min_periods=ROLL//2).std()
df = df.dropna()
df.head()

3.6 Price over time

fig, ax = plt.subplots(figsize=(8,3))
ax.plot(df.index, df["close"])
ax.set_title(f"{SYMBOL} — Adjusted Close")
ax.set_xlabel("Date"); ax.set_ylabel("Price")
fig.tight_layout()
# figpath = Path("reports/figs")/f"{SYMBOL}_price.png"
figpath = Path("figs")/f"{SYMBOL}_price.png" #same changes for the rest of the figures
fig.savefig(figpath, dpi=144)
figpath

3.7 Daily log returns — histogram

fig, ax = plt.subplots(figsize=(6,3))
ax.hist(df["log_return"], bins=50, alpha=0.8)
ax.set_title(f"{SYMBOL} — Daily Log Return Distribution")
ax.set_xlabel("log return"); ax.set_ylabel("count")
fig.tight_layout()
figpath = Path("figs")/f"{SYMBOL}_hist.png"
fig.savefig(figpath, dpi=144)
figpath

3.8 Rolling mean & volatility (window = {params.rolling})

fig, ax = plt.subplots(figsize=(8,3))
ax.plot(df.index, df["roll_mean"], label="rolling mean")
ax.plot(df.index, df["roll_vol"],  label="rolling std")
ax.set_title(f"{SYMBOL} — Rolling Return Stats (window={ROLL})")
ax.set_xlabel("Date"); ax.set_ylabel("value")
ax.legend()
fig.tight_layout()
figpath = Path("figs")/f"{SYMBOL}_rolling.png"
fig.savefig(figpath, dpi=144)
figpath

3.9 Summary table

summary = pd.DataFrame({
    "n_days": [len(df)],
    "start": [df.index.min().date()],
    "end":   [df.index.max().date()],
    "mean_daily_ret": [df["log_return"].mean()],
    "std_daily_ret":  [df["log_return"].std()],
    "ann_vol_approx": [df["log_return"].std()*np.sqrt(252)]
})
summary

Note: Educational use only. This is not trading advice. ““”

::::


```python
open("reports/eda.qmd","w").write(eda_qmd)
print("Wrote reports/eda.qmd")

3.9.1 4) Render the report for one ticker (AAPL) and put outputs in docs/

# Single render with defaults (AAPL)
!quarto render reports/eda.qmd --output-dir docs/

Open the produced HTML (Colab file browser → docs/reports/eda.html). If the HTML is under docs/reports/eda.html, that’s expected (Quarto keeps layout mirroring source folders).

3.9.2 5) Render for multiple tickers by passing parameters

# Render for MSFT with custom dates and rolling window
!quarto render reports/eda.qmd -P symbol:MSFT -P start_date:2019-01-01 -P end_date:2025-08-01 -P rolling:30 --output-dir docs/

# Render for NVDA with a different window
!quarto render reports/eda.qmd -P symbol:NVDA -P start_date:2018-01-01 -P end_date:2025-08-01 -P rolling:60 --output-dir docs/

This will create docs/reports/eda.html for the last render (Quarto overwrites the same output path by default). If you want separate pages per ticker, render to different filenames:

# Example: write MSFT to docs/reports/eda-MSFT.html via project copy
import shutil, os
shutil.copy("reports/eda.qmd", "reports/eda-MSFT.qmd")
!quarto render reports/eda-MSFT.qmd -P symbol:MSFT -P start_date:2019-01-01 -P end_date:2025-08-01 -P rolling:30 --output-dir docs/

3.9.4 7) Commit and push site to GitHub (so Pages can serve docs/)

!git add _quarto.yml index.qmd reports/eda*.qmd reports/figs docs
!git status
!git commit -m "feat: add parameterized Quarto EDA and publish to docs/"
# Push using a short-lived fine-grained token (as in Session 2)
from getpass import getpass
token = getpass("GitHub token (not stored): ")
push_url = f"https://{token}@github.com/{REPO_OWNER}/{REPO_NAME}.git"
!git push {push_url} HEAD:main
del token

3.9.5 8) Enable GitHub Pages (one-time, UI)

  • On GitHub: Settings → Pages

    • Source: Deploy from a branch
    • Branch: main
    • Folder: /docs
  • Save. Wait ~1–3 minutes. Your site will be live at the URL GitHub shows (usually https://<owner>.github.io/<repo>/).


3.10 Wrap‑up (10 min)

  • Re‑rendering with -P lets you build many variants quickly.
  • Keep data fetches cached and/or saved to files to speed up renders.
  • Your team can add more pages (e.g., Methodology, Results, Model Card) and link them via _quarto.yml.

3.11 Homework (due before Session 4)

Goal: Enhance the EDA report with two features and publish distinct pages for three tickers from tickers_25.csv.

3.11.1 Part A — Add drawdown & simple regime shading

  1. Edit reports/eda.qmd. After computing df["log_return"], compute:

    • cum_return and drawdown
    • A simple volatility regime indicator (e.g., rolling std quantiles)
# Add to the "Tidy & features" section in eda.qmd
df["cum_return"] = df["log_return"].cumsum().fillna(0.0)
peak = df["cum_return"].cummax()
df["drawdown"] = df["cum_return"] - peak

# Regime via rolling volatility terciles
vol = df["log_return"].rolling(ROLL, min_periods=ROLL//2).std()
q1, q2 = vol.quantile([0.33, 0.66])
def regime(v):
    if np.isnan(v): return "mid"
    return "low" if v < q1 else ("high" if v > q2 else "mid")
df["regime"] = [regime(v) for v in vol]
df["regime"].value_counts().to_frame("days").T
  1. Add a drawdown plot and shade high‑volatility regimes:
# Drawdown plot
fig, ax = plt.subplots(figsize=(8,3))
ax.plot(df.index, df["drawdown"])
ax.set_title(f"{SYMBOL} — Drawdown (log-return cumulative)")
ax.set_xlabel("Date"); ax.set_ylabel("drawdown")
fig.tight_layout()
figpath = Path("reports/figs")/f"{SYMBOL}_drawdown.png"
fig.savefig(figpath, dpi=144)
figpath
# Price with regime shading (simple)
fig, ax = plt.subplots(figsize=(8,3))
ax.plot(df.index, df["close"])
ax.set_title(f"{SYMBOL} — Price with High-Volatility Shading")
ax.set_xlabel("Date"); ax.set_ylabel("Price")

# Shade where regime == 'high'
mask = (df["regime"] == "high")
# merge contiguous regions
in_region = False
start = None
for i, (ts, is_high) in enumerate(zip(df.index, mask)):
    if is_high and not in_region:
        in_region = True
        start = ts
    if in_region and (not is_high or i == len(df)-1):
        end = df.index[i-1] if not is_high else ts
        ax.axvspan(start, end, alpha=0.15)  # shaded band
        in_region = False
fig.tight_layout()
figpath = Path("reports/figs")/f"{SYMBOL}_price_regimes.png"
fig.savefig(figpath, dpi=144)
figpath

3.11.3 Part C — Makefile convenience targets

Append these to your project Makefile:

report:
\tquarto render reports/eda.qmd --output-dir docs/

reports-trio:
\tquarto render reports/eda-AAPL.qmd -P symbol:AAPL -P start_date:2018-01-01 -P end_date:2025-08-01 --output-dir docs/
\tquarto render reports/eda-MSFT.qmd -P symbol:MSFT -P start_date:2018-01-01 -P end_date:2025-08-01 --output-dir docs/
\tquarto render reports/eda-NVDA.qmd -P symbol:NVDA -P start_date:2018-01-01 -P end_date:2025-08-01 --output-dir docs/

On Colab, running make requires make to be available (it is). Otherwise, keep using quarto render commands.

3.11.4 Grading (pass/revise)

  • reports/eda.qmd renders with parameters and caching enabled.
  • At least three ticker pages rendered and linked in navbar.
  • Drawdown and simple regime shading working on the EDA page(s).
  • Site published via GitHub Pages (docs/ present on main and live).

3.12 Key poitns

  • Parameters make reports reusable; don’t copy‑paste notebooks for each ticker.
  • Cache for speed; docs/ for Pages.
  • Keep figures saved under reports/figs/ and referenced in the report.
  • Keep secrets out of the repo; EDA uses public data only.

Next time (Session 4): a quick RStudio Quarto cameo and more report hygiene (citations, figure captions, alt text), then into Unix automation.