4  Session 4 — RStudio Quarto cameo + Report Hygiene

Assumptions:

  • Students already have a repo (e.g., unified-stocks-teamX) with the Quarto site scaffolding from Sessions 2–3.
  • Python‑first; the RStudio cameo demonstrates that Quarto is editor‑agnostic (no R coding required).

4.1 Session 4 — RStudio cameo + Report Hygiene (75 min)

4.1.1 Learning goals

By the end of class, students can:

  1. Render a Python‑only Quarto report from RStudio (or RStudio Cloud) as a proof that Quarto is editor‑agnostic.
  2. Add hygiene features to the project: citations (references.bib), figure/table captions + cross‑references, alt text, better site navigation, custom CSS, and freeze/caching for reproducibility.
  3. Produce a Data Dictionary section that documents columns and dtypes, and reference it from the EDA page.
  4. Render & publish the cleaned site to GitHub Pages.

Python environment in Rstudio:
1. Tools->Global Options->Python->select, then select one available virtual environment (e.g. a conda environment). You might need to execute one Python code block before using the “Render” menu button. If no virtual environment is shown, use conda env list to show all the environments and their paths, then copy the path of the environment you want to use, and add python.exe, e.g., it should look like this: C:/Users/ywang2/.conda/envs/stat1010/python.exe. And this to the Python Interpreter path box. make sure also under the Environment tab, switch R to Python. 2. Click on the “Terminal” dropdown arrow, switch to a “command prompt” Terminal. You might need go to Option->Terminal->New terminal open with-> command prompt”. Then activate the virtual environment.

4.2 Agenda (75 min)

  • (10 min) Why report hygiene matters (credibility, accessibility, reusability)

  • (15 min) RStudio cameo: Render the Python‑based Quarto report in RStudio

  • (30 min) In‑class lab (Colab): add citations, cross‑refs, alt text, freeze/caching, CSS, data dictionary, rebuild site

  • (10 min) Wrap‑up + troubleshooting + homework briefing

  • 4.3 (10 min) Buffer (for first‑time installs or Git pushes)

4.4 Slides

4.4.1 Why hygiene?

  • Credibility: citations + model/report lineage
  • Accessibility: alt text, readable fonts, color‑safe figures
  • Reusability: parameters, freeze/caching, stable page links
  • Assessability: clear captions, labeled figures & tables, cross‑references

4.4.2 Quarto features we’ll use

  • Captions & labels: #| label: fig-price, #| fig-cap: "Price over time" → reference in text with @fig-price
  • Tables: #| label: tbl-summary, #| tbl-cap: "Summary statistics" → reference with @tbl-summary
  • Alt text: #| fig-alt: "One‑sentence description of the figure"
  • Citations: add bibliography: references.bib and cite with [@key]
  • Freeze: project‑level freeze: auto for deterministic rebuilds
  • Cache: execute: cache: true to avoid redoing expensive steps
  • CSS: (cascadfing style sheet) small tweaks to readability (font size, code block width)

4.4.3 RStudio cameo (no R required)

  • RStudio integrates Quarto; the Render button runs quarto render under the hood.
  • Your .qmd can be Python‑only; RStudio is just the IDE.

4.5 RStudio cameo (15 min, live demo steps)

  1. Open RStudio (Desktop or Cloud).
  2. File → Open Project and select your repo folder (e.g.: unified-stocks-teamX).
  3. Confirm Quarto: Help → About Quarto (or run quarto --version in the RStudio terminal).
  4. Open reports/eda.qmd. Click Render (or run quarto render reports/eda.qmd).
  5. Show the generated HTML preview. Note: no R code, just Python chunks.
  6. RMarkdown is the predecessor; Quarto unifies Python & R (and more). We use Quarto.

4.6 In‑class lab (30 min, Colab‑friendly)

We’ll: ensure Quarto CLI is present, upgrade _quarto.yml (freeze, bibliography, CSS), add references.bib, rewrite EDA with captions/labels/alt text, generate a Data Dictionary, re‑render, and push to GitHub.

4.6.1 0) Mount Drive, set repo path, and ensure Quarto CLI

from google.colab import drive
drive.mount('/content/drive', force_remount=True)

REPO_OWNER = "YOUR_GITHUB_USERNAME_OR_ORG"  # <- change
REPO_NAME  = "unified-stocks-teamX"         # <- change
BASE_DIR   = "/content/drive/MyDrive/dspt25"
REPO_DIR   = f"{BASE_DIR}/{REPO_NAME}"
REPO_URL   = f"https://github.com/{REPO_OWNER}/{REPO_NAME}.git"

import pathlib, os
pathlib.Path(BASE_DIR).mkdir(parents=True, exist_ok=True)

if not pathlib.Path(REPO_DIR).exists():
    !git clone {REPO_URL} {REPO_DIR}
%cd {REPO_DIR}

# Ensure Quarto CLI
!quarto --version || (wget -q https://quarto.org/download/latest/quarto-linux-amd64.deb -O /tmp/quarto.deb && dpkg -i /tmp/quarto.deb || (apt-get -y -f install >/dev/null && dpkg -i /tmp/quarto.deb))
!quarto --version

4.6.2 1) Upgrade _quarto.yml: freeze, bibliography, CSS, nav polish

# Install ruamel.yaml for safe YAML edits
!pip -q install ruamel.yaml

from ruamel.yaml import YAML
from pathlib import Path

yaml = YAML()
cfg_path = Path("_quarto.yml")
if cfg_path.exists():
    cfg = yaml.load(cfg_path.read_text())
else:
    cfg = {"project": {"type": "website", "output-dir": "docs"},
           "website": {"title": "Unified Stocks", "navbar": {"left": [{"href":"index.qmd","text":"Home"}]}},
           "format":{"html":{"theme":"cosmo","toc":True}}}

# Add/ensure features
cfg.setdefault("format", {}).setdefault("html", {})
cfg["format"]["html"]["toc"] = True
cfg["format"]["html"]["code-fold"] = False
cfg["format"]["html"]["toc-depth"] = 2
cfg["format"]["html"]["page-navigation"] = True
cfg["format"]["html"]["code-tools"] = True
cfg["format"]["html"]["fig-cap-location"] = "bottom"
cfg["format"]["html"]["tbl-cap-location"] = "top"
cfg["format"]["html"]["css"] = "docs/style.css"

cfg.setdefault("execute", {})
cfg["execute"]["echo"] = True
cfg["execute"]["warning"] = False
cfg["execute"]["cache"] = True

# Freeze: deterministic rebuilds until the source changes
# cfg["project"]["freeze"] = "auto"
cfg["execyte"]["freeze"] = "auto"

# Bibliography
cfg["bibliography"] = "references.bib"

# Ensure navbar has EDA link
nav = cfg.setdefault("website", {}).setdefault("navbar", {}).setdefault("left", [])
if not any(item.get("href") == "reports/eda.qmd" for item in nav if isinstance(item, dict)):
    nav.append({"href": "reports/eda.qmd", "text": "EDA"})

yaml.dump(cfg, open("_quarto.yml","w"))
print(open("_quarto.yml").read())

4.6.3 2) Add references.bib (sample entries; students will refine later)

refs = r"""@book{hyndman-fpp3,
  title = {Forecasting: Principles and Practice},
  author = {Hyndman, Rob J. and Athanasopoulos, George},
  edition = {3},
  year = {2021},
  url = {https://otexts.com/fpp3/}
}
@misc{quarto-docs,
  title = {Quarto Documentation},
  author = {{Posit}},
  year = {2025},
  url = {https://quarto.org/}
}
@misc{yfinance,
  title = {yfinance: Yahoo! Finance market data downloader},
  author = {Ran Aroussi},
  year = {2024},
  url = {https://github.com/ranaroussi/yfinance}
}
"""
open("references.bib","w").write(refs)
print(open("references.bib").read())

4.6.4 3) Overwrite reports/eda.qmd with captions, labels, alt text, citations, and cross‑refs

This replaces the earlier EDA with a hygienic version. Feel free to adjust wording later.

from textwrap import dedent
eda = dedent(r"""\
---
title: "Stock EDA"
format:
  html:
    toc: true
    number-sections: false
execute:
  echo: true
  warning: false
  cache: true
params:
  symbol: "AAPL"
  start_date: "2018-01-01"
  end_date: ""
  rolling: 20
---

> *Educational use only — not trading advice.* Data pulled via **yfinance** [@yfinance].

This page is **parameterized**; see the **Parameters** section for usage.

## Setup parameters if using Python

::: {#033ce8b4 .cell tags='["parameters"]' execution_count=2}
``` {.python .cell-code}
# Default values (overridden by -P at render time)
SYMBOL = "AAPL"
START  = "2018-01-01"
END    = ""
ROLL   =  20
```
:::


## Setup

::: {#928f962f .cell execution_count=3}
``` {.python .cell-code}
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
from pathlib import Path

# SYMBOL = params.get("symbol", "AAPL")
# START  = params.get("start_date", "2018-01-01")
# END    = params.get("end_date", "")
# ROLL   = int(params.get("rolling", 20))
if not END:
  END = pd.Timestamp.today().strftime("%Y-%m-%d")
```
:::


## Download and tidy

::: {#f9e5fad4 .cell execution_count=4}
``` {.python .cell-code}
try:
  data = yf.download(SYMBOL, start=START, end=END, auto_adjust=True, progress=False)
except Exception as e:
  # Synthetic fallback
  idx = pd.bdate_range(START, END)
  rng = np.random.default_rng(42)
  ret = rng.normal(0, 0.01, len(idx))
  price = 100 * np.exp(np.cumsum(ret))
  vol = rng.integers(1e5, 5e6, len(idx))
  data = pd.DataFrame({"Close": price, "Volume": vol}, index=idx)

df = (data.rename(columns=str.lower)[["close","volume"]]
        .dropna()
        .assign(log_return=lambda d: np.log(d["close"]).diff()))
df["roll_mean"] = df["log_return"].rolling(ROLL, min_periods=ROLL//2).mean()
df["roll_vol"]  = df["log_return"].rolling(ROLL, min_periods=ROLL//2).std()
df = df.dropna()
```
:::


## Price over time



As shown in **Figure @fig-price**, prices vary over time with changing volatility.

## Return distribution



**Figure @fig-hist** shows the return distribution; many assets exhibit heavy tails \[@hyndman-fpp3, pp. 20–21].

## Rolling statistics (window = {params.rolling})



## Summary table



See **Table @tbl-summary** for overall statistics.

## Data dictionary



## Parameters

This page accepts parameters: `symbol`, `start_date`, `end_date`, and `rolling`. You can re‑render with:

```
quarto render reports/eda.qmd \\
  -P symbol:MSFT -P start_date:2019-01-01 -P end_date:2025-08-01 -P rolling:30
```

## References

""")
open("reports/eda.qmd","w").write(eda)
print("Wrote reports/eda.qmd with hygiene features.")

4.6.5 4) Add a minimal CSS for readability

from pathlib import Path
Path("docs").mkdir(exist_ok=True)
css = """\
/* Increase base font and widen code blocks slightly */
body { font-size: 1.02rem; }
pre code { white-space: pre-wrap; }
img { max-width: 100%; height: auto; }
"""
open("docs/style.css","w").write(css)
print("Wrote docs/style.css")

4.6.6 5) Render site to docs/ and preview

!quarto render --output-dir docs/

Open docs/reports/eda.html in the Colab file browser to preview. Confirm:

  • Captions under figures, tables titled at top
  • Cross‑refs like “Figure 1”/“Table 1” clickable
  • “References” section at bottom with your 2–3 entries

4.6.7 6) Commit and push (short‑lived token method)

!git add _quarto.yml references.bib reports/eda.qmd docs/style.css docs/
!git commit -m "chore: report hygiene (captions, cross-refs, alt text, freeze, bibliography, CSS)"
from getpass import getpass
token = getpass("GitHub token (not stored): ")
push_url = f"https://{token}@github.com/{REPO_OWNER}/{REPO_NAME}.git"
!git push {push_url} HEAD:main
del token

4.7 Wrap‑up (10 min)

  • Your report now has citations, captions, cross‑refs, alt text, and frozen outputs for stable rebuilds.
  • RStudio can render the exact same Python‑based .qmd. Teams can mix editors without friction.
  • Next: Unix automation and Makefile targets to run reports end‑to‑end.

4.8 Homework (due before Session 5)

Goal: Extend hygiene and add one analytic section—ACF plot—with proper captions/labels/alt text/citations.

4.8.1 Part A — Add an ACF figure with cross‑ref + alt text

Append this code chunk to reports/eda.qmd after the “Rolling statistics” section:

import numpy as np
import matplotlib.pyplot as plt

# simple ACF (biased) up to max_lag
x = df["log_return"].fillna(0.0).values
x = x - x.mean()
max_lag = 20
acf = []
for k in range(1, max_lag+1):
    num = np.sum(x[:-k] * x[k:])
    den = np.sum(x * x)
    acf.append(num/den if den != 0 else 0.0)

fig, ax = plt.subplots(figsize=(6,3))
ax.bar(range(1, max_lag+1), acf)
ax.axhline(0, linewidth=1)
ax.set_xlabel("Lag"); ax.set_ylabel("ACF")
fig.tight_layout()
Figure 4.1

Then reference it in the prose:

Short‑memory patterns are visible in Figure Figure 4.1 (see also (Hyndman and Athanasopoulos 2021), Chapter 2).

4.8.2 Part B — Add a monthly returns table with caption + label

Add a new section “Monthly returns” with a cross‑ref’d table:

Table 4.1: {SYMBOL} — Monthly mean of daily log returns
monthly = (df["log_return"]
           .groupby([df.index.year.rename("year"), df.index.month.rename("month")])
           .mean()
           .reset_index()
           .pivot(index="year", columns="month", values="log_return")
           .round(4))
monthly

In text: “See Table Table 4.1 for month‑by‑month averages.”

4.8.3 Part C — Add two real citations and tidy your references

  1. Replace the placeholder references with at least two credible sources (textbook, API docs, or peer‑reviewed).
  2. Cite them in relevant sections of eda.qmd.
  3. Ensure References renders at the bottom.

(Tip: you can add more @misc{key, title=..., url=...} entries for web docs.)

4.8.4 Part D — Verify freeze and caching behavior

  • In _quarto.yml, ensure:

    
    execute:
      cache: true
      freeze: auto
  • Re‑render once (quarto render --output-dir docs/), note speed.

  • Change a small line in eda.qmd and re‑render; confirm only affected chunks rebuild.

4.8.5 Part E — Commit & push

!git add reports/eda.qmd references.bib docs/
!git commit -m "feat: ACF figure and monthly returns table; references updated"
from getpass import getpass
token = getpass("GitHub token (not stored): ")
push_url = f"https://{token}@github.com/{REPO_OWNER}/{REPO_NAME}.git"
!git push {push_url} HEAD:main
del token

4.8.6 Grading (pass/revise)

  • EDA page includes ACF figure with caption, label, and alt text; cross‑referenced in text.
  • Monthly returns table present with caption/label; referenced in text.
  • At least two new, relevant citations included and rendered under References.
  • freeze and cache enabled; site renders to docs/ and loads on GitHub Pages.

4.9 Key points

  • Accessibility is part of professionalism: always write alt text, don’t rely on color alone, and keep captions informative.
  • Citations are not optional for serious work; treat the report like a short paper.
  • Freeze + cache save time and prevent accidental drift.
  • RStudio is a comfortable alternative editor for Quarto even in a Python‑only workflow.

Next up (Session 5): Unix for data work—shell power tools and Make automation to glue everything together.