open("references.bib","w").write(refs)
print(open("references.bib").read())
4 Session 4 — RStudio Quarto cameo + Report Hygiene
Assumptions:
- Students already have a repo (e.g.,
unified-stocks-teamX
) with the Quarto site scaffolding from Sessions 2–3.- Python‑first; the RStudio cameo demonstrates that Quarto is editor‑agnostic (no R coding required).
4.1 Session 4 — RStudio cameo + Report Hygiene (75 min)
4.1.1 Learning goals
By the end of class, students can:
- Render a Python‑only Quarto report from RStudio (or RStudio Cloud) as a proof that Quarto is editor‑agnostic.
- Add hygiene features to the project: citations (
references.bib
), figure/table captions + cross‑references, alt text, better site navigation, custom CSS, and freeze/caching for reproducibility. - Produce a Data Dictionary section that documents columns and dtypes, and reference it from the EDA page.
- Render & publish the cleaned site to GitHub Pages.
Python environment in Rstudio:
1. Tools->Global Options->Python->select, then select one available virtual environment (e.g. a conda environment). You might need to execute one Python code block before using the “Render” menu button. If no virtual environment is shown, use conda env list
to show all the environments and their paths, then copy the path of the environment you want to use, and add python.exe
, e.g., it should look like this: C:/Users/ywang2/.conda/envs/stat1010/python.exe
. And this to the Python Interpreter path box. make sure also under the Environment tab, switch R
to Python
. 2. Click on the “Terminal” dropdown arrow, switch to a “command prompt” Terminal. You might need go to Option->Terminal->New terminal open with-> command prompt”. Then activate the virtual environment.
4.2 Agenda (75 min)
(10 min) Why report hygiene matters (credibility, accessibility, reusability)
(15 min) RStudio cameo: Render the Python‑based Quarto report in RStudio
(30 min) In‑class lab (Colab): add citations, cross‑refs, alt text, freeze/caching, CSS, data dictionary, rebuild site
(10 min) Wrap‑up + troubleshooting + homework briefing
4.3 (10 min) Buffer (for first‑time installs or Git pushes)
4.4 Slides
4.4.1 Why hygiene?
- Credibility: citations + model/report lineage
- Accessibility: alt text, readable fonts, color‑safe figures
- Reusability: parameters, freeze/caching, stable page links
- Assessability: clear captions, labeled figures & tables, cross‑references
4.4.2 Quarto features we’ll use
- Captions & labels:
#| label: fig-price
,#| fig-cap: "Price over time"
→ reference in text with@fig-price
- Tables:
#| label: tbl-summary
,#| tbl-cap: "Summary statistics"
→ reference with@tbl-summary
- Alt text:
#| fig-alt: "One‑sentence description of the figure"
- Citations: add
bibliography: references.bib
and cite with[@key]
- Freeze: project‑level
freeze: auto
for deterministic rebuilds - Cache:
execute: cache: true
to avoid redoing expensive steps - CSS: (cascadfing style sheet) small tweaks to readability (font size, code block width)
4.4.3 RStudio cameo (no R required)
- RStudio integrates Quarto; the Render button runs
quarto render
under the hood. - Your
.qmd
can be Python‑only; RStudio is just the IDE.
4.5 RStudio cameo (15 min, live demo steps)
- Open RStudio (Desktop or Cloud).
- File → Open Project and select your repo folder (e.g.:
unified-stocks-teamX
). - Confirm Quarto: Help → About Quarto (or run
quarto --version
in the RStudio terminal). - Open
reports/eda.qmd
. Click Render (or runquarto render reports/eda.qmd
). - Show the generated HTML preview. Note: no R code, just Python chunks.
- RMarkdown is the predecessor; Quarto unifies Python & R (and more). We use Quarto.
4.6 In‑class lab (30 min, Colab‑friendly)
We’ll: ensure Quarto CLI is present, upgrade
_quarto.yml
(freeze, bibliography, CSS), addreferences.bib
, rewrite EDA with captions/labels/alt text, generate a Data Dictionary, re‑render, and push to GitHub.
4.6.1 0) Mount Drive, set repo path, and ensure Quarto CLI
from google.colab import drive
'/content/drive', force_remount=True)
drive.mount(
= "YOUR_GITHUB_USERNAME_OR_ORG" # <- change
REPO_OWNER = "unified-stocks-teamX" # <- change
REPO_NAME = "/content/drive/MyDrive/dspt25"
BASE_DIR = f"{BASE_DIR}/{REPO_NAME}"
REPO_DIR = f"https://github.com/{REPO_OWNER}/{REPO_NAME}.git"
REPO_URL
import pathlib, os
=True, exist_ok=True)
pathlib.Path(BASE_DIR).mkdir(parents
if not pathlib.Path(REPO_DIR).exists():
!git clone {REPO_URL} {REPO_DIR}
%cd {REPO_DIR}
# Ensure Quarto CLI
!quarto --version || (wget -q https://quarto.org/download/latest/quarto-linux-amd64.deb -O /tmp/quarto.deb && dpkg -i /tmp/quarto.deb || (apt-get -y -f install >/dev/null && dpkg -i /tmp/quarto.deb))
!quarto --version
4.6.3 2) Add references.bib
(sample entries; students will refine later)
= r"""@book{hyndman-fpp3,
refs title = {Forecasting: Principles and Practice},
author = {Hyndman, Rob J. and Athanasopoulos, George},
edition = {3},
year = {2021},
url = {https://otexts.com/fpp3/}
}
@misc{quarto-docs,
title = {Quarto Documentation},
author = {{Posit}},
year = {2025},
url = {https://quarto.org/}
}
@misc{yfinance,
title = {yfinance: Yahoo! Finance market data downloader},
author = {Ran Aroussi},
year = {2024},
url = {https://github.com/ranaroussi/yfinance}
}
"""
4.6.4 3) Overwrite reports/eda.qmd
with captions, labels, alt text, citations, and cross‑refs
This replaces the earlier EDA with a hygienic version. Feel free to adjust wording later.
from textwrap import dedent
= dedent(r"""\
eda ---
title: "Stock EDA"
format:
html:
toc: true
number-sections: false
execute:
echo: true
warning: false
cache: true
params:
symbol: "AAPL"
start_date: "2018-01-01"
end_date: ""
rolling: 20
---
> *Educational use only — not trading advice.* Data pulled via **yfinance** [@yfinance].
This page is **parameterized**; see the **Parameters** section for usage.
## Setup parameters if using Python
::: {#033ce8b4 .cell tags='["parameters"]' execution_count=2}
``` {.python .cell-code}
# Default values (overridden by -P at render time)
SYMBOL = "AAPL"
START = "2018-01-01"
END = ""
ROLL = 20
```
:::
## Setup
::: {#928f962f .cell execution_count=3}
``` {.python .cell-code}
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
from pathlib import Path
# SYMBOL = params.get("symbol", "AAPL")
# START = params.get("start_date", "2018-01-01")
# END = params.get("end_date", "")
# ROLL = int(params.get("rolling", 20))
if not END:
END = pd.Timestamp.today().strftime("%Y-%m-%d")
```
:::
## Download and tidy
::: {#f9e5fad4 .cell execution_count=4}
``` {.python .cell-code}
try:
data = yf.download(SYMBOL, start=START, end=END, auto_adjust=True, progress=False)
except Exception as e:
# Synthetic fallback
idx = pd.bdate_range(START, END)
rng = np.random.default_rng(42)
ret = rng.normal(0, 0.01, len(idx))
price = 100 * np.exp(np.cumsum(ret))
vol = rng.integers(1e5, 5e6, len(idx))
data = pd.DataFrame({"Close": price, "Volume": vol}, index=idx)
df = (data.rename(columns=str.lower)[["close","volume"]]
.dropna()
.assign(log_return=lambda d: np.log(d["close"]).diff()))
df["roll_mean"] = df["log_return"].rolling(ROLL, min_periods=ROLL//2).mean()
df["roll_vol"] = df["log_return"].rolling(ROLL, min_periods=ROLL//2).std()
df = df.dropna()
```
:::
## Price over time
As shown in **Figure @fig-price**, prices vary over time with changing volatility.
## Return distribution
**Figure @fig-hist** shows the return distribution; many assets exhibit heavy tails \[@hyndman-fpp3, pp. 20–21].
## Rolling statistics (window = {params.rolling})
## Summary table
See **Table @tbl-summary** for overall statistics.
## Data dictionary
## Parameters
This page accepts parameters: `symbol`, `start_date`, `end_date`, and `rolling`. You can re‑render with:
```
quarto render reports/eda.qmd \\
-P symbol:MSFT -P start_date:2019-01-01 -P end_date:2025-08-01 -P rolling:30
```
## References
""")
open("reports/eda.qmd","w").write(eda)
print("Wrote reports/eda.qmd with hygiene features.")
4.6.5 4) Add a minimal CSS for readability
from pathlib import Path
"docs").mkdir(exist_ok=True)
Path(= """\
css /* Increase base font and widen code blocks slightly */
body { font-size: 1.02rem; }
pre code { white-space: pre-wrap; }
img { max-width: 100%; height: auto; }
"""
open("docs/style.css","w").write(css)
print("Wrote docs/style.css")
4.6.6 5) Render site to docs/
and preview
!quarto render --output-dir docs/
Open docs/reports/eda.html
in the Colab file browser to preview. Confirm:
- Captions under figures, tables titled at top
- Cross‑refs like “Figure 1”/“Table 1” clickable
- “References” section at bottom with your 2–3 entries
4.6.7 6) Commit and push (short‑lived token method)
!git add _quarto.yml references.bib reports/eda.qmd docs/style.css docs/
!git commit -m "chore: report hygiene (captions, cross-refs, alt text, freeze, bibliography, CSS)"
from getpass import getpass
= getpass("GitHub token (not stored): ")
token = f"https://{token}@github.com/{REPO_OWNER}/{REPO_NAME}.git"
push_url !git push {push_url} HEAD:main
del token
4.7 Wrap‑up (10 min)
- Your report now has citations, captions, cross‑refs, alt text, and frozen outputs for stable rebuilds.
- RStudio can render the exact same Python‑based
.qmd
. Teams can mix editors without friction. - Next: Unix automation and Makefile targets to run reports end‑to‑end.
4.8 Homework (due before Session 5)
Goal: Extend hygiene and add one analytic section—ACF plot—with proper captions/labels/alt text/citations.
4.8.1 Part A — Add an ACF figure with cross‑ref + alt text
Append this code chunk to reports/eda.qmd
after the “Rolling statistics” section:
import numpy as np
import matplotlib.pyplot as plt
# simple ACF (biased) up to max_lag
= df["log_return"].fillna(0.0).values
x = x - x.mean()
x = 20
max_lag = []
acf for k in range(1, max_lag+1):
= np.sum(x[:-k] * x[k:])
num = np.sum(x * x)
den /den if den != 0 else 0.0)
acf.append(num
= plt.subplots(figsize=(6,3))
fig, ax range(1, max_lag+1), acf)
ax.bar(0, linewidth=1)
ax.axhline("Lag"); ax.set_ylabel("ACF")
ax.set_xlabel( fig.tight_layout()
Then reference it in the prose:
Short‑memory patterns are visible in Figure Figure 4.1 (see also (Hyndman and Athanasopoulos 2021), Chapter 2).
4.8.2 Part B — Add a monthly returns table with caption + label
Add a new section “Monthly returns” with a cross‑ref’d table:
= (df["log_return"]
monthly "year"), df.index.month.rename("month")])
.groupby([df.index.year.rename(
.mean()
.reset_index()="year", columns="month", values="log_return")
.pivot(indexround(4))
. monthly
In text: “See Table Table 4.1 for month‑by‑month averages.”
4.8.3 Part C — Add two real citations and tidy your references
- Replace the placeholder references with at least two credible sources (textbook, API docs, or peer‑reviewed).
- Cite them in relevant sections of
eda.qmd
.
- Ensure References renders at the bottom.
(Tip: you can add more @misc{key, title=..., url=...}
entries for web docs.)
4.8.4 Part D — Verify freeze and caching behavior
In
_quarto.yml
, ensure:execute: cache: true freeze: auto
Re‑render once (
quarto render --output-dir docs/
), note speed.Change a small line in
eda.qmd
and re‑render; confirm only affected chunks rebuild.
4.8.5 Part E — Commit & push
!git add reports/eda.qmd references.bib docs/
!git commit -m "feat: ACF figure and monthly returns table; references updated"
from getpass import getpass
= getpass("GitHub token (not stored): ")
token = f"https://{token}@github.com/{REPO_OWNER}/{REPO_NAME}.git"
push_url !git push {push_url} HEAD:main
del token
4.8.6 Grading (pass/revise)
- EDA page includes ACF figure with caption, label, and alt text; cross‑referenced in text.
- Monthly returns table present with caption/label; referenced in text.
- At least two new, relevant citations included and rendered under References.
freeze
andcache
enabled; site renders todocs/
and loads on GitHub Pages.
4.9 Key points
- Accessibility is part of professionalism: always write alt text, don’t rely on color alone, and keep captions informative.
- Citations are not optional for serious work; treat the report like a short paper.
- Freeze + cache save time and prevent accidental drift.
- RStudio is a comfortable alternative editor for Quarto even in a Python‑only workflow.
Next up (Session 5): Unix for data work—shell power tools and Make automation to glue everything together.