Retrieve Actions logs on private repos (token-accepting path or cookie/CSRF auth) #103

New issue

Open

opened 2026-06-10 23:49:52 +00:00 by stephen · 1 comment

stephen commented

2026-06-10 23:49:52 +00:00

Owner

What

Make fj run view <n> --log/--log-failed actually retrieve step logs on private repos. Today the only working log path is the Forgejo web frontend route under /{owner}/{repo}/actions/runs/{run}/jobs/{job}/attempt/{attempt} (see the module header in src/api/workflow_view.rs), which authenticates with a session cookie + CSRF token, not an API token. On a private repo that route 404s under token auth, so logs are unreadable through fj.

Find a path that works under the credentials fj actually holds (an API token in the OS keychain). Options, roughly in order of preference:

A token-accepting /api/v1 Actions log endpoint, if the target Forgejo version exposes one. rasterstate/fj#91 reports that on the observed Forgejo (11.0.x / gitea-1.22 era) /api/v1/.../actions/runs/{n}/jobs, /api/v1/.../actions/jobs/{id}/logs, and /api/v1/.../actions/runs/{n}/logs all 404, so this may simply not exist there. Re-verify per version before building on it.
Cookie/CSRF auth against the web log route: obtain a session and CSRF token (e.g. via a stored browser cookie or an interactive login) and replay them on the existing web-route requests. This is a second auth path and must not bypass Client::request / is_trusted_url casually; design it deliberately.
If neither is available on a given Forgejo, document the limitation and point users at the fallback (/actions/tasks for pass/fail, web UI for logs).

Why

fj run view --log is the primary way to read CI output without leaving the terminal, and most of our repos are private (rasterstate/fjord-ios, rasterstate/flux, ...). rasterstate/fj#91 and its duplicate rasterstate/fj#92 both hit this during real CI triage and had to fall back to the /actions/tasks API for pass/fail and to server-side sqlite (action_task_step.log_length) for ground truth. The error-message fix in rasterstate/fj#91 stops the misdiagnosis, but it does not restore the feature: private-repo logs still cannot be read through fj.

Acceptance

fj run view <n> --log and --log-failed print real step logs for a private repo run that has logs, against a Forgejo version where a supported path exists.
The chosen auth path funnels through Client::request and respects is_trusted_url (no second, unguarded HTTP/credential path).
If no token-accepting or cookie path is viable on a given Forgejo version, the limitation is documented (docs/troubleshooting) and the actionable error from rasterstate/fj#91 names the fallback.
Coverage for the working path (wiremock for a token-accepting endpoint, or an equivalent test for the cookie path).
cargo fmt --check, cargo clippy --all-targets --all-features -- -D warnings, cargo test --all pass.

Dependencies

Builds on rasterstate/fj#91 (the error-message + private-repo-detection slice). That ships first and is independent; this item restores the capability.
Uncertain / parked: blocked on confirming whether the target Forgejo version exposes ANY token-accepting Actions-log endpoint. rasterstate/fj#91 suggests it may not, in which case this becomes cookie/CSRF auth or a documented limitation rather than a typed API wrapper. Do not start until that contract is pinned down for a concrete Forgejo version.

Out of scope

The error-message / auth-vs-missing distinction and its private-repo test (rasterstate/fj#91).
Public-repo log retrieval, which already works through the web route.

Size

L (uncertain; may reduce to "document the limitation" if no auth path exists)

## What Make `fj run view <n> --log/--log-failed` actually retrieve step logs on **private** repos. Today the only working log path is the Forgejo **web** frontend route under `/{owner}/{repo}/actions/runs/{run}/jobs/{job}/attempt/{attempt}` (see the module header in `src/api/workflow_view.rs`), which authenticates with a **session cookie + CSRF token**, not an API token. On a private repo that route 404s under token auth, so logs are unreadable through `fj`. Find a path that works under the credentials `fj` actually holds (an API token in the OS keychain). Options, roughly in order of preference: 1. A token-accepting `/api/v1` Actions log endpoint, if the target Forgejo version exposes one. rasterstate/fj#91 reports that on the observed Forgejo (11.0.x / gitea-1.22 era) `/api/v1/.../actions/runs/{n}/jobs`, `/api/v1/.../actions/jobs/{id}/logs`, and `/api/v1/.../actions/runs/{n}/logs` all 404, so this may simply not exist there. Re-verify per version before building on it. 2. Cookie/CSRF auth against the web log route: obtain a session and CSRF token (e.g. via a stored browser cookie or an interactive login) and replay them on the existing web-route requests. This is a second auth path and must not bypass `Client::request` / `is_trusted_url` casually; design it deliberately. 3. If neither is available on a given Forgejo, document the limitation and point users at the fallback (`/actions/tasks` for pass/fail, web UI for logs). ## Why `fj run view --log` is the primary way to read CI output without leaving the terminal, and most of our repos are private (rasterstate/fjord-ios, rasterstate/flux, ...). rasterstate/fj#91 and its duplicate rasterstate/fj#92 both hit this during real CI triage and had to fall back to the `/actions/tasks` API for pass/fail and to server-side sqlite (`action_task_step.log_length`) for ground truth. The error-message fix in rasterstate/fj#91 stops the misdiagnosis, but it does not restore the feature: private-repo logs still cannot be read through `fj`. ## Acceptance - [ ] `fj run view <n> --log` and `--log-failed` print real step logs for a **private** repo run that has logs, against a Forgejo version where a supported path exists. - [ ] The chosen auth path funnels through `Client::request` and respects `is_trusted_url` (no second, unguarded HTTP/credential path). - [ ] If no token-accepting or cookie path is viable on a given Forgejo version, the limitation is documented (docs/troubleshooting) and the actionable error from rasterstate/fj#91 names the fallback. - [ ] Coverage for the working path (wiremock for a token-accepting endpoint, or an equivalent test for the cookie path). - [ ] `cargo fmt --check`, `cargo clippy --all-targets --all-features -- -D warnings`, `cargo test --all` pass. ## Dependencies - Builds on rasterstate/fj#91 (the error-message + private-repo-detection slice). That ships first and is independent; this item restores the capability. - **Uncertain / parked:** blocked on confirming whether the target Forgejo version exposes ANY token-accepting Actions-log endpoint. rasterstate/fj#91 suggests it may not, in which case this becomes cookie/CSRF auth or a documented limitation rather than a typed API wrapper. Do not start until that contract is pinned down for a concrete Forgejo version. ## Out of scope - The error-message / auth-vs-missing distinction and its private-repo test (rasterstate/fj#91). - Public-repo log retrieval, which already works through the web route. ## Size L (uncertain; may reduce to "document the limitation" if no auth path exists)

stephen added the

backlog

parked

labels

2026-06-10 23:50:02 +00:00

stephen referenced this issue

2026-06-10 23:50:28 +00:00

fj run view: distinguish private-repo log auth failure from a missing run, instead of masking it as "no logs" #91

stephen commented

2026-06-11 02:30:23 +00:00

Author

Owner

Product-validation + feasibility: private-repo Actions log retrieval

Posting this as the operator-facing decision record for this parked item. Two parts: (1) is the pain worth the work, and (2) if so, by which path. TL;DR up front: the acute pain is already cured by rasterstate/fj#91, the remaining pain is a recurring papercut, and the only client-side path (cookie/CSRF) costs far more than the papercut. Pursue a server-side/gateway endpoint and an upstream request instead; keep this parked behind the shipped documented-limitation.

1. Product validation

Who needs it, and how often. This is a real, recurring need, not a hypothetical. fj run watch / fj run view --log is the headline reason to use fj over the web UI at all: read CI output without leaving the terminal. Most of our repos are private (rasterstate/fjord-ios, rasterstate/flux, and the rest), so "private-repo logs" is not an edge case, it is the common case for anyone triaging CI here. Frequency tracks how often private-repo CI goes red and someone wants the log in-terminal, which for active repos is daily-ish. So on raw frequency this scores high.

The cost of today's fallback. After rasterstate/fj#91 shipped, the fallback is three layers, and they are better than they sound:

The rasterstate/fj#91 actionable error. This is the important one. The original failure mode was not "I have to open a browser," it was silent misdiagnosis: every run, green or red, returned no logs for that run/job/attempt, which reads as "the job died before producing logs." rasterstate/fj#92 documents that exact error sending a triage on a multi-step detour through runner provisioning and autoscaler limits for what was an ordinary step failure whose logs existed the whole time. rasterstate/fj#91 replaced that with a message that names the real cause (web log route rejects token auth; private-repo logs need session auth) and points here. That converts a wrong answer into a correct "go look in the web UI" signpost. The expensive part of the pain (chasing a nonexistent cause) is already gone.
The web UI. One click to the run page gives full, rendered logs. It works today on every private repo. The cost is leaving the terminal, a papercut, not a blocker.
The token-accessible /actions/tasks list. Still readable under the PAT, so fj run list and pass/fail status work in-terminal on private repos. You lose the log body, not run state.

Does the pain justify the work? Honest read: the remaining pain is a recurring papercut, not a fire, and it does not justify an expensive fix. Two of our own issues (rasterstate/fj#91, rasterstate/fj#92) were filed from real CI triage that this blocked, which is genuine evidence the feature matters, but both predate the rasterstate/fj#91 error fix. The thing that actually hurt (misdiagnosis) is cured. What's left is "on a private repo, --log can't print the body, click through to the web UI instead," correctly signposted. That justifies a cheap fix (an upstream ask, or a server-side endpoint we get nearly for free) and the already-shipped documentation. It does not justify forking the auth model to chase it. The work is only worth doing if the path is cheap.

2. Feasibility

Technical baseline (re-confirmed against the code). src/api/workflow_view.rs is explicit that /api/v1 exposes no Actions log surface: single-run, per-run /jobs, /jobs/{id}/logs, and /runs/{id}/logs all 404. Both the run summary and the logs live only behind the human web routes under /{owner}/{repo}/actions/.... On a public repo those routes answer a token-bearing request; on a private repo they 404 because they want a session cookie + CSRF token, not a PAT. So the only client-side path to private logs is to authenticate as the browser does.

What browser-session auth concretely requires. This is a second, heavier auth system, not a tweak:

Login flow. POST the Forgejo login form (user + password) to mint a session cookie. fj auth login today prompts for a PAT and stores one opaque string per host (src/auth/mod.rs). Cookie auth means prompting for a password, a regression in credential hygiene (PATs are scoped and revocable; a password is not).
CSRF fetch. Scrape the _csrf token from a form/meta tag or cookie and replay it on every state-changing web request. Brittle, tied to the HTML, unversioned.
Cookie jar. reqwest is currently used statelessly through one Client::request; we'd add cookie-store handling and a parallel notion of "logged-in session" alongside the token.
Secure session storage. The session cookie is a bearer credential and would have to live in the keychain next to the PAT, doubling the secret surface (src/auth/mod.rs deliberately stores only tokens, never to disk).
2FA. Accounts with TOTP/WebAuthn can't be driven by a form POST without an interactive second-factor dance. There is no clean headless story.
Session expiry / refresh. Sessions expire server-side on Forgejo's schedule. We'd need to detect expiry, re-prompt, and silently re-login, ongoing complexity the PAT model never has.

Security + maintenance burden. High and permanent. We'd be storing a second class of credential, prompting for passwords, parsing CSRF out of HTML, and depending on unversioned web routes plus a login form that upstream can restyle or re-auth at any release, with no API contract. The module header already flags these routes as "stable in practice but can shift between Forgejo releases." Every one of those shifts becomes a fj-breaks-on-upgrade incident. For a papercut-sized payoff, this is a bad trade.

Alternatives, ranked.

Upstream Forgejo token-log endpoint request (do this first; cheapest). File an issue/PR upstream asking for a token-accepting /api/v1/.../actions/jobs/{id}/logs (or equivalent). If it lands, fj deletes the whole web-route hack and private logs become an ordinary typed /api/v1 wrapper, the cleanest possible end state. Cost to us is a writeup; risk is timeline (not ours to control). Low effort, high upside, no downside, so it's worth filing regardless of what else we do.
Paragon / server-side gateway log endpoint (the real fix if we build anything). We already run the piece that makes this natural: src/fjord/mod.rs shows Fjord-Account traffic routing through fjord-platform's /api/v1/forge-gateway/:instance-id/api/v1/..., and the platform already performs a credentialed sign-in (it even handles a 2FA recovery code). Crucially, workflow_view.rs notes the gateway only proxies /api/v1, so it doesn't forward the web log routes today, which is exactly the gap. The server side can hold the session (or read action_task_step.log_length / the log blobs directly, the same ground truth rasterstate/fj#91 and rasterstate/fj#92 had to reach via sqlite) and expose a clean token-or-bearer log endpoint that fj calls like any other gateway route. This keeps all the session/CSRF/2FA/expiry burden server-side where it's already solved, instead of shipping it to every client. Effort is real but bounded, and it only helps Fjord-Account users (PAT-direct users still need option 1 or the web UI). This is where to invest if the papercut proves worth investing in.
Document as limitation (the current state; keep it). rasterstate/fj#91 already shipped: the actionable error names session-vs-token auth and links here, and /actions/tasks still gives in-terminal pass/fail. Add a troubleshooting note ("private-repo log bodies aren't retrievable via PAT; use the web UI or a Fjord-Account instance") and this is a defensible resting state indefinitely.

Recommendation

Do not build client-side cookie/CSRF auth. It forks the auth model (passwords, cookie jar, CSRF scraping, 2FA, expiry, second keychain secret) against unversioned web routes, a permanent maintenance and security tax, to fix a papercut that rasterstate/fj#91 already de-fanged.

Instead:

File the upstream Forgejo token-log-endpoint request now (option 1). Cheap, and it's the only path that fully removes the hack.
Keep this item parked behind the rasterstate/fj#91 documented-limitation, which is the correct resting state today (option 3). Land the small troubleshooting-doc note if it isn't already in.
Only promote to active work as a server-side gateway endpoint (option 2), and only when private-repo log triage is demonstrably costing more than the per-incident web-UI click, i.e. when this papercut starts drawing blood again. If/when it does, the gateway is the place to build it, not the client.

No production code written and no labels changed here; leaving the parked/backlog state for you to set.

## Product-validation + feasibility: private-repo Actions log retrieval Posting this as the operator-facing decision record for this parked item. Two parts: (1) is the pain worth the work, and (2) if so, by which path. TL;DR up front: the acute pain is already cured by rasterstate/fj#91, the remaining pain is a recurring papercut, and the only client-side path (cookie/CSRF) costs far more than the papercut. Pursue a server-side/gateway endpoint and an upstream request instead; keep this parked behind the shipped documented-limitation. --- ### 1. Product validation **Who needs it, and how often.** This is a real, recurring need, not a hypothetical. `fj run watch` / `fj run view --log` is the headline reason to use `fj` over the web UI at all: read CI output without leaving the terminal. Most of our repos are private (rasterstate/fjord-ios, rasterstate/flux, and the rest), so "private-repo logs" is not an edge case, it is the common case for anyone triaging CI here. Frequency tracks how often private-repo CI goes red and someone wants the log in-terminal, which for active repos is daily-ish. So on raw frequency this scores high. **The cost of today's fallback.** After rasterstate/fj#91 shipped, the fallback is three layers, and they are better than they sound: - **The rasterstate/fj#91 actionable error.** This is the important one. The original failure mode was not "I have to open a browser," it was *silent misdiagnosis*: every run, green or red, returned `no logs for that run/job/attempt`, which reads as "the job died before producing logs." rasterstate/fj#92 documents that exact error sending a triage on a multi-step detour through runner provisioning and autoscaler limits for what was an ordinary step failure whose logs existed the whole time. rasterstate/fj#91 replaced that with a message that names the real cause (web log route rejects token auth; private-repo logs need session auth) and points here. That converts a wrong answer into a correct "go look in the web UI" signpost. The expensive part of the pain (chasing a nonexistent cause) is already gone. - **The web UI.** One click to the run page gives full, rendered logs. It works today on every private repo. The cost is leaving the terminal, a papercut, not a blocker. - **The token-accessible `/actions/tasks` list.** Still readable under the PAT, so `fj run list` and pass/fail status work in-terminal on private repos. You lose the *log body*, not run state. **Does the pain justify the work?** Honest read: **the remaining pain is a recurring papercut, not a fire, and it does not justify an expensive fix.** Two of our own issues (rasterstate/fj#91, rasterstate/fj#92) were filed from real CI triage that this blocked, which is genuine evidence the feature matters, but both predate the rasterstate/fj#91 error fix. The thing that actually hurt (misdiagnosis) is cured. What's left is "on a private repo, `--log` can't print the body, click through to the web UI instead," correctly signposted. That justifies a *cheap* fix (an upstream ask, or a server-side endpoint we get nearly for free) and the already-shipped documentation. It does **not** justify forking the auth model to chase it. The work is only worth doing if the path is cheap. --- ### 2. Feasibility **Technical baseline (re-confirmed against the code).** `src/api/workflow_view.rs` is explicit that `/api/v1` exposes no Actions log surface: single-run, per-run `/jobs`, `/jobs/{id}/logs`, and `/runs/{id}/logs` all 404. Both the run summary and the logs live only behind the human web routes under `/{owner}/{repo}/actions/...`. On a **public** repo those routes answer a token-bearing request; on a **private** repo they 404 because they want a session cookie + CSRF token, not a PAT. So the only client-side path to private logs is to authenticate as the browser does. **What browser-session auth concretely requires.** This is a second, heavier auth system, not a tweak: - **Login flow.** POST the Forgejo login form (user + password) to mint a session cookie. `fj auth login` today prompts for a PAT and stores one opaque string per host (`src/auth/mod.rs`). Cookie auth means prompting for a *password*, a regression in credential hygiene (PATs are scoped and revocable; a password is not). - **CSRF fetch.** Scrape the `_csrf` token from a form/meta tag or cookie and replay it on every state-changing web request. Brittle, tied to the HTML, unversioned. - **Cookie jar.** `reqwest` is currently used statelessly through one `Client::request`; we'd add cookie-store handling and a parallel notion of "logged-in session" alongside the token. - **Secure session storage.** The session cookie is a bearer credential and would have to live in the keychain next to the PAT, doubling the secret surface (`src/auth/mod.rs` deliberately stores only tokens, never to disk). - **2FA.** Accounts with TOTP/WebAuthn can't be driven by a form POST without an interactive second-factor dance. There is no clean headless story. - **Session expiry / refresh.** Sessions expire server-side on Forgejo's schedule. We'd need to detect expiry, re-prompt, and silently re-login, ongoing complexity the PAT model never has. **Security + maintenance burden.** High and permanent. We'd be storing a second class of credential, prompting for passwords, parsing CSRF out of HTML, and depending on **unversioned** web routes plus a login form that upstream can restyle or re-auth at any release, with no API contract. The module header already flags these routes as "stable in practice but can shift between Forgejo releases." Every one of those shifts becomes a `fj`-breaks-on-upgrade incident. For a papercut-sized payoff, this is a bad trade. **Alternatives, ranked.** 1. **Upstream Forgejo token-log endpoint request (do this first; cheapest).** File an issue/PR upstream asking for a token-accepting `/api/v1/.../actions/jobs/{id}/logs` (or equivalent). If it lands, `fj` deletes the whole web-route hack and private logs become an ordinary typed `/api/v1` wrapper, the cleanest possible end state. Cost to us is a writeup; risk is timeline (not ours to control). Low effort, high upside, no downside, so it's worth filing regardless of what else we do. 2. **Paragon / server-side gateway log endpoint (the real fix if we build anything).** We already run the piece that makes this natural: `src/fjord/mod.rs` shows Fjord-Account traffic routing through fjord-platform's `/api/v1/forge-gateway/:instance-id/api/v1/...`, and the platform already performs a credentialed `sign-in` (it even handles a 2FA recovery code). Crucially, `workflow_view.rs` notes the gateway only proxies `/api/v1`, so it **doesn't** forward the web log routes today, which is exactly the gap. The server side can hold the session (or read `action_task_step.log_length` / the log blobs directly, the same ground truth rasterstate/fj#91 and rasterstate/fj#92 had to reach via sqlite) and expose a clean token-or-bearer log endpoint that `fj` calls like any other gateway route. This keeps all the session/CSRF/2FA/expiry burden server-side where it's already solved, instead of shipping it to every client. Effort is real but bounded, and it only helps Fjord-Account users (PAT-direct users still need option 1 or the web UI). This is where to invest *if* the papercut proves worth investing in. 3. **Document as limitation (the current state; keep it).** rasterstate/fj#91 already shipped: the actionable error names session-vs-token auth and links here, and `/actions/tasks` still gives in-terminal pass/fail. Add a troubleshooting note ("private-repo log bodies aren't retrievable via PAT; use the web UI or a Fjord-Account instance") and this is a defensible resting state indefinitely. --- ### Recommendation **Do not build client-side cookie/CSRF auth.** It forks the auth model (passwords, cookie jar, CSRF scraping, 2FA, expiry, second keychain secret) against unversioned web routes, a permanent maintenance and security tax, to fix a papercut that rasterstate/fj#91 already de-fanged. Instead: - **File the upstream Forgejo token-log-endpoint request now** (option 1). Cheap, and it's the only path that fully removes the hack. - **Keep this item parked** behind the rasterstate/fj#91 documented-limitation, which is the correct resting state today (option 3). Land the small troubleshooting-doc note if it isn't already in. - **Only promote to active work as a server-side gateway endpoint** (option 2), and only when private-repo log triage is demonstrably costing more than the per-incident web-UI click, i.e. when this papercut starts drawing blood again. If/when it does, the gateway is the place to build it, not the client. No production code written and no labels changed here; leaving the parked/backlog state for you to set.