hf_io#

HuggingFace Hub I/O wrappers for Vulcan.

Vulcan uses huggingface_hub.HfApi.create_commit to upload many staged shards in one commit – avoiding the one-commit-per-file pattern in stringforge.vacua_writer.VacuaWriter.designate_vacua() that would hit HF’s 100-commit-per-hour cap almost immediately on a cluster workload.

The HuggingFace SDK is an optional runtime dependency: this module loads it lazily so the rest of Vulcan (worker writes, schema validation, run-id rendering, queries against a local cache) keeps working in environments without huggingface_hub installed.

class stringforge.vulcan.hf_io.CommitResult(ok, commit_oid, paths, error=None, retry_after_s=None, is_transient=False)#

Bases: object

Outcome of a single batched commit.

ok#

True if the commit landed on HF.

commit_oid#

The commit OID returned by the Hub (None on failure or dry-run).

paths#

HF-repo-relative paths that were carried in this commit.

error#

Stringified error when ok is False.

retry_after_s#

Minimum retry interval for rate-limit responses; 0.0 when the server didn’t suggest one, None when the failure is not rate-limit-shaped.

is_transient#

True when the failure looks like a transient network error (connection reset, timeout) with no HTTP response attached. The sync tier treats this as retry-eligible under its own back-off schedule.

stringforge.vulcan.hf_io.create_batched_commit(*, repo_id, paths_and_locals, commit_message, token=None, repo_type='dataset', create_pr=False, revision=None, dry_run=False)#

Upload a batch of files to repo_id in a single commit.

The call is atomic on the HuggingFace side: either every file lands or none does. Failures (auth, network, rate-limit) are reported back as a CommitResult with ok=False rather than being raised, so the sync tier can route them to failed/ or retry without losing the batch.

On dry_run=True no network call is made and ok is reported as True with commit_oid=None.

Parameters:
  • repo_id (str) – "user/repo" on the Hub.

  • paths_and_locals (Sequence[tuple[str, Path]]) – Iterable of (path_in_repo, local_path) pairs. Each local_path must be readable.

  • commit_message (str) – Single commit message for the whole batch.

  • token (Optional[str]) – HF write token. Falls back to HF_TOKEN env var if unset.

  • repo_type (str) – "dataset" for Vulcan; the SDK default would be "model".

  • create_pr (bool) – When True, opens a PR instead of pushing to the default branch.

  • revision (Optional[str]) – Optional ref name; defaults to the repo’s default branch.

  • dry_run (bool) – Skip the network call. Used by the sync CLI in inspect mode.

Returns:
  • CommitResult – Summary of the outcome. retry_after_s is

  • populated only on rate-limit responses.

Return type:

CommitResult

stringforge.vulcan.hf_io.ensure_repo_exists(repo_id, *, token=None, repo_type='dataset', private=False)#

Idempotent huggingface_hub.HfApi.create_repo() wrapper.

Safe to invoke before every sync; the SDK no-ops when the repo already exists (using exist_ok=True where available, with a fallback to swallowing “already exists” errors on older SDKs).

Parameters:
  • repo_id (str) – "user/repo" on the Hub.

  • token (Optional[str]) – HF write token; falls back to HF_TOKEN.

  • repo_type (str) – "dataset" for Vulcan.

  • private (bool) – Mark the repo private at creation time.

Returns:

None

Return type:

None