hf_io#
HuggingFace Hub I/O wrappers for Vulcan.
Vulcan uses huggingface_hub.HfApi.create_commit to upload
many staged shards in one commit – avoiding the
one-commit-per-file pattern in
stringforge.vacua_writer.VacuaWriter.designate_vacua() that
would hit HF’s 100-commit-per-hour cap almost immediately on a cluster
workload.
The HuggingFace SDK is an optional runtime dependency: this module
loads it lazily so the rest of Vulcan (worker writes, schema
validation, run-id rendering, queries against a local cache) keeps
working in environments without huggingface_hub installed.
- class stringforge.vulcan.hf_io.CommitResult(ok, commit_oid, paths, error=None, retry_after_s=None, is_transient=False)#
Bases:
objectOutcome of a single batched commit.
- ok#
Trueif the commit landed on HF.
- commit_oid#
The commit OID returned by the Hub (
Noneon failure or dry-run).
- paths#
HF-repo-relative paths that were carried in this commit.
- error#
Stringified error when
okis False.
- retry_after_s#
Minimum retry interval for rate-limit responses;
0.0when the server didn’t suggest one,Nonewhen the failure is not rate-limit-shaped.
- is_transient#
Truewhen the failure looks like a transient network error (connection reset, timeout) with no HTTP response attached. The sync tier treats this as retry-eligible under its own back-off schedule.
- stringforge.vulcan.hf_io.create_batched_commit(*, repo_id, paths_and_locals, commit_message, token=None, repo_type='dataset', create_pr=False, revision=None, dry_run=False)#
Upload a batch of files to
repo_idin a single commit.The call is atomic on the HuggingFace side: either every file lands or none does. Failures (auth, network, rate-limit) are reported back as a
CommitResultwithok=Falserather than being raised, so the sync tier can route them tofailed/or retry without losing the batch.On
dry_run=Trueno network call is made andokis reported asTruewithcommit_oid=None.- Parameters:
repo_id (
str) –"user/repo"on the Hub.paths_and_locals (
Sequence[tuple[str,Path]]) – Iterable of(path_in_repo, local_path)pairs. Eachlocal_pathmust be readable.commit_message (
str) – Single commit message for the whole batch.token (
Optional[str]) – HF write token. Falls back toHF_TOKENenv var if unset.repo_type (
str) –"dataset"for Vulcan; the SDK default would be"model".create_pr (
bool) – WhenTrue, opens a PR instead of pushing to the default branch.revision (
Optional[str]) – Optional ref name; defaults to the repo’s default branch.dry_run (
bool) – Skip the network call. Used by the sync CLI in inspect mode.
- Returns:
CommitResult – Summary of the outcome.
retry_after_sispopulated only on rate-limit responses.
- Return type:
- stringforge.vulcan.hf_io.ensure_repo_exists(repo_id, *, token=None, repo_type='dataset', private=False)#
Idempotent
huggingface_hub.HfApi.create_repo()wrapper.Safe to invoke before every sync; the SDK no-ops when the repo already exists (using
exist_ok=Truewhere available, with a fallback to swallowing “already exists” errors on older SDKs).