VideoModelStudio / docs /huggingface /Downloading files from the hub.md
jbilcke-hf's picture
jbilcke-hf HF staff
upgrade finetrainers + gradio
ecd5028
[](#downloading-files)Downloading files
=======================================
[](#download-a-single-file)Download a single file
-------------------------------------------------
### [](#huggingface_hub.hf_hub_download)hf\_hub\_download
#### huggingface\_hub.hf\_hub\_download
[](#huggingface_hub.hf_hub_download)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/file_download.py#L663)
( repo\_id: strfilename: strsubfolder: typing.Optional\[str\] = Nonerepo\_type: typing.Optional\[str\] = Nonerevision: typing.Optional\[str\] = Nonelibrary\_name: typing.Optional\[str\] = Nonelibrary\_version: typing.Optional\[str\] = Nonecache\_dir: typing.Union\[str, pathlib.Path, NoneType\] = Nonelocal\_dir: typing.Union\[str, pathlib.Path, NoneType\] = Noneuser\_agent: typing.Union\[typing.Dict, str, NoneType\] = Noneforce\_download: bool = Falseproxies: typing.Optional\[typing.Dict\] = Noneetag\_timeout: float = 10token: typing.Union\[bool, str, NoneType\] = Nonelocal\_files\_only: bool = Falseheaders: typing.Optional\[typing.Dict\[str, str\]\] = Noneendpoint: typing.Optional\[str\] = Noneresume\_download: typing.Optional\[bool\] = Noneforce\_filename: typing.Optional\[str\] = Nonelocal\_dir\_use\_symlinks: typing.Union\[bool, typing.Literal\['auto'\]\] = 'auto' ) β†’ export const metadata = 'undefined';`str`
Expand 16 parameters
Parameters
* [](#huggingface_hub.hf_hub_download.repo_id)**repo\_id** (`str`) β€” A user or an organization name and a repo name separated by a `/`.
* [](#huggingface_hub.hf_hub_download.filename)**filename** (`str`) β€” The name of the file in the repo.
* [](#huggingface_hub.hf_hub_download.subfolder)**subfolder** (`str`, _optional_) β€” An optional value corresponding to a folder inside the model repo.
* [](#huggingface_hub.hf_hub_download.repo_type)**repo\_type** (`str`, _optional_) β€” Set to `"dataset"` or `"space"` if downloading from a dataset or space, `None` or `"model"` if downloading from a model. Default is `None`.
* [](#huggingface_hub.hf_hub_download.revision)**revision** (`str`, _optional_) β€” An optional Git revision id which can be a branch name, a tag, or a commit hash.
* [](#huggingface_hub.hf_hub_download.library_name)**library\_name** (`str`, _optional_) β€” The name of the library to which the object corresponds.
* [](#huggingface_hub.hf_hub_download.library_version)**library\_version** (`str`, _optional_) β€” The version of the library.
* [](#huggingface_hub.hf_hub_download.cache_dir)**cache\_dir** (`str`, `Path`, _optional_) β€” Path to the folder where cached files are stored.
* [](#huggingface_hub.hf_hub_download.local_dir)**local\_dir** (`str` or `Path`, _optional_) β€” If provided, the downloaded file will be placed under this directory.
* [](#huggingface_hub.hf_hub_download.user_agent)**user\_agent** (`dict`, `str`, _optional_) β€” The user-agent info in the form of a dictionary or a string.
* [](#huggingface_hub.hf_hub_download.force_download)**force\_download** (`bool`, _optional_, defaults to `False`) β€” Whether the file should be downloaded even if it already exists in the local cache.
* [](#huggingface_hub.hf_hub_download.proxies)**proxies** (`dict`, _optional_) β€” Dictionary mapping protocol to the URL of the proxy passed to `requests.request`.
* [](#huggingface_hub.hf_hub_download.etag_timeout)**etag\_timeout** (`float`, _optional_, defaults to `10`) β€” When fetching ETag, how many seconds to wait for the server to send data before giving up which is passed to `requests.request`.
* [](#huggingface_hub.hf_hub_download.token)**token** (`str`, `bool`, _optional_) β€” A token to be used for the download.
* If `True`, the token is read from the HuggingFace config folder.
* If a string, it’s used as the authentication token.
* [](#huggingface_hub.hf_hub_download.local_files_only)**local\_files\_only** (`bool`, _optional_, defaults to `False`) β€” If `True`, avoid downloading the file and return the path to the local cached file if it exists.
* [](#huggingface_hub.hf_hub_download.headers)**headers** (`dict`, _optional_) β€” Additional headers to be sent with the request.
Returns
export const metadata = 'undefined';
`str`
export const metadata = 'undefined';
Local path of file or if networking is off, last version of file cached on disk.
Raises
export const metadata = 'undefined';
[RepositoryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RepositoryNotFoundError) or [RevisionNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RevisionNotFoundError) or [EntryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.EntryNotFoundError) or [LocalEntryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.LocalEntryNotFoundError) or `EnvironmentError` or `OSError` or `ValueError`
export const metadata = 'undefined';
* [RepositoryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RepositoryNotFoundError) β€” If the repository to download from cannot be found. This may be because it doesn’t exist, or because it is set to `private` and you do not have access.
* [RevisionNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RevisionNotFoundError) β€” If the revision to download from cannot be found.
* [EntryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.EntryNotFoundError) β€” If the file to download cannot be found.
* [LocalEntryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.LocalEntryNotFoundError) β€” If network is disabled or unavailable and file is not found in cache.
* [`EnvironmentError`](https://docs.python.org/3/library/exceptions.html#EnvironmentError) β€” If `token=True` but the token cannot be found.
* [`OSError`](https://docs.python.org/3/library/exceptions.html#OSError) β€” If ETag cannot be determined.
* [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError) β€” If some parameter value is invalid.
Download a given file if it’s not already present in the local cache.
The new cache file layout looks like this:
* The cache directory contains one subfolder per repo\_id (namespaced by repo type)
* inside each repo folder:
* refs is a list of the latest known revision => commit\_hash pairs
* blobs contains the actual file blobs (identified by their git-sha or sha256, depending on whether they’re LFS files or not)
* snapshots contains one subfolder per commit, each β€œcommit” contains the subset of the files that have been resolved at that particular commit. Each filename is a symlink to the blob at that particular commit.
[](#huggingface_hub.hf_hub_download.example)
Copied
\[ 96\] .
└── \[ 160\] models\--julien-c--EsperBERTo-small
β”œβ”€β”€ \[ 160\] blobs
β”‚ β”œβ”€β”€ \[321M\] 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
β”‚ β”œβ”€β”€ \[ 398\] 7cb18dc9bafbfcf74629a4b760af1b160957a83e
β”‚ └── \[1.4K\] d7edf6bd2a681fb0175f7735299831ee1b22b812
β”œβ”€β”€ \[ 96\] refs
β”‚ └── \[ 40\] main
└── \[ 128\] snapshots
β”œβ”€β”€ \[ 128\] 2439f60ef33a0d46d85da5001d52aeda5b00ce9f
β”‚ β”œβ”€β”€ \[ 52\] README.md -> ../../blobs/d7edf6bd2a681fb0175f7735299831ee1b22b812
β”‚ └── \[ 76\] pytorch\_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
└── \[ 128\] bbc77c8132af1cc5cf678da3f1ddf2de43606d48
β”œβ”€β”€ \[ 52\] README.md -> ../../blobs/7cb18dc9bafbfcf74629a4b760af1b160957a83e
└── \[ 76\] pytorch\_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this option, the `cache_dir` will not be used and a `.cache/huggingface/` folder will be created at the root of `local_dir` to store some metadata related to the downloaded files. While this mechanism is not as robust as the main cache-system, it’s optimized for regularly pulling the latest version of a repository.
### [](#huggingface_hub.hf_hub_url)hf\_hub\_url
#### huggingface\_hub.hf\_hub\_url
[](#huggingface_hub.hf_hub_url)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/file_download.py#L171)
( repo\_id: strfilename: strsubfolder: typing.Optional\[str\] = Nonerepo\_type: typing.Optional\[str\] = Nonerevision: typing.Optional\[str\] = Noneendpoint: typing.Optional\[str\] = None )
Parameters
* [](#huggingface_hub.hf_hub_url.repo_id)**repo\_id** (`str`) β€” A namespace (user or an organization) name and a repo name separated by a `/`.
* [](#huggingface_hub.hf_hub_url.filename)**filename** (`str`) β€” The name of the file in the repo.
* [](#huggingface_hub.hf_hub_url.subfolder)**subfolder** (`str`, _optional_) β€” An optional value corresponding to a folder inside the repo.
* [](#huggingface_hub.hf_hub_url.repo_type)**repo\_type** (`str`, _optional_) β€” Set to `"dataset"` or `"space"` if downloading from a dataset or space, `None` or `"model"` if downloading from a model. Default is `None`.
* [](#huggingface_hub.hf_hub_url.revision)**revision** (`str`, _optional_) β€” An optional Git revision id which can be a branch name, a tag, or a commit hash.
Construct the URL of a file from the given information.
The resolved address can either be a huggingface.co-hosted url, or a link to Cloudfront (a Content Delivery Network, or CDN) for large files which are more than a few MBs.
[](#huggingface_hub.hf_hub_url.example)
Example:
Copied
\>>> from huggingface\_hub import hf\_hub\_url
\>>> hf\_hub\_url(
... repo\_id="julien-c/EsperBERTo-small", filename="pytorch\_model.bin"
... )
'https://huggingface.co/julien-c/EsperBERTo-small/resolve/main/pytorch\_model.bin'
Notes:
Cloudfront is replicated over the globe so downloads are way faster for the end user (and it also lowers our bandwidth costs).
Cloudfront aggressively caches files by default (default TTL is 24 hours), however this is not an issue here because we implement a git-based versioning system on huggingface.co, which means that we store the files on S3/Cloudfront in a content-addressable way (i.e., the file name is its hash). Using content-addressable filenames means cache can’t ever be stale.
In terms of client-side caching from this library, we base our caching on the objects’ entity tag (`ETag`), which is an identifier of a specific version of a resource \[1\]\_. An object’s ETag is: its git-sha1 if stored in git, or its sha256 if stored in git-lfs.
References:
* \[1\] [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag)
[](#huggingface_hub.snapshot_download)Download a snapshot of the repo
---------------------------------------------------------------------
#### huggingface\_hub.snapshot\_download
[](#huggingface_hub.snapshot_download)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/_snapshot_download.py#L20)
( repo\_id: strrepo\_type: typing.Optional\[str\] = Nonerevision: typing.Optional\[str\] = Nonecache\_dir: typing.Union\[str, pathlib.Path, NoneType\] = Nonelocal\_dir: typing.Union\[str, pathlib.Path, NoneType\] = Nonelibrary\_name: typing.Optional\[str\] = Nonelibrary\_version: typing.Optional\[str\] = Noneuser\_agent: typing.Union\[typing.Dict, str, NoneType\] = Noneproxies: typing.Optional\[typing.Dict\] = Noneetag\_timeout: float = 10force\_download: bool = Falsetoken: typing.Union\[bool, str, NoneType\] = Nonelocal\_files\_only: bool = Falseallow\_patterns: typing.Union\[typing.List\[str\], str, NoneType\] = Noneignore\_patterns: typing.Union\[typing.List\[str\], str, NoneType\] = Nonemax\_workers: int = 8tqdm\_class: typing.Optional\[tqdm.asyncio.tqdm\_asyncio\] = Noneheaders: typing.Optional\[typing.Dict\[str, str\]\] = Noneendpoint: typing.Optional\[str\] = Nonelocal\_dir\_use\_symlinks: typing.Union\[bool, typing.Literal\['auto'\]\] = 'auto'resume\_download: typing.Optional\[bool\] = None ) β†’ export const metadata = 'undefined';`str`
Expand 18 parameters
Parameters
* [](#huggingface_hub.snapshot_download.repo_id)**repo\_id** (`str`) β€” A user or an organization name and a repo name separated by a `/`.
* [](#huggingface_hub.snapshot_download.repo_type)**repo\_type** (`str`, _optional_) β€” Set to `"dataset"` or `"space"` if downloading from a dataset or space, `None` or `"model"` if downloading from a model. Default is `None`.
* [](#huggingface_hub.snapshot_download.revision)**revision** (`str`, _optional_) β€” An optional Git revision id which can be a branch name, a tag, or a commit hash.
* [](#huggingface_hub.snapshot_download.cache_dir)**cache\_dir** (`str`, `Path`, _optional_) β€” Path to the folder where cached files are stored.
* [](#huggingface_hub.snapshot_download.local_dir)**local\_dir** (`str` or `Path`, _optional_) β€” If provided, the downloaded files will be placed under this directory.
* [](#huggingface_hub.snapshot_download.library_name)**library\_name** (`str`, _optional_) β€” The name of the library to which the object corresponds.
* [](#huggingface_hub.snapshot_download.library_version)**library\_version** (`str`, _optional_) β€” The version of the library.
* [](#huggingface_hub.snapshot_download.user_agent)**user\_agent** (`str`, `dict`, _optional_) β€” The user-agent info in the form of a dictionary or a string.
* [](#huggingface_hub.snapshot_download.proxies)**proxies** (`dict`, _optional_) β€” Dictionary mapping protocol to the URL of the proxy passed to `requests.request`.
* [](#huggingface_hub.snapshot_download.etag_timeout)**etag\_timeout** (`float`, _optional_, defaults to `10`) β€” When fetching ETag, how many seconds to wait for the server to send data before giving up which is passed to `requests.request`.
* [](#huggingface_hub.snapshot_download.force_download)**force\_download** (`bool`, _optional_, defaults to `False`) β€” Whether the file should be downloaded even if it already exists in the local cache.
* [](#huggingface_hub.snapshot_download.token)**token** (`str`, `bool`, _optional_) β€” A token to be used for the download.
* If `True`, the token is read from the HuggingFace config folder.
* If a string, it’s used as the authentication token.
* [](#huggingface_hub.snapshot_download.headers)**headers** (`dict`, _optional_) β€” Additional headers to include in the request. Those headers take precedence over the others.
* [](#huggingface_hub.snapshot_download.local_files_only)**local\_files\_only** (`bool`, _optional_, defaults to `False`) β€” If `True`, avoid downloading the file and return the path to the local cached file if it exists.
* [](#huggingface_hub.snapshot_download.allow_patterns)**allow\_patterns** (`List[str]` or `str`, _optional_) β€” If provided, only files matching at least one pattern are downloaded.
* [](#huggingface_hub.snapshot_download.ignore_patterns)**ignore\_patterns** (`List[str]` or `str`, _optional_) β€” If provided, files matching any of the patterns are not downloaded.
* [](#huggingface_hub.snapshot_download.max_workers)**max\_workers** (`int`, _optional_) β€” Number of concurrent threads to download files (1 thread = 1 file download). Defaults to 8.
* [](#huggingface_hub.snapshot_download.tqdm_class)**tqdm\_class** (`tqdm`, _optional_) β€” If provided, overwrites the default behavior for the progress bar. Passed argument must inherit from `tqdm.auto.tqdm` or at least mimic its behavior. Note that the `tqdm_class` is not passed to each individual download. Defaults to the custom HF progress bar that can be disabled by setting `HF_HUB_DISABLE_PROGRESS_BARS` environment variable.
Returns
export const metadata = 'undefined';
`str`
export const metadata = 'undefined';
folder path of the repo snapshot.
Raises
export const metadata = 'undefined';
[RepositoryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RepositoryNotFoundError) or [RevisionNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RevisionNotFoundError) or `EnvironmentError` or `OSError` or `ValueError`
export const metadata = 'undefined';
* [RepositoryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RepositoryNotFoundError) β€” If the repository to download from cannot be found. This may be because it doesn’t exist, or because it is set to `private` and you do not have access.
* [RevisionNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RevisionNotFoundError) β€” If the revision to download from cannot be found.
* [`EnvironmentError`](https://docs.python.org/3/library/exceptions.html#EnvironmentError) β€” If `token=True` and the token cannot be found.
* [`OSError`](https://docs.python.org/3/library/exceptions.html#OSError) β€” if ETag cannot be determined.
* [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError) β€” if some parameter value is invalid.
Download repo files.
Download a whole snapshot of a repo’s files at the specified revision. This is useful when you want all files from a repo, because you don’t know which ones you will need a priori. All files are nested inside a folder in order to keep their actual filename relative to that folder. You can also filter which files to download using `allow_patterns` and `ignore_patterns`.
If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this option, the `cache_dir` will not be used and a `.cache/huggingface/` folder will be created at the root of `local_dir` to store some metadata related to the downloaded files. While this mechanism is not as robust as the main cache-system, it’s optimized for regularly pulling the latest version of a repository.
An alternative would be to clone the repo but this requires git and git-lfs to be installed and properly configured. It is also not possible to filter which files to download when cloning a repository using git.
[](#get-metadata-about-a-file)Get metadata about a file
-------------------------------------------------------
### [](#huggingface_hub.get_hf_file_metadata)get\_hf\_file\_metadata
#### huggingface\_hub.get\_hf\_file\_metadata
[](#huggingface_hub.get_hf_file_metadata)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/file_download.py#L1246)
( url: strtoken: typing.Union\[bool, str, NoneType\] = Noneproxies: typing.Optional\[typing.Dict\] = Nonetimeout: typing.Optional\[float\] = 10library\_name: typing.Optional\[str\] = Nonelibrary\_version: typing.Optional\[str\] = Noneuser\_agent: typing.Union\[typing.Dict, str, NoneType\] = Noneheaders: typing.Optional\[typing.Dict\[str, str\]\] = None )
Parameters
* [](#huggingface_hub.get_hf_file_metadata.url)**url** (`str`) β€” File url, for example returned by [hf\_hub\_url()](/docs/huggingface_hub/v0.29.2/en/package_reference/file_download#huggingface_hub.hf_hub_url).
* [](#huggingface_hub.get_hf_file_metadata.token)**token** (`str` or `bool`, _optional_) β€” A token to be used for the download.
* If `True`, the token is read from the HuggingFace config folder.
* If `False` or `None`, no token is provided.
* If a string, it’s used as the authentication token.
* [](#huggingface_hub.get_hf_file_metadata.proxies)**proxies** (`dict`, _optional_) β€” Dictionary mapping protocol to the URL of the proxy passed to `requests.request`.
* [](#huggingface_hub.get_hf_file_metadata.timeout)**timeout** (`float`, _optional_, defaults to 10) β€” How many seconds to wait for the server to send metadata before giving up.
* [](#huggingface_hub.get_hf_file_metadata.library_name)**library\_name** (`str`, _optional_) β€” The name of the library to which the object corresponds.
* [](#huggingface_hub.get_hf_file_metadata.library_version)**library\_version** (`str`, _optional_) β€” The version of the library.
* [](#huggingface_hub.get_hf_file_metadata.user_agent)**user\_agent** (`dict`, `str`, _optional_) β€” The user-agent info in the form of a dictionary or a string.
* [](#huggingface_hub.get_hf_file_metadata.headers)**headers** (`dict`, _optional_) β€” Additional headers to be sent with the request.
Fetch metadata of a file versioned on the Hub for a given url.
### [](#huggingface_hub.HfFileMetadata)HfFileMetadata
### class huggingface\_hub.HfFileMetadata
[](#huggingface_hub.HfFileMetadata)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/file_download.py#L147)
( commit\_hash: typing.Optional\[str\]etag: typing.Optional\[str\]location: strsize: typing.Optional\[int\] )
Parameters
* [](#huggingface_hub.HfFileMetadata.commit_hash)**commit\_hash** (`str`, _optional_) β€” The commit\_hash related to the file.
* [](#huggingface_hub.HfFileMetadata.etag)**etag** (`str`, _optional_) β€” Etag of the file on the server.
* [](#huggingface_hub.HfFileMetadata.location)**location** (`str`) β€” Location where to download the file. Can be a Hub url or not (CDN).
* [](#huggingface_hub.HfFileMetadata.size)**size** (`size`) β€” Size of the file. In case of an LFS file, contains the size of the actual LFS file, not the pointer.
Data structure containing information about a file versioned on the Hub.
Returned by [get\_hf\_file\_metadata()](/docs/huggingface_hub/v0.29.2/en/package_reference/file_download#huggingface_hub.get_hf_file_metadata) based on a URL.
[](#caching)Caching
-------------------
The methods displayed above are designed to work with a caching system that prevents re-downloading files. The caching system was updated in v0.8.0 to become the central cache-system shared across libraries that depend on the Hub.
Read the [cache-system guide](../guides/manage-cache) for a detailed presentation of caching at at HF.
[< \> Update on GitHub](https://github.com/huggingface/huggingface_hub/blob/main/docs/source/en/package_reference/file_download.md)
HfApi Client
[←Hugging Face Hub API](/docs/huggingface_hub/en/package_reference/hf_api) [Mixins & serialization methodsβ†’](/docs/huggingface_hub/en/package_reference/mixins)