Spaces:
Running
Running
File size: 22,776 Bytes
ecd5028 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
[](#downloading-files)Downloading files
=======================================
[](#download-a-single-file)Download a single file
-------------------------------------------------
### [](#huggingface_hub.hf_hub_download)hf\_hub\_download
#### huggingface\_hub.hf\_hub\_download
[](#huggingface_hub.hf_hub_download)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/file_download.py#L663)
( repo\_id: strfilename: strsubfolder: typing.Optional\[str\] = Nonerepo\_type: typing.Optional\[str\] = Nonerevision: typing.Optional\[str\] = Nonelibrary\_name: typing.Optional\[str\] = Nonelibrary\_version: typing.Optional\[str\] = Nonecache\_dir: typing.Union\[str, pathlib.Path, NoneType\] = Nonelocal\_dir: typing.Union\[str, pathlib.Path, NoneType\] = Noneuser\_agent: typing.Union\[typing.Dict, str, NoneType\] = Noneforce\_download: bool = Falseproxies: typing.Optional\[typing.Dict\] = Noneetag\_timeout: float = 10token: typing.Union\[bool, str, NoneType\] = Nonelocal\_files\_only: bool = Falseheaders: typing.Optional\[typing.Dict\[str, str\]\] = Noneendpoint: typing.Optional\[str\] = Noneresume\_download: typing.Optional\[bool\] = Noneforce\_filename: typing.Optional\[str\] = Nonelocal\_dir\_use\_symlinks: typing.Union\[bool, typing.Literal\['auto'\]\] = 'auto' ) β export const metadata = 'undefined';`str`
Expand 16 parameters
Parameters
* [](#huggingface_hub.hf_hub_download.repo_id)**repo\_id** (`str`) β A user or an organization name and a repo name separated by a `/`.
* [](#huggingface_hub.hf_hub_download.filename)**filename** (`str`) β The name of the file in the repo.
* [](#huggingface_hub.hf_hub_download.subfolder)**subfolder** (`str`, _optional_) β An optional value corresponding to a folder inside the model repo.
* [](#huggingface_hub.hf_hub_download.repo_type)**repo\_type** (`str`, _optional_) β Set to `"dataset"` or `"space"` if downloading from a dataset or space, `None` or `"model"` if downloading from a model. Default is `None`.
* [](#huggingface_hub.hf_hub_download.revision)**revision** (`str`, _optional_) β An optional Git revision id which can be a branch name, a tag, or a commit hash.
* [](#huggingface_hub.hf_hub_download.library_name)**library\_name** (`str`, _optional_) β The name of the library to which the object corresponds.
* [](#huggingface_hub.hf_hub_download.library_version)**library\_version** (`str`, _optional_) β The version of the library.
* [](#huggingface_hub.hf_hub_download.cache_dir)**cache\_dir** (`str`, `Path`, _optional_) β Path to the folder where cached files are stored.
* [](#huggingface_hub.hf_hub_download.local_dir)**local\_dir** (`str` or `Path`, _optional_) β If provided, the downloaded file will be placed under this directory.
* [](#huggingface_hub.hf_hub_download.user_agent)**user\_agent** (`dict`, `str`, _optional_) β The user-agent info in the form of a dictionary or a string.
* [](#huggingface_hub.hf_hub_download.force_download)**force\_download** (`bool`, _optional_, defaults to `False`) β Whether the file should be downloaded even if it already exists in the local cache.
* [](#huggingface_hub.hf_hub_download.proxies)**proxies** (`dict`, _optional_) β Dictionary mapping protocol to the URL of the proxy passed to `requests.request`.
* [](#huggingface_hub.hf_hub_download.etag_timeout)**etag\_timeout** (`float`, _optional_, defaults to `10`) β When fetching ETag, how many seconds to wait for the server to send data before giving up which is passed to `requests.request`.
* [](#huggingface_hub.hf_hub_download.token)**token** (`str`, `bool`, _optional_) β A token to be used for the download.
* If `True`, the token is read from the HuggingFace config folder.
* If a string, itβs used as the authentication token.
* [](#huggingface_hub.hf_hub_download.local_files_only)**local\_files\_only** (`bool`, _optional_, defaults to `False`) β If `True`, avoid downloading the file and return the path to the local cached file if it exists.
* [](#huggingface_hub.hf_hub_download.headers)**headers** (`dict`, _optional_) β Additional headers to be sent with the request.
Returns
export const metadata = 'undefined';
`str`
export const metadata = 'undefined';
Local path of file or if networking is off, last version of file cached on disk.
Raises
export const metadata = 'undefined';
[RepositoryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RepositoryNotFoundError) or [RevisionNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RevisionNotFoundError) or [EntryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.EntryNotFoundError) or [LocalEntryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.LocalEntryNotFoundError) or `EnvironmentError` or `OSError` or `ValueError`
export const metadata = 'undefined';
* [RepositoryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RepositoryNotFoundError) β If the repository to download from cannot be found. This may be because it doesnβt exist, or because it is set to `private` and you do not have access.
* [RevisionNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RevisionNotFoundError) β If the revision to download from cannot be found.
* [EntryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.EntryNotFoundError) β If the file to download cannot be found.
* [LocalEntryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.LocalEntryNotFoundError) β If network is disabled or unavailable and file is not found in cache.
* [`EnvironmentError`](https://docs.python.org/3/library/exceptions.html#EnvironmentError) β If `token=True` but the token cannot be found.
* [`OSError`](https://docs.python.org/3/library/exceptions.html#OSError) β If ETag cannot be determined.
* [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError) β If some parameter value is invalid.
Download a given file if itβs not already present in the local cache.
The new cache file layout looks like this:
* The cache directory contains one subfolder per repo\_id (namespaced by repo type)
* inside each repo folder:
* refs is a list of the latest known revision => commit\_hash pairs
* blobs contains the actual file blobs (identified by their git-sha or sha256, depending on whether theyβre LFS files or not)
* snapshots contains one subfolder per commit, each βcommitβ contains the subset of the files that have been resolved at that particular commit. Each filename is a symlink to the blob at that particular commit.
[](#huggingface_hub.hf_hub_download.example)
Copied
\[ 96\] .
βββ \[ 160\] models\--julien-c--EsperBERTo-small
βββ \[ 160\] blobs
β βββ \[321M\] 403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
β βββ \[ 398\] 7cb18dc9bafbfcf74629a4b760af1b160957a83e
β βββ \[1.4K\] d7edf6bd2a681fb0175f7735299831ee1b22b812
βββ \[ 96\] refs
β βββ \[ 40\] main
βββ \[ 128\] snapshots
βββ \[ 128\] 2439f60ef33a0d46d85da5001d52aeda5b00ce9f
β βββ \[ 52\] README.md -> ../../blobs/d7edf6bd2a681fb0175f7735299831ee1b22b812
β βββ \[ 76\] pytorch\_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
βββ \[ 128\] bbc77c8132af1cc5cf678da3f1ddf2de43606d48
βββ \[ 52\] README.md -> ../../blobs/7cb18dc9bafbfcf74629a4b760af1b160957a83e
βββ \[ 76\] pytorch\_model.bin -> ../../blobs/403450e234d65943a7dcf7e05a771ce3c92faa84dd07db4ac20f592037a1e4bd
If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this option, the `cache_dir` will not be used and a `.cache/huggingface/` folder will be created at the root of `local_dir` to store some metadata related to the downloaded files. While this mechanism is not as robust as the main cache-system, itβs optimized for regularly pulling the latest version of a repository.
### [](#huggingface_hub.hf_hub_url)hf\_hub\_url
#### huggingface\_hub.hf\_hub\_url
[](#huggingface_hub.hf_hub_url)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/file_download.py#L171)
( repo\_id: strfilename: strsubfolder: typing.Optional\[str\] = Nonerepo\_type: typing.Optional\[str\] = Nonerevision: typing.Optional\[str\] = Noneendpoint: typing.Optional\[str\] = None )
Parameters
* [](#huggingface_hub.hf_hub_url.repo_id)**repo\_id** (`str`) β A namespace (user or an organization) name and a repo name separated by a `/`.
* [](#huggingface_hub.hf_hub_url.filename)**filename** (`str`) β The name of the file in the repo.
* [](#huggingface_hub.hf_hub_url.subfolder)**subfolder** (`str`, _optional_) β An optional value corresponding to a folder inside the repo.
* [](#huggingface_hub.hf_hub_url.repo_type)**repo\_type** (`str`, _optional_) β Set to `"dataset"` or `"space"` if downloading from a dataset or space, `None` or `"model"` if downloading from a model. Default is `None`.
* [](#huggingface_hub.hf_hub_url.revision)**revision** (`str`, _optional_) β An optional Git revision id which can be a branch name, a tag, or a commit hash.
Construct the URL of a file from the given information.
The resolved address can either be a huggingface.co-hosted url, or a link to Cloudfront (a Content Delivery Network, or CDN) for large files which are more than a few MBs.
[](#huggingface_hub.hf_hub_url.example)
Example:
Copied
\>>> from huggingface\_hub import hf\_hub\_url
\>>> hf\_hub\_url(
... repo\_id="julien-c/EsperBERTo-small", filename="pytorch\_model.bin"
... )
'https://huggingface.co/julien-c/EsperBERTo-small/resolve/main/pytorch\_model.bin'
Notes:
Cloudfront is replicated over the globe so downloads are way faster for the end user (and it also lowers our bandwidth costs).
Cloudfront aggressively caches files by default (default TTL is 24 hours), however this is not an issue here because we implement a git-based versioning system on huggingface.co, which means that we store the files on S3/Cloudfront in a content-addressable way (i.e., the file name is its hash). Using content-addressable filenames means cache canβt ever be stale.
In terms of client-side caching from this library, we base our caching on the objectsβ entity tag (`ETag`), which is an identifier of a specific version of a resource \[1\]\_. An objectβs ETag is: its git-sha1 if stored in git, or its sha256 if stored in git-lfs.
References:
* \[1\] [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag)
[](#huggingface_hub.snapshot_download)Download a snapshot of the repo
---------------------------------------------------------------------
#### huggingface\_hub.snapshot\_download
[](#huggingface_hub.snapshot_download)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/_snapshot_download.py#L20)
( repo\_id: strrepo\_type: typing.Optional\[str\] = Nonerevision: typing.Optional\[str\] = Nonecache\_dir: typing.Union\[str, pathlib.Path, NoneType\] = Nonelocal\_dir: typing.Union\[str, pathlib.Path, NoneType\] = Nonelibrary\_name: typing.Optional\[str\] = Nonelibrary\_version: typing.Optional\[str\] = Noneuser\_agent: typing.Union\[typing.Dict, str, NoneType\] = Noneproxies: typing.Optional\[typing.Dict\] = Noneetag\_timeout: float = 10force\_download: bool = Falsetoken: typing.Union\[bool, str, NoneType\] = Nonelocal\_files\_only: bool = Falseallow\_patterns: typing.Union\[typing.List\[str\], str, NoneType\] = Noneignore\_patterns: typing.Union\[typing.List\[str\], str, NoneType\] = Nonemax\_workers: int = 8tqdm\_class: typing.Optional\[tqdm.asyncio.tqdm\_asyncio\] = Noneheaders: typing.Optional\[typing.Dict\[str, str\]\] = Noneendpoint: typing.Optional\[str\] = Nonelocal\_dir\_use\_symlinks: typing.Union\[bool, typing.Literal\['auto'\]\] = 'auto'resume\_download: typing.Optional\[bool\] = None ) β export const metadata = 'undefined';`str`
Expand 18 parameters
Parameters
* [](#huggingface_hub.snapshot_download.repo_id)**repo\_id** (`str`) β A user or an organization name and a repo name separated by a `/`.
* [](#huggingface_hub.snapshot_download.repo_type)**repo\_type** (`str`, _optional_) β Set to `"dataset"` or `"space"` if downloading from a dataset or space, `None` or `"model"` if downloading from a model. Default is `None`.
* [](#huggingface_hub.snapshot_download.revision)**revision** (`str`, _optional_) β An optional Git revision id which can be a branch name, a tag, or a commit hash.
* [](#huggingface_hub.snapshot_download.cache_dir)**cache\_dir** (`str`, `Path`, _optional_) β Path to the folder where cached files are stored.
* [](#huggingface_hub.snapshot_download.local_dir)**local\_dir** (`str` or `Path`, _optional_) β If provided, the downloaded files will be placed under this directory.
* [](#huggingface_hub.snapshot_download.library_name)**library\_name** (`str`, _optional_) β The name of the library to which the object corresponds.
* [](#huggingface_hub.snapshot_download.library_version)**library\_version** (`str`, _optional_) β The version of the library.
* [](#huggingface_hub.snapshot_download.user_agent)**user\_agent** (`str`, `dict`, _optional_) β The user-agent info in the form of a dictionary or a string.
* [](#huggingface_hub.snapshot_download.proxies)**proxies** (`dict`, _optional_) β Dictionary mapping protocol to the URL of the proxy passed to `requests.request`.
* [](#huggingface_hub.snapshot_download.etag_timeout)**etag\_timeout** (`float`, _optional_, defaults to `10`) β When fetching ETag, how many seconds to wait for the server to send data before giving up which is passed to `requests.request`.
* [](#huggingface_hub.snapshot_download.force_download)**force\_download** (`bool`, _optional_, defaults to `False`) β Whether the file should be downloaded even if it already exists in the local cache.
* [](#huggingface_hub.snapshot_download.token)**token** (`str`, `bool`, _optional_) β A token to be used for the download.
* If `True`, the token is read from the HuggingFace config folder.
* If a string, itβs used as the authentication token.
* [](#huggingface_hub.snapshot_download.headers)**headers** (`dict`, _optional_) β Additional headers to include in the request. Those headers take precedence over the others.
* [](#huggingface_hub.snapshot_download.local_files_only)**local\_files\_only** (`bool`, _optional_, defaults to `False`) β If `True`, avoid downloading the file and return the path to the local cached file if it exists.
* [](#huggingface_hub.snapshot_download.allow_patterns)**allow\_patterns** (`List[str]` or `str`, _optional_) β If provided, only files matching at least one pattern are downloaded.
* [](#huggingface_hub.snapshot_download.ignore_patterns)**ignore\_patterns** (`List[str]` or `str`, _optional_) β If provided, files matching any of the patterns are not downloaded.
* [](#huggingface_hub.snapshot_download.max_workers)**max\_workers** (`int`, _optional_) β Number of concurrent threads to download files (1 thread = 1 file download). Defaults to 8.
* [](#huggingface_hub.snapshot_download.tqdm_class)**tqdm\_class** (`tqdm`, _optional_) β If provided, overwrites the default behavior for the progress bar. Passed argument must inherit from `tqdm.auto.tqdm` or at least mimic its behavior. Note that the `tqdm_class` is not passed to each individual download. Defaults to the custom HF progress bar that can be disabled by setting `HF_HUB_DISABLE_PROGRESS_BARS` environment variable.
Returns
export const metadata = 'undefined';
`str`
export const metadata = 'undefined';
folder path of the repo snapshot.
Raises
export const metadata = 'undefined';
[RepositoryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RepositoryNotFoundError) or [RevisionNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RevisionNotFoundError) or `EnvironmentError` or `OSError` or `ValueError`
export const metadata = 'undefined';
* [RepositoryNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RepositoryNotFoundError) β If the repository to download from cannot be found. This may be because it doesnβt exist, or because it is set to `private` and you do not have access.
* [RevisionNotFoundError](/docs/huggingface_hub/v0.29.2/en/package_reference/utilities#huggingface_hub.errors.RevisionNotFoundError) β If the revision to download from cannot be found.
* [`EnvironmentError`](https://docs.python.org/3/library/exceptions.html#EnvironmentError) β If `token=True` and the token cannot be found.
* [`OSError`](https://docs.python.org/3/library/exceptions.html#OSError) β if ETag cannot be determined.
* [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError) β if some parameter value is invalid.
Download repo files.
Download a whole snapshot of a repoβs files at the specified revision. This is useful when you want all files from a repo, because you donβt know which ones you will need a priori. All files are nested inside a folder in order to keep their actual filename relative to that folder. You can also filter which files to download using `allow_patterns` and `ignore_patterns`.
If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this option, the `cache_dir` will not be used and a `.cache/huggingface/` folder will be created at the root of `local_dir` to store some metadata related to the downloaded files. While this mechanism is not as robust as the main cache-system, itβs optimized for regularly pulling the latest version of a repository.
An alternative would be to clone the repo but this requires git and git-lfs to be installed and properly configured. It is also not possible to filter which files to download when cloning a repository using git.
[](#get-metadata-about-a-file)Get metadata about a file
-------------------------------------------------------
### [](#huggingface_hub.get_hf_file_metadata)get\_hf\_file\_metadata
#### huggingface\_hub.get\_hf\_file\_metadata
[](#huggingface_hub.get_hf_file_metadata)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/file_download.py#L1246)
( url: strtoken: typing.Union\[bool, str, NoneType\] = Noneproxies: typing.Optional\[typing.Dict\] = Nonetimeout: typing.Optional\[float\] = 10library\_name: typing.Optional\[str\] = Nonelibrary\_version: typing.Optional\[str\] = Noneuser\_agent: typing.Union\[typing.Dict, str, NoneType\] = Noneheaders: typing.Optional\[typing.Dict\[str, str\]\] = None )
Parameters
* [](#huggingface_hub.get_hf_file_metadata.url)**url** (`str`) β File url, for example returned by [hf\_hub\_url()](/docs/huggingface_hub/v0.29.2/en/package_reference/file_download#huggingface_hub.hf_hub_url).
* [](#huggingface_hub.get_hf_file_metadata.token)**token** (`str` or `bool`, _optional_) β A token to be used for the download.
* If `True`, the token is read from the HuggingFace config folder.
* If `False` or `None`, no token is provided.
* If a string, itβs used as the authentication token.
* [](#huggingface_hub.get_hf_file_metadata.proxies)**proxies** (`dict`, _optional_) β Dictionary mapping protocol to the URL of the proxy passed to `requests.request`.
* [](#huggingface_hub.get_hf_file_metadata.timeout)**timeout** (`float`, _optional_, defaults to 10) β How many seconds to wait for the server to send metadata before giving up.
* [](#huggingface_hub.get_hf_file_metadata.library_name)**library\_name** (`str`, _optional_) β The name of the library to which the object corresponds.
* [](#huggingface_hub.get_hf_file_metadata.library_version)**library\_version** (`str`, _optional_) β The version of the library.
* [](#huggingface_hub.get_hf_file_metadata.user_agent)**user\_agent** (`dict`, `str`, _optional_) β The user-agent info in the form of a dictionary or a string.
* [](#huggingface_hub.get_hf_file_metadata.headers)**headers** (`dict`, _optional_) β Additional headers to be sent with the request.
Fetch metadata of a file versioned on the Hub for a given url.
### [](#huggingface_hub.HfFileMetadata)HfFileMetadata
### class huggingface\_hub.HfFileMetadata
[](#huggingface_hub.HfFileMetadata)[< source \>](https://github.com/huggingface/huggingface_hub/blob/v0.29.2/src/huggingface_hub/file_download.py#L147)
( commit\_hash: typing.Optional\[str\]etag: typing.Optional\[str\]location: strsize: typing.Optional\[int\] )
Parameters
* [](#huggingface_hub.HfFileMetadata.commit_hash)**commit\_hash** (`str`, _optional_) β The commit\_hash related to the file.
* [](#huggingface_hub.HfFileMetadata.etag)**etag** (`str`, _optional_) β Etag of the file on the server.
* [](#huggingface_hub.HfFileMetadata.location)**location** (`str`) β Location where to download the file. Can be a Hub url or not (CDN).
* [](#huggingface_hub.HfFileMetadata.size)**size** (`size`) β Size of the file. In case of an LFS file, contains the size of the actual LFS file, not the pointer.
Data structure containing information about a file versioned on the Hub.
Returned by [get\_hf\_file\_metadata()](/docs/huggingface_hub/v0.29.2/en/package_reference/file_download#huggingface_hub.get_hf_file_metadata) based on a URL.
[](#caching)Caching
-------------------
The methods displayed above are designed to work with a caching system that prevents re-downloading files. The caching system was updated in v0.8.0 to become the central cache-system shared across libraries that depend on the Hub.
Read the [cache-system guide](../guides/manage-cache) for a detailed presentation of caching at at HF.
[< \> Update on GitHub](https://github.com/huggingface/huggingface_hub/blob/main/docs/source/en/package_reference/file_download.md)
HfApi Client
[βHugging Face Hub API](/docs/huggingface_hub/en/package_reference/hf_api) [Mixins & serialization methodsβ](/docs/huggingface_hub/en/package_reference/mixins) |