John6666/civitai_to_hf · updating issue

1 day ago

Hi John, with some profiles containing over 500 LoRAs/models, I find it challenging to download or update them because the one-hour limit is triggered too quickly. Typically, it downloads around 150–200 files, and then after one hour, I try again. However, it appears to download and overwrite everything, so after reaching 250–300 files, the one-hour limit is triggered again. This means the download rate for updating a profile is only about 50 files per hour.

Is there a way to check the already downloaded files and ignore them so that only missing files are downloaded instead of redownloading everything? I have a profile with over 1,000 LoRAs, and I've been updating it since it only had 100 LoRAs using the monthly tab. However, I find it challenging to download or update profiles with such a high volume of content.

John6666

Owner 1 day ago

I'll give it a try. For now, since the API for Hugging Face is well-developed, it's easy to check the hash in advance, but the issue was obtaining the SHA256 hash for Civitai. I managed to do that part today. I plan to work on the rest tomorrow.

import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
import urllib
import re

def get_user_agent():
    return 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0'

def get_civitai_sha256(dl_url: str, api_key=""):
    def is_invalid_file(qs: dict, json: dict, k: str):
        return k in qs.keys() and qs[k][0] != json.get(k, None) and json.get(k, None) is not None

    if "https://civitai.com/api/download/models/" not in dl_url: return None
    user_agent = get_user_agent()
    headers = {'User-Agent': user_agent, 'content-type': 'application/json'}
    if api_key: headers['Authorization'] = f'Bearer {{{api_key}}}'
    base_url = 'https://civitai.com/api/v1/model-versions/'
    params = {}
    session = requests.Session()
    retries = Retry(total=5, backoff_factor=1, status_forcelist=[500, 502, 503, 504])
    session.mount("https://", HTTPAdapter(max_retries=retries))
    m = re.match(r'https://civitai.com/api/download/models/(\d+)\??(.+)?', dl_url)
    if m is None: return None
    url = base_url + m.group(1)
    qs = urllib.parse.parse_qs(m.group(2))
    if "type" not in qs.keys(): qs["type"] = ["Model"]
    try:
        r = session.get(url, params=params, headers=headers, stream=True, timeout=(5.0, 15))
        if not r.ok: return None
        json = dict(r.json())
        if "files" not in json.keys() or not isinstance(json["files"], list): return None
        hash = None
        for d in json["files"]:
            if is_invalid_file(qs, d, "type") or is_invalid_file(qs, d, "format") or is_invalid_file(qs, d, "size") or is_invalid_file(qs, d, "fp"): continue
            hash = d["hashes"]["SHA256"].lower()
        return hash
    except Exception as e:
        print(e)
        return None

print(get_civitai_sha256("https://civitai.com/api/download/models/1335639"))
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639?type=Model&format=SafeTensor"))
print(get_civitai_sha256("https://civitai.com/api/download/models/1335639?type=Training%20Data"))

John6666

Owner about 15 hours ago

I found a troublesome bug related to Transformers and Spaces, so I'll investigate that first.