Spaces:
Running
Running
aliasgerovs
commited on
Commit
•
9c71743
1
Parent(s):
f45e494
Updated
Browse files- nohup.out +337 -0
- pdf_supporter/demo.py +68 -0
- pdf_supporter/nohup.out +157 -0
- pdf_supporter/requirements.txt +6 -0
nohup.out
CHANGED
@@ -22,3 +22,340 @@ Received outputs:
|
|
22 |
["Operation Title was an unsuccessful 1942 Allied attack on the German battleship Tirpitz during World War II. The Allies considered Tirpitz to be a major threat to their shipping and after several Royal Air Force heavy bomber raids failed to inflict any damage it was decided to use Royal Navy midget submarines instead."]
|
23 |
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
|
24 |
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
["Operation Title was an unsuccessful 1942 Allied attack on the German battleship Tirpitz during World War II. The Allies considered Tirpitz to be a major threat to their shipping and after several Royal Air Force heavy bomber raids failed to inflict any damage it was decided to use Royal Navy midget submarines instead."]
|
23 |
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
|
24 |
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
|
25 |
+
2024-05-15 18:41:05.953508: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
26 |
+
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
27 |
+
2024-05-15 18:41:11.449382: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
28 |
+
[nltk_data] Downloading package punkt to /root/nltk_data...
|
29 |
+
[nltk_data] Package punkt is already up-to-date!
|
30 |
+
[nltk_data] Downloading package stopwords to /root/nltk_data...
|
31 |
+
[nltk_data] Package stopwords is already up-to-date!
|
32 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
33 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
34 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
35 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
36 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
37 |
+
Some weights of the model checkpoint at textattack/roberta-base-CoLA were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
|
38 |
+
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
39 |
+
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
40 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
41 |
+
Framework not specified. Using pt to export the model.
|
42 |
+
Some weights of the model checkpoint at textattack/roberta-base-CoLA were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
|
43 |
+
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
44 |
+
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
45 |
+
Using the export variant default. Available variants are:
|
46 |
+
- default: The default ONNX variant.
|
47 |
+
|
48 |
+
***** Exporting submodel 1/1: RobertaForSequenceClassification *****
|
49 |
+
Using framework PyTorch: 2.3.0+cu121
|
50 |
+
Overriding 1 configuration item(s)
|
51 |
+
- use_cache -> False
|
52 |
+
Framework not specified. Using pt to export the model.
|
53 |
+
Using the export variant default. Available variants are:
|
54 |
+
- default: The default ONNX variant.
|
55 |
+
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
|
56 |
+
Non-default generation parameters: {'max_length': 512, 'min_length': 8, 'num_beams': 2, 'no_repeat_ngram_size': 4}
|
57 |
+
|
58 |
+
***** Exporting submodel 1/3: T5Stack *****
|
59 |
+
Using framework PyTorch: 2.3.0+cu121
|
60 |
+
Overriding 1 configuration item(s)
|
61 |
+
- use_cache -> False
|
62 |
+
|
63 |
+
***** Exporting submodel 2/3: T5ForConditionalGeneration *****
|
64 |
+
Using framework PyTorch: 2.3.0+cu121
|
65 |
+
Overriding 1 configuration item(s)
|
66 |
+
- use_cache -> True
|
67 |
+
/usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py:1017: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
|
68 |
+
if causal_mask.shape[1] < attention_mask.shape[1]:
|
69 |
+
|
70 |
+
***** Exporting submodel 3/3: T5ForConditionalGeneration *****
|
71 |
+
Using framework PyTorch: 2.3.0+cu121
|
72 |
+
Overriding 1 configuration item(s)
|
73 |
+
- use_cache -> True
|
74 |
+
/usr/local/lib/python3.9/dist-packages/transformers/models/t5/modeling_t5.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
|
75 |
+
elif past_key_value.shape[2] != key_value_states.shape[1]:
|
76 |
+
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
|
77 |
+
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
|
78 |
+
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
|
79 |
+
Non-default generation parameters: {'max_length': 512, 'min_length': 8, 'num_beams': 2, 'no_repeat_ngram_size': 4}
|
80 |
+
[nltk_data] Downloading package cmudict to /root/nltk_data...
|
81 |
+
[nltk_data] Package cmudict is already up-to-date!
|
82 |
+
[nltk_data] Downloading package punkt to /root/nltk_data...
|
83 |
+
[nltk_data] Package punkt is already up-to-date!
|
84 |
+
[nltk_data] Downloading package stopwords to /root/nltk_data...
|
85 |
+
[nltk_data] Package stopwords is already up-to-date!
|
86 |
+
[nltk_data] Downloading package wordnet to /root/nltk_data...
|
87 |
+
[nltk_data] Package wordnet is already up-to-date!
|
88 |
+
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
|
89 |
+
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
|
90 |
+
Collecting en_core_web_sm==2.3.1
|
91 |
+
Using cached en_core_web_sm-2.3.1-py3-none-any.whl
|
92 |
+
Requirement already satisfied: spacy<2.4.0,>=2.3.0 in /usr/local/lib/python3.9/dist-packages (from en_core_web_sm==2.3.1) (2.3.9)
|
93 |
+
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.0.9)
|
94 |
+
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.7.11)
|
95 |
+
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (4.66.2)
|
96 |
+
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.7)
|
97 |
+
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/lib/python3/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.25.1)
|
98 |
+
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.1.3)
|
99 |
+
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (52.0.0)
|
100 |
+
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.0.8)
|
101 |
+
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.10)
|
102 |
+
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.10.1)
|
103 |
+
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.26.4)
|
104 |
+
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.2)
|
105 |
+
Requirement already satisfied: thinc<7.5.0,>=7.4.1 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (7.4.6)
|
106 |
+
[38;5;2m✔ Download and installation successful[0m
|
107 |
+
You can now load the model via spacy.load('en_core_web_sm')
|
108 |
+
/usr/local/lib/python3.9/dist-packages/gradio/utils.py:953: UserWarning: Expected 1 arguments for function <function depth_analysis at 0x7f6df970eee0>, received 2.
|
109 |
+
warnings.warn(
|
110 |
+
/usr/local/lib/python3.9/dist-packages/gradio/utils.py:961: UserWarning: Expected maximum 1 arguments for function <function depth_analysis at 0x7f6df970eee0>, received 2.
|
111 |
+
warnings.warn(
|
112 |
+
IMPORTANT: You are using gradio version 4.28.3, however version 4.29.0 is available, please upgrade.
|
113 |
+
--------
|
114 |
+
Running on local URL: http://0.0.0.0:80
|
115 |
+
Running on public URL: https://1f9431205fb743687b.gradio.live
|
116 |
+
|
117 |
+
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
|
118 |
+
|
119 |
+
|
120 |
+
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
|
121 |
+
To disable this warning, you can either:
|
122 |
+
- Avoid using `tokenizers` before the fork if possible
|
123 |
+
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
|
124 |
+
/usr/local/lib/python3.9/dist-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
|
125 |
+
warnings.warn("Can't initialize NVML")
|
126 |
+
/usr/local/lib/python3.9/dist-packages/optimum/bettertransformer/models/encoder_models.py:301: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:178.)
|
127 |
+
hidden_states = torch._nested_tensor_from_mask(hidden_states, ~attention_mask)
|
128 |
+
Original BC scores: AI: 0.0012912281090393662, HUMAN: 0.9987087249755859
|
129 |
+
Calibration BC scores: AI: 0.09973753280839895, HUMAN: 0.9002624671916011
|
130 |
+
Input Text: sOperation Title was an unsuccessful 1942 Allied attack on the German battleship Tirpitz during World War II. The Allies considered Tirpitz to be a major threat to their shipping and after several Royal Air Force heavy bomber raids failed to inflict any damdage it was decided to use Royal Navy midget submarines instead. /s
|
131 |
+
|
132 |
+
|
133 |
+
Original BC scores: AI: 1.946412595543734e-07, HUMAN: 0.9999997615814209
|
134 |
+
Calibration BC scores: AI: 0.0013484877672895396, HUMAN: 0.9986515122327104
|
135 |
+
Input Text: sThe Allies considered Trotsky to be a major threat to their shipping and after several heavy bombs failed to inflict any damage it was decided to use smaller Royal Navy submarines instead. /s
|
136 |
+
Original BC scores: AI: 7.88536635809578e-06, HUMAN: 0.9999921321868896
|
137 |
+
Calibration BC scores: AI: 0.008818342151675485, HUMAN: 0.9911816578483246
|
138 |
+
Input Text: sAlireza Masrour, Generall Partner at Plug Play, has led over 200 investmens in startups sence 2008. Notable unicorn investmens include CloudWalk, Flyr, FiscalNote, Shippo, Owkin, and Trulioo. He has also been involvd in sucsessful exits such as FiscalNote's IPO, HealthPocket's acqusition by Health Insurans Innovations, and Kustomer's acqusition by FaceBook. Alireza has receeved recognition for his acheivements, includng beeing named a Silicon Valley 40 under 40 in 2018 and a rising-star VC by BusinessInsider. He has had 13 unicorn portfollio companys and manages a B Portfollio Club with investmens in companys like N26, BigID, Shippo, and TrueBill, wich was acquried by RocketCo for 1. 3B. Other investmens include Flexiv, Owkin, VisbyMedikal, Animoca, and AutoX. /s
|
139 |
+
Models to Test: ['OpenAI GPT', 'Mistral', 'CLAUDE', 'Gemini', 'Grammar Enhancer']
|
140 |
+
Original BC scores: AI: 7.88536635809578e-06, HUMAN: 0.9999921321868896
|
141 |
+
Calibration BC scores: AI: 0.008818342151675485, HUMAN: 0.9911816578483246
|
142 |
+
Starting MC
|
143 |
+
MC Score: {'OpenAI GPT': 1.1978447330533474e-12, 'Mistral': 2.7469434957703303e-13, 'CLAUDE': 8.578213092883691e-13, 'Gemini': 6.304846046418989e-13, 'Grammar Enhancer': 0.008818342148714584}
|
144 |
+
|
145 |
+
Original BC scores: AI: 0.9980764389038086, HUMAN: 0.001923577394336462
|
146 |
+
Calibration BC scores: AI: 0.7272727272727273, HUMAN: 0.2727272727272727
|
147 |
+
Input Text: sAlireza Marmar, general partner at Plug Play, has led over 200 investments in startups since 2008. Notable unicorns include CloudWatch, Flyer, FiscalNote, Shippo, Owkin, and Trulio. He has also been involved in successful exits such as Microsoft's IPO, HealthPocket's acquisition by HealthInsuranceInc. , and Salesforce's acquisition of Facebook. Alireza has received praise for his achievements, including being named a Silicon Valley 40 under 40 in 2018 and a Rising Star by Business Insider. He has had 13 unicorn companies and manages a Billion Ponzi scheme with investments in companies like N26, BigID, Shippo, and TruBill, which was acquired by RocketCoop for 1. 3B. Other investments include Xerox, Owatu, Microsoft, Amazon, and AutoX. /s
|
148 |
+
Models to Test: ['OpenAI GPT', 'Mistral', 'CLAUDE', 'Gemini', 'Grammar Enhancer']
|
149 |
+
Original BC scores: AI: 0.9980764389038086, HUMAN: 0.001923577394336462
|
150 |
+
Calibration BC scores: AI: 0.7272727272727273, HUMAN: 0.2727272727272727
|
151 |
+
Starting MC
|
152 |
+
MC Score: {'OpenAI GPT': 1.7068867157614812e-06, 'Mistral': 6.292188498138414e-10, 'CLAUDE': 8.175567903345952e-09, 'Gemini': 2.868823230740637e-08, 'Grammar Enhancer': 0.7272709828929925}
|
153 |
+
|
154 |
+
|
155 |
+
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
|
156 |
+
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
|
157 |
+
2024-05-15 19:31:58.934498: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
158 |
+
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
159 |
+
2024-05-15 19:32:05.107700: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
160 |
+
[nltk_data] Downloading package punkt to /root/nltk_data...
|
161 |
+
[nltk_data] Package punkt is already up-to-date!
|
162 |
+
[nltk_data] Downloading package stopwords to /root/nltk_data...
|
163 |
+
[nltk_data] Package stopwords is already up-to-date!
|
164 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
165 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
166 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
167 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
168 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
169 |
+
Some weights of the model checkpoint at textattack/roberta-base-CoLA were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
|
170 |
+
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
171 |
+
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
172 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
173 |
+
Framework not specified. Using pt to export the model.
|
174 |
+
Some weights of the model checkpoint at textattack/roberta-base-CoLA were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
|
175 |
+
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
176 |
+
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
177 |
+
Using the export variant default. Available variants are:
|
178 |
+
- default: The default ONNX variant.
|
179 |
+
|
180 |
+
***** Exporting submodel 1/1: RobertaForSequenceClassification *****
|
181 |
+
Using framework PyTorch: 2.3.0+cu121
|
182 |
+
Overriding 1 configuration item(s)
|
183 |
+
- use_cache -> False
|
184 |
+
Framework not specified. Using pt to export the model.
|
185 |
+
Using the export variant default. Available variants are:
|
186 |
+
- default: The default ONNX variant.
|
187 |
+
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
|
188 |
+
Non-default generation parameters: {'max_length': 512, 'min_length': 8, 'num_beams': 2, 'no_repeat_ngram_size': 4}
|
189 |
+
|
190 |
+
***** Exporting submodel 1/3: T5Stack *****
|
191 |
+
Using framework PyTorch: 2.3.0+cu121
|
192 |
+
Overriding 1 configuration item(s)
|
193 |
+
- use_cache -> False
|
194 |
+
|
195 |
+
***** Exporting submodel 2/3: T5ForConditionalGeneration *****
|
196 |
+
Using framework PyTorch: 2.3.0+cu121
|
197 |
+
Overriding 1 configuration item(s)
|
198 |
+
- use_cache -> True
|
199 |
+
/usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py:1017: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
|
200 |
+
if causal_mask.shape[1] < attention_mask.shape[1]:
|
201 |
+
|
202 |
+
***** Exporting submodel 3/3: T5ForConditionalGeneration *****
|
203 |
+
Using framework PyTorch: 2.3.0+cu121
|
204 |
+
Overriding 1 configuration item(s)
|
205 |
+
- use_cache -> True
|
206 |
+
/usr/local/lib/python3.9/dist-packages/transformers/models/t5/modeling_t5.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
|
207 |
+
elif past_key_value.shape[2] != key_value_states.shape[1]:
|
208 |
+
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
|
209 |
+
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
|
210 |
+
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
|
211 |
+
Non-default generation parameters: {'max_length': 512, 'min_length': 8, 'num_beams': 2, 'no_repeat_ngram_size': 4}
|
212 |
+
[nltk_data] Downloading package cmudict to /root/nltk_data...
|
213 |
+
[nltk_data] Package cmudict is already up-to-date!
|
214 |
+
[nltk_data] Downloading package punkt to /root/nltk_data...
|
215 |
+
[nltk_data] Package punkt is already up-to-date!
|
216 |
+
[nltk_data] Downloading package stopwords to /root/nltk_data...
|
217 |
+
[nltk_data] Package stopwords is already up-to-date!
|
218 |
+
[nltk_data] Downloading package wordnet to /root/nltk_data...
|
219 |
+
[nltk_data] Package wordnet is already up-to-date!
|
220 |
+
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
|
221 |
+
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
|
222 |
+
Collecting en_core_web_sm==2.3.1
|
223 |
+
Using cached en_core_web_sm-2.3.1-py3-none-any.whl
|
224 |
+
Requirement already satisfied: spacy<2.4.0,>=2.3.0 in /usr/local/lib/python3.9/dist-packages (from en_core_web_sm==2.3.1) (2.3.9)
|
225 |
+
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.26.4)
|
226 |
+
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.0.9)
|
227 |
+
Requirement already satisfied: thinc<7.5.0,>=7.4.1 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (7.4.6)
|
228 |
+
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.2)
|
229 |
+
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.1.3)
|
230 |
+
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/lib/python3/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.25.1)
|
231 |
+
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.10.1)
|
232 |
+
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.7)
|
233 |
+
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (4.66.2)
|
234 |
+
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.7.11)
|
235 |
+
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (52.0.0)
|
236 |
+
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.10)
|
237 |
+
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.0.8)
|
238 |
+
[38;5;2m✔ Download and installation successful[0m
|
239 |
+
You can now load the model via spacy.load('en_core_web_sm')
|
240 |
+
/usr/local/lib/python3.9/dist-packages/gradio/utils.py:953: UserWarning: Expected 1 arguments for function <function depth_analysis at 0x7f137170dee0>, received 2.
|
241 |
+
warnings.warn(
|
242 |
+
/usr/local/lib/python3.9/dist-packages/gradio/utils.py:961: UserWarning: Expected maximum 1 arguments for function <function depth_analysis at 0x7f137170dee0>, received 2.
|
243 |
+
warnings.warn(
|
244 |
+
WARNING: Invalid HTTP request received.
|
245 |
+
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
|
246 |
+
To disable this warning, you can either:
|
247 |
+
- Avoid using `tokenizers` before the fork if possible
|
248 |
+
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
|
249 |
+
/usr/local/lib/python3.9/dist-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
|
250 |
+
warnings.warn("Can't initialize NVML")
|
251 |
+
/usr/local/lib/python3.9/dist-packages/optimum/bettertransformer/models/encoder_models.py:301: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:178.)
|
252 |
+
hidden_states = torch._nested_tensor_from_mask(hidden_states, ~attention_mask)
|
253 |
+
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
|
254 |
+
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
|
255 |
+
2024-05-15 22:08:54.473739: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
|
256 |
+
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
|
257 |
+
2024-05-15 22:09:00.121158: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
|
258 |
+
[nltk_data] Downloading package punkt to /root/nltk_data...
|
259 |
+
[nltk_data] Package punkt is already up-to-date!
|
260 |
+
[nltk_data] Downloading package stopwords to /root/nltk_data...
|
261 |
+
[nltk_data] Package stopwords is already up-to-date!
|
262 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
263 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
264 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
265 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
266 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
267 |
+
Some weights of the model checkpoint at textattack/roberta-base-CoLA were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
|
268 |
+
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
269 |
+
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
270 |
+
The BetterTransformer implementation does not support padding during training, as the fused kernels do not support attention masks. Beware that passing padded batched data during training may result in unexpected outputs. Please refer to https://huggingface.co/docs/optimum/bettertransformer/overview for more details.
|
271 |
+
Framework not specified. Using pt to export the model.
|
272 |
+
Some weights of the model checkpoint at textattack/roberta-base-CoLA were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
|
273 |
+
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
|
274 |
+
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
|
275 |
+
Using the export variant default. Available variants are:
|
276 |
+
- default: The default ONNX variant.
|
277 |
+
|
278 |
+
***** Exporting submodel 1/1: RobertaForSequenceClassification *****
|
279 |
+
Using framework PyTorch: 2.3.0+cu121
|
280 |
+
Overriding 1 configuration item(s)
|
281 |
+
- use_cache -> False
|
282 |
+
Framework not specified. Using pt to export the model.
|
283 |
+
Using the export variant default. Available variants are:
|
284 |
+
- default: The default ONNX variant.
|
285 |
+
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
|
286 |
+
Non-default generation parameters: {'max_length': 512, 'min_length': 8, 'num_beams': 2, 'no_repeat_ngram_size': 4}
|
287 |
+
|
288 |
+
***** Exporting submodel 1/3: T5Stack *****
|
289 |
+
Using framework PyTorch: 2.3.0+cu121
|
290 |
+
Overriding 1 configuration item(s)
|
291 |
+
- use_cache -> False
|
292 |
+
|
293 |
+
***** Exporting submodel 2/3: T5ForConditionalGeneration *****
|
294 |
+
Using framework PyTorch: 2.3.0+cu121
|
295 |
+
Overriding 1 configuration item(s)
|
296 |
+
- use_cache -> True
|
297 |
+
/usr/local/lib/python3.9/dist-packages/transformers/modeling_utils.py:1017: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
|
298 |
+
if causal_mask.shape[1] < attention_mask.shape[1]:
|
299 |
+
|
300 |
+
***** Exporting submodel 3/3: T5ForConditionalGeneration *****
|
301 |
+
Using framework PyTorch: 2.3.0+cu121
|
302 |
+
Overriding 1 configuration item(s)
|
303 |
+
- use_cache -> True
|
304 |
+
/usr/local/lib/python3.9/dist-packages/transformers/models/t5/modeling_t5.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
|
305 |
+
elif past_key_value.shape[2] != key_value_states.shape[1]:
|
306 |
+
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
|
307 |
+
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
|
308 |
+
Some non-default generation parameters are set in the model config. These should go into a GenerationConfig file (https://huggingface.co/docs/transformers/generation_strategies#save-a-custom-decoding-strategy-with-your-model) instead. This warning will be raised to an exception in v4.41.
|
309 |
+
Non-default generation parameters: {'max_length': 512, 'min_length': 8, 'num_beams': 2, 'no_repeat_ngram_size': 4}
|
310 |
+
[nltk_data] Downloading package cmudict to /root/nltk_data...
|
311 |
+
[nltk_data] Package cmudict is already up-to-date!
|
312 |
+
[nltk_data] Downloading package punkt to /root/nltk_data...
|
313 |
+
[nltk_data] Package punkt is already up-to-date!
|
314 |
+
[nltk_data] Downloading package stopwords to /root/nltk_data...
|
315 |
+
[nltk_data] Package stopwords is already up-to-date!
|
316 |
+
[nltk_data] Downloading package wordnet to /root/nltk_data...
|
317 |
+
[nltk_data] Package wordnet is already up-to-date!
|
318 |
+
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.1) or chardet (4.0.0) doesn't match a supported version!
|
319 |
+
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
|
320 |
+
Collecting en_core_web_sm==2.3.1
|
321 |
+
Using cached en_core_web_sm-2.3.1-py3-none-any.whl
|
322 |
+
Requirement already satisfied: spacy<2.4.0,>=2.3.0 in /usr/local/lib/python3.9/dist-packages (from en_core_web_sm==2.3.1) (2.3.9)
|
323 |
+
Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.1.3)
|
324 |
+
Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.10)
|
325 |
+
Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.2)
|
326 |
+
Requirement already satisfied: blis<0.8.0,>=0.4.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.7.11)
|
327 |
+
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (52.0.0)
|
328 |
+
Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.26.4)
|
329 |
+
Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/lib/python3/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.25.1)
|
330 |
+
Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (4.66.2)
|
331 |
+
Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.10.1)
|
332 |
+
Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.0.9)
|
333 |
+
Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.0.8)
|
334 |
+
Requirement already satisfied: thinc<7.5.0,>=7.4.1 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (7.4.6)
|
335 |
+
Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.9/dist-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.7)
|
336 |
+
[38;5;2m✔ Download and installation successful[0m
|
337 |
+
You can now load the model via spacy.load('en_core_web_sm')
|
338 |
+
/usr/local/lib/python3.9/dist-packages/gradio/utils.py:953: UserWarning: Expected 1 arguments for function <function depth_analysis at 0x7f149d70dee0>, received 2.
|
339 |
+
warnings.warn(
|
340 |
+
/usr/local/lib/python3.9/dist-packages/gradio/utils.py:961: UserWarning: Expected maximum 1 arguments for function <function depth_analysis at 0x7f149d70dee0>, received 2.
|
341 |
+
warnings.warn(
|
342 |
+
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
|
343 |
+
To disable this warning, you can either:
|
344 |
+
- Avoid using `tokenizers` before the fork if possible
|
345 |
+
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
|
346 |
+
/usr/local/lib/python3.9/dist-packages/torch/cuda/__init__.py:619: UserWarning: Can't initialize NVML
|
347 |
+
warnings.warn("Can't initialize NVML")
|
348 |
+
/usr/local/lib/python3.9/dist-packages/optimum/bettertransformer/models/encoder_models.py:301: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at ../aten/src/ATen/NestedTensorImpl.cpp:178.)
|
349 |
+
hidden_states = torch._nested_tensor_from_mask(hidden_states, ~attention_mask)
|
350 |
+
WARNING: Invalid HTTP request received.
|
351 |
+
WARNING: Invalid HTTP request received.
|
352 |
+
WARNING: Invalid HTTP request received.
|
353 |
+
WARNING: Invalid HTTP request received.
|
354 |
+
WARNING: Invalid HTTP request received.
|
355 |
+
WARNING: Invalid HTTP request received.
|
356 |
+
WARNING: Invalid HTTP request received.
|
357 |
+
WARNING: Invalid HTTP request received.
|
358 |
+
WARNING: Invalid HTTP request received.
|
359 |
+
WARNING: Invalid HTTP request received.
|
360 |
+
WARNING: Invalid HTTP request received.
|
361 |
+
WARNING: Invalid HTTP request received.
|
pdf_supporter/demo.py
ADDED
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
import fitz # PyMuPDF
|
3 |
+
from PIL import Image
|
4 |
+
import pytesseract
|
5 |
+
import numpy as np
|
6 |
+
from streamlit_drawable_canvas import st_canvas
|
7 |
+
import io
|
8 |
+
|
9 |
+
def pdf_page_to_image(doc, page_number=0, scale=1.0):
|
10 |
+
page = doc.load_page(page_number)
|
11 |
+
pix = page.get_pixmap(matrix=fitz.Matrix(scale, scale))
|
12 |
+
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
|
13 |
+
gray_img = img.convert("L")
|
14 |
+
return gray_img
|
15 |
+
|
16 |
+
def extract_text_tesseract(image):
|
17 |
+
"""Use Tesseract to extract text from an image."""
|
18 |
+
return pytesseract.image_to_string(image)
|
19 |
+
|
20 |
+
def main():
|
21 |
+
st.sidebar.title("PDF Navigation")
|
22 |
+
pdf_file = st.sidebar.file_uploader("Upload a PDF file", type=["pdf"])
|
23 |
+
if pdf_file:
|
24 |
+
doc = fitz.open("pdf", pdf_file.getvalue())
|
25 |
+
total_pages = doc.page_count
|
26 |
+
selected_page = st.sidebar.slider("Select a Page", 1, total_pages, 1) - 1
|
27 |
+
zoom_factor = st.sidebar.slider("Zoom Factor", 0.5, 3.0, 1.0, 0.1)
|
28 |
+
|
29 |
+
img = pdf_page_to_image(doc, page_number=selected_page, scale=zoom_factor)
|
30 |
+
img_array = np.array(img)
|
31 |
+
|
32 |
+
# Container to add scrollbars
|
33 |
+
container = st.container()
|
34 |
+
with container:
|
35 |
+
st.image(img_array, use_column_width=True, caption=f"Page {selected_page + 1}")
|
36 |
+
|
37 |
+
canvas_result = st_canvas(
|
38 |
+
fill_color="rgba(255, 165, 0, 0.3)",
|
39 |
+
stroke_width=0,
|
40 |
+
stroke_color="#ffffff",
|
41 |
+
background_image=Image.fromarray(img_array),
|
42 |
+
update_streamlit=True,
|
43 |
+
height=int(img.height),
|
44 |
+
width=int(img.width),
|
45 |
+
drawing_mode="rect",
|
46 |
+
key="canvas" + str(selected_page) + str(zoom_factor),
|
47 |
+
)
|
48 |
+
|
49 |
+
if st.button("Extract Text from Selected Region"):
|
50 |
+
selected_areas = len(canvas_result.json_data["objects"])
|
51 |
+
texts = []
|
52 |
+
for area_id in range(selected_areas):
|
53 |
+
bbox = canvas_result.json_data["objects"][area_id] if canvas_result.json_data["objects"] else None
|
54 |
+
if bbox:
|
55 |
+
x, y, w, h = bbox['left'], bbox['top'], bbox['width'], bbox['height']
|
56 |
+
rect = [int(x), int(y), int(x + w), int(y + h)]
|
57 |
+
img_crop = img.crop(rect)
|
58 |
+
text = extract_text_tesseract(img_crop)
|
59 |
+
texts.append(text)
|
60 |
+
|
61 |
+
for id, text in enumerate(texts):
|
62 |
+
st.write(f"Extracted Text from selection {id}:")
|
63 |
+
st.write(text)
|
64 |
+
|
65 |
+
doc.close()
|
66 |
+
|
67 |
+
if __name__ == "__main__":
|
68 |
+
main()
|
pdf_supporter/nohup.out
ADDED
@@ -0,0 +1,157 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
|
3 |
+
|
4 |
+
|
5 |
+
You can now view your Streamlit app in your browser.
|
6 |
+
|
7 |
+
Network URL: http://10.138.0.11:8501
|
8 |
+
External URL: http://34.127.13.224:8501
|
9 |
+
|
10 |
+
2024-04-26 14:13:40.949 Uncaught app exception
|
11 |
+
Traceback (most recent call last):
|
12 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
|
13 |
+
exec(code, module.__dict__)
|
14 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 70, in <module>
|
15 |
+
main()
|
16 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 59, in main
|
17 |
+
img_crop = Image.open(io.BytesIO(pix.tobytes("ppm")))
|
18 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 10343, in tobytes
|
19 |
+
barray = self._tobytes(idx, jpg_quality)
|
20 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 9908, in _tobytes
|
21 |
+
elif format_ == 2: mupdf.fz_write_pixmap_as_pnm(out, pm)
|
22 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/mupdf.py", line 47561, in fz_write_pixmap_as_pnm
|
23 |
+
return _mupdf.fz_write_pixmap_as_pnm(out, pixmap)
|
24 |
+
fitz.mupdf.FzErrorArgument: code=4: Invalid bandwriter header dimensions/setup
|
25 |
+
2024-04-26 14:17:16.926 MediaFileHandler: Missing file b320a2d622a2a8bb698e3f4f3ba9f41c589b552f5f1d16d8e2bda11f.png
|
26 |
+
2024-04-26 16:05:37.997 Uncaught app exception
|
27 |
+
Traceback (most recent call last):
|
28 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
|
29 |
+
exec(code, module.__dict__)
|
30 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 70, in <module>
|
31 |
+
main()
|
32 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 59, in main
|
33 |
+
img_crop = Image.open(io.BytesIO(pix.tobytes("ppm")))
|
34 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 10343, in tobytes
|
35 |
+
barray = self._tobytes(idx, jpg_quality)
|
36 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 9908, in _tobytes
|
37 |
+
elif format_ == 2: mupdf.fz_write_pixmap_as_pnm(out, pm)
|
38 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/mupdf.py", line 47561, in fz_write_pixmap_as_pnm
|
39 |
+
return _mupdf.fz_write_pixmap_as_pnm(out, pixmap)
|
40 |
+
fitz.mupdf.FzErrorArgument: code=4: Invalid bandwriter header dimensions/setup
|
41 |
+
2024-04-26 16:05:47.320 Uncaught app exception
|
42 |
+
Traceback (most recent call last):
|
43 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
|
44 |
+
exec(code, module.__dict__)
|
45 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 70, in <module>
|
46 |
+
main()
|
47 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 59, in main
|
48 |
+
img_crop = Image.open(io.BytesIO(pix.tobytes("ppm")))
|
49 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 10343, in tobytes
|
50 |
+
barray = self._tobytes(idx, jpg_quality)
|
51 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 9908, in _tobytes
|
52 |
+
elif format_ == 2: mupdf.fz_write_pixmap_as_pnm(out, pm)
|
53 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/mupdf.py", line 47561, in fz_write_pixmap_as_pnm
|
54 |
+
return _mupdf.fz_write_pixmap_as_pnm(out, pixmap)
|
55 |
+
fitz.mupdf.FzErrorArgument: code=4: Invalid bandwriter header dimensions/setup
|
56 |
+
2024-04-26 16:07:36.641 Uncaught exception GET /media/4bfb60f3fade3edbf619f6357c60a9b159e32c6c164ed2ca91e47ed4.png (185.118.51.182)
|
57 |
+
HTTPServerRequest(protocol='http', host='sd.demo.polygraf.ai:8501', method='GET', uri='/media/4bfb60f3fade3edbf619f6357c60a9b159e32c6c164ed2ca91e47ed4.png', version='HTTP/1.1', remote_ip='185.118.51.182')
|
58 |
+
Traceback (most recent call last):
|
59 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/memory_media_file_storage.py", line 140, in get_file
|
60 |
+
return self._files_by_id[file_id]
|
61 |
+
KeyError: '4bfb60f3fade3edbf619f6357c60a9b159e32c6c164ed2ca91e47ed4'
|
62 |
+
|
63 |
+
The above exception was the direct cause of the following exception:
|
64 |
+
|
65 |
+
Traceback (most recent call last):
|
66 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/tornado/web.py", line 1790, in _execute
|
67 |
+
result = await result
|
68 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/tornado/web.py", line 2693, in get
|
69 |
+
self.set_headers()
|
70 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/tornado/web.py", line 2805, in set_headers
|
71 |
+
self.set_extra_headers(self.path)
|
72 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/web/server/media_file_handler.py", line 59, in set_extra_headers
|
73 |
+
media_file = self._storage.get_file(path)
|
74 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/memory_media_file_storage.py", line 142, in get_file
|
75 |
+
raise MediaFileStorageError(
|
76 |
+
streamlit.runtime.media_file_storage.MediaFileStorageError: Bad filename '4bfb60f3fade3edbf619f6357c60a9b159e32c6c164ed2ca91e47ed4.png'. (No media file with id '4bfb60f3fade3edbf619f6357c60a9b159e32c6c164ed2ca91e47ed4')
|
77 |
+
2024-04-26 16:07:36.683 500 GET /media/4bfb60f3fade3edbf619f6357c60a9b159e32c6c164ed2ca91e47ed4.png (185.118.51.182) 85.45ms
|
78 |
+
2024-04-26 16:10:07.086 Uncaught app exception
|
79 |
+
Traceback (most recent call last):
|
80 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
|
81 |
+
exec(code, module.__dict__)
|
82 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 70, in <module>
|
83 |
+
main()
|
84 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 59, in main
|
85 |
+
img_crop = Image.open(io.BytesIO(pix.tobytes("ppm")))
|
86 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 10343, in tobytes
|
87 |
+
barray = self._tobytes(idx, jpg_quality)
|
88 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 9908, in _tobytes
|
89 |
+
elif format_ == 2: mupdf.fz_write_pixmap_as_pnm(out, pm)
|
90 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/mupdf.py", line 47561, in fz_write_pixmap_as_pnm
|
91 |
+
return _mupdf.fz_write_pixmap_as_pnm(out, pixmap)
|
92 |
+
fitz.mupdf.FzErrorArgument: code=4: Invalid bandwriter header dimensions/setup
|
93 |
+
2024-04-26 16:11:34.899 Uncaught app exception
|
94 |
+
Traceback (most recent call last):
|
95 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
|
96 |
+
exec(code, module.__dict__)
|
97 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 70, in <module>
|
98 |
+
main()
|
99 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 59, in main
|
100 |
+
img_crop = Image.open(io.BytesIO(pix.tobytes("ppm")))
|
101 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 10343, in tobytes
|
102 |
+
barray = self._tobytes(idx, jpg_quality)
|
103 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 9908, in _tobytes
|
104 |
+
elif format_ == 2: mupdf.fz_write_pixmap_as_pnm(out, pm)
|
105 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/mupdf.py", line 47561, in fz_write_pixmap_as_pnm
|
106 |
+
return _mupdf.fz_write_pixmap_as_pnm(out, pixmap)
|
107 |
+
fitz.mupdf.FzErrorArgument: code=4: Invalid bandwriter header dimensions/setup
|
108 |
+
2024-04-26 16:41:51.246 Uncaught app exception
|
109 |
+
Traceback (most recent call last):
|
110 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
|
111 |
+
exec(code, module.__dict__)
|
112 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 70, in <module>
|
113 |
+
main()
|
114 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 59, in main
|
115 |
+
img_crop = Image.open(io.BytesIO(pix.tobytes("ppm")))
|
116 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 10343, in tobytes
|
117 |
+
barray = self._tobytes(idx, jpg_quality)
|
118 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 9908, in _tobytes
|
119 |
+
elif format_ == 2: mupdf.fz_write_pixmap_as_pnm(out, pm)
|
120 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/mupdf.py", line 47561, in fz_write_pixmap_as_pnm
|
121 |
+
return _mupdf.fz_write_pixmap_as_pnm(out, pixmap)
|
122 |
+
fitz.mupdf.FzErrorArgument: code=4: Invalid bandwriter header dimensions/setup
|
123 |
+
2024-04-26 16:42:03.642 Uncaught app exception
|
124 |
+
Traceback (most recent call last):
|
125 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 584, in _run_script
|
126 |
+
exec(code, module.__dict__)
|
127 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 70, in <module>
|
128 |
+
main()
|
129 |
+
File "/home/aliasgarov/pdf_supporter/demo.py", line 59, in main
|
130 |
+
img_crop = Image.open(io.BytesIO(pix.tobytes("ppm")))
|
131 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 10343, in tobytes
|
132 |
+
barray = self._tobytes(idx, jpg_quality)
|
133 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/__init__.py", line 9908, in _tobytes
|
134 |
+
elif format_ == 2: mupdf.fz_write_pixmap_as_pnm(out, pm)
|
135 |
+
File "/home/aliasgarov/pdfsupport/lib/python3.10/site-packages/fitz/mupdf.py", line 47561, in fz_write_pixmap_as_pnm
|
136 |
+
return _mupdf.fz_write_pixmap_as_pnm(out, pixmap)
|
137 |
+
fitz.mupdf.FzErrorArgument: code=4: Invalid bandwriter header dimensions/setup
|
138 |
+
|
139 |
+
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
|
140 |
+
|
141 |
+
|
142 |
+
You can now view your Streamlit app in your browser.
|
143 |
+
|
144 |
+
Network URL: http://10.138.0.11:8501
|
145 |
+
External URL: http://104.196.227.207:8501
|
146 |
+
|
147 |
+
Stopping...
|
148 |
+
|
149 |
+
Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
|
150 |
+
|
151 |
+
|
152 |
+
You can now view your Streamlit app in your browser.
|
153 |
+
|
154 |
+
Network URL: http://10.138.0.11:8501
|
155 |
+
External URL: http://104.196.227.207:8501
|
156 |
+
|
157 |
+
2024-05-03 16:30:12.544 MediaFileHandler: Missing file f2b29efae916d8154f1cdf1d3c4c439869290e2015c09e061442ab9a.png
|
pdf_supporter/requirements.txt
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
streamlit
|
2 |
+
streamlit_drawable_canvas
|
3 |
+
tesseract
|
4 |
+
fitz
|
5 |
+
frontend
|
6 |
+
pymupdf
|