Hung Bui ashrma commited on
Commit
92d8eff
0 Parent(s):

Duplicate from ashrma/Chat-with-Docs

Browse files

Co-authored-by: Anoop Sharma <[email protected]>

Files changed (5) hide show
  1. .gitattributes +34 -0
  2. README.md +51 -0
  3. app.py +131 -0
  4. documents/sample.txt +176 -0
  5. requirements.txt +5 -0
.gitattributes ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tflite filter=lfs diff=lfs merge=lfs -text
29
+ *.tgz filter=lfs diff=lfs merge=lfs -text
30
+ *.wasm filter=lfs diff=lfs merge=lfs -text
31
+ *.xz filter=lfs diff=lfs merge=lfs -text
32
+ *.zip filter=lfs diff=lfs merge=lfs -text
33
+ *.zst filter=lfs diff=lfs merge=lfs -text
34
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Chat with Docs
3
+ emoji: 🦙
4
+ license: mit
5
+ sdk: streamlit
6
+ python_version: 3.9
7
+ app_file: app.py
8
+ colorFrom: pink
9
+ colorTo: blue
10
+ pinned: false
11
+ duplicated_from: ashrma/Chat-with-Docs
12
+ ---
13
+ # Chat-with-Docs
14
+
15
+ ![image](https://user-images.githubusercontent.com/26565263/236671146-5fc5d5f0-4acb-40c7-9d9a-dc072efd8078.png)
16
+
17
+ Chat with your Docs and gain better insights. Powered by `LlamaIndex` and `Streamlit` is used for UI.
18
+ Handles `CSV/PDFs/Txt/Doc`. CSV file is catered via [PandasAI](https://llamahub.ai/l/pandas_ai) loader and rest of the docs are handled via
19
+ `GPTVectorStoreIndex`.
20
+
21
+ Clone the repo or copy the `.py ` file in your local machine.
22
+
23
+ ## Install required Dependencies
24
+ ```
25
+ pip install -r requirements.txt
26
+ ```
27
+
28
+ ## Create a folder in the root dir and name it as `documents`
29
+
30
+ ## Run the application
31
+ `streamlit run chat_with_docs.py`
32
+
33
+ ## How to Contribute
34
+ Feel free to open any Issue or PR request. This small application can help anyone to interact with their docs more smartly in just 2-3 steps.
35
+
36
+ ## Roadmap
37
+ - [ ] Add support for choosing in between GPT-3/GPT-3.5/GPT-4 or HuggingFace model for creating vectors and generating rich responses.
38
+ - [x] Blog explaining the entire application in detail.
39
+ - [ ] Add Docker support.
40
+ - [ ] Deploy the project on Streamlit or DataButton platform.
41
+ - [ ] Add support to handle multiple files at once.
42
+
43
+ ## Snapshots
44
+ - Upload a CSV file. Get better insights by just asking question, Render graphs based on the Data
45
+ ![image](https://user-images.githubusercontent.com/26565263/236671237-8517eecd-59f5-4961-8e33-772a26e92962.png)
46
+ ![image](https://user-images.githubusercontent.com/26565263/236671280-e5e9da7a-dd32-4af2-bd79-42545ad67d07.png)
47
+ ![image](https://user-images.githubusercontent.com/26565263/236671344-31967a79-2601-4cf2-bb2e-12a9eaf9429d.png)
48
+
49
+ - In Doc section, Upload PDFs/Txt/Docs to chat with your docs directly. No need to press `CTRL+F` to search for anything in the Docs
50
+ ![image](https://user-images.githubusercontent.com/26565263/236671378-650d387f-57ad-4738-9bd0-15229f7e2e1d.png)
51
+ ![image](https://user-images.githubusercontent.com/26565263/236671580-0b032941-6c89-430a-a42c-f68655d39f71.png)
app.py ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader
2
+ from llama_index import download_loader
3
+ from pandasai.llm.openai import OpenAI
4
+ from matplotlib import pyplot as plt
5
+ import streamlit as st
6
+ import pandas as pd
7
+ import os
8
+
9
+
10
+ documents_folder = "./documents"
11
+
12
+ # Load PandasAI loader, Which is a wrapper over PandasAI library
13
+ PandasAIReader = download_loader("PandasAIReader")
14
+
15
+ st.title("Welcome to `ChatwithDocs`")
16
+ st.header("Interact with Documents such as `PDFs/CSV/Docs` using the power of LLMs\nPowered by `LlamaIndex🦙` \nCheckout the [GITHUB Repo Here](https://github.com/anoopshrma/Chat-with-Docs) and Leave a star⭐")
17
+
18
+
19
+ def get_csv_result(df, query):
20
+ reader = PandasAIReader(llm=csv_llm)
21
+ response = reader.run_pandas_ai(
22
+ df,
23
+ query,
24
+ is_conversational_answer=False
25
+ )
26
+ return response
27
+
28
+ def save_file(doc):
29
+ fn = os.path.basename(doc.name)
30
+ # open read and write the file into the server
31
+ open(documents_folder+'/'+fn, 'wb').write(doc.read())
32
+ # Check for the current filename, If new filename
33
+ # clear the previous cached vectors and update the filename
34
+ # with current name
35
+ if st.session_state.get('file_name'):
36
+ if st.session_state.file_name != fn:
37
+ st.cache_resource.clear()
38
+ st.session_state['file_name'] = fn
39
+ else:
40
+ st.session_state['file_name'] = fn
41
+
42
+ return fn
43
+
44
+ def remove_file(file_path):
45
+ # Remove the file from the Document folder once
46
+ # vectors are created
47
+ if os.path.isfile(documents_folder+'/'+file_path):
48
+ os.remove(documents_folder+'/'+file_path)
49
+
50
+
51
+
52
+ @st.cache_resource
53
+ def create_index():
54
+ # Create vectors for the file stored under Document folder.
55
+ # NOTE: You can create vectors for multiple files at once.
56
+ documents = SimpleDirectoryReader(documents_folder).load_data()
57
+ index = GPTVectorStoreIndex.from_documents(documents)
58
+ return index
59
+
60
+
61
+
62
+ def query_doc(vector_index, query):
63
+ # Applies Similarity Algo, Finds the nearest match and
64
+ # take the match and user query to OpenAI for rich response
65
+ query_engine = vector_index.as_query_engine()
66
+ response = query_engine.query(query)
67
+ return response
68
+
69
+
70
+ api_key = st.text_input("Enter your OpenAI API key here:", type="password")
71
+ if api_key:
72
+ os.environ['OPENAI_API_KEY'] = api_key
73
+ csv_llm = OpenAI(api_token=api_key)
74
+
75
+
76
+ tab1, tab2= st.tabs(["CSV", "PDFs/Docs"])
77
+
78
+ with tab1:
79
+
80
+ st.write("Chat with CSV files using PandasAI loader with LlamaIndex")
81
+ input_csv = st.file_uploader("Upload your CSV file", type=['csv'])
82
+
83
+ if input_csv is not None:
84
+ st.info("CSV Uploaded Successfully")
85
+ df = pd.read_csv(input_csv)
86
+ st.dataframe(df, use_container_width=True)
87
+
88
+
89
+ st.write("---")
90
+
91
+ input_text = st.text_area("Ask your query")
92
+
93
+ if input_text is not None:
94
+ if st.button("Send"):
95
+ st.info("Your query: "+ input_text)
96
+ with st.spinner('Processing your query...'):
97
+ response = get_csv_result(df, input_text)
98
+ if plt.get_fignums():
99
+ st.pyplot(plt.gcf())
100
+ else:
101
+ st.success(response)
102
+
103
+
104
+ with tab2:
105
+ st.write("Chat with PDFs/Docs")
106
+ input_doc = st.file_uploader("Upload your Docs")
107
+
108
+ if input_doc is not None:
109
+ st.info("Doc Uploaded Successfully")
110
+ file_name = save_file(input_doc)
111
+ index = create_index()
112
+ remove_file(file_name)
113
+
114
+
115
+ st.write("---")
116
+ input_text = st.text_area("Ask your question")
117
+
118
+ if input_text is not None:
119
+ if st.button("Ask"):
120
+ st.info("Your query: \n" +input_text)
121
+ with st.spinner("Processing your query.."):
122
+ response = query_doc(index, input_text)
123
+ print(response)
124
+
125
+ st.success(response)
126
+
127
+ st.write("---")
128
+ # Shows the source documents context which
129
+ # has been used to prepare the response
130
+ st.write("Source Documents")
131
+ st.write(response.get_formatted_sources())
documents/sample.txt ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ UNITED STATES
2
+ SECURITIES AND EXCHANGE COMMISSION
3
+ Washington, D.C. 20549
4
+ ____________________________________________
5
+ FORM 10-K
6
+ ____________________________________________
7
+ (Mark One)
8
+
9
+ ☒ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
10
+ For the fiscal year ended December 31, 2019
11
+ OR
12
+ ☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934
13
+ For the transition period from_____ to _____
14
+ Commission File Number: 001-38902
15
+ ____________________________________________
16
+ UBER TECHNOLOGIES, INC.
17
+ (Exact name of registrant as specified in its charter)
18
+ ____________________________________________
19
+ Delaware
20
+ 45-2647441
21
+ (State or other jurisdiction of incorporation or organization)
22
+ (I.R.S. Employer Identification No.)
23
+ 1455 Market Street, 4th Floor
24
+ San Francisco, California 94103
25
+ (Address of principal executive offices, including zip code)
26
+ (415) 612-8582
27
+ (Registrant’s telephone number, including area code)
28
+ ____________________________________________
29
+ Securities registered pursuant to Section 12(b) of the Act:
30
+ Title of each class
31
+
32
+ Trading Symbol(s)
33
+
34
+ Name of each exchange on which registered
35
+ Common Stock, par value $0.00001 per share
36
+
37
+ UBER
38
+
39
+ New York Stock Exchange
40
+ Securities registered pursuant to Section 12(g) of the Act: None
41
+ Indicate by check mark whether the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes ☐ No ☒
42
+ Indicate by check mark whether the registrant is not required to file reports pursuant to Section 13 or Section 15(d) of the Act. Yes ☐ No ☒
43
+ Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days. Yes ☒ No ☐
44
+ Indicate by check mark whether the registrant has submitted electronically every Interactive Data File required to be submitted pursuant to Rule 405 of Regulation S-T (§232.405 of this chapter) during the preceding 12 months (or for such shorter period that the registrant was required to submit such files). Yes ☒ No ☐
45
+ Indicate by check mark whether the registrant is a large accelerated filer, an accelerated filer, a non-accelerated filer, a smaller reporting company, or an emerging growth company. See the definitions of “large accelerated filer,” “accelerated filer,” “smaller reporting company,” and “emerging growth company” in Rule 12b-2 of the Exchange Act.
46
+
47
+
48
+ Large accelerated filer
49
+
50
+
51
+
52
+ Accelerated filer
53
+
54
+ Non-accelerated filer
55
+
56
+
57
+
58
+ Smaller reporting company
59
+
60
+
61
+
62
+
63
+
64
+ Emerging growth company
65
+
66
+ If an emerging growth company, indicate by check mark if the registrant has elected not to use the extended transition period for complying with any new or revised financial accounting standards provided pursuant to Section 13(a) of the Exchange Act.
67
+
68
+ Indicate by check mark whether the registrant is a shell company (as defined in Rule 12b-2 of the Exchange Act). Yes ☐ No ☒
69
+ The aggregate market value of the voting and non-voting common equity held by non-affiliates of the registrant as of June 28, 2019, the last business day of the registrant's most recently completed second fiscal quarter, was approximately $59.7 billion based upon the closing price reported for such date on the New York Stock Exchange.
70
+ The number of shares of the registrant's common stock outstanding as of February 19, 2020 was 1,723,775,076.
71
+ DOCUMENTS INCORPORATED BY REFERENCE
72
+ Portions of the registrant’s Definitive Proxy Statement relating to the Annual Meeting of Stockholders are incorporated by reference into Part III of this Annual Report on Form 10-K where indicated. Such Definitive Proxy Statement will be filed with the Securities and Exchange Commission within 120 days after the end of the registrant’s fiscal year ended December 31, 2019.
73
+
74
+
75
+ UBER TECHNOLOGIES, INC.
76
+ TABLE OF CONTENTS
77
+
78
+
79
+ Pages
80
+
81
+ Special Note Regarding Forward-Looking Statements
82
+ 2
83
+
84
+
85
+
86
+ PART I
87
+
88
+
89
+ Item 1.
90
+ Business
91
+ 4
92
+ Item 1A.
93
+ Risk Factors
94
+ 9
95
+ Item 1B.
96
+ Unresolved Staff Comments
97
+ 44
98
+ Item 2.
99
+ Properties
100
+ 44
101
+ Item 3.
102
+ Legal Proceedings
103
+ 44
104
+ Item 4.
105
+ Mine Safety Disclosures
106
+ 45
107
+
108
+
109
+
110
+ PART II
111
+
112
+
113
+ Item 5.
114
+ Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities
115
+ 45
116
+ Item 6.
117
+ Selected Financial Data
118
+ 46
119
+ Item 7.
120
+ Management’s Discussion and Analysis of Financial Condition and Results of Operations
121
+ 48
122
+ Item 7A.
123
+ Quantitative and Qualitative Disclosures About Market Risk
124
+ 75
125
+ Item 8.
126
+ Financial Statements and Supplementary Data
127
+ 76
128
+ Item 9.
129
+ Changes in and Disagreements with Accountants on Accounting and Financial Disclosure
130
+ 139
131
+ Item 9A.
132
+ Controls and Procedures
133
+ 139
134
+ Item 9B.
135
+ Other Information
136
+ 140
137
+
138
+
139
+
140
+ PART III
141
+
142
+
143
+ Item 10.
144
+ Directors, Executive Officers and Corporate Governance
145
+ 140
146
+ Item 11.
147
+ Executive Compensation
148
+ 140
149
+ Item 12.
150
+ Security Ownership of Certain Beneficial Owners and Management and Related Stockholder Matters
151
+ 140
152
+ Item 13.
153
+ Certain Relationships and Related Transactions, and Director Independence
154
+ 140
155
+ Item 14.
156
+ Principal Accounting Fees and Services
157
+ 140
158
+
159
+
160
+
161
+ PART IV
162
+
163
+
164
+ Item 15.
165
+ Exhibits, Financial Statement Schedules
166
+ 140
167
+ Item 16.
168
+ Form 10-K Summary
169
+ 140
170
+
171
+ Exhibit Index
172
+ 141
173
+
174
+ Signatures
175
+ 143
176
+
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ llama-index
2
+ streamlit
3
+ pandas
4
+ pandasai
5
+ PyPDF2