Spaces:

umangchaudhry
/

free-speech-test

Runtime error

App Files Files Community

Umang Chaudhry commited on Oct 1, 2023

Commit

329ead2

unverified ·

0 Parent(s):

Add files via upload

Browse files

Files changed (7) hide show

README.md +110 -0
apikeyex.png +0 -0
app.py +227 -0
config.yaml +17 -0
free_speech_app/DataLoadDb.py +50 -0
free_speech_app/FreeSpeechPromptsResponses.py +115 -0
requirements.txt +12 -0

README.md ADDED Viewed

	@@ -0,0 +1,110 @@

+# Future of Free Speech App.
+In this project we’ll be creating a tool to empower people to respond to hate speech and misinformation only by creating an action dashboard that provides relevant background, and that drafts a response.  that respects:
+1. The principles of the Future of Free Speech program
+2. The principles of the responder
+3. Background information on the original post
+4. Guidance on possible best responses to the original post (e.g. counter information)
+5. Style and preferences of the responder
+## Access to Beta Version Here
+https://free-speech.streamlit.app/
+### Draft Main Screen
+![image](https://github.com/vanderbilt-data-science/free-speech-app/assets/5521243/9cb9dbd4-fa8d-4f18-bb7e-85fb7076f23a)
+### Draft Principles and Sources Screen
+![image](https://github.com/vanderbilt-data-science/free-speech-app/assets/5521243/95b932d8-20fb-4f94-aadb-aec71590f882)
+Students who are particularly interested in this:
+Eleanor Beers
+Katrina Rbeiz
+Sovann Chang
+## Features
+- The prototype solution could be a stand-alone dashboard application into which the responder would cut-and-paste the post to which they would like to respond.
+- Brainstorming ideas in the [Google drive](https://drive.google.com/drive/folders/1XvUthVpeZJ449nAEjeCPwvKy9v3oEHPd?usp=sharing)
+## Getting Started
+## Contributing
+To contribute to the project, please fork the repository and submit a pull request. Our community is supportive, and we provide training and classes if you're new to any of the frameworks used in the project. Everyone is welcome to contribute, as we believe participating in data science and AI projects is an excellent way to learn.
+### How to Contribute
+With `nbdev` it will always try to run every single cell of anything in the `nbs` directory. Yes, I definitely mean that it tries to execute every cell in a notebook unless you tell it not to. We tell it not to. To do this, in every new "source" notebook created in `nbs`, you need to include as a **raw** cell:
+```
+---
+skip_exec: true
+---
+```
+This suppresses tests. Now, it executes nothing. There's no model building, no nothing.
+#### Setup
+Make sure you've followed the instructions above for setup with `nbdev_install_hooks`
+#### Passing GH Actions
+Locally, you'll do whatever edits that you're after to finish your issue. Generally, after you've saved all of your changes locally (before you add/commit), you will run (at the command line when you're about to do all of your git operations):
+```
+nbdev_prepare
+```
+Now, this is when you add and commit changed files. Keep an eye out for any of the files in the `free_speech_app` module, e.g., `_modidx.py` or any new modules you have created. Make sure you add and commit them as well in this commit. Don't miss any! `nbdev_prepare` strips out metadata, builds the library (so it matches the source notebooks), runs tests, and creates docs. We're just casually avoiding the last 2 things.
+This is going to help you pass your GH Actions by preparing your repo locally. If you would fail any Actions, it will tell you why they've failed locally before pushing to GH. However, for unknown reasons, sometimes you pass the `nbdev_prepare` locally while somehow failing the GH Actions remotely? If that's the case, here's some troubleshooting help.
+If you're failing actions, this basically means:
+* Detected unstripped out notebooks: you need to run `nbdev_prepare` which strips out all of the metadata
+* Something about the library not matching the source notebooks: you need to run `nbdev_prepare` which converts your source notebooks into the modules. **Also, make sure that you have added and committed all of the .py files that you've generated or have been changed**.
+* Something about nbdev_install_hooks: run `nbdev_install_hooks`
+* Something about tests failed or the GH Actions log seems to look like it's trying to run code: You need to add or fix the raw code cell with `skip_exec=true` and the `yaml triple line indicators` above in all of the notebooks. Run `nbdev_prepare` and it will error out on some notebook. That tells you where you need to add that raw code cell.
+* If none of these are true, your local version of nbdev might differ from the one used in GH Actions. If that's the case, hop on over to pyPI to see if there has been a recent update.
+## Community Guidelines
+We aim to create a welcoming and inclusive community where everyone can feel comfortable and valued, regardless of skill level, background, ability, or identity. To ensure a positive atmosphere, please adhere to our code of conduct and community guidelines.
+## Meetings
+[Zoom Link](https://vanderbilt.zoom.us/j/93382420275?pwd=Z0xrSFZJelcrY1ZCNk9xQ3A1R3Irdz09 )
+- Sprint Planning: 9:30am Mondays
+- Backlog Grooming: 11am Thursdays
+- Retrospective: 9:30am Fridays
+- Demos: Fridays at 3 pm
+## Additional Resources
+- LangChain documentation
+- Introduction to transformers and generative AI on our [YouTube channel](https://www.youtube.com/channel/UC8C2_3L5gR9qLmL7rmb2BdQ)
+- AI Summer and AI Winter sessions (free and open to all)
+## Reporting Issues
+If you encounter a bug, please submit an issue and label it with "Bug." To escalate the issue, email [[email protected]](mailto:[email protected]).
+## Contact Information
+- Organization: Data Science Institute at Vanderbilt University and Future of Free Speech Program
+- Main Email: [[email protected]](mailto:[email protected])
+- Staff Lead: [[email protected]](mailto:[email protected])
+- Staff Lead: [[email protected]](mailto:[email protected])
+Data Scientists
+- Eleanor Beers
+- Katrina Rbeiz
+- Sovann Chang
+Remember to replace "Staff lead and email" with the actual name and email address of the staff lead.
+## Red Team Observations
+[redteam.md](https://github.com/vanderbilt-data-science/free-speech-app/files/12718272/redteam.md)

apikeyex.png ADDED Viewed

app.py ADDED Viewed

	@@ -0,0 +1,227 @@

+import streamlit as st
+import streamlit_authenticator as stauth
+from deta import Deta
+import yaml
+from yaml.loader import SafeLoader
+import os
+from cryptography.fernet import Fernet
+from free_speech_app.DataLoadDb import *
+from free_speech_app.FreeSpeechPromptsResponses import *
+from langchain.chat_models import ChatOpenAI
+#connect to/create Deta user database
+deta = Deta(st.secrets["deta_key"])
+db = deta.Base("user_data")
+#fernet key (generated locally, stored in streamlit secrets)
+fernet = Fernet(bytes(st.secrets["fernet_key"],'utf-8'))
+#activeloop_token
+os.environ["ACTIVELOOP_TOKEN"] = st.secrets["deeplake_key"]
+config_drive = deta.Drive("passwords")
+config = config_drive.get("config.yaml").read()
+config = yaml.load(config, Loader=SafeLoader)
+# Create an authenticator
+authenticator = stauth.Authenticate(
+    config['credentials'],
+    config['cookie']['name'],
+    config['cookie']['key'],
+    config['cookie']['expiry_days'],
+    config['preauthorized']
+)
+def get_user_data(user):
+    data = db.fetch().items
+    for person in data:
+        if person['key'] == user:
+            return person
+    return None
+def encrypt(api_key: str, fernet) -> bytes:
+    """Encrypt the API key."""
+    return fernet.encrypt(api_key.encode())
+def decrypt(encrypted_api_key: bytes, fernet) -> str:
+    """Decrypt the encrypted API key."""
+    return fernet.decrypt(encrypted_api_key).decode()
+# Render the login module
+name, authentication_status, username = authenticator.login('Login', 'main')
+# If the user is authenticated
+if authentication_status:
+    authenticator.logout('Logout', 'main', key='unique_key')
+    st.write(f'Welcome *{name}*')
+    # Sidebar for navigation
+    page = st.sidebar.radio("Choose a page", ["OpenAI API Key Setup", "Account Setup", "Respond to Post"])
+    # Fetch user data from the database
+    user_data = get_user_data(username)
+    if page == "Account Setup":
+        st.title("Account Setup")
+        st.markdown("Please use this page to provide your OpenAI API Key, Principles and Writing Style. **Please make sure to press the Save Changes button after providing the information.**")
+        # Input boxes with existing data
+        if 'api_key' not in st.session_state:
+            st.session_state.api_key = ""
+        api_input = st.text_input("OpenAI API Key", value=decrypt(user_data["api_key"].encode()[2:-1], fernet) if user_data and "api_key" in user_data else "", type="password")
+        encrypted_api_key = str(encrypt(api_input, fernet))
+        st.session_state.api_key = api_input
+        principles = st.text_input("My Principles", placeholder = "Enter the main principles of your life you wish this response to uphold", value=user_data["principles"] if user_data and "principles" in user_data else "")
+        writing_style = st.text_input("My Writing Style (Paste Examples)", placeholder = "Provide examples of your writing style here", value=user_data["writing_style"] if user_data and "writing_style" in user_data else "")
+        sources = st.text_input("Sources (Provide all sources you would like to use)", value=st.session_state.sources if 'sources' in st.session_state else '', key = 'sources_key')
+        # Update button
+        if st.button("Save Changes"):
+            db.put({"key": username, "principles": principles, "writing_style": writing_style, "sources": sources, "api_key": encrypted_api_key})
+    if page == "OpenAI API Key Setup":
+        st.title("OpenAI API Key Setup")
+        st.header('What is an API key?')
+        st.write('An API (Application Programming Interface) key is like a password that allows you to access certain functions or data from a website or service. Many sites use API keys to identify you and control access to their APIs.')
+        st.header('Why do you need an API key?')
+        st.write('API keys allow sites to track usage and prevent abuse of their services. They help keep things secure. When you request an API key, the site knows the calls are coming from you.')
+        image = 'free_speech_app/apikeyex.png'
+        st.header('How to get an OpenAI API key:')
+        st.write('1. Go to https://platform.openai.com/account/api-keys')
+        st.write('2. Log in or create an OpenAI account if you do not have one')
+        st.write('3. Click "Create new secret key" and give your key a name')
+        st.image(image, caption=None, width=None, use_column_width=None, clamp=False, channels="RGB", output_format="auto")
+        st.write('4. Copy the generated API key and keep it private like a password')
+        st.header('Using your API key')
+        st.write('When making calls to the OpenAI API, include your API key in the request headers or parameters to authenticate.')
+        st.code('headers = {"Authorization": f"Bearer {YOUR_API_KEY}"}')
+        st.warning('Treat your API key like a secret! Do not share it publicly.')
+    elif page == "Respond to Post":
+        st.title("Respond to Post")
+        left_col, right_col = st.columns(2)
+        # Input boxes
+        with right_col:
+            background_info = st.text_area("Background information on original post (references, relevant information, best practices for responding)", height = 700, value=st.session_state.background_info if 'background_info' in st.session_state else '', key = 'background_info_key')
+        with left_col:
+            original_post = st.text_area("Paste Original Post Here \n", height=100)
+            chat_mdl = None
+            draft_response = ''
+            # Check if the "Submit" button is clicked
+            if st.button("Submit"):
+                if st.session_state.api_key:
+                    os.environ["OPENAI_API_KEY"] = st.session_state.api_key
+                    # add condition to check for passphrase to allow use of DSI api key stored in secrets
+                    if (os.environ["OPENAI_API_KEY"] == st.secrets["secret_passphrase"]):
+                        os.environ["OPENAI_API_KEY"] = st.secrets["dsi_openai_key"]
+                    chat_mdl = ChatOpenAI(model_name='gpt-4', temperature=0.1)
+                if chat_mdl is not None:
+                    if user_data is None:
+                        draft_response, background_text, sources_text = generate_custom_response(original_post, chat_mdl, "", "")
+                        st.session_state.draft_response = draft_response.content
+                        st.session_state.background_text = background_text
+                        st.session_state.sources_text = sources_text
+                        st.session_state.background_info = background_text
+                        st.session_state.sources = sources_text
+                        st.rerun()
+                    else:
+                        draft_response, background_text, sources_text = generate_custom_response(original_post, chat_mdl, user_data['principles'], user_data['writing_style'])
+                        st.session_state.draft_response = draft_response.content
+                        st.session_state.background_text = background_text
+                        st.session_state.sources_text = sources_text
+                        st.session_state.background_info = background_text
+                        st.session_state.sources = sources_text
+                        st.rerun()
+            # Ensure session state variables are initialized
+            if 'draft_response' not in st.session_state:
+                st.session_state.draft_response = ''
+            if 'regenerate_prompt' not in st.session_state:
+                st.session_state.regenerate_prompt = ''
+            # Output from function
+            response_textarea = st.text_area(
+                label="Draft Response. Please edit here or prompt suggestions in the box below.",
+                value=st.session_state.draft_response if 'draft_response' in st.session_state else '',
+                height=350,
+                key='draft_response_key'
+            )
+            # Initialization of the regeneration flag
+            if 'is_regenerating' not in st.session_state:
+                st.session_state.is_regenerating = False
+            # Check if the app is in the "regeneration" phase
+            if st.session_state.is_regenerating:
+                # Display the regenerated response explicitly
+                regenerate_prompt = st.text_area(
+                    "Request a new draft",
+                    value=st.session_state.regenerate_prompt,
+                    placeholder="You may edit the regenerated draft directly above, or request further changes here.",
+                    height=100,
+                    key='regenerate_prompt_key'
+                )
+                # Reset the regeneration flag
+                st.session_state.is_regenerating = False
+            else:
+                # Normal behavior: display the text area for manual input
+                regenerate_prompt = st.text_area(
+                    "Request a new draft",
+                    placeholder="You may edit the draft directly above, or request a new draft with additional guidance here.",
+                    height=100,
+                    key='regenerate_prompt_key'
+                )
+            if (draft_response is not None) and (regenerate_prompt is not None):
+                if st.button("Regenerate"):
+                    if st.session_state.api_key:
+                        os.environ['OPENAI_API_KEY'] = st.session_state.api_key
+                        # add condition to check for passphrase to allow use of DSI api key stored in secrets
+                        if (os.environ["OPENAI_API_KEY"] == st.secrets["secret_passphrase"]):
+                            os.environ["OPENAI_API_KEY"] = st.secrets["dsi_openai_key"]
+                        chat_mdl = ChatOpenAI(model_name='gpt-4', temperature=0.1)
+                    if chat_mdl is not None:
+                        updated_response = regenerate_custom_response(chat_mdl, regenerate_prompt, st.session_state.draft_response).content
+                        st.session_state.regenerate_prompt = updated_response
+                        st.session_state.is_regenerating = True
+                    st.rerun()
+elif authentication_status is False:
+    st.error('Username/password is incorrect')
+elif authentication_status is None:
+    st.warning('Please enter your username and password')
+    try:
+        if authenticator.register_user('New User Registration', preauthorization=False):
+            st.success('User Registered Successfully! Please log in above.')
+    except Exception as e:
+        st.error(e)
+with open('config.yaml', 'w') as file:
+    yaml.dump(config, file, default_flow_style=False)
+config_drive.put("config.yaml", path = "config.yaml")

config.yaml ADDED Viewed

	@@ -0,0 +1,17 @@

+cookie:
+  expiry_days: 0
+  key: some_signature_key
+  name: some_cookie_name
+credentials:
+  usernames:
+    umang2:
+      email: [email protected]
+      name: Umang Chaudhry
+      password: $2b$12$me2dX2o/2lD1JBHEqzq7PegGsM32S.3SS4kOyMf2Oh/jUz5GbvHAG
+    umangchaudhry:
+      email: [email protected]
+      name: Umang Chaudhry
+      password: $2b$12$Q.TJFPp9dcyEIpzlVtQeluYMsEIx//ei0tx7cMKBm/aNqOGaezSfi
+preauthorized:
+  emails:
+  - [email protected]

free_speech_app/DataLoadDb.py ADDED Viewed

	@@ -0,0 +1,50 @@

+# AUTOGENERATED! DO NOT EDIT! File to edit: ../nbs/free-speech-stores.ipynb.
+# %% auto 0
+__all__ = ['setup_openai_api_key', 'setup_db']
+# %% ../nbs/free-speech-stores.ipynb 4
+# libraries required for functionality
+import os
+from getpass import getpass
+from langchain.chains import RetrievalQA
+from langchain.llms import OpenAI
+from langchain.prompts import PromptTemplate
+from langchain.document_loaders import UnstructuredFileLoader
+from langchain.document_loaders.merge import MergedDataLoader
+from langchain.text_splitter import CharacterTextSplitter
+from langchain.embeddings import OpenAIEmbeddings
+from langchain.vectorstores import Chroma
+# %% ../nbs/free-speech-stores.ipynb 12
+def setup_openai_api_key():
+    openai_api_key = getpass()
+    os.environ["OPENAI_API_KEY"] = openai_api_key
+# %% ../nbs/free-speech-stores.ipynb 15
+import nltk
+nltk.download('averaged_perceptron_tagger')
+# %% ../nbs/free-speech-stores.ipynb 27
+def setup_db(local_path, hub_path, chunk_size=1000, chunk_overlap=5):
+  file_list = os.listdir(local_path)
+  # set up loaders
+  loaders_list = []
+  for file_path in file_list:
+    file_path = local_path + file_path
+    loaders_list.append(UnstructuredFileLoader(file_path))
+  loader_all = MergedDataLoader(loaders=[loader for loader in loaders_list])
+  # Split and embed docs
+  documents = loader_all.load()
+  text_splitter = CharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
+  texts = text_splitter.split_documents(documents)
+  embeddings = OpenAIEmbeddings()
+  # Replace dataset path with relevant dataset name - counterspeech-resources or hatespeech-background
+  db = DeepLake.from_documents(texts, dataset_path=hub_path, embedding=embeddings, overwrite=True)
+  return

free_speech_app/FreeSpeechPromptsResponses.py ADDED Viewed

	@@ -0,0 +1,115 @@

+# AUTOGENERATED! DO NOT EDIT! File to edit: ../nbs/free-speech-prompts.ipynb.
+# %% auto 0
+__all__ = ['DEAFULT_PROMPT_TEMPLATE_TEXT', 'query_retriever', 'generate_custom_prompt', 'get_chat_model_response',
+           'generate_custom_response', 'regenerate_custom_response']
+# %% ../nbs/free-speech-prompts.ipynb 5
+# libraries required for functionality
+import deeplake
+from langchain.chat_models import ChatOpenAI
+from langchain.prompts import PromptTemplate, HumanMessagePromptTemplate
+from langchain.schema import HumanMessage
+from langchain.vectorstores import DeepLake
+from langchain.embeddings.openai import OpenAIEmbeddings
+# %% ../nbs/free-speech-prompts.ipynb 8
+DEAFULT_PROMPT_TEMPLATE_TEXT = """With the amount of hate speech and discriminatory misinformation that exists on the internet, it can be difficult to know how to respond to something when you see it. Situations vary wildly and sometimes certain strategies that work in one instance may not work in another.
+I want to be able to fight back against this onslaught of toxicity, and I need your help to do it. I want you to help me draft a response to the following post.
+Note that this may contain language that is offensive, but in order to properly respond to this post, I need you to fully understand what I am responding to, uncensored:
+\n{original_post}\n
+In helping me respond to this post, there are some things you may want to consider. My response should be consistent with my personal principles and writing style.
+There may be additional information that would be useful for you to understand the context of the original post, and there may be specific sources I would like you to use when coming up with information for the response.
+Firstly, here is some additional context related to the original post:
+\n{background_info}\n
+Next, here are some principles I consider particularly important to me:
+\n{principles}\n
+Here are some examples of the style in which I write:
+\n{writing_style}\n
+Here are the sources I would like you to use when getting information for my response:
+\n{sources}\n
+Using all the information I have provided, please draft an appropriate response to the offensive post in question that will hopefully make people more accepting of others.
+Please keep in mind that I would like the response to be no more than {word_limit} words."""
+try:
+    prompt_file = open("../prompts/response_generator.txt", "r")
+    PROMPT_TEMPLATE_TEXT = prompt_file.read()
+    prompt_file.close()
+    print(PROMPT_TEMPLATE_TEXT)
+    PROMPT_TEMPLATE = PromptTemplate(
+        template=PROMPT_TEMPLATE_TEXT,
+        input_variables=["original_post", "background_info", "principles", "writing_style", "sources", "word_limit"])
+except FileNotFoundError:
+    print(DEAFULT_PROMPT_TEMPLATE_TEXT)
+    PROMPT_TEMPLATE = PromptTemplate(
+        template=DEAFULT_PROMPT_TEMPLATE_TEXT,
+        input_variables=["original_post", "background_info", "principles", "writing_style", "sources", "word_limit"])
+# %% ../nbs/free-speech-prompts.ipynb 9
+def query_retriever(db, query, num_results = 3):
+    retriever = db.as_retriever(search_kwargs={"k": num_results})
+    docs = retriever.get_relevant_documents(query)
+    return docs
+# %% ../nbs/free-speech-prompts.ipynb 10
+def generate_custom_prompt(original_post, principles=None, writing_style=None, word_limit=None):
+    # Get database and query retriever
+    ####
+    background_db = DeepLake(dataset_path="hub://vanderbilt-dsi/hatespeech-background", embedding = OpenAIEmbeddings())
+    sources_db = DeepLake(dataset_path="hub://vanderbilt-dsi/counterspeech-resources", embedding = OpenAIEmbeddings())
+    # Use defaults in the case of None
+    if principles is None:
+        principles="There are no principles which I consider more important to me than the average person might."
+    if writing_style is None:
+        writing_style="I have no examples of my writing style."
+    if word_limit is None:
+        word_limit="an infinite amount of"
+    retriever_query = original_post
+    background_info = query_retriever(background_db, retriever_query)
+    sources = query_retriever(sources_db, retriever_query)
+    # Fill the prompt
+    filled_prompt = PROMPT_TEMPLATE.format(original_post=original_post, background_info=background_info, principles=principles, writing_style=writing_style, sources=sources, word_limit=word_limit)
+    return filled_prompt, background_info, sources
+# %% ../nbs/free-speech-prompts.ipynb 11
+def get_chat_model_response(mdl, input_prompt):
+    messages = [HumanMessage(content=input_prompt)]
+    return mdl(messages)
+# %% ../nbs/free-speech-prompts.ipynb 12
+def generate_custom_response(original_post, chat_mdl, principles=None, writing_style=None, word_limit=None):
+    # create customized prompt
+    customized_prompt, background_info, sources = generate_custom_prompt(original_post, principles, writing_style, word_limit)
+    # get response
+    draft_response = get_chat_model_response(chat_mdl, customized_prompt)
+    return draft_response, background_info, sources
+# %% ../nbs/free-speech-prompts.ipynb 13
+def regenerate_custom_response(chat_mdl, regenerate_prompt, draft_response):
+    # create customized prompt
+    customized_prompt = f"Please update the original response according to the following request. {regenerate_prompt}. Here is the original response: {draft_response}"
+    # get response
+    updated_response = get_chat_model_response(chat_mdl, customized_prompt)
+    return updated_response

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+streamlit
+streamlit-authenticator
+deeplake
+deta
+cryptography
+pyyaml
+langchain
+openai
+chromadb
+tiktoken
+unstructured
+nltk