Spaces:

eaglelandsonce
/

TensorFlowClass

Sleeping

App Files Files Community

eaglelandsonce commited on Jul 9, 2024

Commit

cb06d03

verified ·

1 Parent(s): 87c68a6

Update pages/21_GraphRag.py

Browse files

Files changed (1) hide show

pages/21_GraphRag.py +59 -26

pages/21_GraphRag.py CHANGED Viewed

@@ -1,42 +1,68 @@
 import streamlit as st
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
 import torch
 from collections import Counter
-import nltk
-from nltk.corpus import stopwords
 @st.cache_resource
 def load_model():
-    model_name = "distilbert-base-uncased-finetuned-sst-2-english"
-    tokenizer = AutoTokenizer.from_pretrained(model_name)
-    model = AutoModelForSequenceClassification.from_pretrained(model_name)
-    return tokenizer, model
-def analyze_sentiment(text, tokenizer, model):
-    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
     with torch.no_grad():
         outputs = model(**inputs)
-    probabilities = torch.softmax(outputs.logits, dim=1)
     sentiment = "Positive" if probabilities[0][1] > probabilities[0][0] else "Negative"
     confidence = probabilities[0][1].item() if sentiment == "Positive" else probabilities[0][0].item()
-    return sentiment, confidence
-st.title("Text Sentiment Analysis")
-tokenizer, model = load_model()
-lincoln_text = """Abraham Lincoln (⫽ˈlɪŋkən⫽ LING-kən; February 12, 1809 – April 15, 1865) was an American lawyer, politician, and statesman who served as the 16th president of the United States from 1861 until his assassination in 1865. Lincoln led the United States through the American Civil War, defending the nation as a constitutional union, defeating the insurgent Confederacy, playing a major role in the abolition of slavery, expanding the power of the federal government, and modernizing the U.S. economy.
-Lincoln was born into poverty in a log cabin in Kentucky and was raised on the frontier, mainly in Indiana. He was self-educated and became a lawyer, Whig Party leader, Illinois state legislator, and U.S. representative from Illinois. In 1849, he returned to his successful law practice in Springfield, Illinois. In 1854, angered by the Kansas–Nebraska Act, which opened the territories to slavery, he re-entered politics. He soon became a leader of the new Republican Party. He reached a national audience in the 1858 Senate campaign debates against Stephen A. Douglas. Lincoln ran for president in 1860, sweeping the North to gain victory. Pro-slavery elements in the South viewed his election as a threat to slavery, and Southern states began seceding from the nation. They formed the Confederate States of America, which began seizing federal military bases in the South. A little over one month after Lincoln assumed the presidency, Confederate forces attacked Fort Sumter, a U.S. fort in South Carolina. Following the bombardment, Lincoln mobilized forces to suppress the rebellion and restore the union.
-Lincoln, a moderate Republican, had to navigate a contentious array of factions with friends and opponents from both the Democratic and Republican parties. His allies, the War Democrats and the Radical Republicans, demanded harsh treatment of the Southern Confederates. He managed the factions by exploiting their mutual enmity, carefully distributing political patronage, and by appealing to the American people. Anti-war Democrats (called "Copperheads") despised Lincoln, and some irreconcilable pro-Confederate elements went so far as to plot his assassination. His Gettysburg Address came to be seen as one of the greatest and most influential statements of American national purpose. Lincoln closely supervised the strategy and tactics in the war effort, including the selection of generals, and implemented a naval blockade of the South's trade. He suspended habeas corpus in Maryland and elsewhere, and he averted war with Britain by defusing the Trent Affair. In 1863, he issued the Emancipation Proclamation, which declared the slaves in the states "in rebellion" to be free. It also directed the Army and Navy to "recognize and maintain the freedom of said persons" and to receive them "into the armed service of the United States." Lincoln pressured border states to outlaw slavery, and he promoted the Thirteenth Amendment to the U.S. Constitution, which abolished slavery, except as punishment for a crime.
-Lincoln managed his own successful re-election campaign. He sought to heal the war-torn nation through reconciliation. On April 14, 1865, just five days after the Confederate surrender at Appomattox, he was attending a play at Ford's Theatre in Washington, D.C., with his wife, Mary, when he was fatally shot by Confederate sympathizer John Wilkes Booth. Lincoln is remembered as a martyr and a national hero for his wartime leadership and for his efforts to preserve the Union and abolish slavery. Lincoln is often ranked in both popular and scholarly polls as the greatest president in American history."""
-text_input = st.text_area("Enter text for analysis:", value=lincoln_text, height=300)
 if st.button("Analyze Text"):
     if text_input:
-        sentiment, confidence = analyze_sentiment(text_input, tokenizer, model)
         st.write(f"Sentiment: {sentiment}")
         st.write(f"Confidence: {confidence:.2f}")
@@ -44,15 +70,22 @@ if st.button("Analyze Text"):
         word_count = len(text_input.split())
         st.write(f"Word count: {word_count}")
-        # Most common words (excluding stop words)
-        nltk.download('stopwords', quiet=True)
-        stop_words = set(stopwords.words('english'))
-        words = [word.lower() for word in text_input.split() if word.isalnum() and word.lower() not in stop_words]
         word_freq = Counter(words).most_common(5)
-        st.write("Top 5 most common words (excluding stop words):")
         for word, freq in word_freq:
             st.write(f"- {word}: {freq}")
     else:
         st.write("Please enter some text to analyze.")

 import streamlit as st
+from transformers import GraphormerForGraphClassification, GraphormerTokenizer
+from datasets import Dataset
+from transformers.models.graphormer.collating_graphormer import preprocess_item, GraphormerDataCollator
 import torch
+import networkx as nx
+import matplotlib.pyplot as plt
 from collections import Counter
 @st.cache_resource
 def load_model():
+    model = GraphormerForGraphClassification.from_pretrained(
+        "clefourrier/pcqm4mv2_graphormer_base",
+        num_classes=2,  # Binary classification (positive/negative sentiment)
+        ignore_mismatched_sizes=True,
+    )
+    tokenizer = GraphormerTokenizer.from_pretrained("clefourrier/pcqm4mv2_graphormer_base")
+    return model, tokenizer
+def text_to_graph(text):
+    words = text.split()
+    G = nx.Graph()
+    for i, word in enumerate(words):
+        G.add_node(i, word=word)
+        if i > 0:
+            G.add_edge(i-1, i)
+    edge_index = [[e[0] for e in G.edges()] + [e[1] for e in G.edges()],
+                  [e[1] for e in G.edges()] + [e[0] for e in G.edges()]]
+    return {
+        "edge_index": edge_index,
+        "num_nodes": len(G.nodes()),
+        "node_feat": [[ord(word[0])] for word in words],  # Use ASCII value of first letter as feature
+        "edge_attr": [[1] for _ in range(len(G.edges()) * 2)],  # All edges have the same attribute
+        "y": [1]  # Placeholder label, will be ignored during inference
+    }
+def analyze_text(text, model, tokenizer):
+    graph = text_to_graph(text)
+    dataset = Dataset.from_dict({"train": [graph]})
+    dataset_processed = dataset.map(preprocess_item, batched=False)
+    inputs = GraphormerDataCollator()(dataset_processed["train"])
+    inputs = {k: v.to(model.device) for k, v in inputs.items()}
     with torch.no_grad():
         outputs = model(**inputs)
+    logits = outputs.logits
+    probabilities = torch.softmax(logits, dim=1)
     sentiment = "Positive" if probabilities[0][1] > probabilities[0][0] else "Negative"
     confidence = probabilities[0][1].item() if sentiment == "Positive" else probabilities[0][0].item()
+    return sentiment, confidence, graph
+st.title("Graph-based Text Analysis")
+model, tokenizer = load_model()
+text_input = st.text_area("Enter text for analysis:", height=200)
 if st.button("Analyze Text"):
     if text_input:
+        sentiment, confidence, graph = analyze_text(text_input, model, tokenizer)
         st.write(f"Sentiment: {sentiment}")
         st.write(f"Confidence: {confidence:.2f}")
         word_count = len(text_input.split())
         st.write(f"Word count: {word_count}")
+        # Most common words
+        words = [word.lower() for word in text_input.split() if word.isalnum()]
         word_freq = Counter(words).most_common(5)
+        st.write("Top 5 most common words:")
         for word, freq in word_freq:
             st.write(f"- {word}: {freq}")
+        # Visualize graph
+        G = nx.Graph()
+        G.add_edges_from(zip(graph["edge_index"][0], graph["edge_index"][1]))
+        plt.figure(figsize=(10, 6))
+        nx.draw(G, with_labels=False, node_size=30, node_color='lightblue', edge_color='gray')
+        plt.title("Text as Graph")
+        st.pyplot(plt)
     else:
         st.write("Please enter some text to analyze.")