Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.33.1
Simple Website Loader
This loader is a simple web scraper that fetches the text from static websites by converting the HTML to text.
Usage
To use this loader, you need to pass in an array of URLs.
from llama_index import download_loader
SimpleWebPageReader = download_loader("SimpleWebPageReader")
loader = SimpleWebPageReader()
documents = loader.load_data(urls=["https://google.com"])
Examples
This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent.
LlamaIndex
from llama_index import VectorStoreIndex, download_loader
SimpleWebPageReader = download_loader("SimpleWebPageReader")
loader = SimpleWebPageReader()
documents = loader.load_data(urls=["https://google.com"])
index = VectorStoreIndex.from_documents(documents)
index.query("What language is on this website?")
LangChain
Note: Make sure you change the description of the Tool
to match your use-case.
from llama_index import VectorStoreIndex, download_loader
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.chains.conversation.memory import ConversationBufferMemory
SimpleWebPageReader = download_loader("SimpleWebPageReader")
loader = SimpleWebPageReader()
documents = loader.load_data(urls=["https://google.com"])
index = VectorStoreIndex.from_documents(documents)
tools = [
Tool(
name="Website Index",
func=lambda q: index.query(q),
description=f"Useful when you want answer questions about the text on websites.",
),
]
llm = OpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_chain = initialize_agent(
tools, llm, agent="zero-shot-react-description", memory=memory
)
output = agent_chain.run(input="What language is on this website?")