Cédric KACZMAREK
first commit
70b87af

A newer version of the Gradio SDK is available: 5.33.1

Upgrade

Simple Website Loader

This loader is a simple web scraper that fetches the text from static websites by converting the HTML to text.

Usage

To use this loader, you need to pass in an array of URLs.

from llama_index import download_loader

SimpleWebPageReader = download_loader("SimpleWebPageReader")

loader = SimpleWebPageReader()
documents = loader.load_data(urls=["https://google.com"])

Examples

This loader is designed to be used as a way to load data into LlamaIndex and/or subsequently used as a Tool in a LangChain Agent.

LlamaIndex

from llama_index import VectorStoreIndex, download_loader

SimpleWebPageReader = download_loader("SimpleWebPageReader")

loader = SimpleWebPageReader()
documents = loader.load_data(urls=["https://google.com"])
index = VectorStoreIndex.from_documents(documents)
index.query("What language is on this website?")

LangChain

Note: Make sure you change the description of the Tool to match your use-case.

from llama_index import VectorStoreIndex, download_loader
from langchain.agents import initialize_agent, Tool
from langchain.llms import OpenAI
from langchain.chains.conversation.memory import ConversationBufferMemory

SimpleWebPageReader = download_loader("SimpleWebPageReader")

loader = SimpleWebPageReader()
documents = loader.load_data(urls=["https://google.com"])
index = VectorStoreIndex.from_documents(documents)

tools = [
    Tool(
        name="Website Index",
        func=lambda q: index.query(q),
        description=f"Useful when you want answer questions about the text on websites.",
    ),
]
llm = OpenAI(temperature=0)
memory = ConversationBufferMemory(memory_key="chat_history")
agent_chain = initialize_agent(
    tools, llm, agent="zero-shot-react-description", memory=memory
)

output = agent_chain.run(input="What language is on this website?")