Wikipedia

Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki. Wikipedia is the largest and most-read reference work in history.

This notebook shows how to retrieve wiki pages from wikipedia.org into the Document format that is used downstream.

Integration details

Setup

To enable automated tracing of individual tools, set your LangSmith API key:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation

The integration lives in the langchain-community package. We also need to install the wikipedia python package itself.

%pip install -qU langchain-community wikipedia

Instantiation

Now we can instantiate our retriever: WikipediaRetriever parameters include:

optional lang: default=“en”. Use it to search in a specific language part of Wikipedia
optional load_max_docs: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.
optional load_all_available_meta: default=False. By default only the most important fields downloaded: Published (date when document was published/last updated), title, Summary. If True, other fields also downloaded.

get_relevant_documents() has one argument, query: free text which used to find documents in Wikipedia

from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever()

Usage

docs = retriever.invoke("TOKYO GHOUL")

print(docs[0].page_content[:400])

Tokyo Ghoul (Japanese: 東京喰種（トーキョーグール）, Hepburn: Tōkyō Gūru) is a Japanese dark fantasy manga series written and illustrated by Sui Ishida. It was serialized in Shueisha's seinen manga magazine Weekly Young Jump from September 2011 to September 2014, with its chapters collected in 14 tankōbon volumes. The story is set in an alternate version of Tokyo where humans coexist with ghouls, beings who loo

Use within a chain

Like other retrievers, WikipediaRetriever can be incorporated into LLM applications via chains. We will need a LLM or chat model:

# | output: false
# | echo: false

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

prompt = ChatPromptTemplate.from_template(
    """
    Answer the question based only on the context provided.
    Context: {context}
    Question: {question}
    """
)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

chain.invoke(
    "Who is the main character in `Tokyo Ghoul` and does he transform into a ghoul?"
)

'The main character in Tokyo Ghoul is Ken Kaneki, who transforms into a ghoul after receiving an organ transplant from a ghoul named Rize.'

API reference

For detailed documentation of all WikipediaRetriever features and configurations head to the API reference.

Providers

Integrations by component

Integration details

Setup

Installation

Instantiation

Usage

Use within a chain

API reference

Providers

Integrations by component

​Integration details

​Setup

​Installation

​Instantiation

​Usage

​Use within a chain

​API reference

Integration details

Setup

Installation

Instantiation

Usage

Use within a chain

API reference