01. Structure of Document

Document & Document Loaders

Reference

Documents utilized for practice

Software Policy Institute (SPRi)-December 2023

  • Author: Jaeheung Lee (AI Policy Institute Office Liability Institute), Lee Ji-soo (AI Policy Lab Yi Phyang Institute)

  • File name: SPRI_AI_Brief_2023년12월호_F.pdf

Document

This is the basic document object of LangChain.

property - page_content : A string representing the content of the document. - metadata : A dictionary representing the document's metadata.

from langchain_core.documents import Document

document = Document("Hello, this is Langchain's document.")

Add properties to metadata

Document Loader

It serves to convert content from various file formats to Document objects.

Main Loader

  • PyPDFLoader: A loader that loads PDF files.

  • CSVLoader: A loader that loads CSV files.

  • UnstructuredHTMLLoader: A loader that loads HTML files.

  • JSONLoader: A loader that loads JSON files.

  • TextLoader: A loader that loads text files.

  • DirectoryLoader: A loader that loads directories.

load()

  • Load and return documents.

  • Returned results List[Document] Form.

load_and_split()

  • Split and return documents using splitter.

  • Returned results List[Document] Form.

lazy_load()

  • Load documents in a generator way.

aload()

  • Loading documents in asynchronous (Async)

Last updated