06. Word
Microsoft Word
Microsoft Word is a word processor developed by Microsoft.
This covers how to load a Word document into a document format that can be used downstream.
Docx2txtLoader
You can use docx2txt to import .docx files into documents.
# installation
# !pip install -qU docx2txtfrom langchain_community.document_loaders import Docx2txtLoader
loader = Docx2txtLoader("./data/sample-word-document.docx") # Initialize document loader
docs = loader.load() # loading documents
print(len(docs))1UnstructuredWordDocumentLoader
from langchain_community.document_loaders import UnstructuredWordDocumentLoader
# Output of the uploaded document
loader = UnstructuredWordDocumentLoader("./data/sample-word-document.docx")
# data loading
docs = loader.load()
print(len(docs))The result is loaded as a single Document.
Internally, amorphism creates different “elements” for each chunk of text.
By default these are combined together, but can be easily separated by specifying mode="elements" .
Last updated