# 12. UpstageLayoutAnalysisLoader

## UpstageLayoutAnalysisLoader <a href="#upstagelayoutanalysisloader" id="upstagelayoutanalysisloader"></a>

`UpstageLayoutAnalysisLoader` Is a document analysis tool provided by Upstage AI, a document loader that can be used in conjunction with the LangChain framework.

**Main features:** -Perform layout analysis in various types of documents, including PDFs and images -Automatically recognize and extract structural elements of documents (titles, paragraphs, tables, images, etc.) -OCR function support (optional)

UpstageLayoutAnalysisLoader goes beyond simple text extraction to understand the structure of documents and identify relationships between elements, enabling more accurate document analysis.

**install**

`langchain-upstage` Use the package after installation.

```
pip install -U langchain-upstage
```

**API Key Settings**

`.env` To file `UPSTAGE_API_KEY` Set the key.

**Reference**

* [Upstage developer documentation ](https://developers.upstage.ai/docs/getting-started/quick-start)See.

### Preferences <a href="#id-1" id="id-1"></a>

```
# Configuration file for managing API KEY as environment variable

# Load API KEY information
load_dotenv()
```

```
True
```

```
# LangSmith Set up tracking. https://smith.langchain.com
# !pip install langchain-teddynote
from langchain_teddynote import logging

# Enter a project name.
logging.langsmith("CH07-DocumentLoader")
```

```
 Start tracking LangSmith. 
[Project name] 
CH07-DocumentLoader 
```

### UpstageLayoutAnalysisLoader <a href="#upstagelayoutanalysisloader_1" id="upstagelayoutanalysisloader_1"></a>

**Main parameters**

* `file_path` : Document path to analyze
* `output_type` : Output format \[(default)'html','text']
* `split` : Document splitting method \['none','element','page']
* `use_ocr=True` : Using OCR
* `exclude=["header", "footer"]` : Header, except Footer

```
from langchain_upstage import UpstageLayoutAnalysisLoader

# File Path
file_path = "./data/SPRI_AI_Brief_2023년12월호_F.pdf"

# Document loader settings
loader = UpstageLayoutAnalysisLoader(
    file_path,
    output_type="text",
    split="page",
    use_ocr=True,
    exclude=["header", "footer"],
)

# Load Document
docs = loader.load()

# Output the results
for doc in docs[:3]:
    print(doc)
```

```
page_content='SPRi AI Brief The latest trends in the AI industry December 2023' metadata={'page': 1} 
page_content='December 2023 CONTENTS I Artificial Industry Trend Brief 1. Policy/Act ▷ United States announces administrative order on safe and reliable AI development and use 1 
▷ G7, Hiroshima AI process to agree on international action decree for AI companies 2 
▷ 28 countries participating in the UK AI Safety Summit, Joint Response to AI Risk 3 
▷ US court dismisses copyright lawsuits filed by artists to AI companies 4 
▷ US Federal Trade Commission submits AI comments on consumer protection and competition to Copyright Office 5 
▷ EU AI law third-party negotiations, based model regulation views on turbulence 6 2. Corporate/Industrial ▷ American Frontier Model Forum raises $10 million AI safety fund 7 
▷ Cohir unveils data source explorer to ensure data transparency 8 
▷ Alibaba Cloud unveils the latest LLM'Gunichien One 2.0' 9 
▷ Samsung Electronics unveils self-developed AI'Samsung Gauss' 10 
▷ Google strengthens AI cooperation generated by $20 billion investment in Antwerp 11 
▷ IDC forecasts $250 billion in AI software sales in 2027 12 
▷ Bill Gates predicts paradigm shift in computer use due to AI agents 13 
▷ YouTube mandates AI-generated content display from 2024 14 3. Technology / Research ▷ UK Science and Innovation Technology Department Announces Establishment of AI Safety Institute 15 
▷ Google Deepmind announces classification system for features and behavior of universal AI models 16 
▷ GPT-4 is the best in Galileo's LLM hallucinations index assessment 17 4. Workforce/Education ▷ Oxford Internet Institute, UK, AI technician wages average 21% higher 18 II. Main Events ▷ CES 2024 19 
▷ AIMLA 2024 19 
▷ AAAI Conference on Artificial Intelligence 19' metadata={'page': 2} 
page_content='I. Artificial Industry Trend Brief' metadata={'page': 3} 
```
