04. CSV

CSV

Comma-Separated Values (CSV) The file is a delimited text file where values ​​are separated by commas. Each line in the file is a data record.

Each record consists of one or more fields separated by commas.

CSVLoader

  • CSV Load data one row per document.

from langchain_community.document_loaders.csv_loader import CSVLoader

# Create a CSV loader
loader = CSVLoader(file_path="./data/titanic.csv")

# load data
docs = loader.load()

print(len(docs))
print(docs[0].metadata)
891
{'source': './data/titanic.csv', 'row': 0}

Customizing CSV parsing and loading

See the csv module documentation for more information on supported csv args.

Use the source_column argument to specify the source of the document generated for each row. Otherwise, file_path is used as the source for all documents.

This is useful when using a chain of questions to answer questions using sources loaded from a CSV file.

UnstructuredCSVLoader

You can also load tables using UnstructuredCSVLoader. One advantage of using UnstructuredCSVLoader is that when used in "elements" mode, the metadata provides an HTML representation of the table.

DataFrameLoader

  • Output HTML text metadata for the first document

Query the first 5 rows..

PassengerId

Survived

Pclass

Name

Sex

Age

SibSp

Parch

Ticket

Fare

Cabin

Embarked

0

1

0

3

Braund, Mr. Owen Harris

male

22.0

1

0

A/5 21171

7.2500

NaN

S

1

2

1

1

Cumings, Mrs. John Bradley (Florence Briggs Th...

female

38.0

1

0

PC 17599

71.2833

C85

C

2

3

1

3

Heikkinen, Miss. Laina

female

26.0

0

0

STON/O2. 3101282

7.9250

NaN

S

3

4

1

1

Futrelle, Mrs. Jacques Heath (Lily May Peel)

female

35.0

1

0

113803

53.1000

C123

S

4

5

0

3

Allen, Mr. William Henry

male

35.0

0

0

373450

8.0500

NaN

S

Last updated