# 08. Regressive JSON split (RecursiveJsonSplitter)

## RecursiveJsonSplitter <a href="#recursivejsonsplitter" id="recursivejsonsplitter"></a>

This JSON divider creates a smaller JSON chunk by deep-first traversal of JSON data.

This divider attempts to keep nested JSON objects as much as possible, but splits objects if necessary to keep the chunk size between min\_chunk\_size and max\_chunk\_size. If the value is a very large string, not a nested JSON, that string is not split.

If you need strict restrictions on the size of the chunk, you can consider using the Recursive Text Splitter after this divider to handle that chunk.

**Split criteria**

1. Text splitting method: based on JSON value
2. Chunk size measurement method: based on number of characters

```
%pip install -qU langchain-text-splitters
```

* `requests.get()` Use the function to get JSON data from the "<https://api.smith.langchain.com/openapi.json>" URL.
* Imported JSON data `json()` Converted to Python dictionary form via method `json_data` Stored in variables.

```
import requests

# JSON Load the data.
json_data = requests.get("https://api.smith.langchain.com/openapi.json").json()
```

`RecursiveJsonSplitter` An example of splitting JSON data using.

```
from langchain_text_splitters import RecursiveJsonSplitter

# JSON Splits data into chunks of up to 300 size. RecursiveJsonSplitter Create an object.
splitter = RecursiveJsonSplitter(max_chunk_size=300)
```

`splitter.split_json()` Split JSON data recursively using functions.

```
# JSON Splits data recursively. Use when you need to access or manipulate small pieces of JSON.
json_chunks = splitter.split_json(json_data=json_data)
```

* `splitter.create_documents()` Convert JSON data to document format using methods.
* `splitter.split_text()` Split JSON data into string list using methods.

```
# JSON Generate documents based on data.
docs = splitter.create_documents(texts=[json_data])

# JSON Generates string chunks based on data.
texts = splitter.split_text(json_data=json_data)

# Prints the first string.
print(docs[0].page_content)

print("===" * 20)
# Outputs chunks of a split string.
print(texts[0])
```

```
{"openapi": "3.1.0", "info": {"title": "LangSmith", "version": "0.1.0"}, "paths": {"/api/v1/sessions/{ 
============================================================ 
{"openapi": "3.1.0", "info": {"title": "LangSmith", "version": "0.1.0"}, "paths": {"/api/v1/sessions/{ 
```

`texts[2]` After reviewing one of the large chunks by outputting, you can see that the chunk contains a list object.

* There is a reason why the size of the second chunk exceeds the limit 300, which is a list object.
* This is `RecursiveJsonSplitter` end **Because the list object does not split** is.

```
# Let's check the size of the chunk.
print([len(text) for text in texts][:10])

# If we examine one of the larger chunks, we can see that it contains a list object.
print(texts[1])
```

```
[232, 197, 469, 210, 213, 237, 271, 191, 232, 215] 
{"paths": {"/api/v1/sessions/{session_id}": {"get": {" operationId": "read_tracer_session_api_v1_sessions__session_id 
```

2 index chunks as follows `json` You can parse using modules.

```
import json

json_data = json.loads(texts[2])
json_data["paths"]
```

```
'1':'G','G','G'Accept'}}]}}} 
```

`convert_lists` parameter `True` Rest within JSON by setting it to `index:item` Form `key:value` You can convert it into pairs.

```
# Next, we preprocess the JSON and convert the list into a dictionary with key:value pairs as index:items.
texts = splitter.split_text(json_data=json_data, convert_lists=True)
```

```
# The list is converted to a dictionary and the result is checked.
print(texts[2])
```

```
 "3G1>"paths": {"/api/v1/sessions/{session_id{": }"geters": [{"name": "session_id", "in":}], "title": "  Accept" }}]}}}}
```

`docs` You can check the documents corresponding to the specific index of the list.

```
# Check document number 2.
docs[2]
```

```
 "3:Face": "Gex": {" type": "null" }], "  title": "Accept" }}]}}}}') 
```
