Split and print the results.
```python
md_splitter = RecursiveCharacterTextSplitter.from_language(
# Create a text splitter using the Markdown language
language=Language.MARKDOWN,
# Set chunk size to 60
chunk_size=60,
# Make sure there are no overlapping parts between chunks
chunk_overlap=0,
)
# Create a document by splitting markdown text
md_docs = md_splitter.create_documents([markdown_text])
# Output the generated document
md_docs
[Document (page_content='# ⁇ ️🔗 LangChain\n\n⚡ Build a second-speed application using LLM ⚡'), Document (page_content='## Fast installation\n\n```bash\npip install` Ministry of Mass 🙏')]
latex_text = """
\documentclass{article}
\begin{document}
\maketitle
\section{Introduction}
% LLM is a type of machine learning model that can learn from large amounts of text data and generate human-like language.
% In recent years, LLM has made significant progress in a variety of natural language processing tasks, including language translation, text generation, and sentiment analysis.
\subsection{History of LLMs}
% Early LLMs were developed in the 1980s and 1990s, but were limited by the amount of data they could process and the computing power available at the time.
% However, over the past decade, advances in hardware and software have made it possible to train LLMs on large datasets, leading to significant improvements in performance.
\subsection{Applications of LLMs}
% The LLM has many applications across industries, including chatbots, content creation, and virtual assistants.
% It can also be used in academia for research in linguistics, psychology, and computational linguistics.
\end{document}
"""
latex_splitter = RecursiveCharacterTextSplitter.from_language(
# Split text using Markdown language.
language=Language.LATEX,
# Set the size of each chunk to 60 characters.
chunk_size=60,
# Set the number of overlapping characters between chunks to 0.
chunk_overlap=0,
)
# latex_text Generate a list of documents by splitting them.
latex_docs = latex_splitter.create_documents([latex_text])
# Prints a list of generated documents.
latex_docs
[Document (page_content='\documentclass{article}\n\x08egin{document}n\\maketitle'), Document (page_content='\section{ Data can be used for various natural language processing operations, such as emotional analysis.'), Document (page_content='\subsection{History of LLMs}\n% Initial LLM was developed in 1980s and 1990s'), Document (page_content='), Document (page_content=', which led to a great improvement in performance.'), Document (page_content='\subsection{Applications of LLMs}\n% LLM has chatbots, content creation, virtual'), Document (page_content< \n% can also be used in academia for linguistics, psychology, computer linguistics'), Document (page_content=' research.\n\n\\end{document}')]
html_text = """
<!DOCTYPE html>
<html>
<head>
<title>🦜️🔗 LangChain</title>
<style>
body {
font-family: Arial, sans-serif;
}
h1 {
color: darkblue;
}
</style>
</head>
<body>
<div>
<h1>🦜️🔗 LangChain</h1>
<p>⚡ Building applications with LLMs through composability ⚡</p>
</div>
<div>
As an open-source project in a rapidly developing field, we are extremely open to contributions.
</div>
</body>
</html>
"""
html_splitter = RecursiveCharacterTextSplitter.from_language(
# HTML Create a text splitter using language
language=Language.HTML,
# Set chunk size to 60
chunk_size=60,
# Make sure there are no overlapping parts between chunks
chunk_overlap=0,
)
# Split given HTML text to create a document
html_docs = html_splitter.create_documents([html_text])
# Output the generated document
html_docs
[Document (page_content='\n'), Document (page_content='\n '), Document (page_content=' \n
Solidarity
Here is an example using the Solidity text divider:
Solidity code in string form
SOL_CODE
Save to variable.
RecursiveCharacterTextSplitter
Split the Solidarity code in chunks using
sol_splitter
Generate.
language
parameter
Language.SOL
Specify the Solidarity language by setting it to.
chunk_size
Set to 128 to specify the maximum size of each chunk.
chunk_overlap
Set to 0 to avoid duplication between chunks.
sol_splitter.create_documents()
Using methods
SOL_CODE
Split in chunks, split chunks
sol_docs
Save to variable.
sol_docs
Output to confirm the split Solidarity code chunk.
SOL_CODE = """
pragma solidity ^0.8.20;
contract HelloWorld {
function add(uint a, uint b) pure public returns(uint) {
return a + b;
}
}
"""
# Split and print the results
sol_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.SOL, chunk_size=128, chunk_overlap=0
)
sol_docs = sol_splitter.create_documents([SOL_CODE])
sol_docs
[Document (page_content='pragma stability ^0.8.20;'), Document (page_content='contract HelloWorld { \n function add(uint a, uint b) pure public returns (uint) {
C
Here is an example using a C# text divider:
C_CODE = """
using System;
class Program
{
static void Main()
{
Console.WriteLine("Enter a number (1-5):");
int input = Convert.ToInt32(Console.ReadLine());
for (int i = 1; i <= input; i++)
{
if (i % 2 == 0)
{
Console.WriteLine($"{i} is even.");
}
else
{
Console.WriteLine($"{i} is odd.");
}
}
Console.WriteLine("Goodbye!");
}
}
"""
# Split and print the results.
c_splitter = RecursiveCharacterTextSplitter.from_language(
language=Language.CSHARP, chunk_size=128, chunk_overlap=0
)
c_docs = c_splitter.create_documents([C_CODE])
c_docs
[Document (page_content='using System;'), Document (page_content='class Program\n{\n static void Main()\n {\n Console.WriteLine ("Enter a number (1-5):"), Document Document (page_content='if (i% 2 == 0)\n {\n Console.WriteLine ($"{i} is even."), Document (page_content;'), Document (page_content='}\n}')]