05. Code splitting (Python, Markdown, JAVA, C++, C#, GO, JS, Latex, etc.)

Split code

CodeTextSplitter allows you to split code written in various programming languages.

To do this Language Just import the enum and specify the corresponding programming language.

%pip install -qU langchain-text-splitters

RecursiveCharacterTextSplitter This is an example of splitting text using.

  • langchain_text_splitters In module Language Wow RecursiveCharacterTextSplitter Import the class.

  • RecursiveCharacterTextSplitter Is a text divider that recursively divides text into character units.

from langchain_text_splitters import (
    Language,
    RecursiveCharacterTextSplitter,
)

Get a complete list of supported languages.

# Get a full list of supported languages
[e.value for e in Language]
 ['cpp','go','java','kotlin','js','ts','php','proto','python','rst','ruby','rust', 'scala ','swift','markdown','latex','html','sol','csharp'

RecursiveCharacterTextSplitter Class get_separators_for_language You can use methods to identify the separators used in a particular language.

  • In example Language.PYTHON Pass the enumeration values to the factor to confirm the delimiter used in the Python language.

# You can check the delimiters used for a given language.
RecursiveCharacterTextSplitter.get_separators_for_language(Language.PYTHON)

Python

RecursiveCharacterTextSplitter The examples used are:

  • RecursiveCharacterTextSplitter Split Python code into document units using.

  • language In parameters Language.PYTHON Specify and use the Python language.

  • chunk_size Set to 50 to limit the maximum size of each document.

  • chunk_overlap Setting 0 does not allow duplication between documents.

Document Generate. Created Document is returned in list form.

JS

Here is an example using a JS text divider

TS

Here is an example using a TS text divider.

Markdown

Here is an example using a Markdown text divider.

It is an open source project in a rapidly developing field. Ministry of Mass 🙏

Latex

LaTeX is a markup language for writing documents, widely used to express mathematical symbols and formulas.

Here is an example of LaTeX text.

Split and output results.

HTML

Here is an example using an HTML text divider:

Split and output results.

Last updated