5 levels of text splitting

https://github.com/langchain-ai/langchain/blob/master/cookbook/Multi_modal_RAG.ipynb

 

In this tutorial we are reviewing the 5 Levels Of Text Splitting. This is an unofficial list put together for fun and educational purposes.

Ever try to put a long piece of text into ChatGPT but it tells you it’s too long? Or you're trying to give your application better long term memory, but it’s still just not quite working.

One of the most effective strategies to improve performance of your language model applications is to split your large data into smaller pieces. This is call splitting or chunking (we'll use these terms interchangeably). In the world of multi-modal, splitting also applies to images.

We are going to cover a lot, but if you make it to the end, I guarantee you’ll have a solid grasp on chunking theory, strategies, and resources to learn more.

Levels Of Text Splitting

Notebook resources:

  • Video Overview - Walkthrough of this code with commentary
  • ChunkViz.com - Visual representation of chunk splitting methods
  • RAGAS - Retrieval evaluation framework

This tutorial was created with ❤️ by Greg Kamradt. MIT license, attribution is always welcome.

This tutorial will use code from LangChain (pip install langchain) & Llama Index (pip install llama-index)

posted on 2024-09-11 18:08  Sanny.Liu-CV&&ML  阅读(20)  评论(0编辑  收藏  举报

导航