Use a chunk size of 1500 with an overlap of 100 (feel free to experiment with different values): RecursiveCharacterTextSplitter follows the order of the list of characters you provide, meaning it’ll use the next character in the list until the chunks are small enough. If the split can't be done on headings, it’ll try to split on the characters This ensures that you split the content at the best place between paragraphs, and not between the sentences of the same paragraph. Your Notion content consists of markdown files with headings ( # for H1, # for H2, # for H3), so split on those specific characters. Split the content into smaller text chunks using RecursiveCharacterTextSplitter.Load the Notion content located in the notion_content folder using NotionDirectoryLoader.Print('Local FAISS index has been successfully saved.') # Store all vectors in FAISS index and save to local folder 'faiss_index'ĭb = om_documents(docs, embeddings) # Convert all chunks into vectors embeddings using OpenAI embedding model Markdown_splitter = RecursiveCharacterTextSplitter(ĭocs = markdown_splitter.split_documents(documents) # Split the Notion content into smaller chunks Loader = NotionDirectoryLoader("notion_content") # Load the Notion content located in the folder 'notion_content' Open your project folder in your favorite IDE and create a new file called ingest.py: #ingest.pyįrom langchain.document_loaders import NotionDirectoryLoaderįrom langchain.text_splitter import RecursiveCharacterTextSplitterįrom langchain.embeddings import OpenAIEmbeddings To do this, use LangChain, an OpenAI embedding model, and FAISS. To use the content of your Notion page as the knowledge base of your chatbot, convert all the content into vectors and store them. md files in the notion_content folder within your notion-chatbot project folder. Great! You should now have all the Notion content as. To keep it simple, just export the content manually. You can also get your Notion content using Notion's API. Place the notion_content folder into your notion-chatbot project folder.Select Markdown and CSV for the Export Format.In the top right corner, click on the three dots.Go to the main Notion page of the Blendle Employee Handbook. ![]() The model will convert the text chunks into vectors, which you’ll then store in a vector database. To convert all content from your Notion pages into numerical representations (vectors), use LangChain to split the text into smaller chunks that can be processed by OpenAI's embedding model. Select Duplicate on the top-right corner to duplicate it into your Notion If you don’t have a Notion account, create it here. Use Blendle Employee Handbook as your knowledge base.streamlit, create the file secrets.toml to store your OpenAI API key as follows gitignore file to specify which files not to track Pip install streamlit langchain openai tiktoken faiss-cpu Create a new environment and install the required dependencies.Start by creating a project folder notion-chatbot.You’ll create each file step-by-step, so there is no need to create them all at once. requirements.txt: a file containing the necessary packages to deploy to Streamlit Community Cloud.utils.py: the script used to create a Conversation Retrieval Chain.ingest.py: the script used to convert Notion content to vectors and store them in a vector index.app.py: the script for the Streamlit chat application.gitignore: ignores your OpenAI API key and Notion content. notion_content: a folder containing the Notion content in markdown files.faiss_index: a FAISS index (vector database) that stores all the vectors.streamlit/secrets.toml: stores your OpenAI API key. The project structure of notion-chatbot consists of the following: And don’t forget to get your OpenAI API key and duplicate a public Notion page (to use as a foundation). Let's start by examining the project structure and installing the necessary dependencies. ![]() Let’s walk through how you can build your own Notion chatbot! An overview of the app
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |