Can it scrape a url or 2 for its knowledgebase before performing each task? As storage seems low is why i ask.

Question

Just trying to figure out if there are ways to increase knowledge base without filling up in app storage.

SeanP_AgenticFlowAI · Answer

Hey Jsamplesjr,\u000a\u000aThat\u0027s a really smart question about managing knowledge and storage, and you\u0027ve hit on a key concept!\u000a\u000a1. Understanding AgenticFlow Knowledge Storage (It\u0027s Not Just File Size):\u000a\u000aYou\u0027re right, the storage limits (e.g., 100MB on Tier 1/2, up to 2GB on Tier 4) might seem modest if you\u0027re thinking purely in terms of raw PDF or DOCX file sizes.\u000a\u000aHowever, our \u0022Knowledge Storage\u0022 refers to the space taken up by the vectorized embeddings of your content. When you upload a document or provide a URL, we process the text, break it into meaningful chunks, and then convert those chunks into these special numerical representations (embeddings) that the AI uses for fast, semantic searching (this is the RAG part).\u000a\u000aA single embedding for a text chunk is quite small (e.g., around 6KB for 1536 dimensions). This means 1GB of our \u0022Knowledge Storage\u0022 can actually hold a massive amount of textual information – think tens of thousands, or even hundreds of thousands, of text chunks. So, that 2GB on Tier 4 is indeed \u0022a lot a lot\u0022 of actual, usable knowledge for your agents.\u000a\u000a2. Dynamically Scraping URLs Before Each Task (Your Excellent Idea):\u000a\u000aYes, you can absolutely design your AgenticFlow agents or workflows to scrape a URL (or a couple of URLs) for fresh context before performing each task, rather than relying solely on pre\u002Dloaded, static knowledge. This is a great way to work with dynamic information or to augment a smaller persistent knowledge base.\u000a\u000aHere\u0027s how:\u000a\u000aWorkflow/Agent Step 1: Web Scraping:\u000aWhen a task starts (e.g., user asks the agent a question), the first step can be to use our Web Scraping node or a more robust MCP like Firecrawl (https://agenticflow.ai/mcp/firecrawl) or Apify (https://agenticflow.ai/mcp/apify) to fetch live content from the specific URL(s) relevant to that task.\u000a\u000aWorkflow/Agent Step 2: AI Processing:\u000aThe scraped text from these URLs is then passed as dynamic, just\u002Din\u002Dtime context to a subsequent LLM node along with the user\u0027s original query or the main task input.\u000a\u000aThe LLM uses this freshly scraped information (plus any information it retrieves from your persistent vectorized Knowledge Base, if you\u0027ve also configured one) to generate its response or complete the task.\u000a\u000aAdvantages of This \u0022Just\u002Din\u002DTime\u0022 Scraping:\u000aAlways Fresh Info: The agent uses the most up\u002Dto\u002Ddate content from the web for that specific task.\u000a\u000aOptimizes Persistent Storage: You reserve your persistent Knowledge Storage (the 100MB\u002D2GB) for core, foundational, or less volatile information, while highly dynamic info is fetched on demand.\u000a\u000aTargeted Knowledge: You scrape only the pages most relevant to the immediate task, providing highly focused context to the LLM.\u000a\u000aConsiderations:\u000a\u002D Scraping Time: Each live scrape adds a little to the task execution time.\u000a\u002D Reliability: Success depends on the target site\u0027s accessibility and structure (robust scrapers like Apify/Firecrawl help here).\u000a\u002D Credit Usage: Each web scraping step and LLM processing step will consume AgenticFlow credits.\u000a\u000aThis dynamic scraping approach is a very powerful way to keep your agents informed with the latest data without necessarily filling up all your persistent vectorized storage with content that changes daily. You\u0027re thinking exactly right!\u000a\u000a— Sean

AgenticFlow

Share AgenticFlow

Related questions