Hi, I'm just trying to create my first website support chatbot, with our website (https://www.doorsonlineuk.co.uk) as the Knowledge Base. I want the whole site to be crawled, but after a few pages I get a ton of "Network Error RESPONSE HEADERS: Undefined" errors listing up the right hand side of the page. I assume this is our server or Cloudflare detecting an unknown bot/over-zealous crawling and blocking Lindy's crawler. Can someone tell me if I'm correct and if so, what IP/user-agent do I need to whitelist to get the whole thing crawled? (I intend to crawl by XML sitemap rather than web pages as otherwise there will be a ton of query parameter URLs getting crawled and I can't see a way to set rules for that kind of stuff in Lindy's knowledge base crawl).
Hey Ian! oh, interesting. I'm actually not sure - Are you able to send over the Lindy that is having the issue + a screenshot of where you see this? I can connect with my team
Sure, no problem, here's a video of the errors flying up the page (I couldn't get a screenshot while they were rolling through!): https://www.awesomescreenshot.com/video/44811261?key=ad4695069c4efe3e509e39808d212489 The Lindy is https://chat.lindy.ai/doors-more/lindy/website-customer-support-68dc0da494bf5717c7cf5b41/editor The sitemap I'm trying to crawl is here: https://www.doorsonlineuk.co.uk/sitemap_index.xml
Ok got it, thanks Ian! Ugh, unfortunately I'm unable to help here -- I'll have to report this to my eng team. I'll follow up asap!
Hey Ian! Ugh apologies, unfortunately nothing yet from my team 😞 I can see it hasn't been dropped though, they are still looking into it. I'll ping them 🙏
