Given a page:

page x( arg : String ){
  ...
}

with app url example.org , the same page is currently accessible from any URL starting with example.org/x/<put anything here>, including URLs with arbitrary number of sub-paths (example.org/x/s/u/b/pa/th/s).
Is it safe to block URLs with too many arguments, and serve notfound page instead?

It currently causes crawlers to index (an infinite depth) of the same page when the page contains a relative URL. At least, this is what we just saw with HTTrack (for static website copies), which copied the same page over and over again, each time adding one level of a relative URLs that appeared on a page.

Submitted by Elmer van Chastelet on 4 March 2024 at 13:44

Log in to post comments