Block URLs with more arguments than page definition
Given a page:
page x( arg : String ){ ... }
with app url
example.org
, the same page is currently accessible from any URL starting withexample.org/x/<put anything here>
, including URLs with arbitrary number of sub-paths (example.org/x/s/u/b/pa/th/s
).
Is it safe to block URLs with too many arguments, and servenotfound
page instead?It currently causes crawlers to index (an infinite depth) of the same page when the page contains a relative URL. At least, this is what we just saw with HTTrack (for static website copies), which copied the same page over and over again, each time adding one level of a relative URLs that appeared on a page.
Submitted by Elmer van Chastelet on 4 March 2024 at 13:44
Log in to post comments