Jump to content
 

Sitemap issue


Recommended Posts

This follows on from other threads about the change of URL structure breaking inbound links, and in particular needing search engines to re-spider the site so that they pick up the new URLs. I was going to post in one of those threads that the search engines should update fairly quickly, because so long as the old links now return a 404 code (which they do, I checked), and there's a valid sitemap, it should sort itself out fairly quickly. 

 

Now, i was going to say that there's a valid sitemap, because I checked. But... there isn't. There is a valid robots.txt, which is good because search engines will check there first when looking for a sitemap. However, the relevant line in robots.txt says:

 

Sitemap: https://www.rmweb.co.uk/community-hide/sitemap.php

 

Unfortunately, that's not the correct URL. The actual location of the sitemap is https://www.rmweb.co.uk/sitemap.php (which I found by just guessing), but that in turn has links to child sitemaps, all of which have the same spurious 'community-hide' in the URL.

 

So, at the moment, there isn't a sitemap accessible to search engines, because the one in robots.txt isn't valid, it isn't on the fallback default URL, and search engines aren't going to try and guess by chopping and changing the URL in the way that I did. Which means that, at the moment, the only way for search engines to index the site is by starting at the front page and recursively following all the links. Which is a lot slower than using a sitemap.

 

I'm pretty sure this is something which needs to be fixed by Invision, given that the sitemap is dynamically generated by the software (and the robots.txt is otherwise correct for the current version, so it's not just a case of old data still being on the site). And I'm aware that it's Sunday evening as I write, and this is hardly a mission critical problem, so I'm not expecting anyone to look at this until tomorrow at the earliest, or maybe even later in the week, and so I'm certainly not going to complain if I don't get an update for a while. But I thought I'd mention it now, while it's fresh in my mind, and you can pick up on it as and when you've got sufficient circular tuits in stock!

Link to post
Share on other sites

  • 2 weeks later...

I should add that this does now seem to be resolved, so presumably someone has given the software the appropriate prod. Google is now indexing the correct URLs, at last check it had around 56,000 in the new form and 177,000 in the old form. That should give an idea of how long it's likely to take before they're all correct.

 

One thing that might help would be to add an explicit entry in robots.txt to disallow indexing of URLs in /community/*. I don't think that will necessarily speed up indexing of the new ones, but it will help clear out the old ones and mean that people won't end up following dead links.

  • Like 1
  • Informative/Useful 2
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...