Monday, September 16, 2013

Why Your Website Is Not Cached in Google

For people who are asking “what is the reason not to show cache of a website”, here is a brief rundown of possible explanations.

Someone left the following comment a week ago. I have tried twice to send an email to the person but their mail service says the mailbox is unavailable. Normally I would leave matters at that but this question comes up from time to time so I thought I would write a brief article about it. To the original commenter, I hope you see this. Your site should be fine.

The reason why your site is not cached in Google is the “NoArchive” directive that you found in the settings. Turning that off effectively removes the “noarchive” directive from your pages. Google and other search engines will now cache your pages.

Does the “NoArchive” directive affect rankings or search visibility? Not that I have ever seen. People sometimes use this directive to prevent search engines from copying their content for redistribution. That is, if you do something on your Web pages to prevent visitors from copying and pasting the content elsewhere, a search engine cache page can still be used to copy and paste the content. Preventing the search engine from caching makes it a little more difficult to copy and paste.

I should point out to people who want to prevent copying of their content that every time a visitor hits your page their browser copies everything to their local hard drive. The visitor can ALWAYS grab your content and republish it elsewhere. Streaming content doesn’t work that way, but there are programs available on the Web that will record streaming feeds.

Remember the old adage: If you put something on the Internet, it’s out there to stay.

Search engines may not cache a page while showing it in their SERPs for at least one other reason. For example, in the comment above, the writer checked his “robots.txt” file. If you block a URL in your “robots.txt” file but a search engine finds links pointing to that URL, the search engine may show that URL in its results. These listings are sometimes referred to as “URL-only” or “Uncrawled listings”.

Just getting DISMAL TRAFFIC to your Website?  Let's change all that.  CLICK HERE to contact Reflective Dynamics...
Google won’t show these URLs in its safe search mode but if you loosen the search restrictions you may see them from time to time. The search engine displays the uncrawled URL to let you know it’s there but that it doesn’t know anything about the content of the page.

For this reason, when you want to remove content from a search engine’s listings they recommend that you allow them to crawl the pages but that you embed “noindex” directives in the meta tags. Robots meta directives look like this:

<meta name=”robots” content=”" > (No restrictions)
<meta name=”robots” content=”noindex” > (Do not index this page)
<meta name=”robots” content=”noarchive” > (Do not archive/cache this page)
<meta name=”robots” content=”nofollow” > (Do not follow any links on this page)
<meta name=”robots” content=”noodp” > (Do not use the Open Directory Project description in SERPs)
<meta name=”robots” content=”noydir” > (Do not use the Yahoo! Directory description in SERPs)

You can combine them, separated by commas:

<meta name=”robots” content=”noindex,noarchive,nofollow,noodp,noydir” >

You can give positive directives as well, although most SEOs feel these are redundant and unnecessary:

<meta name=”robots” content=”index,archive,follow” >

Another reason why a search engine may now show cache data for a page is that it has only just recently crawled that page and has not fully integrated the data into its index. This situation is usually temporary and I believe usually only lasts for a few days.


Finally, if you serve your page content through some special AJAX or Javascript code, the search engine may only cache the Javascript code and not actually cache the contents that the code displays.

No comments:

Post a Comment