Wordpress Themes - WP Forum at BFA
There will be no more development for Atahualpa (or any other theme), and no support. Also no new registrations. I turned off the donation system. I may turn the forum to read only if it gets abused for spam. Unfortunately I have no time for the forum or the themes. Thanks a lot to the people who helped in all these years, especially Larry and of course: Paul. Take care and stay healthy -- Flynn, Atahualpa developer, Sep 2021

Wordpress Themes - WP Forum at BFA » WordPress Themes » Atahualpa 3 Wordpress theme » Plugins & Atahualpa »

I'm using WP + Atahualpa 3.5.3, Not sure how to disable Google to crawl non-exist URL


  #1  
Old Aug 30, 2011, 11:50 PM
adventure
 
15 posts · Oct 2010
I'm using WP + Atahualpa 3.5.3, Not sure how to disable Google Bot to crawl non-existent URLs.
My Permalink is: "/%category%/%postname%.html"

The URL that users see is : domain.com/category/postname.html
But the URL that Google Bot crawled and error as 404 is : domain.com/postname.html

Does anyone know how to disallow Google Bot to crawl those URL ?
I saw in the Webmaster Tool : Link From is unavailable --> not sure what we should do to see the caused of problem ?


Thanks everyone for the helps.

Last edited by adventure; Aug 30, 2011 at 11:52 PM. Reason: missed type
  #2  
Old Aug 31, 2011, 05:18 AM
juggledad's Avatar
juggledad
 
23,765 posts · Mar 2009
OSX 10.11.5 WP 4.x Atahualpa(all) Safari, Firefox, Chrome
remone the '.html' from the permalink
__________________
"Tell me and I forget, teach me and I may remember, involve me and I learn." - Benjamin Franklin
Juggledad | Forum Moderator/Support
  #3  
Old Aug 31, 2011, 06:54 AM
adventure
 
15 posts · Oct 2010
Hello Sir,

Thank you so very much.

Let me try to delete the .html, and will come to update in this post soon.

PS 1: But i'm afraid all working urls will be gone ?

PS 2 : i like this word"'Give a man a solution and you solve that one issue. Teach a man how to solve problems and you've freed him for a lifetime.' "
  #4  
Old Aug 31, 2011, 07:19 AM
adventure
 
15 posts · Oct 2010
Dear Mr. Juggledad,

I've deleted the .html in the permalink on my other sites
and it doesn't work; all the current urls that arleady in google search results return 404 as those url already index in Google SERPs.

my exact current problem

google crawl both WP-domain.com/category/computer.html -- this is working for 2 years

I saw its crawl WP-domain.com/computer.html after install 4 plugins (link checking, xml sitemap, bot check, W3 Totol cache)

Do you think the url WP-domain.com/computer.html was generated by one of those plugin ?

i just disable the 3 ones :link checking, xml sitemap, bot check -- as i think is the source of problem ?


Big thanks
  #5  
Old Aug 31, 2011, 12:12 PM
juggledad's Avatar
juggledad
 
23,765 posts · Mar 2009
OSX 10.11.5 WP 4.x Atahualpa(all) Safari, Firefox, Chrome
without knowing the URL, it is hard to tell, but with the permalink you have, itf you created a post r page titled 'computer' it would show as 'computer.html'

You might want to look at a plugin to see if you can do a redirect
__________________
"Tell me and I forget, teach me and I may remember, involve me and I learn." - Benjamin Franklin
Juggledad | Forum Moderator/Support
  #6  
Old Aug 31, 2011, 12:59 PM
adventure
 
15 posts · Oct 2010
Thank you for you kind update

My website is: http://tinyurl.com/3uyuanz

I just added more command on robots.txt :

User-agent: *
Disallow: /cgi-bin
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /trackback
Disallow: /feed
Disallow: /comments
Disallow: */trackback
Disallow: */feed
Disallow: */comments
Disallow: /*?*
Disallow: /*?
Disallow: /xmlrpc
Disallow: /tag/*
Disallow: /tag
Disallow: /rss
Disallow: /archives
Disallow: /wget/

Not sure if this correct ?
Also not sure if the current W3 Total Cache Plug-in is concern with the URL 404 error ?


After that try delete plug-in that seem to related to link generation issue like sitemap, link checking, bot checker plug-in.

I also deleted the file sitemap in the server as i'm not sure if this file link to invalid URLs ( as this file might generated from plugin that not compatible with my permalink/ word press config.)

Will see one or two days for update about crawling issue.

Thanks Mr. Juggledad and all Readers.

Last edited by adventure; Aug 31, 2011 at 01:06 PM.
  #7  
Old Sep 1, 2011, 01:43 AM
adventure
 
15 posts · Oct 2010
Other issue:

How to set Atahualpha not to let Google bot NOT to crawl RSS ?
or the urls like domain.com/postname.hml


on google webmaster tool saying that http://mydomain.com/category/feed on robots.txt restriction tab



Thanks sir.
  #8  
Old Sep 1, 2011, 06:09 AM
adventure
 
15 posts · Oct 2010
Seem like got the the 404 url from W3 Total Cache plugin

Does anyone know how to fix the problem ?
  #9  
Old Sep 2, 2011, 12:03 PM
jcsites
 
1 posts · Aug 2011
U.S.
This is also my concern. Will this be permanent? Or will a non-existent URL be eventually moved down in Google results once the other existent pages are crawled? Thank you

lifelock coupon

Last edited by jcsites; Dec 12, 2011 at 03:20 AM.

Bookmarks




All times are GMT -6. The time now is 08:49 PM.


Powered by vBulletin® Copyright ©2000 - 2025, Jelsoft Enterprises Ltd.