Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In with Facebook Sign In with Google Sign In with OpenID Sign In with Twitter

Sponsors

Gold

Gold sponsor

Silver

Silver sponsor Silver sponsor

Bronze

Bronze sponsor Bronze sponsor Bronze sponsor Bronze sponsor Bronze sponsor Bronze sponsor

Duplicate Content Pages Issue

Dear Friends?

I have a problem that one of my client website pages are reading in text version.

Actually one of our external link webmaster has putting our link in his website has Ex: "www.abz.com/xys.php/" which is wrong url. He (External link webmaster) is putting slash '/ ' at the end of the url. Now Google is reading it in text version which google will consider it as duplicate content page of our "www.abz.com/xyz.php" website.

Kindly help me why and from where google is reading these files????
«1

Posts

  • Its a problem that Ive faced many times, people will link to all sorts of incorrect versions of your pages... You need to have strict url validation rules setup using your .htaccess (providing you are using apache hosting ofcourse) to strip out these incorrect url's, rewriting them to the way they should be.

    Add this to your .htaccess file in your site root:

    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_URI} ^(.+)/$
    RewriteRule ^(.+)/$ /$1 [R=301,L]

    you MAY need

    rewritebase /

    as well at the start of that, depending on your specific host. For instance, rackspacecloud require that as well.

    That should sort it out for you!

    MOGmartin
  • PS: Google is reading those files, because someone linked to them - and they render a page.

    the .htaccess hack I mention above will solve the problem though.
  • Thank you very much.

    I will try it now and get back to you.
  • Dear Martin,

    I have discussed with our developer and he is saying that this could not be done. Bcoz our website server is not having these type of file then why google is reading them whenever a visitor/developer is giving '/' at the end of the url.
  • If the .htaccess file doesnt work on your server its because its a windows / asp server, which isnt the best to be honest for exactly this reason.

    you can still do it though by adding this to your server config... your developer should know how to add this:










    Let me know if it works!


    EDIT: Its VERY unusual for a php server to not recognise a .htaccess file mate, who do you host with?
  • are you saying about redirecting the url in other way......
  • No not really...


    so that we dont get fragmented here:

    1) If your server is Linux / Apache / php / MySql
    This is what you do:

    Create a file in notepad - call it ".htaccess"

    copy and paste this into the file:

    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_URI} ^(.+)/$
    RewriteRule ^(.+)/$ /$1 [R=301,L]

    Save it, and then upload it to your root directory of your website.

    2) if your server is IIS / MS / ASP / MSSQL

    Add these rules to your server config file:










    --------------------------------------------------------------------------------

    As you said in your original post that your file names end in .php - Im 99.999% sure that the first one is the one that you should do.

    PHP generally runs on Linux Apache servers.

    I would be happy to bet that would solve your problem. If your developer is saying otherwise I would get a second opinion mate.


    thanks!

    MOGmartin
  • Dear Martin,

    The use of .htaccess files can be disabled completely by setting the AllowOverride directive to "none"

    AllowOverride None

    what is this mean....

    Can you please explain me on this....
  • the only time Ive EVER heard of a host using AllowOverride None is because they are a free host, with security issues that .htaccess could abuse.

    EVERY decent host in the world supports .htaccess - without it things like wordpress etc. just wouldnt work properly.

    Who are you hosting with?
  • RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_URI} ^(.+)/$
    RewriteRule ^(.+)/$ /$1 [R=301,L]

    can you explain me this. Is this is rewriting the url?????

    Note: My main doubt is there are no such files which has '/' but still google is able to read the files from our website. How??? Is there any problem with coding in rewriting the url????
  • RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_URI} ^(.+)/$
    RewriteRule ^(.+)/$ /$1 [R=301,L]

    if you dont mind can you explain me regarding this code about ....
  • said:

    RewriteEngine on
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_URI} ^(.+)/$
    RewriteRule ^(.+)/$ /$1 [R=301,L]

    if you dont mind can you explain me regarding this code about ....

    Line #1 – The ability for the server to rewrite URLs needs to be turned on.


    Line #2 – Check to make sure that the requested file is not a directory. Signified by !-d (if you want to check to make sure the requested file is not a file !-f should be used instead).


    Line #3 – Check to see if the URI that was entered has a trailing slash.


    Line #4 – If the requested URI did have a trailing slash, then lets use a regular expression to copy the URL up to the slash and then perform a 301 redirect to the URI without the trailing slash.




    ------




    Its not a problem of coding, its just a webserver issue.


    If you can go to



    yoursite.com/file.php
    AND
    yoursite.com/file.php/


    and they BOTH display a page (the same one) then google will see that as two different pages, as they have distinct URLs - its a mistake on google's part really.


    But, add that code to your .htaccess file as I outlined before, and your problem is solved!
  • Thank you but still iam little bit confused....

    how google is able to see & read these pages???? if these pages are not available in the server root directory
  • what you are saying is solution......

    but why and from where google is reading these pages????
  • you can see them in a browser right?

    if so, they "exist" - irrespective of whether they actually exist in your file structure.
  • Actually they are indexed by google.

    We have blocked all the url's in the robots.txt which has '/' as well as in google webmaster tool.

    iam not convinced with this solution. So, iam trying to get the solution from how is reading these pages when these pages not in the root directory. I have confirmed with developer that these pages not exist in the root directory.

    So, I need to know from where google is reading these pages and how it is possible for it to read?????

    Yes i can see them in the browser before i blocked in the robots.txt
  • said:

    Actually they are indexed by google.

    We have blocked all the url's in the robots.txt which has '/' as well as in google webmaster tool.

    iam not convinced with this solution. So, iam trying to get the solution from how is reading these pages when these pages not in the root directory. I have confirmed with developer that these pages not exist in the root directory.

    So, I need to know from where google is reading these pages and how it is possible for it to read?????

    Yes i can see them in the browser before i blocked in the robots.txt

    mate your getting a bit confused here:

    1) the pages that you are seeing in your browser dont need to exist as files.

    2) blocking them in robots.txt will have no affect on your browser

    3) you DONT need to block them in robots.txt

    4) you MUST redirect them using the .htaccess file that I mention before, to consolidate link strength

    5) Google is reading these pages, because someone has LINKED to them incorrectly!!


    If you follow the steps I gave earlier, the problem will be fixed!!!
  • Yes ur write...

    how long we do corrections for external links.

    I want perminent solution instead of redirection.


    Google is making another page in text version (just like google cache text version) from the original page and indexing at his data base.
    Ex:
    home page: www.mypage.com/index.php
    original page : www.mypage.com/xyx.php
    text version : www.mypage.com/xyx.php/index.php

    I want to know how it is making it text version page and reading.

    Here is my example. Google is reading till .php but bcoz of external link mistake it has been published with slash at end of the url. I want to know how google is able to taking extention content after the slash. if server is not having these files.
  • Line #4 – If the requested URI did have a trailing slash, then lets use a regular expression to copy the URL up to the slash and then perform a 301 redirect to the URI without the trailing slash.

    Means we are agreeing and saying to google to redirect the '/' url to be main folder.

    i think, we are accepting our mistake that we do have these type of files in our server but we are instructing google not to read those files.

    is it correct?????
  • no - the URL with a trailing slash is rendering the page that doesnt in reality have a trailing slash.

    Thats your webserver thats doing that, its not a mistake, its just the way that your server is configured.

    There is not right/wrong way, just an optimal and non optimal way ;)

    - by redirecting the pages to the exact same address you are making it optimal.
  • Yes ur write...

    how long we do corrections for external links.

    I want perminent solution instead of redirection.

    Google is making another page in text version (just like google cache text version) from the original page and indexing at his data base.
    Ex:
    home page: www.mypage.com/index.php
    original page : www.mypage.com/xyx.php
    text version : www.mypage.com/xyx.php/index.php

    I want to know how it is making it text version page and reading.

    Here is my example. Google is reading till .php but bcoz of external link mistake it has been published with slash at end of the url. I want to know how google is able to taking extention content after the slash. if server is not having these files.
  • matrin i have another doubt...

    is there any chance in coding wrong at rewriting code. bcoz of any mistake in codeing in rewrite code.
    google is readin like this??????
  • If a page will render on your webserver, and it has a link to it, google will find it eventually.

    It doesnt matter that your server doesnt contain these files, only that you can navigate to them in a web browser.

    the redirection IS the permanant solution, as if you had that in place, the person would not likely have linked to the "wrong" pages.

    You do understand that the redirection will work for ALL pages on your site, not just the ones where you have this problem right?
  • said:

    matrin i have another doubt...

    is there any chance in coding wrong at rewriting code. bcoz of any mistake in codeing in rewrite code.
    google is readin like this??????

    nope. its a fairly normal problem. just use the .htaccess I wrote and you will be fine
  • can i have correct rewriting code for dynamic pages
  • can i have correct rewriting code for dynamic pages
  • said:

    can i have correct rewriting code for dynamic pages

    that depends entirely on the format of the dynamic pages mate, so "standard" code might not work properly...
  • what website is totally designed on .php. when iam adding articles it is automatically creating a dynamic page.

    can you give rewrite code for these pages
  • it depends on the url structure... its not something that I can give an accurate generic answer to. It depends on case by case...
  • where do i get this type of code for dynamic url

    my url is like this: www.xyz.co.uk/news.php?nid=105486
Sign In or Register to comment.