PDA

View Full Version : Dmoz Import - New Mod.


zootreeves
11-16-2005, 05:45 PM
Hi,

Added by David: A have attached the mod here to the first post.
I haven't tested, but look forward to seeing what people say about it.

I noticed that people were having some problems with the dmoz import mod and I myself needed one, so I created a new one.

It imports data from the dmoz RDF files (http://rdf.dmoz.org/) and so far seems to be working well. I have indexed all 590,000 categories + sites with no problems (Other than it takes several days).

It is not ready to post yet as It needs a readme and so tidying up, but if anyone is interested I will add the finishing touches and post it.

{url no longer in service}

agolkar
11-16-2005, 06:48 PM
would be very interested :) thanks

zootreeves
11-16-2005, 10:07 PM
How do I put an attachment on the post?

Ap0s7le
11-16-2005, 11:25 PM
Hey Ben, you'll need to email it to one of the admins (David, yktan and myself) and we'll attach it to the post for you.

I look forward to seeing the script, best of luck :)

-Casey

zootreeves
11-19-2005, 01:09 AM
Sorry for the slow posting.

Here is the script: (removed as it has been attached to the first post) (If anyone can post a mirror or if the mods want to attach it to the post instead I would be greatful)

Instruction are included in readme.txt.

thanks[/url]

zootreeves
11-19-2005, 05:50 PM
Anyone have any comments?

Hizson
11-21-2005, 06:18 PM
Since I am fairly new to the directory world I am wondering why import so many links and categories at one time? I thought that I had read somewhere that aquiring a lot of links at one time is harmful to pr? Or is this only dealing with backlinks?

I am somewhat interested in this mod but would like to know if there is anyway to specify a limit of links that get imported and also can the categories be limited to specific ones?

Thanks,

Jere

David
11-21-2005, 07:36 PM
I've attached the mod to the first post.

zootreeves
11-21-2005, 08:26 PM
Yes too many outbound links are meant to damage your pr, but I am not actually using it as a link directory for pr, I want to build my own dmoz style directory and am just using the phplinkdirectory script as a base.


You can use links like:

<a href=http://thisisnotmysite.com rel="nofollow">

and this will tell google not to follow it, so you will not lose any pr. But the website owners your linking to may not be too pleased with it.

You can limit links that are imported to a single category, and you can change the maximum number of links to be imported (although It is not very accurate). just open up buildcats.php once the mod is installed and it will ask you

Davilac
11-22-2005, 12:03 PM
Outbounds links DO NOT hurt your Google PR, nor your rankings.

ibold
12-30-2005, 12:07 PM
Content visible to registered users only.

This is not accurate..
Each link is given importance based on the PR of the initial page divided by the number of links on said page, plus other factors like category, content, etc etc.
Therefore, if you link to 10 other sites from one of your pages, then all the links that are internal for your site will not pass as much PR to your other pages.
Otherwise, one PR8 or 9 with PR 8 internal pages site could link to 10,000 (for example..) pages and give them all PR 7 or 8 (from it's main page and all it's internal pages)..it would be a poor system on Googles part if it didn't hurt your PR..

webwriter
01-04-2006, 06:12 PM
If you are building a directory, you are going to have a lot of outbound links. That's what a directory is.

There are tons of SEO forums out there if you want to know more about it, but the bottom line is that outbound links each get an equal percentage of the "link vote" that page has to give.

Some people think it's better to "hoard" the votes by limiting their outbound links so that all links are internal, however there is considerable evidence that a site designated a "hub" with tightly themed links actually benefits from outbound links and can even become an authority.

Here's an article I read a long time ago about it, but it still is good advice. Links are Good for Business (http://www.highrankings.com/issue100.htm#guest)

Now, does this mod ONLY import the entire DMOZ dump, or can you limit it to certain categories in some way?

ibold
01-09-2006, 03:30 AM
You can index only certain categories I believe (I have not tried this yet). Looks like this in the import page:
Base Directory? (/Top for all)
For example to only index "kids and teens" it would be Top/Kids_and_Teens

Also takes quite some time...been going on about 30 minutes and just importing the categories hasn't finished..

FB
01-09-2006, 07:21 AM
I tried this on one category and it just ended up timing out sometime during the night...tried it again and the same thing. I finally just gave up.

ibold
01-09-2006, 09:05 AM
Content visible to registered users only.

Had the same experience myself =(

djhomeless
01-16-2006, 03:16 PM
Now what?

I've run throug the process twice now, both times I got the success message. However there's nothing in the system, and no new records in the database.

Any ideas? I assume this script doesn't hard code the database name?? Is there something to do after the script runs?

Thanks,

Geoffrey

Syphon
01-16-2006, 09:12 PM
Does this only work on the Dmoz? Or will it extrat from any directory? Also if i extract from dmoz can i extract only a certian area then place it in a diffrent area on my site?

zootreeves
01-27-2006, 08:46 PM
This version should work much better, there was a bug in the last one which resulted in it becoming an endless loop:

{url no longer in service}

nate_king1
01-31-2006, 05:01 AM
How the hell do you download the .gz files? I started it on firefox 5 times and it always crashes. I started the file with MSIE and it too crashed. Please let us know exactly step-by-step how to get the data or maybe you can host it so I can download it? I just want the categories that Dmoz has, not the links.

Kevuk2k
01-31-2006, 09:59 AM
Content visible to registered users only.

I have the entire .gz files and if you want them you can have them, its 601 megs just for the cats. It does time out but I think it is misleading as it is their servers that create this not yours. They are under a heavy load as more and more people keep parsing thier script. You need a super fast connection like me! 8)

What I can do is to create a zip file, break it down into little pieces for you and upload it to a peer to peer fileshare system, or alternatively you can have the whole thing via msn messenger. Amazingly, this is the fastest p2p system out there and they don't know it. All you need is to keep online with me until the whole file has transferred. You WON'T time out as my connection is fast enough and I will only allow one at a time to recieve it. Seems to be the obvious way around it!

Crow
02-02-2006, 01:32 AM
I've downloaded both files but they downloaded as XML files, is that correct? Also Structure is 600mb and Content is 2 Gig's is that right? I want to make sure before I upload these to my host.

Content visible to registered users only.

I have the entire .gz files and if you want them you can have them, its 601 megs just for the cats. It does time out but I think it is misleading as it is their servers that create this not yours. They are under a heavy load as more and more people keep parsing thier script. You need a super fast connection like me! 8)

What I can do is to create a zip file, break it down into little pieces for you and upload it to a peer to peer fileshare system, or alternatively you can have the whole thing via msn messenger. Amazingly, this is the fastest p2p system out there and they don't know it. All you need is to keep online with me until the whole file has transferred. You WON'T time out as my connection is fast enough and I will only allow one at a time to recieve it. Seems to be the obvious way around it![/quote]

Julius Romo
03-07-2006, 02:55 PM
Fix runs great!

5 minutes, 136.000 categories imported.

In some minutes, I'll tell u if links are imported well.

Julius Romo
03-08-2006, 12:56 AM
When categories are finished, the script stops.... :cry: :cry: :cry:

zootreeves
03-08-2006, 10:56 AM
Did you check the checbox to import links? Try it again, but this time you don't have to select import categories, see what it does.

Julius Romo
03-08-2006, 11:02 AM
Yes, I tried it twice....

It's a subcategorie (Top/World/EspaƱol)

Julius Romo
03-08-2006, 11:10 AM
I did it with Firefox... could it be for this? Did u used Firefox, Explorer....?

See The Web
03-09-2006, 11:50 AM
I've tried using this MOD for version 3.0.2 but have recieved the following errors:

Deleting All existing categoriesMySQL error 1061: Duplicate key name 'DIR_INDEX'


Checking for RDF data:
structure.rdf.u8 does not exist in /home/adz1980/public_html/seetheweb/admin/! exiting

Can anyone help me with this?

zootreeves
03-09-2006, 01:21 PM
You need to get the the dmoz rdf data and put it in your admin directory. You can do it via shell:

# cd /home/path/to/phpLD/admin
# wget http://rdf.dmoz.org/rdf/structure.rdf.u8.gz
# wget http://rdf.dmoz.org/rdf/content.rdf.u8.gz
# gzip -d content.rdf.u8.gz
# gzip -d structure.rdf.u8.gz

I will have a look at the other error, but i don't think it will cause any advserse affects

jgsketch
03-14-2006, 04:14 PM
Has anyone been able to get this to work on specific directories yet?

jgsketch
03-15-2006, 07:53 PM
Sorry, nevermind. What I ment to ask is, How do you download just a particular category without downloading the entire RDF file? Thanks.

David
03-15-2006, 08:36 PM
I may have that solution shortly.
I am working on this for someone...in this case to only get certain categories.

jgsketch
03-15-2006, 10:40 PM
That's great news, I shall wait patiently.

See The Web
03-18-2006, 02:19 PM
When I use this mod I seem to be able to get the structure in place but the submit button starts to fail? Any clues?

Notes from start to problem:
-Complete clean install of the latest PHP LD
-Turned ModRewrite On
-Installed Mod and uploaded DMOZ files
-Uncompressed DMOZ files
-Ran Mod...waited for a while
-Checked web directory (www.seetheweb.co.uk) all categories seem to look fine
-Submit button not working

David
03-18-2006, 04:33 PM
Here is the problem:
DMOZ has 11,000,000 categories

With regard to the submit page, we would need to use the mod where only the category the user is submitting to is viewable. Otherwise, is will try to populate the category drop down list with too much data.

Also, I don't believe phpLD could handle 11,000,000 categories without some advanced mysql procedures and load balancing.

What I think would be nice is to see how deep we can go and still run. Surely we can go down to the third category level.
ex. top/business/consulting/internet/

zootreeves
03-22-2006, 01:51 PM
Permformance is not too bad with all the categories - I just setup a test at http://www.qkos.com/phpLD/.

It is the searches which are really sloow. Please do not use the search on the link above because you will overload the server.

Search speed could be imporved by using mysql full text search instead of using LIKE = ''.

Garfield
04-07-2006, 01:26 AM
Content visible to registered users only.

What the heck is 'shell'
I was able to dowload the dmoz files but when I try to upload I get an error message that the file type is not supported and that the target cannot handle this type od document...help!

anon
04-07-2006, 02:26 AM
If you don't know what 'shell' is, then you're probably not going to want to go there. And while I can explain it, time doesn't permit, so i shall let wiki do it for me.

http://en.wikipedia.org/wiki/Secure_Shell

Garfield
04-07-2006, 02:37 AM
Yikes! That looks extremely complicated for a web moron like me. I'll just have to figure out a way to get those files there and decompressed. I've gotten this far not knowing anything, maybe I can get just a little further. :)

jgsketch
04-07-2006, 07:45 PM
Lol, it was too much for my brain too. I'm still doing it by hand. Got 435 links and counting for the first month. I'll continue until I find a solution that works. I should be done in about 9 more months. :)

I'm kinda glad though, I'm able to weed out all the foreign links, which there are alot of, and free site links from geosites and homestead etc. Even found some links that did not belong in that particular category, webdesign in a pet category. I didn't releazie how many mistakes are in DMOZ until I started adding sites by hand. I even found some redirects to **censored****censored****censored****censored** sites, which I reported. So even if I do find a solution, I might just stick to my current method, better qaulity links unless I myself make a mistake too. I would never attempt to do this on a large scale though.

anon
04-07-2006, 08:57 PM
i'm a web nerd. :D

florian
04-21-2006, 07:35 PM
Importing Category works but when it comes to Importing Links the scriot stops. Any Idea, anyone?

bkiyer
05-05-2006, 03:40 PM
Content visible to registered users only.

Your "submit link" menu should point to "submit.php" which is the original name of the file. Rightnow, it is trying to call a file "submitlink.php". Do you have any file by that name?.

Steven Myers
07-23-2006, 04:01 AM
*Bump* When it says in the buildstats.php /public_html/admin/! does the ! mean this is a folder?

kservik
08-28-2006, 04:58 AM
Could you provide a .txt with information on how to set this script up?

maxxfusion
08-29-2006, 05:21 PM
I just bought and installed PLD and am now trying to use this script. I get

Warning: Smarty error: unable to read resource: "admin/buildcats.tpl" in /home/thepumas/public_html/links/libs/smarty/Smarty.class.php on line 1095

David
08-29-2006, 05:42 PM
I can tell you that we have never gotten a "perfect" solution to DMOZ importing.

Are you trying to import all of DMOZ or just part of it?

David
08-29-2006, 05:45 PM
I might add that Google seems to identify copies of DMOZ, and in most cases, the pages get nailed with a duplicate content penalty. Creating your own categories or importing from another source seems to be more solid from an SEO standpoint.

maxxfusion
08-29-2006, 05:50 PM
Part of it would be fine. I just want some links in there to get things started.

Steven Myers
08-29-2006, 05:57 PM
Content visible to registered users only.

Offer free service for a few weeks or a month, and you'll start to recieve a good flow of traffic and submissions :)

kservik
08-29-2006, 11:33 PM
I just want a small portion of ODP, but I cant seem to get the DMoz import working :(

Regarding SEO, I am just using it as a basis as a DMoz clone is useless and only brands your site negativly in terms of Google.

maxxfusion
08-30-2006, 12:56 AM
Content visible to registered users only.

Like what? Any ideas?

Steven Myers
08-30-2006, 01:04 AM
Let them get free submissions for atleast 1 month, no longer than that because it gives you a chance to get noticed and improve your traffic a bit. Then do ALOT of manual submitting to free directories, this builds your backlinks up and gets Google to work on indexing you of where you're located at on other sites.

maxxfusion
08-30-2006, 01:30 AM
When you say free do you mean free for links with out the reciprocal links? Right now I have it setup that those are fee and a standard link is 3.95 for the year.

hani
08-30-2006, 04:49 PM
Hello,

I download the mod "dirzip - fix.rar" and copy the files in the right folders.
I login in the adminmenu but I dont see any changes.
The download doesn't contain any readme.txt

How can I use this mod?
Maybe anyone can post the readme.txt in this forum?

Best regards
hani

Steven Myers
08-30-2006, 05:38 PM
Content visible to registered users only.

If you're not getting over atleast 150+ unique hits a day, I wouldn't even recommend setting up a fee for them, the basic line of charging is when you have enough traffic to support their payment they made to you.

Set the Reciprocal as free and activate the Free Links option to YES. Set the featured to 3.95 as this is a basic price if you're not getting over 150+ unique hits a day. (See your cPanel or any other webstats tool you have for this.)

hani
09-21-2006, 11:23 AM
Hello,

I cant run it with the new version of pld.
Please would someone (with php knowledge) modify it for the new pld version?
Maybe the admin?

Best regards,
hani