View Full Version : DMOZ Import Mod - Broken
David
08-04-2005, 04:22 PM
Please note: as expressed by the writer of this mod (f1gm3nt), there are some bugs, and also for some categories it can take hours and hours to import everything. So use at your own risk. It would not be recommended to use on an existing directory until you have used and understand this mod. Create a test directory first.
f1gm3nt
08-07-2005, 02:11 AM
Hey guys, I just wanted to post and tell everyone that I'm working on this thing to improve it, however if you do use this I want to point out that you save your current DB. If you are wanting to start from scratch then go for it. The script outputs a lot of debug info that makes you guess what it's doing. I'd like to make it a little cleaner. I'll keep everyone informed of my progress.
David
08-07-2005, 02:35 AM
That's great. I really appreciate your contribution here.
f1gm3nt
08-07-2005, 02:39 AM
Don't worry about it, I've also fixed a few more bugs =) My site now has 9274 active links and 1326 active categories. All of that comes from about one or two hours of watching TV and letting the script do it's thing. I'd like to encourge people and post commments and suggestions in here because I'm almost ready to release a newer version of this.
latino
08-07-2005, 02:47 AM
Hi
f1gm3nt, could you post the latest version of the mod so we use latest available code ;). I am eager to test it. Also, I am somehow an experienced PostNuker.
I plan on trying a PostNuke integration asap. It will not be a PostNuke module but a hack/mod. The important thing for me right now is to make PHPLD fully functional within PostNuke CMS.
Regards!
:wink:
David
08-07-2005, 02:52 AM
Content visible to registered users only.
That would be great, and then maybe we can make a post on a couple of the phpnuke community boards letting them know about the script, and maybe one day it will eveolve to become a plugin. :)
latino
08-07-2005, 02:58 AM
Hi David:
I posted about PHPDL about an hour ago at PostNuke forum ;) :
http://forums.postnuke.com/index.php?name=PNphpBB2&file=viewtopic&t=42711
Yes having a module or plugin will be nice!. Let se how this evolve. Also when I have it working within PostNuke I will post here.
Take care!
:wink:
David
08-07-2005, 03:12 AM
Cool, I posted a response over there as well. I hope what I said was ok.
Now...back to this DMOZ mod. I am going to try it now. :)
David
08-07-2005, 03:30 AM
Ok, I uploaded the files and created the tables, but I get this error message:
"SQL ERROR:1146: Table 'webmast_tools.dmoz' doesn't exist"
IMHO, it DOES exist! But surely I am missing something.
I'm not complaining, just hoeing you'll know what to say next.
Thanks,
David :)
latino
08-07-2005, 05:08 AM
Hi:
One question, could the script be used more than one time to extract different categories?? I mean could it _append_ new entries to the DB?
Thanks!
:wink:
David
08-07-2005, 05:12 AM
I took a look at it. I think it would do that just fine. You would just change the values in the dmoz table and the capture.php file. You might have to also go in and clear old data in phpmyadmin.
f1gm3nt
08-07-2005, 01:23 PM
Real quick how this thing works. It'll check the dmoz table for URLs to crawl. If it doesn't find any or they are all crawled then it doesn't do jack. If however if finds URLs to crawl it crawls them and adds the category and link into the database. When this is done you will run convert.php which will convert all the data into phpLD. Once you upload your script be sure to add the url you want to start the crawl and also edit crawl.php to those values. (crawl.php uses thoses values to make sure it only pulls from that category and it's subdirectories)
- Joshua
latino
08-09-2005, 05:40 AM
Hi:
I installed with localhost to try from home. It tells:
Crawl complete
But is not working. Anyway to make it work from localhost? I just would like to prevent to expose public IP to be banned from DMOZ.
I installed, PHPMYADMIN, Installed PHPLD and created the DMOZ extractor tables.
TIA
:wink:
David
08-09-2005, 06:37 AM
I've got an updated version for you (courtesy of f1gm3nt)
I haven't tested it yet, but I think it is probably better still.
f1gm3nt
08-09-2005, 08:26 AM
I've fixed a few bugs and it should work well enough to use in a live directory. If you experience problems post them here or PM me. Also be sure to back-up your database because there might be a few bugs that I haven't found.
latino
08-10-2005, 12:07 AM
Hi
I am having trouble, program reporst s 'Crawler Complete' in two different servers. Nothing done. Have modified the sql and dmoz info in the class files...
Setup:
Apache 2.0.54
PHP 5.0.4
MySQL 4.1.12
RHEL 4
Later
:roll:
f1gm3nt
08-10-2005, 12:52 AM
You have to add the first into the dmoz table. For example if you want to crawl the category Arts you would go ahead and add http://www.dmoz.org/Arts and leave the crawl setting to 0. I haven't got to the point where if it doesn't find anything in the dmoz table that it starts at what you entered in, in class.dmoz.php. Hope this helps, let me know
- Joshua
David
08-11-2005, 06:41 AM
test
We were having some problems replying to this topic. I think it is working now...not sure what the glitch was...? :?
wehost4u
10-09-2005, 06:35 AM
Great mod - but need a little guidance!
I was able to add the sql tables, no problem.
Edited the 2 files. No problem.
Added the url as a row in sql.
Ran crawl.php - great! It worked!
Ran convert.php - great! That worked too!
Now - silly me - I am totally lost! I SEE the links in a table called "links" in the sql.
HOW in the world do I get them to show in my phplinkdirectory?
Thanks!
Ciao121
10-17-2005, 04:44 PM
Hi all,
I succesfully did crawl.php. When I run convert.php I get this:
Content visible to registered users only.
FOUND SOLUTION
This patch not present in current version
webwriter
10-21-2005, 08:58 PM
This mod doesn't work and isn't supported- I suggest it be moved from the mod list.
Ciao121
10-21-2005, 10:06 PM
I worked on it two days, then I started crawling and I succesfully imported more than 180.000 links...
David
10-21-2005, 10:10 PM
It's good to hear some people got it working. Hopefully, one day soon someone will submit and improved version that works smoother.
jminscoe
10-21-2005, 10:41 PM
I got it to crawl I think but then I got this error
SQL ERROR:1054: Unknown column 'url' in 'where clause'
does anyone know what it means :?:
Ok I got it to work I just had to go into where it said Men's Health and get rid of the apostrope it can look at it here http://www.ahealthylifestyle4u.net/directory/ I am still redoing the colors :o
Now I am getting this error
Warning: file_get_contents(http://www.dmoz.orgHealth/Conditions_and_Diseases/O): failed to open stream: Success in /home/jminscoe/public_html/directory/crawl.php on line 21
Crawling: Conditions_and_Diseases/O
no pages to add for crawling.
no categories found
no links found
crawl.php on line 21 says this
$data = file_get_contents($row['url']);
vBulletin® v3.8.0, Copyright ©2000-2012, Jelsoft Enterprises Ltd.