PDA

View Full Version : Working DMOZ import Mod


ibold
01-21-2006, 12:11 AM
I have in hand a working DMOZ import mod, fixed from the script provided in this thread:
{link to old forum removed}

Please PM me if you are interested in getting a copy of it.

If it's helpful to you, it did cost me a bit of money to have fixed via scriptlance, and a donation would be appreciated (don't go overboard, just if you have something you can spare)
My paypal is:
admin at ibold.net

Once I get a mods help I can post the attachment in this post. Enjoy :)

EDIT:
I uploaded it to my own server:
{unfortunately this link was broken and is now removed}

dagger
01-21-2006, 06:31 PM
hi ibold

do you mean you want some $ to give it to me or its free ? because iam looking for this mod :(

djhomeless
01-21-2006, 08:03 PM
I am more than happy to throw down some green for a working mod. I've sent a PM, if you can send me a copy of the working mod I would really appreciate it.

I setup an auction on Scriptlance to fix this as well so I would hate to double up resources on the same fix! :)

Geoffrey

ibold
01-21-2006, 08:41 PM
All PMs sent. I gave David a copy of the script if he would like to include a link here for it. In the meantime, I uploaded it to my site:

{unfortunately this link was broken and is now removed}

Enjoy, and let me know if you run into any problems or have questions.

djhomeless
01-21-2006, 09:01 PM
**censored****censored****censored****censored**. Really got my hopes up.

Same result as before:

Content visible to registered users only.

Now every time I retry crawl.php, it finishes straight away.

Any ideas??

Geoffrey

djhomeless
01-21-2006, 09:10 PM
Hold the phone.

It can't handle root! I mean the root of Dmoz (/). I just tried a subdirectory and it seems to be working.

David
01-22-2006, 12:42 AM
I'll be trying to add this version to 3.0. Understandably, importing from DMOZ, especially from root is an enormous task. We have to start somewhere, and then we'll make improvements as we go. Sound good?

ibold
01-22-2006, 01:05 AM
Thanks to David for making a donation to help support this :)

Neticus
01-22-2006, 07:20 AM
Content visible to registered users only.

Sounds like a similar issue I had with the old mod, I got round this by deleting the previously crawled url from the Dmoz_TABLE as well as emptying the dmoz Category and Links TABLE in phpmyadmin before attempting another crawl. This is regardless of if the convert.php had worked or not.

- Neticus

resource
01-22-2006, 04:10 PM
Take a look at this
http://rdf.dmoz.org/rdf/content.rdf.u8.gz
Just need to extract it, its better to crawl all the pages, I don't have time to write the script for it, but you can just google "Parsing RDF".

Cheers

David
01-22-2006, 07:27 PM
I could probably get someone to write a windows based script that creates all the sql code for insert into tables. And then you would need to import the BIG file using something like big dump.

djhomeless
01-22-2006, 11:26 PM
Well, I'm back a square one!

I tried to import just the /Games category. 200k links, 4k categories, and 24 hours later it was done.

Sadly, the hamsters in my pokey server couldn't run fast enough once the links were all added. Granted, its a LOT of links, but whenever a search was run mysql and/or apache would use up to 90% of the CPU.

Without branching this thread, is there a theoretical or logical limit to the number of cats/links that phplinks can handle? I ask because after the add/edit categories link in the backend kept timing out.

If this were a perfect world, zootreeves's script would actually work, and you would be given a chance to cherry pick cats and subcats instead of having to take the entire tree. zootreeves dmoz script works off the assumption that you have the rdf file downloaded locally which should be faster than doing the scrape.

my 2 cents

f1gm3nt
01-24-2006, 05:56 AM
I think that you are trying to sell my work and that kind of pisses me off. I'm very busy and don't sleep much because I am working on a lot of different projects. If you made some improvements great let me know and I'll update the script. However, it's bull **censored****censored****censored****censored** that you are doing this and asking for donations. I've worked hard on this thing and for some it works and others it doesn't. If I knew an ass like you would rip off my work then I would have never posted this in here in the first place.

ibold
01-24-2006, 07:34 AM
Content visible to registered users only.

Please read the entire post. I paid someone else to do this. As in real money. I'm only asking for a little bit of that back. I posted it for free download up above. I didn't charge anyone for it. This is the idea of an 'open source' product. I don't think that asking for donations from appreciative people goes against the spirit of this. You posting distastful remarks how ever makes me not want to share just as much as suddenly you don't want to. It doesn't help anything. I at least hope you feel better.
How is this any different from you asking for donations? Is your time or money worth more then mine? You invested something in this and so did I. Just because I started with your work doesn't mean I ripped anything off. I distributed it exactly the same way you did with yours.
How often do you take advantage of other peoples works and ask for donations because you have invested time or money in it? Made any donations or even given credit to Andi Gutmans? To Zeev Suraski? How about Rasmus Lerdorf? Maybe if they knew that a person like you would rip off their work they never would have shared it in the first place.

djhomeless
01-24-2006, 08:25 AM
Content visible to registered users only.

You should be thanking users like ibold. He went out and took your script and fixed it, using his own money. There were many forum threads where users like us were searching for an import script that actually worked. I appreciate the time you put into writing your script, but at the end of the day it didn't work, and some of us, myself included, were willing to put our money where our mouths were to get something that did the job.

As ibold suggests do read the thread. He was by no means looking to profit from this, just make back the money (or close to it) that he spent at Scriptlance.

You sir should first watch your language, then offer an apology to ibold. If it wasn't him, it would have been a dozen others.

ibold
01-24-2006, 08:38 AM
As far as an apology, it's not necessary. I certainly don't want to bias a member who makes valuble contributions to this script against making more. I was just trying to make a point. If you make something available for free, what I did (what we both did..) is relatively common practice. I do appreciate this mod and your contribution to this community.

On a more productive front, this fix didn't address the issue within the crawl.php file, but rather the convert.php file.
I am currently crawling most of the larger sections of the DMOZ site (by this I mean the main categories on the main page), and assuming nothing breaks over the next day or two on my computer (it happens..), I will be able to just offer SQL dumps to anyone who is interested in them. Right now I have 'Health' and 'Computers', just PM me if either of these interest you and I can get you an SQL dump.

H2O
01-26-2006, 01:49 AM
hi,

I keep getting sql error when running covert.php:

SQL ERROR:1136: Column count doesn't match value count at row 1

Any idea?

Do I have to start with a totaly empty database?

H2O
01-26-2006, 08:42 AM
I tried again a number of times emptied tables and re-crawled but got the same result.

Neticus
01-26-2006, 03:49 PM
Content visible to registered users only.

Previous to installing the dmoz mod you may have installed another mod that added further variables (rows) to your default PLD database. Check for mods that have told you to CREATE or ALTER tables through sql commands. Especially to your PLD_LINK or PLD_CONFIG tables.

Dmoz mod (the convert.php file) works on the premise that you have not changed the original values in PLD database.

Also see this thread;

{link to old forum url removed}

H2O
01-26-2006, 10:39 PM
Hi, thanks for the reply.

That would be the problem then. So if i was to do the crawl first then install other mods would that work?

Thanks

Neticus
01-26-2006, 10:52 PM
Content visible to registered users only.

Yes that should clear up the error you described, if you are only intending to do one crawl and thats it. However if you choose to attempt more crawls after the mods are re-installed, then it is expected that you will receive the same error again.

H2O
01-27-2006, 09:22 AM
I have the Rate_Links mod installed.

I have no idea how to make the changes to the convert.php like in the link above.

The changes by rate_links.mod are bellow:

#-----[ SQL ]------------------------------------------
#
CREATE TABLE `PLD_LINK_RATE` (
`ID` int( 11 ) NOT NULL AUTO_INCREMENT ,
`LINK_ID` int( 11 ) NOT NULL default '0',
`IPADDRESS` varchar( 15 ) NOT NULL default '',
PRIMARY KEY ( `ID` ) ,
KEY `LINK_ID` ( `LINK_ID` , `IPADDRESS` )
) ENGINE = InnoDB;
ALTER TABLE `PLD_LINK` ADD `RATE_TOTAL` INT( 11 ) DEFAULT '0' NOT NULL ,
ADD `RATE_COUNT` INT( 11 ) DEFAULT '0' NOT NULL ,
ADD `RATE` DECIMAL( 2, 2 ) DEFAULT '0' NOT NULL ,
ADD `RATE_ENABLED` TINYINT( 4 ) DEFAULT '1' NOT NULL;
INSERT INTO `PLD_CONFIG` (`ID`, `VALUE`) VALUES ('ENABLE_RATE', '1');
#
#-----[ COPY ]------------------------------------------

Neticus
01-27-2006, 05:23 PM
Content visible to registered users only.

Ok so the rates mod has told you to ALTER the table PLD_LINK to add in sequence from top to bottom '0' '0' '0' '1'

In phpmyadmin go to your PLD database select PLD_LINK then Click on the 'structure' icon. Now you will be able to see that within PLD_LINK structure there is a row named LINK_TYPE. Any other rows past this row are rows created by your rates mods.

Dmoz convert.php does not account for these rows. You need to adjust the section in convert.php where it says "INSERT INTO PLD_LINK VALUES...."

In convert.php the current sequence for INSERT INTO PLD_LINK VALUES is

Content visible to registered users only.

The last sequence is '0' . In PLD_LINK table see LINK_TYPE you will see '0' is allocated to this row. This is where the above code ends it processing. However since you have added more rows+variables via the rates mod you will have to adjust the above code by adding: '0' '0' '0' '1'
at the end of PLD_LINK VALUES, of course '0' '0' '0' '1' is what the rates, mod told you to create. Thus convert.php should be

Content visible to registered users only.

And that should, in effect, take care of the;

SQL ERROR:1136: Column count doesn't match value count at row 1

Should.... :D

H2O
01-27-2006, 10:35 PM
Content visible to registered users only.

I hope it works.

Thanks for your help Neticus

nate_king1
01-31-2006, 01:59 AM
It doesn't allow you to set where the files will be placed?
I did Dmoz.org/Arts for about three hours then, then i did convert and all the categories went on the top level directory?????????? when their was a mysite.com/art/ directory in place?

Now I tried to load the site and it just says Crawl Complete! Crawl complete! Now run covert.php and it did do a thing? Uhhh?

Fred
02-11-2006, 10:06 PM
Content visible to registered users only.

Sounds like a similar issue I had with the old mod, I got round this by deleting the previously crawled url from the Dmoz_TABLE as well as emptying the dmoz Category and Links TABLE in phpmyadmin before attempting another crawl. This is regardless of if the convert.php had worked or not.

- Neticus[/quote]
hmn i am a beginer plase help
i tired it to

i but in this in this in phpmy admin

Content visible to registered users only.

but when i am trying to run crawl.php
i just get to the index.php

dont now how to solw this
please help a n00b

webjunkie
02-12-2006, 03:17 PM
when I ran there was a samll bug ont he link to the convert.php I fixed it on on line 20
Content visible to registered users only.

should be
Content visible to registered users only.

dont worry I can spell either lol


thanks :)

Fred
02-12-2006, 06:35 PM
I got this mod working now Thanks... :))

sc0ttish
02-12-2006, 10:33 PM
Hi everyone, I'm looking for a bit of help.

I've just downloaded the directory script and the DMOZ mod, however i think i have a problem with the tables.

i've never set up tables before, so theres a good chance i've messed something up.

when i point my browser to crawl.php i get the following error:

Crawling: http://www.dmoz.org/Regional/Europe/United_Kingdom/Scotland/

no pages to add for crawling

no categories found

SQL ERROR:1054: Unknown column 'url' in 'where clause'


I used phpmyadmin to set up the table, but i've never used it before.

Could someone point me in the write direction, thanks
Peter

Fred
02-13-2006, 06:06 PM
Content visible to registered users only.
Spoke to early again *doh*
anyway
when a run covert.php
i get this
Content visible to registered users only.
And i can only have 197 catogeries when im adding another cat the all get micset
any ideas

Fred
02-18-2006, 04:27 PM
*PUSH*

Content visible to registered users only.
Spoke to early again *doh*
anyway
when a run covert.php
i get this
Content visible to registered users only.
And i can only have 197 catogeries when im adding another cat the all get micset
any ideas[/quote]

jgsketch
03-15-2006, 09:54 PM
Almost had this working. It crawled just fine, even put the links into the database. But it could not convert them over to the directory. Also got a quick warning about there not being a directory?? Do I have to have the same directory setup as Dmoz? So if I had a directory about online role playing games, my top directory would be dmoz --> Games --> Online --> Roleplaying. I couldn't just have Roleplaying? I'm a little confused. If I could just get the links that are in my database to convert over, I would be all set. Thanks.

David
03-15-2006, 10:13 PM
All I can say right now is we will be "playing" with it tomorrow. :)

gesugefu
03-18-2006, 08:49 AM
Hallo from germany

here the code for convert php thats correct works by me.

Content visible to registered users only.

i have installed any mods.

in my sql there are 4 tables behind the PLD_Link - Link_Type 0 0 0.00 1

Greeting Jürgen

jgsketch
03-18-2006, 09:09 PM
I tried the above code, but got a Parse error: syntax error, unexpected T_STRING. Unfortunatly I do not know enough about PHP to figure out where the mistake is in that code. Oh well. I 've started to enter links manually, should take me a couple of years, lol.

anon
03-19-2006, 02:12 AM
i can tell you that i tried this mod just to, well, 'try it', and it worked without flaw. it took several days to complete.

jgsketch
03-20-2006, 01:45 AM
I did a copy and paste, so I'm not sure why I get a syntext error. Has anyone else tried the above code?

jgsketch
03-20-2006, 08:51 PM
Well, after copying and pasting several times, I figured out that my cursor was not crabbing the very last characture. So I no longer get the syntex error. I am now getting a Column count doesn't match value count at row 1 error. So after looking at my table and code in covert file, they do not seem to match or even come close. I do not understand this since I have not installed any mods. Can some one look at my table shown below and let me know if it is way off? Does ,NOW(), equal the same as NULL?


Content visible to registered users only.
Content visible to registered users only.

Should it look like this
Content visible to registered users only.

Thanks

indonaziaopen
04-20-2006, 06:30 AM
I need a design for my website, who can do it?

schlapp
05-06-2006, 08:23 AM
I have following problem:

SQL ERROR:1062: Duplicate entry '5104' for key 1

Have anyone a advise ?

can we use the crawler for the yahoo directory or others ?

Regards,
Ralf

schlapp
05-06-2006, 10:15 AM
wrong theard

gesugefu
08-24-2006, 11:57 AM
has anyone the covert.php for vers. 3.0.5 ??????

Greets Jürgen

proprod
08-24-2006, 07:25 PM
Content visible to registered users only.

This is the only code for covert.php, that gives my anything different for results, except with this one, I get nothing, page loads, that's it.

Should I post a screenshot with my table layout from phpmyadmin? If I can get this working, I'm willing to donate, lol... PLEASE!!!

proprod
08-24-2006, 07:41 PM
Update, this is a screenshot of my links database...

There are links in the created 'links' and there are categories in the created 'categories' sections... now to get those into my Approve Links or get them into my directory somewhere? lol, sorry am sql-illiterate.

{non working image removed}

Please help, hehehe

hani
08-30-2006, 09:16 PM
Hello,

I use PLD 3.05 and get this error after I click on "Crawl complete! Now run covert.php":

SQL ERROR:1136: Column count doesn't match value count at row 1

I have a clean install without any mods.

Someone can help me?

Best regards,
hani

hani
09-21-2006, 11:31 AM
Hello,

anyone can help?
I would pay for a solution.

Best regards,
hani

Jeff
10-20-2006, 06:22 AM
Getting the same thing. Have a virgin install of 3.06. Crawl works. But when I try to run convert, I get:

SQL ERROR:1136: Column count doesn't match value count at row 1

David
10-20-2006, 06:28 AM
Probably this import was created for a different phpLD version when there were less fields in this table. I think the help of a pro is needed to get this one working.

hani
10-25-2006, 01:17 AM
Hello,

someone have a solution for this?
I would pay for.

thanks!

erikveldman
10-25-2006, 10:45 AM
Hi,

With PHPLD 3.06 I have the categories download working on my testserver (not yet installed a directory online).
I have still problems with crawling and converting the links, can somebody help.

It looks like you have to delete the dmoz table and install on the first 2 rows a crawled (1) and uncrawled (0) record to the subdirectory you like to crawl.
Then delete the content of the dmoz table completely and then only build one record with only uncrawled (0).

My code is the following, may be there are still errors in, please help to get it working.

For Category old (original dmow import mod)and new:


// old ID, TITLE, TITLE_URL, DESCRIPTION, PARENT_ID, STATUS, DATE_ADDED, HITS, SYMBOLIC, SYMBOLIC_ID
// new 3.06 table PLD_CATEGORY: ID, TITLE, CACHE_TITLE, TITLE_URL, CACHE_URL, DESCRIPTION, PARENT_ID, STATUS, DATE_ADDED, HITS, SYMBOLIC, SYMBOLIC_ID, META_KEYWORDS, META_DECRIPTION
// Old $query = "INSERT INTO PLD_CATEGORY VALUES(NULL,'".$dmoz->ReplaceChr($cat[$i])."','".$cat[$i]."','','".$parent_id."','2',NOW(),'0','0','0')";

//New updated 3.06
$query = "INSERT INTO PLD_CATEGORY VALUES(NULL,'".$dmoz->ReplaceChr($cat[$i])."',NULL,'".$cat[$i]."',NULL,'','".$parent_id."','2',NOW(),'0','0','0','".$cat[$i]."','".$dmoz->ReplaceChr($cat[$i])."')";


And Links old (original dmow import mod)and new:


// oud ID, TITLE, DESCRIPTION, URL, CATEGORY_ID, RECPR_URL, RECPR_REQUIRED, STATUS, VALID, RECPR_VALID, OWNER_NAME, OWNER_EMAIL, OWNER_NOTIF, IPADDRESS, DATE_MODIFIED, DATE_ADDED, HITS, LAST_CHECKED, RECPR_LAST_CHECKED, PAGERANK, RECPR_PAGERANK, FEATURED, EXPIRY_DATE, NOFOLLOW, RECPR_ID, PAYED, LINK_TYPE
// nieuw ID, TITLE, DESCRIPTION, URL, CATEGORY_ID, RECPR_URL, RECPR_REQUIRED, STATUS, VALID, RECPR_VALID, OWNER_ID, OWNER_NAME, OWNER_EMAIL, OWNER_NOTIF, DATE_MODIFIED, DATE_ADDED, HITS, LAST_CHECKED, RECPR_LAST_CHECKED, PAGERANK, RECPR_PAGERANK, FEATURED_MAIN,FEATURED, EXPIRY_DATE, NOFOLLOW, RECPR_ID, PAYED, LINK_TYPE, IPADDRESS, DOMAIN, OTHER_INFO, META_KEYWORDS, META_DESCRIPTION
// old $query = "INSERT INTO PLD_LINK VALUES(NULL,'".$sql->sql_real_escape_string($link['title'])."','".$sql->sql_real_escape_string($link['description'])."','".$sql->sql_real_escape_string($link['url'])."','".$phpld_catid."','','','2','2','','','','','127.0.0.1',NOW(),NOW( ),'0',NOW(),NOW(),'-1','-1','',NULL,'',NULL,'-1','0')";

//New updated 3.06
$query = "INSERT INTO PLD_LINK VALUES(NULL,'".$sql->sql_real_escape_string($link['title'])."','".$sql->sql_real_escape_string($link['description'])."','".$sql->sql_real_escape_string($link['url'])."','".$phpld_catid."','','','2','2',NULL,'','','','',NOW(),NOW(),'0',N OW(),NOW(),'-1','-1','',NULL,'',NULL,'-1','0','212.127.200.17','212-127-200-17.cable.quicknet.nl','','".$sql->sql_real_escape_string($link['title'])."','".$sql->sql_real_escape_string($link['description'])."')";

Thanks in advance!
erik

royden
10-26-2006, 01:45 AM
Content visible to registered users only.
I also had a virgin install of 3.06. And the Crawl worked, So I had required db tables dmoz, links & category populated by the crawl (which took close to 2 hours to run to fetch approx. 6500 links).
But I got the same error above running covert.php

But I got it to run by doing the following; (note in my case the PLD tables were empty and the standard version 3.06 - no mods)
Before I get to the code; you will notice the code is rather long. This is deliberate as I found it much easier to follow and read. The computer doesn't need it spelt out in such an extended version :-) But this way you can assign your own default values easily and ensure the correct number and order of required fields. You can easily change to a 1 liner if that's your style.

This was in DEV only not production. Don't forget to backup your tables before doing any updates. And don't forget to backup your tables before doing any updates.

In covert.php (spelling error?? anyway) - replace the 2 sql queries in function CheckCategory();
Under comment '// Category isn't found, we need to add it!'
1. $query = "INSERT INTO PLD_CATEGORY VALUES(NULL,'".$dmoz->ReplaceChr($cat[$i])."','".$cat[$i]."','','".$parent_id."','2',NOW(),'0','0','0')";
Replace with;

$catID = "NULL"; //`ID` int(11) NOT NULL auto_increment,
$catTitle = $dmoz->ReplaceChr($cat[$i]); //`TITLE` varchar(255) collate latin1_general_ci NOT NULL default ''
$catCacheTitle = $catTitle; //`CACHE_TITLE` text collate latin1_general_ci,
$catTitleURL = $cat[$i]; //`TITLE_URL` varchar(255) collate latin1_general_ci default NULL,
$catCacheURL = "index.php?c=". $cid ; //`CACHE_URL` text collate latin1_general_ci,
$catDescription = $catTitleURL; //`DESCRIPTION` longtext collate latin1_general_ci,
$catParentID = $parent_id; //`PARENT_ID` int(11) NOT NULL default '0',
$catStatus = "2"; //`STATUS` int(11) NOT NULL default '1',
$catDateAdded = "NOW()"; //`DATE_ADDED` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
$catHits = "0"; //`HITS` int(11) NOT NULL default '0',
$catSymbolic = "0"; //`SYMBOLIC` int(11) NOT NULL default '0',
$catSymbolicID = "0"; //`SYMBOLIC_ID` int(11) NOT NULL default '0',
$catMetaKeywords = ""; //`META_KEYWORDS` text collate latin1_general_ci,
$catMetaDescription = ""; //`META_DESCRIPTION` text collate latin1_general_ci,


$query = "INSERT INTO `pld_category` (`ID`, `TITLE`, `CACHE_TITLE`, `TITLE_URL`, `CACHE_URL`, `DESCRIPTION`, `PARENT_ID`, `STATUS`, `DATE_ADDED`, `HITS`, `SYMBOLIC`, `SYMBOLIC_ID`, `META_KEYWORDS`, `META_DESCRIPTION`) ";
$query .= "VALUES ($catID
, '". $catTitle ."'
, '". $catCacheTitle ."'
, '". $catTitleURL ."'
, '". $catCacheURL ."'
, '". $catDescription ."'
, '". $catParentID ."'
, $catStatus
, $catDateAdded
, $catHits
, $catSymbolic
, $catSymbolicID
, '". $catMetaKeywords ."'
, '". $catMetaDescription . "'
)";


2. $query = "INSERT INTO PLD_LINK VALUES(NULL,'".$sql->sql_real_escape_string($link['title'])."','".$sql->sql_real_escape_string($link['description'])."','".$sql->sql_real_escape_string($link['url'])."','".$phpld_catid."','','','2','2','','','','','127.0.0.1',NOW(),NOW( ),'0',NOW(),NOW(),'-1','-1','',NULL,'',NULL,'-1','0')";
Replace with;
$linkID = "NULL"; // `ID` int(11) NOT NULL auto_increment,
$linkTitle = $sql->sql_real_escape_string($link['title']); // `TITLE` varchar(255) collate latin1_general_ci NOT NULL default '',
$linkDescription = $sql->sql_real_escape_string($link['description']); // `DESCRIPTION` longtext collate latin1_general_ci,
$linkURL = $sql->sql_real_escape_string($link['url']); // `URL` varchar(255) collate latin1_general_ci NOT NULL default '',
$linkCatID = $phpld_catid; // `CATEGORY_ID` int(11) NOT NULL default '0',
$linkRecprURL = "NULL"; // `RECPR_URL` varchar(255) collate latin1_general_ci default NULL,
$linkRecprReq = "0"; // `RECPR_REQUIRED` tinyint(4) NOT NULL default '0',
$linkStatus = "2"; // `STATUS` int(11) NOT NULL default '0',
$linkValid = "0"; // `VALID` tinyint(4) NOT NULL default '0',
$linkRecprValid = "0"; // `RECPR_VALID` tinyint(4) NOT NULL default '0',
$linkOwnerID = "NULL"; // `OWNER_ID` int(11) default NULL,
$linkOwnerName = "NULL"; // `OWNER_NAME` varchar(255) collate latin1_general_ci default NULL,
$linkOwnerEMail = "NULL"; // `OWNER_EMAIL` varchar(255) collate latin1_general_ci default NULL,
$linkOwnerNotif = "0"; // `OWNER_NOTIF` int(11) NOT NULL default '0',
$linkDateModified = "NOW()"; // `DATE_MODIFIED` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
$linkDateAdded = "NOW()"; // `DATE_ADDED` timestamp NOT NULL default '0000-00-00 00:00:00',
$linkHits = "0"; // `HITS` int(11) NOT NULL default '0',
$linkLastChecked = "NULL"; // `LAST_CHECKED` datetime default NULL,
$linkRecprLastChk = "NULL"; // `RECPR_LAST_CHECKED` datetime default NULL,
$linkPageRank = "-1"; // `PAGERANK` int(11) NOT NULL default '-1',
$linkRecprPageRank = "-1"; // `RECPR_PAGERANK` int(11) NOT NULL default '-1',
$linkFeaturedMain = "0"; // `FEATURED_MAIN` int(11) NOT NULL default '0',
$linkFeatured = "0"; // `FEATURED` int(11) NOT NULL default '0',
$linkExpiryDate = "NULL"; // `EXPIRY_DATE` datetime default NULL,
$linkNoFollow = "0"; // `NOFOLLOW` tinyint(4) NOT NULL default '0',
$linkRecprID = "NULL"; // `RECPR_ID` varchar(6) collate latin1_general_ci default NULL,
$linkPayed = "-1"; // `PAYED` int(11) NOT NULL default '-1',
$linkLinkType = "0"; // `LINK_TYPE` int(11) NOT NULL default '0',
$linkIPAddress = "NULL"; // `IPADDRESS` varchar(15) collate latin1_general_ci default NULL,
$linkDomain = "NULL"; // `DOMAIN` varchar(250) collate latin1_general_ci default NULL,
$linkOtherInfo = ""; // `OTHER_INFO` text collate latin1_general_ci,
$linkMetaKeywords = ""; // `META_KEYWORDS` text collate latin1_general_ci,
$linkMetaDescription = ""; // `META_DESCRIPTION` text collate latin1_general_ci,
$linkRecprExpired = "0"; // `RECPR_EXPIRED` tinyint(4) NOT NULL default '0',

$query = "INSERT INTO `pld_link` (`ID`, `TITLE`, `DESCRIPTION`, `URL`, `CATEGORY_ID`, `RECPR_URL`, `RECPR_REQUIRED`, `STATUS`, `VALID`, `RECPR_VALID`, `OWNER_ID`, `OWNER_NAME`, `OWNER_EMAIL`, `OWNER_NOTIF`, `DATE_MODIFIED`, `DATE_ADDED`, `HITS`, `LAST_CHECKED`, `RECPR_LAST_CHECKED`, `PAGERANK`, `RECPR_PAGERANK`, `FEATURED_MAIN`, `FEATURED`, `EXPIRY_DATE`, `NOFOLLOW`, `RECPR_ID`, `PAYED`, `LINK_TYPE`, `IPADDRESS`, `DOMAIN`, `OTHER_INFO`, `META_KEYWORDS`, `META_DESCRIPTION`, `RECPR_EXPIRED`) ";
$query .= "VALUES ($linkID
, '". $linkTitle . "'
, '". $linkDescription . "'
, '". $linkURL . "'
, '". $linkCatID . "'
, $linkRecprURL
, $linkRecprReq
, $linkStatus
, $linkValid
, $linkRecprValid
, $linkOwnerID
, $linkOwnerName
, $linkOwnerEMail
, $linkOwnerNotif
, '" . $linkDateModified . "'
, '" . $linkDateAdded . "'
, $linkHits
, $linkLastChecked
, $linkRecprLastChk
, $linkPageRank
, $linkRecprPageRank
, $linkFeaturedMain
, $linkFeatured
, $linkExpiryDate
, $linkNoFollow
, $linkRecprID
, $linkPayed
, $linkLinkType
, $linkIPAddress
, $linkDomain
, '" . $linkOtherInfo . "'
, '" . $linkMetaKeywords . "'
, '" . $linkMetaDescription . "'
, $linkRecprExpired
);";

If you want additional information from dmoz that version 3.06 tables can handle you will need to update the crawl.php as well as covert.php, I don't and I'm not going to update it.
I also installed mod for Page Rank afterwards and that took a while to run also but with no trouble, so now I have an excellent starting database. :-)
Still in dev. Cheers.

shutzu
12-15-2006, 04:51 PM
can I stop it while crawling and then open the crawling page back? will it continue from where it was when I closed the browser? gosh.. hope so

urbanv
12-20-2006, 02:32 AM
Can anybody please update this script or post the code to get it working, Im using v3.1.0 I get the same error running as most people: SQL ERROR:1136: Column count doesn't match value count at row 1

Having a look at what this inserts the crawl adds categorys as such:
Content visible to registered users only.When creating a test category via admin the table looks like this:
Content visible to registered users only.Links crated by the crawl php:
Content visible to registered users only.When creating a test link in the admin panel the output sql looks like this:
Content visible to registered users only.I would be willing to donate to anyone that can get this working.

Regards

uv

braydond
09-11-2007, 06:24 AM
What's the purpose of importing links from dmoz into a new directory? Doesn't this defy the fact of making money and having users pay for link submissions? Why would anyone submit their links if it's already in your dir?

sjharvey
09-11-2007, 06:57 PM
Content visible to registered users only.I'm guessing that by having (what appears to be) a 'busy directory', it will attract more search engine attention which will, in turn, bring those that aren't listed in DMOZ to their site and (perhaps) pay for submission.

IMO - It is used by those that are too lazy to go out and find their own sites to link to; to get their directories started.

floppy
09-11-2007, 08:52 PM
Content visible to registered users only.

I agree with you. However I don't believe in just giving away free links. I gave away a few just for testing purposes and a few to friends. If I have to wait 20 years for people to decide to link in my directory thats ok with me. I am hoping maybe after a page rank update or two the links will start to roll. Time will tell I guess.

However, it seems the more links you have the more people submit. Thats just the facts.

braydond
09-12-2007, 05:59 AM
Thanks! That's what i figured. I'm glad i found the category dump. That's good enoug for me. I'm building my site on a pr3 domain that's a few years old and it'll be pr5 in no time. Just started it yesterday.

zdanovicz
10-03-2007, 01:43 AM
Plz could anyone of you PM me the Mod?

arnonel
10-17-2007, 07:14 AM
Please could someone PM me the category mod. i want to import everything from dmoz under a certain category.

thanks