PDA

View Full Version : Request: New Mod that crawls DMOZ and creates Categories...


nate_king1
01-30-2006, 07:04 AM
I'm requesting a mod that can help with SEO and indexing by pulling dmoz's or any big directory's categories(without content, links) to help with the start.

example... for dmoz.org/shopping is cached 100,000 pages...

If you are interestead in making something like this, I'm willing to donate to that person. Or you you want to go in with a auction like scriptlance.com that would be okay also.

Let me know...

nate_king1
01-31-2006, 04:09 AM
bump

VSDan
01-31-2006, 04:47 AM
Not too difficult for DMOZ as they provide the structure.rdf.u8.gz file. What did you want exactly? Script to retrieve the structure.rdf.u8.gz file, extract the categories / subcategories, and then add to the phpLD database?

nate_king1
01-31-2006, 06:54 PM
Content visible to registered users only.

Exactly!

VSDan
01-31-2006, 07:16 PM
I have this mod nearly completed. Okay, some questions:

1. You only want to import dmoz categories / subcategories, and not links (at least at this time)?

2. Did you want the imported dmoz categories / categories in their own phpLD root category (e.g., top > dmoz)?

3. What about language-specific special characters? Ignore or convert to something like underscores? Or, do not include categories / subcategories that include special characters? Not many, and mostly in the 'World' dmoz category.

Some things about the script:

1. CGI-Perl script that must be run from command line shell (using SSH) - for example, perl dmoz.cgi. Otherwise, browser and/or web server will time out - script can take up to a few hours to execute. Do you have shell access to your server, which you can run processes (in this case, CGI-Peerl script from command line)?

2. You need at least around 700 MB available disk space to run the script. space will be returned when the script completes execution. Do you have the disk space?

3. Script downloads categories file from dmoz, gunzips, extracts categories, and writes them to sql database. Checks for, and ignores duplicates. Uses wget. Do you have wget (or xget) available on your server?

4. Script draws a lot of cpu processing, and can slow a server down considerably. Only recommended if you run a dedicated server. Are you located on a shared or dedicated server? Most if not all web hosts will allow such a process to run on shared servers. May be good idea to run the script during low traffic period like late night. However, you should onky have to run once, as dmoz rarely changes their directory structure - at least signficantly. But may take a few runs to squash out any bugs - I have successfully tested up to but not including the point of updating the sql database - populating with new dmoz categories / subcategories.

You may want to address these questions to your web host if you are not absolutely sure.

Club2Share.com
02-02-2006, 02:50 AM
I may use this Script later for anther projact but i have give my small brother to add Dmoz Cat :D almost 1000+ :lol:

Club2share directory

@VSDan

i Run My Own Server and ofcource i have all that space and more .. as well as the other stuff can be easy done ..

Thankyou :)