Log in

View Full Version : script for specific url extraction from a url list



buzzbuilder
07-23-2008, 05:54 PM
I've been looking for a script that will take my url list and extract a certain type of url from the list and put it into another list.

Let's say that I had a list of url's that I would put in notepad or MS Excel like this:


dogtaining.com/dogfood/order
beagleobedience.com/beagle.care/products.html
dogtraining.com
catfancy.com/reviewcatbook/bestcatbook.html
kittenbookreview.net
cattoysforkittens.info/whatdocatslike/toy.html


This script should take all original domains with no subdomains out of this list and put them into another notepad/excel document. So in the case above,

dogtraining.com
kittenbookreview.net

would be removed from the original list and then put into a new list.

This is a very simple script but I'm having problems locating anything like it. Does anyone know where I can find this type of script? Using windows.

thanks

Medyman
07-23-2008, 10:09 PM
You could accomplish that using some simple RegEx in either PHP or ASP. I'm horribly inept with RegEx, but it might help you in finding a script that suits you. Or, just post in the proper forum regarding this, I'm sure someone can whip something up fairly quickly.

benslayton
07-25-2008, 08:15 AM
I've been looking for a script that will take my url list and extract a certain type of url from the list and put it into another list.

Let's say that I had a list of url's that I would put in notepad or MS Excel like this:


dogtaining.com/dogfood/order
beagleobedience.com/beagle.care/products.html
dogtraining.com
catfancy.com/reviewcatbook/bestcatbook.html
kittenbookreview.net
cattoysforkittens.info/whatdocatslike/toy.html


This script should take all original domains with no subdomains out of this list and put them into another notepad/excel document. So in the case above,

dogtraining.com
kittenbookreview.net

would be removed from the original list and then put into a new list.

This is a very simple script but I'm having problems locating anything like it. Does anyone know where I can find this type of script? Using windows.

thanks

Try this...From what I understand this is what you want man...


Oh just copy and paste the results from your browser and save it as a .txt and you got what you need!!

<?php

$fp = fopen("domains.txt", "r");
$temp = fread ($fp, filesize ("domains.txt"));
fclose ($fp);
$domainList = explode ("\n", $temp);


foreach ($domainList as $url)
{
$domain = explode('/', $url);
echo $domain[0] . "<br />\n";

}

?>

Oh and if you want it to strip out subdomains too let me know.. so it would take mail.google.com and display google.com