Log in

View Full Version : Want to change a url



itivae
02-16-2013, 05:55 AM
Hi I have a question.

I have a site that has an indexing searchengine on it (searchblox). There is a sub dir that is mapped to an actual domain name but it it a dir on another domain. Basically making http://www.myrealdomain/something/anotherdir/thisdir/index.shtml into http://www.mymappeddomain.com/index.shtml. Can someone point me in the correct direction to deal with something like this? The index will return the first full path url (by my understanding) anyway to make it go to the second url instead? Is preg_match or preg_replace (or a combo of both a valid solution). Thanks in advance for any help.

james438
02-16-2013, 06:40 AM
As a general practice it is best to avoid PCRE unless you need it because it is rather processor heavy. Perl, which is what PCRE is modeled after, is far less so.

If you just want to change the following:

http://www.myrealdomain/something/anotherdir/thisdir/index.shtml

to

http://www.myrealdomain/index.shtml

and something/anotherdir/thisdir/

is always the same maybe str_replace() would be a better choice in this situation?

I should add that I am not familiar with searchblox.

EDIT: you seem to be missing your top level domain. For example example.com where .com is the top level domain.

itivae
02-16-2013, 02:38 PM
Hi James,

Thank you for the suggestions. The missing .com is a typo. Basically Searchblox is an indexing, search software. It pulls a


<title>
<url>
<description>

Tags from the indexed content. These tags are generated for each returned object from a given query. My question (forgive me if I am not explaining it correctly) can I turn the returned url (which would be
http://www.myrealdomain.com/something/anotherdir/thisdir/index.shtml) into
http://www.mymappeddomain.com/index.shtml and have the urls linkable?

Also the url string isnt always the same i.e.


http://www.myrealdomain.com/something/anotherdir/thisdir/index.shtml is just one dir that needs indexed. There are several dir (all in the same dir) For example

http://www.myrealdomain.com/something/anotherdir/thisdir1/index.shtml

http://www.myrealdomain.com/something/anotherdir/thisdir2/index.shtml

Since the beginning of the url is always the same is str_replace() still a viable option?

Thanks

james438
02-16-2013, 04:38 PM
If I understand you correctly http://www.myrealdomain.com is always the same and index.shtml is almost always different, correct?

Here is the PCRE that would replace the middle directories that you said you don't want:


<?php
$test="http://www.myrealdomain.com/something/anotherdir/thisdir/index.shtml";
$test=preg_replace('/\.com\/.*\//','.com/',$test);
echo "$test";
?>

what would an example link look like as displayed to the user?

itivae
02-16-2013, 04:54 PM
The url that the user would see would look like this:


http://www.mymappeddomain.com/dir1/somename.shtml

or


http://www.mymappeddomain.com/dir2/somename.shtml

etc.

I guess my question is more based on getting the data to look like it is coming from the


http://www.mymappeddomain.com

instead of


http://www.myrealdomain.com

Where it is actually located. I hope that makes sense.

james438
02-16-2013, 06:11 PM
This is not what you asked for earlier. To avoid confusion please give a real example of the preformatted data and what you want it to look like afterwards. It should be data you are working with instead of the hypothetical ones we've been using. Please show the link that you want users to see complete with the anchor tags like the following:


<a href="http://www.mysite.com/text.txt">http://www.mysite.com/text.txt</a>

preferably using the code tags so that it is formatted like above.

itivae
02-16-2013, 08:10 PM
<?php


if (!isset($_GET['qry'])) {

echo '<ul class="listings-results search">';
echo "<li><h4>We’re sorry, search is currently undergoing maintenance.</h4></li></ul>"; //error check
}

$qry = urlencode($_GET['qry']);
$url = "http://mysearchserver.com/searchblox/servlet/SearchServlet?cname=inchicago&fe=utf-8&st=adv&q_phr=&q_low=&q_not=&oc=all&pagesize=100&q_all={$qry}";

$ch = curl_init($url);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

$xml_string = curl_exec($ch);

curl_close($ch);

$xml = new SimpleXMLElement($xml_string);

if ($xml != ''){
echo '<ul class="listings-results search">';
}
foreach($xml->results->result as $the) {

echo '<li class="listing-results">'.'<a class="list-title" href="'.$the->url.'">'.'<h4>'.$the->title.'</h4>'.'</a>'.'<span class="desc">'.$the->description.'</span>'.'</li>'.'</ul>'; //output results

} ?>


When this is indexed the xml takes on the title, url, and description of each page. However, since the files reside on http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml and the mapped drive for http://www.inchicago.com is the "thiscity" dir making this

The Real indexed URL

http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml

The one that needs to be returned to the client


http://www.inchicago.com/0/apage.shtml


Hopefully that will make my question clearer.

Thanks

james438
02-16-2013, 08:53 PM
Much clearer thanks. I am playing around with it to see what I can come up with. preg_match seems to be the way to go with this one.

james438
02-16-2013, 09:39 PM
I'm betting this pattern could be simplified, but here is what I came up with:


<?php
$test="http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml";
$test=preg_match('/\/(?!\/)[^\/]*?\/(?!(\/|.*?\/)).*/',$test, $extract);
echo "http://www.inchicago.com$extract[0]";
?>

I'm a bit out of practice with my PCRE knowledge.

itivae
02-17-2013, 12:00 PM
Thank you for the information. I will let you know how it goes.

james438
02-17-2013, 02:07 PM
The following does the same thing as the code I posted earlier, but has been simplified somewhat.


<?php
$test="http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml";
preg_match('/\/{1}[^\/]*?\/{1}(?!.*?\/).*/',$test, $extract);
echo "http://www.inchicago.com$extract[0]";
?>

EDIT: and simplified a little further:


<?php
$test="http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml";
preg_match('/\/[^\/]*?\/(?!.*?\/).*/',$test, $extract);
echo "http://www.inchicago.com$extract[0]";
?>

itivae
02-20-2013, 03:38 PM
I am having a bit of trouble with this. I am a beginner with regular expressions, so please bear with me. What I am trying to do is match

http://www.mytravelsite.com/states/cities/thiscity/

to

http://www.inchicago.com/

so that when I pass the url through my object


<a href="http://www.mytravelsite.com/states/cities/thiscity/dir0-6/whateverpageisindexed.shtml">title</a>

becomes a clickable link to thats file location that looks like


<a href="http://www.inchicago.com/dir0-6[whicheverisreturned]/whateverpageisindexed.shtml">title</a>

The code
$test="http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml";
preg_match('/\/[^\/]*?\/(?!.*?\/).*/',$test, $extract);
echo "http://www.inchicago.com$extract[0]";

returns


http://www.inchicago.com/0/apage.shtml

but the numbered dir may fluctuate as well as the returned page. Hopefully that makes sense. How can I make


http://www.mytravelsite.com/states/cities/thiscity/

resolve to


http://www.inchicago.com/

but leave


/0/apage.shtml alone so that it will be flexible?

fastsol1
02-20-2013, 05:48 PM
Will the grabbed url always be this in the front of the url?- http://www.mytravelsite.com/states/cities/thiscity/
If that is a static part of the url then you could use a str_replace() on that part and set it to nothing so you are left with the last part of the url.


$url = "http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml";
$new_url = str_replace("http://www.mytravelsite.com/states/cities/thiscity/", "", $url);
echo $new_url;

itivae
02-20-2013, 05:54 PM
So to make it

http://www.inchicago.com/dir0-6/somepage.shtml

Would I do this?


$url = "http://www.mytravelsite.com/states/cities/thiscity/";
$new_url = str_replace("http://www.mytravelsite.com/states/cities/thiscity/", "http://www.inchicago.com/", $url);
$endurl = $passedurldata;
echo $new_url.$endurl;

fastsol1
02-20-2013, 06:00 PM
Looks like that should work.

james438
02-20-2013, 08:56 PM
<?php
$test="http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml";
preg_match('/\/{1}[^\/]*?\/{1}(?!.*?\/).*[^\/]$/',$test, $extract);
if ($extract[0]=="") $prefix="http://www.inchicago.com/";
else $prefix="http://www.inchicago.com";
echo "$prefix$extract[0]";
?>

I modified it just a little to account for the possibility that no file name is listed. This will now match


http://www.mytravelsite.com/states/cities/thiscity/0/apage.shtml
http://www.mytravelsite.com/states/cities/thiscity/
http://www.mytravelsite.com/states/cities/thiscity/27/apage.shtml
http://www.anysite.com/stage/theater/0who/apage.shtml

I do not think I fully understand what you are asking for.

How does my earlier example not give you what you are looking for?

itivae
02-20-2013, 09:04 PM
Hi James,

Basically there are several hundred indexed items. Since the beginning of the url


http://www.mytravelsite.com/states/cities/thiscity/

needs to be equated to


http://www.inchicago.com/ always.

but the directories 0-6 come after and have several hundred .shtml pages in each.

So I really need to match those urls but leave the last two /dirs/ i.e.(/0/apage.shtml/ might be /1/apage23.shtml) as however they are served back from the query.

Hopefully that is clearer.

james438
02-20-2013, 09:36 PM
In that case it looks like what fastsol1 is suggesting would be better. I hope it helps.