Log in

View Full Version : Removing unnecessary dots in a URL with preg_replace



qwikad.com
10-05-2013, 02:39 AM
Let's say somebody post URLs that look like this:


http://something.com.
http://something.org.
http://something.info.

or like this:

http://something.com...
http://something.org...
http://something.info...

or like this:

...http://something.com
...http://something.org
...http://something.info

Is there a way to clean up URLs from any dots that are not supposed to be there?

Any input will be appreciated.

Deadweight
10-05-2013, 03:15 AM
php code:

trim('...http://example.com...','.');

qwikad.com
10-05-2013, 03:21 AM
php code:

trim('...http://example.com...','.');

But how do you translate this to ALL URLs?

I get the http://example.com part, but I want this to work for any URL. Is there a way?

Deadweight
10-05-2013, 03:26 AM
It would depend. What code are you using to locate which string is a URL?

djr33
10-05-2013, 04:54 AM
Separate the problem into logical steps, and ask about one at a time.

Steps (I'm guessing):
1. Get input.
2. Locate URLs in input.
3. With an individual URL, remove the extra content.
4. Store/display/send output.


The post above solves (3). What are your plans for (2)? Have you already dealt with (1) and (4)? Please give us the information we'd need to help you.


Additionally, remember that anything, even this relatively basic task, involving the processing of natural language text can be very difficult. You can approximate good results, but it may never be perfect. Your desires are phrased vaguely, which will only make the problem worse. If you simply want to remove the dots, that's much easier than removing any potential other text that might be there. (What about colons, commas, etc.? And parentheses?) Often a real problem of Step (3) above will be predicting what you want to do in terms of how you'd edit the text.

qwikad.com
10-05-2013, 12:47 PM
I got it figured out. I found something online (spent 2 hours this morning looking for it).


Sorry I couldn't explain better what I needed when I started the thread.

keyboard
10-05-2013, 10:26 PM
If you've found a solution, please feel free to share it so other people with the same question can read about it and find the answer!

qwikad.com
10-06-2013, 05:50 PM
Here's what I am using:



$text = preg_replace( '/(\b)(http|ftp|https|mailto)([\:\/\/])([A-z0-9~!@$%&*()_+:?,.\/;=#-]{2,}[A-z0-9~@$%&*_+\/=])/', " $2$3$4 ", $text);


I decided to put a space on each end of a link, instead of trying to strip all/any characters that may come in contact with it (which in my case used to interfere with a link's accuracy).

It's not perfect, it won't work with domain names ending in .com.de or .org.uk, etc. but other than that I am pretty happy with it.


Thanks!