PDA

View Full Version : Resolved PCRE syntax as it relates to wwws and https



james438
09-26-2008, 03:25 PM
Hi,

I found a script that uses wwws and https as part of a pattern match, but https and wwws is not listed anywhere in the list of terms available for use in the syntax page listed at php.net or pcre.org. What other terms are useable in pcre for pattern matching that is not listed on the syntax page? How is https and wwws defined and used? Why is https used as opposed to http? From what I can see the pcre engine should not be able to recognize any urls by using https, but it does.

the following is an example, not that it is really needed.

$text=preg_replace('/(https?:\\/\\/[-_.\\/\w\d!&%#?+\\,\\\\\'=:;@~]+)/i', '<a href="$1">$1</a>', $text);

Twey
09-26-2008, 07:22 PM
It does, because it's got 'https?' -- that is to say, the string 'http' followed optionally by the character 's'.

HTTPS is 'secure HTTP' -- basically HTTP over a SSL (Secure Sockets Layer) connection, which provides transparent encryption.

There's no such thing as WWWS, unless you mean the American 'rhythmic oldies' radio station.

james438
09-26-2008, 10:17 PM
There's no such thing as WWWS, unless you mean the American 'rhythmic oldies' radio station.

hehe, no I didn't mean that :p

The wwws must be an error in my pcre command. I didn't know that the ? could be used within single quotes like that. I have always used it only as a quantifier as located outside of a character class or subpattern or in conjunction with a few other things like .*? or ?: etc. It looks like if I wanted to have more than one character optional I would enclose it in round brackets like "htt(ps)?".

Thanks for pointing me back in the right direction :)

Twey
09-27-2008, 08:12 AM
If you want the grouping effect of the brackets without actually capturing, you can write htt(?:ps)?, although this is purely a convenience and offers no performance benefit.