Log in

View Full Version : line terminators. Yes, PCRE again.



james438
08-04-2007, 05:33 AM
I seem to post a lot of PCRE questions lately. Well, this time I mostly want to know a few of the very basics.

1. Do you call PHP's PCRE function PCRE or regular expressions or regexp or PCRE. I know there are differences between PCRE and Perl so I have been calling it PCRE. What term should I be using?

2. Is \r\n known as a line terminator? What are they used for? I seem to only get them to display when in notepad mode for editing a script. I figured it would be used for more than that. In fact I used PCRE to create a program that would display code just like forum programs do to display code including the typed in line terminators as well. Or at least similar to how forums do it. By the way, I have not found any built in function among the very many available with PHP that will display \r\n located in a string.

3. Is it possible to use PCRE to match everything that does not match a set like anything that is not aeiou or sometimes y or w? I have seen it for things like... actually

$string = preg_replace('/[\d\s\Waeiouyw]/', '', $string);
will remove anything that is not a letter and will even get rid of letters aeiouyw as well with:

\s = whitespace like a space or tab or line terminator.
\d = any decimal digit.
\W = any non word character.

sorry, that last one was supposed to be my main question :rolleyes: I know that isn't the best way to answer #3, but it is a start.

4. is there a way to replace a space with a space? For example

$string = preg_replace('/[\040]/', '/\040/', $string);
preg_replace('/((\040){2,2})/',"  ",$string); works, but
preg_replace('/((\040){2,2})/'," ",$string]); does not.

fear not, I'll probably up and buy a book on PCRE patterns soon the way I am reading up on them...

Twey
08-04-2007, 06:30 AM
1. Do you call PHP's PCRE function PCRE or regular expressions or regexp or PCRE. I know there are differences between PCRE and Perl so I have been calling it PCRE. What term should I be using?Because PHP has non-Perl-compatible regular expressions as well (ereg() and friends), most PHP developers refer to them as PCRE to avoid confusion.
2. Is \r\n known as a line terminator?It's a Windows line terminator, consisting of a carriage return and a line feed, thus sometimes abbreviated CRLF. UNIX derivatives use a single line feed character.
What are they used for?Er... terminating lines? :p
I seem to only get them to display when in notepad mode for editing a script.HTML "ignores" line-breaks.
I figured it would be used for more than that. In fact I used PCRE to create a program that would display code just like forum programs do to display code including the typed in line terminators as well. Or at least similar to how forums do it.Forums tend to just str_replace("\n", "<br>\n", str_replace("\r", "", $post)). This isn't always semantically correct, which is why I prefer Markdown (http://daringfireball.net/projects/markdown/) or similar.
By the way, I have not found any built in function among the very many available with PHP that will display \r\n located in a string."Display" how? In HTML, the code above will work (although it's not usually best). If you mean actually display them as \r and \n, try str_replace("\r", '\r', str_replace("\n", '\n', $string)).

james438
08-04-2007, 07:31 AM
heh, thanks for the quick answers! I wasn't sure they would be interesting enough to answer :P

for those that are wondering

CRLF = carriage return line feed

for those still wondering:

carriage returns are those things that I used to play with on my grandfather's manual typewriter in order to go down to the next line and all the way to the left.

I was wondering about the built in functions, because it was difficult to discover that \r\n even existed for a long time for me until I discovered pattern matches. I was using things like htmlentities and htmlspecialchars and others to try and display what was being used to replace my CRLFs, because while they were not being recognized I noticed they were not being deleted either. No matter, I know what is being used now. I learned it mostly by accident ;)

anyway, I did some more testing and I think I found out the answer to my last question.

$r="\40";
$string = preg_replace('/[bd]/', $r, $string);
will display spaces, or octal code just fine, but spaces still seem to be condensed to just one. in fact in the above code that I tried to post where there were two spaces next to each other the forum condensed the two down to one. (There is a spot of typo in the code above that I will try to fix right now)

Thanks again!

Twey
08-04-2007, 07:33 AM
carriage returns are those things that I used to play with on my grandfather's manual typewriter in order to go down to the next line and all the way to the left.Heh. Not quite -- the carriage return sends the carriage to the left, then the line feed sends it down one line. You're right, this is a leftover from the days of typewriters. On computers, of course, the two operations are rarely seen independently of one another, so it makes sense to save that space (one byte per line) and eliminate the \r.
spaces still seem to be condensed to just one.Again, this is an HTML whitespace parsing rule.