PDA

View Full Version : How to restore broken lines in links in my PHP editor?



qwikad.com
12-18-2012, 11:03 PM
What happens is the broken lines in links do not show correctly in my markdown editor. For instance the editor will not render
border="0" in this instance:




<a href="http://somedomain.com" rel="nofollow">

<img src="http://somedomain.com/images/someimage.png"
border="0"></a><br>



I need it to be on the same line as the rest of the link like this:




<a href="http://somedomain.com" rel="nofollow">

<img src="http://somedomain.com/images/someimage.png" border="0"></a><br>



Which php command can accomplish this?

Thanks!

traq
12-19-2012, 01:50 AM
Your question is unclear.
Please provide more information, and be as specific as possible.
What do you want to accomplish? What have you already tried? What problems did you encounter?
Also, please be sure that you have included all relevant code and/or a link to the page in question.
You might also consider making a reduced test case (http://css-tricks.com/reduced-test-cases/) using an online tool like jsfiddle (http://jsfiddle.net).

What "markdown editor"?
"Markdown" is a text parser. Some WYSIWYG editors use markdown; if that's what you're talking about, please specify which editor you're using. Furthermore, you don't show any Markdown usage in your examples.

(Also, your title refers to a "PHP Editor", which -to me- implies a code editor or IDE, and seems entirely unrelated to what you talk about in the body of the thread.)

What do you mean by "broken lines"?
You seem to use "broken lines" to refer both to the border around the image, and to literal line breaks in your HTML markup (and, possibly, to the dotted outline normally displayed around hyperlinks). However, what you say below makes me suspect you don't really mean any of these.

What do you mean by "will not render"?
Both the examples you give are HTML markup, and are functionally equal. They will be rendered identically in any browser.

What "PHP"?
There is no PHP code in your examples. Are you referring to the Markdown parser itself?

---------------

If all you want to accomplish is this

<a href="http://somedomain.com" rel="nofollow">

<img src="http://somedomain.com/images/someimage.png" border="0"></a><br>
as opposed to this

<a href="http://somedomain.com" rel="nofollow">

<img src="http://somedomain.com/images/someimage.png"
border="0"></a><br>(which is not, as you seem to be implying, "on the same line as the rest of the link")

Then just write it that way - no PHP, beyond, possibly, echo, is required.

qwikad.com
12-19-2012, 08:23 PM
Ok, I should have simply asked this. What php code should I use to get rid of new lines WITHIN a hyper link:

Original code:




<img src="http://somedomain.com/images/someimage.png"
border="0">



I want it to be:




<img src="http://somedomain.com/images/someimage.png" border="0">



Thanks!

djr33
12-19-2012, 08:47 PM
"Hyperlinks" do not exist in PHP. Only text does. How is this text stored? Is it being generated by PHP? Is it on another page?

Are you attempting to reformat or parse HTML using PHP?


That's why traq's post asks for a lot of information. We can't understand how to answer without actually understanding what you're doing. On a technical level, it just doesn't make sense-- "hyperlinks" aren't in PHP, so you can't change them. You can probably indirectly accomplish that, but we'd need to know how you have set this up.



Regardless, if you can get it as a string, then you can use str_replace() to change line breaks to an empty string ''. Or you can using some form of regular expressions (probably the preg_ family of functions) to search out only things in text that looks like a URL, or something along those lines.

qwikad.com
12-20-2012, 02:10 AM
Of course it is done with preg_replace. The question is how?

djr33
12-20-2012, 02:38 AM
I have no idea. Your questions/explanations aren't enough for us to help you with this.

You can look up a regex tutorial online and see the PHP.net manual for preg_replace (http://php.net/preg_replace).

Do you already have it as a string?

What exactly are the conditions (in terms understandable to a computer, not "hyperlinks") that would make this occur? [This will be your regex environment]

What do you want to replace in that environment? Just line breaks? [this will be added to your regex environment to make the "find" part]

When do you want to replace it with? Always just delete it? You can use an empty string-- ''. [this will be your "replace" part]



If you need more help, I imagine it will be with the regex for finding the environment. First, use google to see if someone has this code available (I'm sure there are at least some similar examples out there). Second, if you want our help with it, then be very precise about what is happening. Show us the "before" string (exactly as it exists in your PHP) and then what you want as the after string, including any variation that might come up.

traq
12-20-2012, 04:51 AM
Of course it is done with preg_replace. The question is how?

Please re-read my earlier post (http://www.dynamicdrive.com/forums/showthread.php?72423-How-to-restore-broken-lines-in-links-in-my-PHP-editor&p=288235#post288235) - once you provide all the information I asked about (particularly your actual code), we'll have a foundation for helping you figure out the rest.



Please don't "simplify" the problem by leaving out contextual information/details that may be important. It is counter-productive.

From reading both your thread title and your original post, I suspect that your problem involves more than just your hyperlinks. I'll bet it involves the Markdown parser, and also a WYSIWYG editor of some sort on your webpage (therefore, indirectly, whatever markdown you actually enter into it, not to mention the form submission itself).

qwikad.com
12-20-2012, 12:33 PM
Here's a sample code from the file:




text = text.replace(/^[ \t]+$/mg,"");



I need a code that would remove all new lines between < and > in HTML links.

Thanks!

traq
12-20-2012, 02:48 PM
Here's a sample code from the file:




text = text.replace(/^[ \t]+$/mg,"");



I need a code that would remove all new lines between < and > in HTML links.

Thanks!

that's javascript code. Are we talking about javascript now?

Beverleyh
12-20-2012, 03:26 PM
I'm getting dizzy just reading this thread.

Traq and Dan are trying to help you, but sadly, you are providing only very small, incomplete snippets of information, which is resulting in us making wild guesses.

At the risk of further convoluting the thread with my own suggestions, maybe this will help? http://stackoverflow.com/questions/6394416/replace-excess-whitespaces-and-line-breaks-with-php
Again, a total guess as to what you really need, but we are doing our best to offer assistance with the little direction you're giving us.

To help things along, can you provide a small zip package of what you are trying to describe? Including your php/javascripts, along with the link to this php editor you speak of, would be most helpful. Or is there a developers site and demo - maybe even with online documentation or a forum - where you can direct us?

We can appreciate how frustrated you might be - you need help, and it must seem like we're being finickity or awkward with the amount of information we're asking for, but it really is so we can pin-point your problem and offer you the most efficient and relevant answers. Please try to remember that you are very close to your own project, whereas we are just outsiders looking in, and we need to get our bearings, and a good feel for what is going on, before we can be of any real help to you.

Until we have all the information upfront, we will be unable to give you any solid resolution. We're trying our best but everything to do with your project/problem is rather alien to us at the moment.

qwikad.com
12-20-2012, 03:37 PM
that's javascript code. Are we talking about javascript now?

Oops... I feel so dumb right now. It is a javascript file (after I checked it again). I am not a programmer so I kind of thought it was a PHP file. In any case, how do I eliminate new lines using the example above between < and > in HTML links.

Thank you.

PS. Please, do not tell me I am confusing all of you. I realize I did. I just need to find a simple solution to this and I know it is possible to do.

Beverleyh
12-20-2012, 03:48 PM
OK - so going back to what the guys have been asking you, what "markdown editor" are you using?

There are probably inbuilt markup/formatting functions that can be adapted so its best that you give us a link/download that will allow us to modify what already exists.

qwikad.com
12-20-2012, 04:04 PM
OK - so going back to what the guys have been asking you, what "markdown editor" are you using?

There are probably inbuilt markup/formatting functions that can be adapted so its best that you give us a link/download that will allow us to modify what already exists.

This is the one I am using:

#
# Markdown - A text-to-HTML conversion tool for web writers
#
# PHP Markdown
# Copyright (c) 2004-2008 Michel Fortin
# <http://www.michelf.com/projects/php-markdown/>
#
# Original Markdown
# Copyright (c) 2004-2006 John Gruber
# <http://daringfireball.net/projects/markdown/>
#

qwikad.com
12-20-2012, 04:10 PM
And by the way I do not need to eliminate white spaces. It's new lines that I want to disappear.

Beverleyh
12-20-2012, 04:12 PM
The PHP Markdown editor at http://www.michelf.com/projects/php-markdown/ ?

There are 2 downloads on their site - which one?

Also, have you already identified the file (that is part of markdown editor bundle) that needs to be modified? (to help speed things up)

Additionally, can you provide an example of a working editor on your website that is exhibiting the 'hyperlink-new-line-space' problem so we can see the behaviour firsthand?

qwikad.com
12-20-2012, 04:23 PM
OK, I give up. Can a moderator lock this thread up? Seriously... This is a dynamicdrive forum, for crying out loud. Can SOMEONE tell me what code I should use to eliminate new lines between < and > in HTML links using THIS example:




text = text.replace(/^[ \t]+$/mg,"");

????????

I have figured out a LOT of things on my own both in PHP and javascript with NO training. I am asking real programmers to help me with something as simple as this and I get nothing in return? It's kind of funny.

Argh!

Beverleyh
12-20-2012, 05:15 PM
Can i just say that its probably not a good idea to bite the hand that feeds you - you may very well need help from these forums again in the future and youre not exactly forging good relationships in our community.

3 people HAVE been trying to help you - 2 of them highly skilled moderators and all 3 of us very busy people.

Weve remained polite and offered assistance dispite having very little to go on, so the attitude in your last post is uncalled for.

traq
12-20-2012, 09:06 PM
And by the way I do not need to eliminate white spaces. It's new lines that I want to disappear.Newlines (\n or \r\n) are a form of whitespace.


OK, I give up. Can a moderator lock this thread up? Seriously... This is a dynamicdrive forum, for crying out loud. Can SOMEONE tell me what code I should use to eliminate new lines between < and > in HTML links using THIS example:




text = text.replace(/^[ \t]+$/mg,"");

????????

I have figured out a LOT of things on my own both in PHP and javascript with NO training. I am asking real programmers to help me with something as simple as this and I get nothing in return? It's kind of funny.

Argh!

I understand that this might be frustrating. However, please consider that you are getting quite a bit in return. We're not running you in circles. We're asking very clear, straightforward questions about your situation so we can offer a solution to your problem.

Your problem is not "simple" until it is understood. At this point, there are many things that are "unknown" and may have an impact on the solution - it's not one-size-fits-all.

For example, we need to know "where" your <img> tag is (in a js variable? in part of a php script? simply text in the source code?) and how your code is acting on it. Without knowing these simple things, it is impossible to offer a useful suggestion - we'd just be making wild guesses, and they would probably be unhelpful.

Is there some reason you cannot provide the information we've asked for?

I'm going to leave this thread open for another 24 hours.

qwikad.com, please reply during that time if you'd like to offer more information. We would love to help you solve your problem.

everyone else, please refrain from posting in the meantime.

qwikad.com
12-20-2012, 10:30 PM
Newlines (\n or \r\n) are a form of whitespace.



I understand that this might be frustrating. However, please consider that you are getting quite a bit in return. We're not running you in circles. We're asking very clear, straightforward questions about your situation so we can offer a solution to your problem.

Your problem is not "simple" until it is understood. At this point, there are many things that are "unknown" and may have an impact on the solution - it's not one-size-fits-all.

For example, we need to know "where" your <img> tag is (in a js variable? in part of a php script? simply text in the source code?) and how your code is acting on it. Without knowing these simple things, it is impossible to offer a useful suggestion - we'd just be making wild guesses, and they would probably be unhelpful.

Is there some reason you cannot provide the information we've asked for?

I'm going to leave this thread open for another 24 hours.

qwikad.com, please reply during that time if you'd like to offer more information. We would love to help you solve your problem.

everyone else, please refrain from posting in the meantime.



traq I see that you have some authority here. OK. It's NOT just the new lines that I need to get rid off. I can do that. I did it in the past. They are the new/broken lines between < and > that I can't figure out how to get rid of. Just like it is in the example:




<img src="http://something.com/images/image.png"
border="0">



Let's say someone cuts and pastes this code into my editor, I want it to automatically become this:




<img src="http://something.com/images/image.png" border="0">



Can you see the difference? Is it really this hard for someone who is a js/php programmer?

Use this analogy to do this:




text = text.replace(blah blah blah);



Thank you, again. Man, I am sweating as I am typing this. I thought it would be a simple question/answer thing. If you look up the beginning of this thread, this is exactly what I asked.

qwikad.com
12-20-2012, 10:41 PM
I mean I confused js with php in the beginning of the thread, but if someone offered a solution with preg_replace, I would change in into text = text.replace... but nobody did. I just don't know what should go instead of the "blah blah blah" part...

keyboard
12-20-2012, 11:30 PM
Sorry to barge in.
You could try running this -

text = text.replace(/(\r\n|\n|\r)/gm,"");
That should remove all line-breaks from the variable text.
But that'd only work if you can run javascript inside your editor...

Are you trying to program a code editor?

qwikad.com
12-21-2012, 01:47 AM
Sorry to barge in.
You could try running this -

text = text.replace(/(\r\n|\n|\r)/gm,"");
That should remove all line-breaks from the variable text.
But that'd only work if you can run javascript inside your editor...

Are you trying to program a code editor?

I am trying to make the editor do more than what it has been designed to do. It seems like one of its weak points is it doesn't read HTML codes between < and > if they start with new lines as I mentioned before. Have no idea why. Michel Fortin who redesigned the code from the previous work of John Gruber says that the code is not perfect and can be updated/changed and although I have done quite a bit of improvement to it, I can't figure out how to fix this issue. I will try what you've suggested. Looks promising.

traq
12-21-2012, 02:08 AM
traq I see that you have some authority here.
I don't want to move the thread off in a different direction, but aside from the fact that I'll close this thread if you ask me to, it doesn't matter one bit if I have any authority or not. The only thing that has been holding back answers is your persistence in not sharing the information that people have been asking you about.


OK. It's NOT just the new lines that I need to get rid off. I can do that. I did it in the past. They are the new/broken lines between < and > that I can't figure out how to get rid of. Just like it is in the example:


<img src="http://something.com/images/image.png"
border="0">
Let's say someone cuts and pastes this code into my editor, I want it to automatically become this:

<img src="http://something.com/images/image.png" border="0">
Can you see the difference? Is it really this hard for someone who is a js/php programmer?
Yes, I can see the difference. A web browser cannot - which brings up another question: is this being displayed as text, or rendered as HTML?

Keyboard offered a possible solution, but, as he said, it will only work if the text you're working with is already contained in the var text. We have no way of knowing whether or not this is the case - and, therefore, no way of knowing if this is a usable solution or not. As I pointed out earlier, we're moving into "wild guesses" territory. We prefer not to hang out there.


I mean I confused js with php in the beginning of the thread, but if someone offered a solution with preg_replace, I would change in into text = text.replace... but nobody did. I just don't know what should go instead of the "blah blah blah" part...
Actually, no - another example of why context is important.

PHP and javascript do not use identical regular expression syntax. For example, PHP requires you to put the regex inside a string, whereas javascript regexes are bare. There are other differences as well -maybe not in this specific example- but that would have been the first one to trip you up. It's just *similar enough* to confuse people (especially if you don't really know how to read/write them).


Thank you, again. Man, I am sweating as I am typing this. I thought it would be a simple question/answer thing.
Once the question is well-defined, yes.

Please believe me: I'm not trying to give you a hard time, here. No one is. I say this with all sincerity, and with every intention of helping you find the answer you need.

Work with us. This very well could have been a simple answer, if you'd only provided some of the info you were asked. You've got four of the most active members on the forum here in this thread, now. We've answered questions like yours before. We know how to ask targeted questions to get to the root of problems like this one. With some cooperation, in most cases we can get the best answer out there pretty quick, and we're all happier.


I am trying to make the editor do more than what it has been designed to do. It seems like one of its weak points is it doesn't read HTML codes between < and > if they start with new lines as I mentioned before. Have no idea why. Michel Fortin who redesigned the code from the previous work of John Gruber says that the code is not perfect and can be updated/changed and although I have done quite a bit of improvement to it, I can't figure out how to fix this issue. I will try what you've suggested. Looks promising.
THAT is a very helpful reply.

For example, we now know where in the script this is happening - before being submitted to the markdown parser. I was working under the assumption that it was being *returned* from markdown this way (which also explains a bit of my confusion about why it *mattered* if there was a newline in the tag).

It's always good to know exactly what it is you're trying to solve.

qwikad.com
12-21-2012, 02:18 AM
Yes, I can see the difference. A web browser cannot - which brings up another question: is this being displayed as text, or rendered as HTML?

The browsers don't care whether it's broken or not. It's the preview that I have on my classifieds that does. For instance visit this page:

http://qwikad.com/?view=post&cityid=86&lang=en&catid=6&subcatid=94&shortcutregion=

and where it says posting description put this one first:




<img src="http://qwikad.com/images/logo.png"
border="0">



You will notice that border="0" is visible in the preview.

But if you put this code, it will all look normal:




<img src="http://qwikad.com/images/logo.png" border="0">



So if I eliminate broken lines while the code is being typed or cut and pasted this issue will disappear as well.

Thanks.

traq
12-21-2012, 02:22 AM
As I said, yes, makes sense now. Your last post was very helpful in clarifying what you need. Did you try KB's suggestion?

qwikad.com
12-21-2012, 02:26 AM
Work with us.

I am, I really am. Sorry if I sounded exasperated.

Using the same idea how can this code be changed so that it would fix new lines ONLY between the tags < and > ?




text = text.replace(/(\r\n|\n|\r)/gm,"");

qwikad.com
12-21-2012, 02:30 AM
As I said, yes, makes sense now. Your last post was very helpful in clarifying what you need. Did you try KB's suggestion?

I have not tried it yet. I will get to it tomorrow. It looks great but I am thinking it will be fixing ALL new lines.

traq
12-21-2012, 02:34 AM
I am, I really am. Sorry if I sounded exasperated.

Using the same idea how can this code be changed so that it would fix new lines ONLY between the tags < and > ?

You'd need a lookahead modifier. I'd have to experiment to figure that one out, and to get it working in javascript (vs. php).

djr33
12-21-2012, 02:36 AM
It looks great but I am thinking it will be fixing ALL new lines.
Yes, it will be. Using regular expressions to specifically detect new lines only within links is going to be quite difficult. If you must do that, I suggest searching google to find examples of this. The real issue is that it's very hard to predict what the input might look like and what should and should not count. Then there's the technical problem of actually writing the regex, but in some sense that's simpler because it's logical-- it's harder to work out what user input might look like.


There is an alternative, which actually might be easier:
1. Remove all lines (using KB's code).
2. Insert new lines back into the code in a systematic way. Something like HTMLtidy (http://tidy.sourceforge.net/) may be useful, if that works in Javascript. (Or, if you can, just wait and do this on the serverside using PHP, for which I know for sure that HTMLtidy works.)


Alternatively, go ahead and remove all new lines. Since it's just HTML, you don't need them. It will be harder to read, but if this is being generated automatically and edited in a WYSIWYG editor, that won't really be a practical problem.



Edit: traq's post is encouraging about trying to do this directly with regex. The problem is that it really can be hard to predict exactly what the code might look like (and are you aware of whether this is browser-generated or Javascript-generated HTML? If it's generated by the browser, then it will look different for different browsers-- that's the case when designMode is used, which is what allows you to get a preview mode for WYSIWYG in the majority of cases). It will probably take a little experimentation to find a good balance for the right strength of the search algorithm. Computers don't see "links" like we do, so it will be a little complicated.

traq
12-21-2012, 02:45 AM
Alternatively, go ahead and remove all new lines. Since it's just HTML, you don't need them. It will be harder to read, but if this is being generated automatically and edited in a WYSIWYG editor, that won't really be a practical problem.

That was my first thought as well, but Markdown relies on newlines to delineate paragraphs (among other elements). Your suggestion about HTMLTidy would probably work perfectly, but would add quite a bit of complexity (as well as processing time).

djr33
12-21-2012, 02:49 AM
If it's relying on new lines for paragraphs, then would it also have <a> tags for links? Wouldn't it be one or the other?

bernie1227
12-21-2012, 02:59 AM
http://jsfiddle.net/bernie1227/Sxs3C/14/

traq
12-21-2012, 03:32 AM
If it's relying on new lines for paragraphs, then would it also have <a> tags for links? Wouldn't it be one or the other?
Markdown generally allows HTML. Specific implementations may differ.


http://jsfiddle.net/bernie1227/Sxs3C/14/
http://jsfiddle.net/traq/vG3U6/

bernie1227
12-21-2012, 03:36 AM
You really didn't need a jsfiddle to ask me that traq:
(text.split("<").length - 1) tells us how many opening tags if you want to know how many there are.

qwikad.com
12-21-2012, 03:47 AM
I don't know if I can nudge some of you in the right direction but is there something that can go like this?


text = text.replace('/<([^<>]+)>/g','"<" ...... ">"');

Where the dots are we need to tell that only new lines between the tags must be stripped...

Or am I totally off?

bernie1227
12-21-2012, 05:37 AM
You could have a try with something along the lines of:


text.replace(/(<\S+)\n*(\S+>)/g, "$1$2");

traq
12-21-2012, 05:40 AM
http://jsfiddle.net/traq/kRsND/

Actually didn't end up using a look-ahead (my idea required a look-behind also, which javascript apparently doesn't implement :rolleyes:).

I could only get it to work by putting it in a loop, though. I'm not sure why the g modifier isn't having the effect I expect (probably, I misunderstand what regex means by "global"). Removing it has no effect on the result.

for reference, here's the regex I'm using:
/(<[^>]*)\n+([^>]*>)/g

// explanation:

( // start $1
< // opening bracket
[^>]* // anything not a closing bracket
) // end first match
\n+ // one or more newlines
( // start $2
[^>]* // anything not a closing bracket
> // closing bracket
) // end $2

// then I just replaced the *whole* match with only $1 and $2.


You could have a try with something along the lines of:


text.replace(/(<\S+)\n*(\S+>)/g, "$1$2");


...beat by 3 minutes.

traq
12-21-2012, 04:14 PM
fixed. http://jsfiddle.net/traq/kRsND/


var regex = /(<[^>\n]*)\s+([^>\n]*>)/g;
/*

( // start $1
< // opening bracket
[^>\n]* // anything not a closing bracket *or* a newline
) // end $1
\s+ // one or more *whitespace* (including newlines)
( // start $2
[^>\n]* // anything not a closing bracket *or* a newline
> // closing bracket
) // end $2

*/

text = text.replace( regex,'$1 $2' );
// note the single non-breaking space between $1 and $2

qwikad.com
12-21-2012, 07:32 PM
This has fixed it. Wow. Thanks!

I will still need to test it to find out whether or not it works with all sorts of HTML banner ads / HTML codes without messing something up, but as of now it works perfect.



fixed. http://jsfiddle.net/traq/kRsND/


var regex = /(<[^>\n]*)\s+([^>\n]*>)/g;
/*

( // start $1
< // opening bracket
[^>\n]* // anything not a closing bracket *or* a newline
) // end $1
\s+ // one or more *whitespace* (including newlines)
( // start $2
[^>\n]* // anything not a closing bracket *or* a newline
> // closing bracket
) // end $2

*/

text = text.replace( regex,'$1 $2' );
// note the single non-breaking space between $1 and $2

traq
12-21-2012, 08:32 PM
glad we could help.

once you've completed your testing,

If your question has been answered, please mark your thread "resolved":
On your original post (post #1), click [edit], then click [go advanced]. In the "thread prefix" box, select "Resolved". Click [save changes].

traq
01-09-2013, 09:08 PM
qwikad, it's perfectly okay to post your question here. It's on the same topic, after all, and it's your thread. Let me know if you want me to un-resolve the thread and restore your post.

*****

So, your problem is that the regex matches when newlines separate the tag into two parts, but not more than two, correct?
works:
<tag
one>

works:
<tag


two>

breaks:
<tag
three
breaks>
new fiddle (http://jsfiddle.net/traq/duzuz/)

I have to admit, the only thing that occurs to me is adding additional, optional subpatterns in the middle of the regex.
This is not a solution, obviously, since you'll always be subject to a maximum of however many subpatterns you add.

I'll think about it. Anyone else have ideas?

qwikad.com
01-10-2013, 12:02 AM
No, it's a bit different (unless I misunderstood you).

This works:




<img src="http://qwikad.com/images/logo.png"
alt="something" title="something">



This also works:




<img src="http://qwikad.com/images/logo.png"

alt="something" title="something">



This, however doesn't:




<img src="http://qwikad.com/images/logo.png"
alt="something"
title="something">




Neither does this:




<img src="http://qwikad.com/images/logo.png"

alt="something"

title="something">





Hope it helps.

traq
01-10-2013, 02:30 AM
no, that's exactly what I meant.

I know this can be done, but it's evading me right now.
I'm also a little preoccupied with some other stuff, so it's a bit hard to really get into it, but I'll keep the problem in mind.

qwikad.com
01-10-2013, 02:58 AM
Sounds fair, no rush. I'll be stopping by here from time to time to see if you had a chance to look at it again.

james438
01-10-2013, 05:46 AM
This looks like a javascript problem. I do have a fair amount of experience with PCRE (Perl Compatible Regular Expressions) though it has been a while for me. I have a fair collection of PCRE scripts and found one that looked similar to what is being worked on here. Can you reverse engineer it to to suit your needs?


<?php
$text="<img src=\"http://qwikad.com/images/logo.png\"

alt=\"something\"

title=\"something\">this
is

fine
<img src=\"http://qwikad.com/images/logo.png\"

alt=\"something\"

title=\"something\">";
$text=preg_replace('/(\r\n){1,}(?=((?!<).)*>)/s','XX',$text);
echo "$text";
?>

I really need several more examples so that I can see exactly what it is you are trying to do though. In short I am looking for a newline and as long as it is not followed by an opening tag (which indicates that it is a newline located within a tag) it will replace it with XX. The s modifier is set so that the dot will also recognize newlines.

Even though I wrote the above PCRE it has been a long enough time since I have worked with it that it is rather hazy for me. If PCRE does not help at all I will quietly bow out :).

Later

traq
01-10-2013, 06:20 AM
Very helpful. Had to change it a bit to work in javascript (what's the s flag do? it's not supported in js) -

/(\s)+(?=((?!<).)*>)/g
This does what we need (http://jsfiddle.net/traq/duzuz/2/), but only one-at-a-time: i.e., you have to run it twice if there are two line breaks in a tag, three times if there are three, etc..

james438
01-10-2013, 06:26 AM
That's why you need the s modifier. s alters the . so that it recognizes newlines as well as other characters otherwise it stops once it gets to a newline, which in our situation is bad. You can read up more on the s modifier as well as most (not all) of the other modifiers here (http://us.php.net/manual/en/reference.pcre.pattern.modifiers.php).

james438
01-10-2013, 06:39 AM
The equivalent of . with the s modifier would be [\s\S].

A reference that shows how javascript regular expressions is different from perl regular expressions is http://www.regular-expressions.info/javascript.html

It is very short, but also has several workarounds.

qwikad.com
01-10-2013, 01:32 PM
james438,

This is what was offered by traq. And it works well for one broken line:




var regex = /(<[^>\n]*)\s+([^>\n]*>)/g;
text = text.replace( regex,'$1 $2' );



I was thinking how many broken lines can an average HTML banner ad possibly have and still be a valid code, and I came up with 8:




<
img
src="http://something.com/images/something.png"
title="something"
alt="something"
width="300"
height="250"
border="0"
>



So if that var regex could be adjusted to restoring 8 broken lines, I think it will cover every possible inconsistency of any HTML banner ad (and more).

james438
01-11-2013, 04:05 AM
Can I see an example of the javascript that uses regular expressions so that I can work on it to get it to work? The php code works just fine on my end, but I am less familiar with javascript's version of regular expressions which is probably the most limited of all the web coding languages with Perl easily being the best. Still there are enough similarities that I can probably work with it, but it would be helpful to see the sample code you are working with.

What you just posted quickad.com is actually rather easier to work with. The following is trickier because the PCRE needs to know when to replace newlines and when not to.


<img src="http://qwikad.com/images/logo.png"

alt="something"

title="something">this
is

fine
<img src="http://qwikad.com/images/logo.png"

alt="something"

title="something">

Does this have to be in javascript or can php work? It should be fine either way though.

qwikad.com
01-12-2013, 01:04 AM
it has to be in javascript, but if you post your php idea, I could use it too for the php part of my editor.

james438
01-12-2013, 02:08 AM
I already did, but here it is again. The only difference is that I removed the s modifier:


<?php
$text="<img src=\"http://qwikad.com/images/logo.png\"

alt=\"something\"

title=\"something\">this
is

fine
<img src=\"http://qwikad.com/images/logo.png\"

alt=\"something\"

title=\"something\">";
$text=preg_replace('/(\r\n){1,}(?=((?!<)[\s\S])*>)/','',$text);
echo "$text";
?>

Could you post the complete javascript that you have so that I have something to work with?

qwikad.com
01-12-2013, 03:29 AM
Ok, got it. Will test it over the next couple days.

traq
01-12-2013, 06:50 AM
I was thinking how many broken lines can an average HTML banner ad possibly have and still be a valid code, and I came up with 8:

<
img
src="http://something.com/images/something.png"
title="something"
alt="something"
width="300"
height="250"
border="0"
>So if that var regex could be adjusted to restoring 8 broken lines, I think it will cover every possible inconsistency of any HTML banner ad (and more).

What about
<
img
src
=
"http://something.com/images/something.png"
title
=
"some
thing"
alt
=
"something"
width
=
"300"
height
=
"250"
border
=
"0"
>? That's still perfectly valid HTML (and creates the same DOM as your example).

**********

Have you looked at the links james provided? They're good references. You can experiment with the fiddle (http://jsfiddle.net/traq/duzuz/).

qwikad.com
01-12-2013, 02:13 PM
So, it can have up to 16 new lines... Which is possible. I saw HTML codes formed so poorly, from all the copying and pasting, that I am not even sure WHY they work.

You mean the link to the php site? I am not sure if I can make sense of what's going on there.

traq
01-12-2013, 03:13 PM
He also linked to regular-expressions.info. But take a look anyway, it can be daunting to get into, but it's not "impossible."

As for newlines, the number is almost unlimited. You can have newlines almost anywhere in HTML and they don't break anything (they're considered whitespace).

qwikad.com
01-12-2013, 10:53 PM
You know I think I can live with what I've got. If once or twice a month someone posts something that doesn't show, it's not going to negatively impact our service as a whole. People would have to make sure that their HTML ads are formed right... if they want to see them on our site.