Log in

View Full Version : Regular Expressions



newspapertragedy
11-18-2009, 06:29 PM
Hi there,

I'm currently working on remodeling an antiquated site. Part of the process involves adding page-based navigation (i.e. previous, next, home, etc.) at the top of each page, where originally it was only at the bottom.

Using my limited knowledge of Regular Expressions, I've already managed to update the code/styling for the existing navigation at the bottom (e.g. using Dreamweaver's Find and Replace with Regular Expressions to locate the tags in questions—anchor tags, for instance—and preserve the attribute data, while modifying class names, adding divs, etc.) and I even batch-added the structure for the navigation at the appropriate place at the top of the page. What I'd like to do now, is to copy the attribute information from the links at the bottom of the page (hrefs and whatnot) into the corresponding attributes in the navigation at the top of the page.

Here's an example of the code I'm working with:


<div class="pageNav-top">
<ul>
<li class="previous-top"><a href="#">Previous</a></li>
<li class="home-top"><a href="../../index.html">Home</a></li>
<li class="needToKnow-top"><a href="needtoknow.html">What I Need to Know</a></li>
<li class="maps-top"><a href="../../maps/index.html">Maps</a></li>
<li class="next-top"><a href="#">Next Page</a></li>
</ul>
</div>

<div id="textBody">

<div class="imagetitle" align="center">Watch the Video</div>

<div align="center"><a class="flowplayer" href="../../hst111/10_civil-war/videos/Fort_Sumter_VP6_384K.flv"><img src="images/videostill-fort-sumter.jpg" alt="Fort Sumter" width="329" height="224" border="0"></a></div>

<div align="center"><span class="caption">Video Length= 4:20</span></div>


<h3>Fort Sumter</h3>
<p>Having declared separate status from the Union, South Carolina (and other states) waited to see what response the Union would make. Lincoln forced the South to fire the first shot by announcing his intention to resupply a federal fort in Charleston harbor. If South Carolina allowed the resupply, the rebellion would be effectively ended. The Fort was bombarded heavily by shore emplacements before the Union resupply ships could arrive. Surrender followed in turn and the Civil War was officially underway. </p>

</div><!-- end of textBody -->

<div class="pageNav-bottom">
<ul>
<li class="previous-bottom"><a href="election-1860.html">Previous</a></li>
<li class="home-bottom"><a href="../../index.html">Home</a></li>
<li class="needToKnow-bottom"><a href="needtoknow.html">What I Need to Know</a></li>
<li class="maps-bottom"><a href="../../maps/index.html">Maps</a></li>
<li class="next-bottom"><a href="early-war.html">Next Page</a></li>
</ul>
</div>


So what I want to do is copy the href attribute values from the "Previous" and "Next" links at the bottom of the page, into the (currently blank, href="#") corresponding "Previous" and "Next" href attributes in the navigation at the top of the page, while ignoring any content (tags, text, or otherwise) in-between.

It seems like with RegEx's there should be a way to do this, but I'm not quite sure how. I know that I can find and grab the attribute values using something like this href="([^"]*)", and then reference them with $1, $2, etc., but I'm not sure how to only deal with the sections of the page that have the top and bottom navigation, while ignoring everything else.

Any help would be greatly appreciated!

sniperman
01-15-2010, 07:16 AM
Thanks a million for your contribution. I've been looking everywhere online (and i mean everywhere) trying to find a web developer app that will save class and id names from HTML files and dump it to a text file?

Going through 100+ HTML pages searching for class="*" or id="*" would be time-consuming. And I mean, no one offers this kind of service. Not even Firefox's Web Developer Toolbar, which only displays class and id but does not allow you to copy and paste them into a blank CSS Stylesheet like this developer did. (http://blog.corunet.com/automatic-css-the-stylizator/comment-page-1/#comment-1837).

In the end I had to download InfoRapid Search and Replace for windows and came online because my Regular Expressions are scratchy. My favorite waterhole is Dynamic Drive, and so, voila, thanks for the regular expression, I can look for classes and ids now with ease.

Is there a way however to refine the regular expression so that it does not include the whole line in the search results, and only the characters in question>

newspapertragedy
01-15-2010, 02:45 PM
From what I can tell, some of this varies based on the program you are using. Right now, the expression: href="([^"]*)" will grab whatever the value of the href attribute is (say, "index.html"). (That is, the parentheses store the content inside them to a variable [which can be referenced $1, $2, etc. depending on how many values you store in your search], the brackets are grouping symbols, the ^ sign before the quote character says to grab any character that is not [think of this as up-to] the quote character, and the asterisks after the brackets says to repeat that an infinite number of times until the parser does run into a quote character). So, that said, it shouldn't be grabbing the whole line. I know when I run a search like that in Dreamweaver, the search results show the whole line in context, but the only content that's dealt with, is the content inside the parentheses. Sorry—I've realized in writing this that without examples, it's rather hard to explain. I'm still fairly new at this myself.

If you want to try to find more information on your own, two of the best tutorials I've found are:

http://www.adobe.com/devnet/dreamweaver/articles/regular_expressions.html - This is easy to follow, and a fairly good introduction to using RegEx in conjunction with HTML and CSS. You'll also find it's a little limited in scope, too, which gets annoying since it's the most common resource you'll find when searching.
http://gnosis.cx/publish/programming/regular_expressions.html - This is a lot more thorough, but a little harder to follow.


I hope those help.