Skip to content Skip to sidebar Skip to footer

Use Regex To Find Specific String Not In Html Tag

I'm having some difficulty with a specific Regex I'm trying to use. I'm searching for every occurrence of a string (for my purposes, I'll say it's 'mystring') in a document, EXCEP

Solution 1:

This should do it:

(?<!<[^>]*)_mystring_

It uses a negative look behind to check that the matched string does not have a < before it without a corresponding >

Solution 2:

When your regex processor doesn't support variable length look behind, try this:

(<.+?>[^<>]*?)(_mystring_)([^<>]*?<.+?>)

Preserve capture groups 1 and 3 and replace capture group 2:

For example, in Eclipse, find:

(<.+?>[^<>]*?)(_mystring_)([^<>]*?<.+?>)

and replace with:

$1_newString_$3

(Other regex processors might use a different capture group syntax, such as \1)

Solution 3:

Another regex to search that worked for me

(?![^<]*>)_mystring_

Source: https://stackoverflow.com/a/857819/1106878

Solution 4:

A quick and dirty alternative is to use a regex replace function with callback to encode the content of tags (everything between < and >), for example using base64, then run your search, then run another callback to decode your tag contents.

This can also save a lot of head scratching when you need to exclude specific tags from a regex search - first obfuscate them and wrap them in a marker that won't match your search, then run your search, then deobfuscate whatever is in markers.

Solution 5:

Why use regex?

For xhtml, load it into XDocument / XmlDocument; for (non-x)html the Html Agility Pack would seem a more sensible choice...

Either way, that will parse the html into a DOM so you can iterate over the nodes and inspect them.

Post a Comment for "Use Regex To Find Specific String Not In Html Tag"