javascript - Splitting HTML Content Into Sentences, But Keeping Subtags Intact -
i'm using code below separate text within paragraph tag sentences. working okay few exceptions. however, tags within paragraphs chewed , spit out. example:
<p>this sample of <a href="#">link</a> getting chewed up.</p>
so, how can ignore tags such parse sentences , place span tags around them , keep , , etc...tags in place? or smarter somehow walk dom , way?
// split text on page clickable sentences $('p').each(function() { var sentences = $(this) .text() .replace(/(((?![.!?]['"]?\s).)*[.!?]['"]?)(\s|$)/g, '<span class="sentence">$1</span>$3'); $(this).html(sentences); });
i using in chrome extension content script; means javascript injected page comes in contact , parses <p>
tags on fly. therefore, needs javascript.
soapbox
we craft regex match specific case, given html parsing , use case hints number of tags in there, you'd best off using dom or using product html agility (free)
however
if you're looking pull out inner text , not interested in retaining of tag data, use regex , repalace matches null
(<[^>]*>)
retain sentence including sub tags
((?:<p(?:\s[^>]*)?>).*?</p>)
- retain paragraph tags , entire sentence, not data outside paragraph(?:<p(?:\s[^>]*)?>)(.*?)(?:</p>)
- retain paragraph innertext including subtags, , store sentence group 1(<p(?:\s[^>]*)?>)(.*?)(</p>)
- capture open , close paragraph tags , innertext including sub tags
granted these powershell examples, regex , replace function should similar
$string = '<img> not stuff either</img><p class=supercoolstuff>this sample of <a href="#">link</a> getting chewed up.</p><a> other stuff</a>' write-host "replace p tags new span tag" $string -replace '(?:<p(?:\s[^>]*)?>)(.*?)(?:</p>)', '<span class=sentence>$1</span>' write-host write-host "insert p tag's inner text span new span tag , return entire thing including p tags" $string -replace '(<p(?:\s[^>]*)?>)(.*?)(</p>)', '$1<span class=sentence>$2</span>$3'
yields
replace p tags new span tag <img> not stuff either</img><span class=sentence>this sample of <a href="#">link</a> getting chewed up.</span ><a> other stuff</a> insert p tag's inner text span new span tag , return entire thing including p tags <img> not stuff either</img><p class=supercoolstuff><span class=sentence>this sample of <a href="#">link</a> getting chewed up.</span></p><a> other stuff</a>
Comments
Post a Comment