java - Excluding markup on lowercased parentheses letters -
a string can contain 1 many parentheses in lower case letters string content = "this (a) nightmare";
want transform string "<centamp>this </centamp>(a) <centamp>nightmare</centamp>";
add centamp markup around string if has lowercase letter in parentheses should excluded markup.
this have tried far, doesn't achieve desired result. there none many parentheses in string , excluding markup should happen every parentheses.
pattern pattern = pattern.compile("^(.*)?(\\([a-z]*\\))?(.*)?$", pattern.multiline); string content = "this (a) nightmare"; system.out.println(content.matches("^(.*)?(\\([a-z]*\\))?(.*)?$")); system.out.println(pattern.matcher(content).replaceall("<centamp>$1$3</centamp>$2"));
this can done in 1 replaceall
:
string outputstring = inputstring.replaceall("(?s)\\g((?:\\([a-z]+\\))*+)((?:(?!\\([a-z]+\\)).)+)", "$1<centamp>$2</centamp>");
it allows non-empty sequence of lower case english alphabet character inside bracket \\([a-z]+\\)
.
features:
- whitespace sequences tagged.
- there no tag surrounding empty string.
explanation:
\g
asserts match boundary, i.e. next match can start end of last match. can match beginning of string (when have yet find match).each match of regex contain sequence of: 0 or more consecutive
\\([a-z]+\\)
(no space between allowed), , followed @ least 1 character not form\\([a-z]+\\)
sequence.0 or more consecutive
\\([a-z]+\\)
cover case string not start\\([a-z]+\\)
, , case string not contain\\([a-z]+\\)
.in pattern portion
(?:\\([a-z]+\\))*+
- note+
after*
makes quantifier possessive, in other words, disallows backtracking. put, optimization.one character restriction necessary prevent adding tag encloses empty string.
in pattern portion
(?:(?!\\([a-z]+\\)).)+
- note every character, check whether part of pattern\\([a-z]+\\)
before matching(?!\\([a-z]+\\)).
.
(?s)
flag cause.
match character including new line. allow tag enclose text spans multiple lines.
Comments
Post a Comment