Let's suppose, for the sake of simplicity, we have the following string:
"John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary."
Let's suppose I wanna use regex to change the characters of that story.
John -> Joseph
Mary -> Jessica
Jake -> Keith
Of course I can change which one of those, one at a time.
But I'd like to know if it's possible to change all of them with just a single regex replacement, like a "multiple replacement" or "conditional replacement".
Something like:
regex: (?:(?<name1>John)|(?<name2>Mary)|(?<name3>Jake))
replacement: (?(name1)Joseph|(?(name2)Jessica|(?(name3)Keith)))
This is just a simple example.
In my application, I have to perform around 20 replacements for each string, which impacts the performance of the application.
The regex flavor I'm using is PCRE.
The application is being coded using C++ with Qt framework.
So you're using the so-called PCRE flavor. Good, except this doesn't say exactly which library you're using. Let's review a few options here, as a couple different libraries claim to be Perl-compatible.
That's the simplest solution. boost::regex
supports exactly what you're asking for through its Boost-Extended Format String Syntax.
So you can replace the pattern:
(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)
With the replacement string:
(?{name1}Joseph:(?{name2}Jessica:Keith))
And sure, it works. You can test it in Notepad++, but here's some sample code:
#include <string>
#include <iostream>
#include <boost/regex.hpp>
int main(int argc, char **argv) {
std::string subject("John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary.");
const char* replacement = "(?{name1}Joseph:(?{name2}Jessica:Keith))";
boost::regex re("(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)", boost::match_perl);
std::string result = boost::regex_replace(subject, re, replacement, boost::format_all);
std::cout << result << std::endl;
return 0;
}
PCRE catched up with Boost and introduced a richer substitution syntax through the PCRE2_SUBSTITUTE_EXTENDED
. As of this post (v10.20), this code isn't released yet, but it's available in the source repository (revision 381), so if you need this solution now, you'll have to build PCRE2 from source.
The pattern is the same but the replacement string has a different syntax:
${name1:+Joseph:${name2:+Jessica:Keith}}
Here's some sample C code:
#include <stdio.h>
#include <string.h>
#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>
int main(int argc, char **argv) {
int error;
PCRE2_SIZE erroffset;
const PCRE2_SPTR pattern = (PCRE2_SPTR)"(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)";
const PCRE2_SPTR subject = (PCRE2_SPTR)"John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary.";
const PCRE2_SPTR replacement = (PCRE2_SPTR)"${name1:+Joseph:${name2:+Jessica:Keith}}";
pcre2_code *re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0, &error, &erroffset, 0);
if (re == 0)
return 1;
pcre2_jit_compile(re, PCRE2_JIT_COMPLETE);
PCRE2_UCHAR output[1024] = "";
PCRE2_SIZE outlen = sizeof(output) / sizeof(PCRE2_UCHAR);
int rc = pcre2_substitute(re, subject, PCRE2_ZERO_TERMINATED, 0, PCRE2_SUBSTITUTE_GLOBAL | PCRE2_SUBSTITUTE_EXTENDED, 0, 0, replacement, PCRE2_ZERO_TERMINATED, output, &outlen);
if (rc >= 0)
printf("%s\n", output);
pcre2_code_free(re);
return 0;
}
With PCRE (<v10), you're out of luck. It lacks a substitution function, this is left for the developer.
...which means if that's the library you're using, you'll have full control over the substitution process anyway. You could use a pattern such as:
John(*MARK:1)|Mary(*MARK:2)|Jake(*MARK:3)
And then, substitute by discriminating on the last encountered MARK
.
Qt's QRegularExpression
class encapsulates the PCRE library (not PCRE2), but it doesn't seem to expose all of the PCRE features.
Anyway, the QString::replace
overload which accepts a QRegularExpression
doesn't look like it's fully featured:
QString & QString::replace(const QRegularExpression & re, const QString & after)
So you're on your own here.
Hey, maybe for such a simple replacement, a regular expression is overkill... If you have a performance issue, you should try to implement these replacements by hand - a carefully crafted algorithm should be faster than a regex solution. Just make sure to profile your code and see where the culprit is.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments