Multiple replacement with just one regex

Rickforce

Let's suppose, for the sake of simplicity, we have the following string:

"John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary."

Let's suppose I wanna use regex to change the characters of that story.

John -> Joseph

Mary -> Jessica

Jake -> Keith

Of course I can change which one of those, one at a time.

But I'd like to know if it's possible to change all of them with just a single regex replacement, like a "multiple replacement" or "conditional replacement".

Something like:

regex: (?:(?<name1>John)|(?<name2>Mary)|(?<name3>Jake))

replacement: (?(name1)Joseph|(?(name2)Jessica|(?(name3)Keith)))

This is just a simple example.

In my application, I have to perform around 20 replacements for each string, which impacts the performance of the application.

The regex flavor I'm using is PCRE.

The application is being coded using C++ with Qt framework.

Lucas Trzesniewski

So you're using the so-called PCRE flavor. Good, except this doesn't say exactly which library you're using. Let's review a few options here, as a couple different libraries claim to be Perl-compatible.

Boost

That's the simplest solution. boost::regex supports exactly what you're asking for through its Boost-Extended Format String Syntax.

So you can replace the pattern:

(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)

With the replacement string:

(?{name1}Joseph:(?{name2}Jessica:Keith))

And sure, it works. You can test it in Notepad++, but here's some sample code:

#include <string>
#include <iostream>
#include <boost/regex.hpp>

int main(int argc, char **argv) {
    std::string subject("John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary.");
    const char* replacement = "(?{name1}Joseph:(?{name2}Jessica:Keith))";

    boost::regex re("(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)", boost::match_perl);

    std::string result = boost::regex_replace(subject, re, replacement, boost::format_all);
    std::cout << result << std::endl;

    return 0;
}

PCRE2

PCRE catched up with Boost and introduced a richer substitution syntax through the PCRE2_SUBSTITUTE_EXTENDED. As of this post (v10.20), this code isn't released yet, but it's available in the source repository (revision 381), so if you need this solution now, you'll have to build PCRE2 from source.

The pattern is the same but the replacement string has a different syntax:

${name1:+Joseph:${name2:+Jessica:Keith}}

Here's some sample C code:

#include <stdio.h>
#include <string.h>

#define PCRE2_CODE_UNIT_WIDTH 8
#include <pcre2.h>

int main(int argc, char **argv) {
    int error;
    PCRE2_SIZE erroffset;

    const PCRE2_SPTR pattern = (PCRE2_SPTR)"(?<name1>John)|(?<name2>Mary)|(?<name3>Jake)";
    const PCRE2_SPTR subject = (PCRE2_SPTR)"John loves Mary, Mary loves Jake and Jake doesn't care about John and Mary.";
    const PCRE2_SPTR replacement = (PCRE2_SPTR)"${name1:+Joseph:${name2:+Jessica:Keith}}";

    pcre2_code *re = pcre2_compile(pattern, PCRE2_ZERO_TERMINATED, 0, &error, &erroffset, 0);
    if (re == 0)
        return 1;

    pcre2_jit_compile(re, PCRE2_JIT_COMPLETE);

    PCRE2_UCHAR output[1024] = "";
    PCRE2_SIZE outlen = sizeof(output) / sizeof(PCRE2_UCHAR);

    int rc = pcre2_substitute(re, subject, PCRE2_ZERO_TERMINATED, 0, PCRE2_SUBSTITUTE_GLOBAL | PCRE2_SUBSTITUTE_EXTENDED, 0, 0, replacement, PCRE2_ZERO_TERMINATED, output, &outlen);
    if (rc >= 0)
        printf("%s\n", output);

    pcre2_code_free(re);
    return 0;
}

PCRE

With PCRE (<v10), you're out of luck. It lacks a substitution function, this is left for the developer.

...which means if that's the library you're using, you'll have full control over the substitution process anyway. You could use a pattern such as:

John(*MARK:1)|Mary(*MARK:2)|Jake(*MARK:3)

And then, substitute by discriminating on the last encountered MARK.

Qt

Qt's QRegularExpression class encapsulates the PCRE library (not PCRE2), but it doesn't seem to expose all of the PCRE features.

Anyway, the QString::replace overload which accepts a QRegularExpression doesn't look like it's fully featured:

QString & QString::replace(const QRegularExpression & re, const QString & after)

So you're on your own here.

My 2 cents

Hey, maybe for such a simple replacement, a regular expression is overkill... If you have a performance issue, you should try to implement these replacements by hand - a carefully crafted algorithm should be faster than a regex solution. Just make sure to profile your code and see where the culprit is.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Regex multiple pattern replacement

From Dev

Regex multiple pattern replacement

From Dev

Regex replacement in Javascript with multiple matches

From Dev

How NOT to match in just one regex

From Dev

Regex replacement one a single line with a repetitive pattern

From Dev

Replace multiple Regex Matches each with a different replacement

From Dev

Psql lookbehind regex for multiple value replacement

From Dev

Replacing multiple spaces into just one?

From Dev

Multiple await operations or just one

From Dev

Multiple inputs with just one Setter

From Dev

Regex for replacement

From Dev

Multiple servlets, or just one main controller

From Dev

mysql/php loop with multiple forms or just one?

From Dev

Multiple publishers in a session, how subscribe to just one?

From Dev

notepad++ regex to replace multiple values with different replacement

From Dev

Execute multiple regex one after another or combine multiple regex into one

From Dev

Regex Multiple elements one string

From Dev

Regex for multiple phrases in one phrase

From Dev

Multiple Patterns and substitutions with one regex

From Dev

Ubuntu One Sync for multiple folders in Windows, not just the Ubuntu One folder

From Dev

Ubuntu One Sync for multiple folders in Windows, not just the Ubuntu One folder

From Dev

regex pattern in java crashes with just one specific text

From Dev

Regex doesn't validate for just one single character

From Dev

How to match any of these words, but just the last one in the list with .NET Regex

From Dev

regex to return all values not just first found one

From Dev

Regex or DOM for splitting an html string with just one element level dept

From Dev

Match just one word among specific others with regex

From Dev

JavaScript Regex Replacement with Reference

From Dev

Java Regex Replacement Issue

Related Related

  1. 1

    Regex multiple pattern replacement

  2. 2

    Regex multiple pattern replacement

  3. 3

    Regex replacement in Javascript with multiple matches

  4. 4

    How NOT to match in just one regex

  5. 5

    Regex replacement one a single line with a repetitive pattern

  6. 6

    Replace multiple Regex Matches each with a different replacement

  7. 7

    Psql lookbehind regex for multiple value replacement

  8. 8

    Replacing multiple spaces into just one?

  9. 9

    Multiple await operations or just one

  10. 10

    Multiple inputs with just one Setter

  11. 11

    Regex for replacement

  12. 12

    Multiple servlets, or just one main controller

  13. 13

    mysql/php loop with multiple forms or just one?

  14. 14

    Multiple publishers in a session, how subscribe to just one?

  15. 15

    notepad++ regex to replace multiple values with different replacement

  16. 16

    Execute multiple regex one after another or combine multiple regex into one

  17. 17

    Regex Multiple elements one string

  18. 18

    Regex for multiple phrases in one phrase

  19. 19

    Multiple Patterns and substitutions with one regex

  20. 20

    Ubuntu One Sync for multiple folders in Windows, not just the Ubuntu One folder

  21. 21

    Ubuntu One Sync for multiple folders in Windows, not just the Ubuntu One folder

  22. 22

    regex pattern in java crashes with just one specific text

  23. 23

    Regex doesn't validate for just one single character

  24. 24

    How to match any of these words, but just the last one in the list with .NET Regex

  25. 25

    regex to return all values not just first found one

  26. 26

    Regex or DOM for splitting an html string with just one element level dept

  27. 27

    Match just one word among specific others with regex

  28. 28

    JavaScript Regex Replacement with Reference

  29. 29

    Java Regex Replacement Issue

HotTag

Archive