Class: RE2::Regexp

Inherits:
Object show all
Defined in:
ext/re2/re2.cc,
lib/re2/regexp.rb

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pattern) ⇒ RE2::Regexp #initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

  • #initialize(pattern) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.

    Parameters:

    • pattern (String)

      the pattern to compile

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

  • #initialize(pattern, options) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.

    Parameters:

    • pattern (String)

      the pattern to compile

    • options (Hash)

      the options with which to compile the pattern

    Options Hash (options):

    • :utf8 (Boolean) — default: true

      text and pattern are UTF-8; otherwise Latin-1

    • :posix_syntax (Boolean) — default: false

      restrict regexps to POSIX egrep syntax

    • :longest_match (Boolean) — default: false

      search for longest match, not first match

    • :log_errors (Boolean) — default: true

      log syntax and execution errors to ERROR

    • :max_mem (Integer)

      approx. max memory footprint of RE2

    • :literal (Boolean) — default: false

      interpret string as literal, not regexp

    • :never_nl (Boolean) — default: false

      never match \n, even if it is in regexp

    • :case_sensitive (Boolean) — default: true

      match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

    • :perl_classes (Boolean) — default: false

      allow Perl's \d \s \w \D \S \W when in posix_syntax mode

    • :word_boundary (Boolean) — default: false

      allow \b \B (word boundary and not) when in posix_syntax mode

    • :one_line (Boolean) — default: false

      ^ and $ only match beginning and end of text when in posix_syntax mode

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern



905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
# File 'ext/re2/re2.cc', line 905

static VALUE re2_regexp_initialize(int argc, VALUE *argv, VALUE self) {
  VALUE pattern, options;
  re2_pattern *p;

  rb_scan_args(argc, argv, "11", &pattern, &options);

  /* Ensure pattern is a string. */
  StringValue(pattern);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (RTEST(options)) {
    RE2::Options re2_options;
    parse_re2_options(&re2_options, options);

    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)), re2_options);
  } else {
    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)));
  }

  if (p->pattern == 0) {
    rb_raise(rb_eNoMemError, "not enough memory to allocate RE2 object");
  }

  return self;
}

Class Method Details

.initialize(pattern) ⇒ RE2::Regexp .initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

  • .initialize(pattern) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.

    Parameters:

    • pattern (String)

      the pattern to compile

    Returns:

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

  • .initialize(pattern, options) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.

    Parameters:

    • pattern (String)

      the pattern to compile

    • options (Hash)

      the options with which to compile the pattern

    Options Hash (options):

    • :utf8 (Boolean) — default: true

      text and pattern are UTF-8; otherwise Latin-1

    • :posix_syntax (Boolean) — default: false

      restrict regexps to POSIX egrep syntax

    • :longest_match (Boolean) — default: false

      search for longest match, not first match

    • :log_errors (Boolean) — default: true

      log syntax and execution errors to ERROR

    • :max_mem (Integer)

      approx. max memory footprint of RE2

    • :literal (Boolean) — default: false

      interpret string as literal, not regexp

    • :never_nl (Boolean) — default: false

      never match \n, even if it is in regexp

    • :case_sensitive (Boolean) — default: true

      match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

    • :perl_classes (Boolean) — default: false

      allow Perl's \d \s \w \D \S \W when in posix_syntax mode

    • :word_boundary (Boolean) — default: false

      allow \b \B (word boundary and not) when in posix_syntax mode

    • :one_line (Boolean) — default: false

      ^ and $ only match beginning and end of text when in posix_syntax mode

    Returns:

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

.escape(unquoted) ⇒ String

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2::Regexp.escape("1.5-2.0?") #=> "1\.5\-2\.0\?"

Parameters:

  • unquoted (String)

    the unquoted string

Returns:

  • (String)

    the escaped string

Raises:

  • (TypeError)

    if the given unquoted string cannot be coerced to a String



1780
1781
1782
1783
1784
1785
1786
1787
# File 'ext/re2/re2.cc', line 1780

static VALUE re2_QuoteMeta(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

.match_has_endpos_argument?Boolean

Returns whether the underlying RE2 version supports passing an endpos argument to Match. If not, #match will raise an error if attempting to pass an endpos.

Returns:

  • (Boolean)

    whether the underlying Match has an endpos argument



1653
1654
1655
1656
1657
1658
1659
# File 'ext/re2/re2.cc', line 1653

static VALUE re2_regexp_match_has_endpos_argument_p(VALUE) {
#ifdef HAVE_ENDPOS_ARGUMENT
  return Qtrue;
#else
  return Qfalse;
#endif
}

.quote(unquoted) ⇒ String

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2::Regexp.escape("1.5-2.0?") #=> "1\.5\-2\.0\?"

Parameters:

  • unquoted (String)

    the unquoted string

Returns:

  • (String)

    the escaped string

Raises:

  • (TypeError)

    if the given unquoted string cannot be coerced to a String



1780
1781
1782
1783
1784
1785
1786
1787
# File 'ext/re2/re2.cc', line 1780

static VALUE re2_QuoteMeta(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

Instance Method Details

#===(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
# File 'ext/re2/re2.cc', line 1567

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#=~(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
# File 'ext/re2/re2.cc', line 1567

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#case_insensitive?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

  • (Boolean)

    the inverse of the case_sensitive option



1133
1134
1135
# File 'ext/re2/re2.cc', line 1133

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#case_sensitive?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_sensitive? #=> true

Returns:

  • (Boolean)

    the case_sensitive option



1116
1117
1118
1119
1120
1121
# File 'ext/re2/re2.cc', line 1116

static VALUE re2_regexp_case_sensitive(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().case_sensitive());
}

#casefold?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

  • (Boolean)

    the inverse of the case_sensitive option



1133
1134
1135
# File 'ext/re2/re2.cc', line 1133

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#errorString?

If the RE2::Regexp could not be created properly, returns an error string otherwise returns nil.

Returns:

  • (String, nil)

    the error string or nil



1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
# File 'ext/re2/re2.cc', line 1191

static VALUE re2_regexp_error(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return rb_str_new(p->pattern->error().data(), p->pattern->error().size());
  }
}

#error_argString?

If the RE2::Regexp could not be created properly, returns the offending portion of the regexp otherwise returns nil.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (String, nil)

    the offending portion of the regexp or nil



1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
# File 'ext/re2/re2.cc', line 1212

static VALUE re2_regexp_error_arg(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return encoded_str_new(p->pattern->error_arg().data(),
        p->pattern->error_arg().size(),
        p->pattern->options().encoding());
  }
}

#full_match(text, options = {}) ⇒ RE2::MatchData, ...

Match the pattern against the given text exactly and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.full_match('woo')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.full_match('woot')               #=> nil
r.full_match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.full_match('woo', submatches: 0) #=> true

Parameters:

  • text (String)

    the text to search

  • options (Hash) (defaults to: {})

    the options with which to perform the match

Options Hash (options):

  • :submatches (Integer)

    how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

  • (RE2::MatchData, nil)

    if extracting any submatches

  • (Boolean)

    if not extracting any submatches

Raises:

  • (ArgumentError)

    if given a negative number of submatches

  • (NoMemoryError)

    if there was not enough memory to allocate the matches

  • (TypeError)

    if given non-numeric submatches or non-hash options



68
69
70
# File 'lib/re2/regexp.rb', line 68

def full_match(text, options = {})
  match(text, Hash(options).merge(anchor: :anchor_both))
end

#full_match?(text) ⇒ Boolean

Returns true if the pattern matches the given text using FullMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
# File 'ext/re2/re2.cc', line 1587

static VALUE re2_regexp_full_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::FullMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#inspectString

Returns a printable version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.inspect #=> "#<RE2::Regexp /woo?/>"

Returns:

  • (String)

    a printable version of the regular expression



947
948
949
950
951
952
953
954
955
956
957
958
# File 'ext/re2/re2.cc', line 947

static VALUE re2_regexp_inspect(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  std::ostringstream output;

  output << "#<RE2::Regexp /" << p->pattern->pattern() << "/>";

  return encoded_str_new(output.str().data(), output.str().length(),
      p->pattern->options().encoding());
}

#literal?Boolean

Returns whether or not the regular expression was compiled with the literal option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", literal: true)
re2.literal? #=> true

Returns:

  • (Boolean)

    the literal option



1084
1085
1086
1087
1088
1089
# File 'ext/re2/re2.cc', line 1084

static VALUE re2_regexp_literal(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().literal());
}

#log_errors?Boolean

Returns whether or not the regular expression was compiled with the log_errors option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", log_errors: true)
re2.log_errors? #=> true

Returns:

  • (Boolean)

    the log_errors option



1053
1054
1055
1056
1057
1058
# File 'ext/re2/re2.cc', line 1053

static VALUE re2_regexp_log_errors(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().log_errors());
}

#longest_match?Boolean

Returns whether or not the regular expression was compiled with the longest_match option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", longest_match: true)
re2.longest_match? #=> true

Returns:

  • (Boolean)

    the longest_match option



1037
1038
1039
1040
1041
1042
# File 'ext/re2/re2.cc', line 1037

static VALUE re2_regexp_longest_match(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().longest_match());
}

#match(text) ⇒ RE2::MatchData, ... #match(text, options) ⇒ RE2::MatchData, ... #match(text, submatches) ⇒ RE2::MatchData, ...

General matching: match the pattern against the given text using Match and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Overloads:

  • #match(text) ⇒ RE2::MatchData, ...

    Returns a MatchData containing the matching pattern and all submatches resulting from looking for the regexp in text if the pattern contains capturing groups.

    Returns either true or false indicating whether a successful match was made if the pattern contains no capturing groups.

    Examples:

    Matching with capturing groups

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo') #=> #<RE2::MatchData "woo" 1:"o" 2:"o">

    Matching without capturing groups

    r = RE2::Regexp.new('woo')
    r.match('woo') #=> true

    Parameters:

    • text (String)

      the text to search

    Returns:

    • (RE2::MatchData, nil)

      if the pattern contains capturing groups

    • (Boolean)

      if the pattern does not contain capturing groups

    Raises:

    • (NoMemoryError)

      if there was not enough memory to allocate the submatches

    • (TypeError)

      if given text that cannot be coerced to a String

  • #match(text, options) ⇒ RE2::MatchData, ...

    See match(text) but with customisable offsets for starting and ending matches, optional anchoring to the start or both ends of the text and a specific number of submatches to extract (padded with nils if necessary).

    Examples:

    Matching with capturing groups

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
    r.match('woo', submatches: 3) #=> #<RE2::MatchData "woo" 1:"o" 2:"o" 3:nil>
    r.match('woot', anchor: :anchor_both, submatches: 0)
    #=> false
    r.match('woot', anchor: :anchor_start, submatches: 0)
    #=> true

    Matching without capturing groups

    r = RE2::Regexp.new('wo+')
    r.match('woot', anchor: :anchor_both)  #=> false
    r.match('woot', anchor: :anchor_start) #=> true

    Parameters:

    • text (String)

      the text to search

    • options (Hash)

      the options with which to perform the match

    Options Hash (options):

    • :startpos (Integer) — default: 0

      offset at which to start matching

    • :endpos (Integer)

      offset at which to stop matching, defaults to the text length

    • :anchor (Symbol) — default: :unanchored

      one of :unanchored, :anchor_start, :anchor_both to anchor the match

    • :submatches (Integer)

      how many submatches to extract (0 is fastest), defaults to the number of capturing groups

    Returns:

    • (RE2::MatchData, nil)

      if extracting any submatches

    • (Boolean)

      if not extracting any submatches

    Raises:

    • (ArgumentError)

      if given a negative number of submatches, invalid anchor or invalid startpos, endpos pair

    • (NoMemoryError)

      if there was not enough memory to allocate the matches

    • (TypeError)

      if given non-String text, non-numeric number of submatches, non-symbol anchor or non-hash options

    • (RE2::Regexp::UnsupportedError)

      if given an endpos argument on a version of RE2 that does not support it

  • #match(text, submatches) ⇒ RE2::MatchData, ...
    Deprecated.

    Legacy syntax for matching against text with a specific number of submatches to extract. Use match(text, submatches: n) instead.

    Examples:

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo', 0) #=> true
    r.match('woo', 1) #=> #<RE2::MatchData "woo" 1:"o">
    r.match('woo', 2) #=> #<RE2::MatchData "woo" 1:"o" 2:"o">

    Parameters:

    • text (String)

      the text to search

    • submatches (Integer)

      the number of submatches to extract

    Returns:

    • (RE2::MatchData, nil)

      if extracting any submatches

    • (Boolean)

      if not extracting any submatches

    Raises:

    • (NoMemoryError)

      if there was not enough memory to allocate the submatches

    • (TypeError)

      if given non-numeric number of submatches



1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
# File 'ext/re2/re2.cc', line 1411

static VALUE re2_regexp_match(int argc, VALUE *argv, const VALUE self) {
  re2_pattern *p;
  re2_matchdata *m;
  VALUE text, options;

  rb_scan_args(argc, argv, "11", &text, &options);

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  int n;
  int startpos = 0;
  int endpos = RSTRING_LEN(text);
  RE2::Anchor anchor = RE2::UNANCHORED;

  if (RTEST(options)) {
    if (RB_INTEGER_TYPE_P(options)) {
      n = NUM2INT(options);

      if (n < 0) {
        rb_raise(rb_eArgError, "number of matches should be >= 0");
      }
    } else {
      if (TYPE(options) != T_HASH) {
        options = rb_Hash(options);
      }

      VALUE endpos_option = rb_hash_aref(options, ID2SYM(id_endpos));
      if (!NIL_P(endpos_option)) {
#ifdef HAVE_ENDPOS_ARGUMENT
        endpos = NUM2INT(endpos_option);

        if (endpos < 0) {
          rb_raise(rb_eArgError, "endpos should be >= 0");
        }
#else
        rb_raise(re2_eRegexpUnsupportedError, "current version of RE2::Match() does not support endpos argument");
#endif
      }

      VALUE anchor_option = rb_hash_aref(options, ID2SYM(id_anchor));
      if (!NIL_P(anchor_option)) {
        Check_Type(anchor_option, T_SYMBOL);

        ID id_anchor_option = SYM2ID(anchor_option);
        if (id_anchor_option == id_unanchored) {
          anchor = RE2::UNANCHORED;
        } else if (id_anchor_option == id_anchor_start) {
          anchor = RE2::ANCHOR_START;
        } else if (id_anchor_option == id_anchor_both) {
          anchor = RE2::ANCHOR_BOTH;
        } else {
          rb_raise(rb_eArgError, "anchor should be one of: :unanchored, :anchor_start, :anchor_both");
        }
      }

      VALUE submatches_option = rb_hash_aref(options, ID2SYM(id_submatches));
      if (!NIL_P(submatches_option)) {
        n = NUM2INT(submatches_option);

        if (n < 0) {
          rb_raise(rb_eArgError, "number of matches should be >= 0");
        }
      } else {
        if (!p->pattern->ok()) {
          return Qnil;
        }

        n = p->pattern->NumberOfCapturingGroups();
      }

      VALUE startpos_option = rb_hash_aref(options, ID2SYM(id_startpos));
      if (!NIL_P(startpos_option)) {
        startpos = NUM2INT(startpos_option);

        if (startpos < 0) {
          rb_raise(rb_eArgError, "startpos should be >= 0");
        }
      }
    }
  } else {
    if (!p->pattern->ok()) {
      return Qnil;
    }

    n = p->pattern->NumberOfCapturingGroups();
  }

  if (startpos > endpos) {
    rb_raise(rb_eArgError, "startpos should be <= endpos");
  }

  if (n == 0) {
#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, endpos, anchor, 0, 0);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, anchor, 0, 0);
#endif
    return BOOL2RUBY(matched);
  } else {
    if (n == INT_MAX) {
      rb_raise(rb_eRangeError, "number of matches should be < %d", INT_MAX);
    }

    /* Because match returns the whole match as well. */
    n += 1;

    re2::StringPiece *matches = new(std::nothrow) re2::StringPiece[n];
    if (matches == 0) {
      rb_raise(rb_eNoMemError,
               "not enough memory to allocate StringPieces for matches");
    }

    text = rb_str_new_frozen(text);

#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, endpos, anchor, matches, n);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, anchor, matches, n);
#endif
    if (matched) {
      VALUE matchdata = rb_class_new_instance(0, 0, re2_cMatchData);
      TypedData_Get_Struct(matchdata, re2_matchdata, &re2_matchdata_data_type, m);

      RB_OBJ_WRITE(matchdata, &m->regexp, self);
      RB_OBJ_WRITE(matchdata, &m->text, text);
      m->matches = matches;
      m->number_of_matches = n;

      return matchdata;
    } else {
      delete[] matches;

      return Qnil;
    }
  }
}

#match?(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
# File 'ext/re2/re2.cc', line 1567

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#max_memInteger

Returns the max_mem setting for the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?", max_mem: 1024)
re2.max_mem #=> 1024

Returns:

  • (Integer)

    the max_mem option



1068
1069
1070
1071
1072
1073
# File 'ext/re2/re2.cc', line 1068

static VALUE re2_regexp_max_mem(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->options().max_mem());
}

#named_capturing_groupsHash

Returns a hash of names to capturing indices of groups.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (Hash)

    a hash of names to capturing indices



1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
# File 'ext/re2/re2.cc', line 1312

static VALUE re2_regexp_named_capturing_groups(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  const std::map<std::string, int>& groups = p->pattern->NamedCapturingGroups();
  VALUE capturing_groups = rb_hash_new();

  for (std::map<std::string, int>::const_iterator it = groups.begin(); it != groups.end(); ++it) {
    rb_hash_aset(capturing_groups,
        encoded_str_new(it->first.data(), it->first.size(),
          p->pattern->options().encoding()),
        INT2FIX(it->second));
  }

  return capturing_groups;
}

#never_nl?Boolean

Returns whether or not the regular expression was compiled with the never_nl option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", never_nl: true)
re2.never_nl? #=> true

Returns:

  • (Boolean)

    the never_nl option



1100
1101
1102
1103
1104
1105
# File 'ext/re2/re2.cc', line 1100

static VALUE re2_regexp_never_nl(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().never_nl());
}

#number_of_capturing_groupsInteger

Returns the number of capturing subpatterns, or -1 if the regexp wasn't valid on construction. The overall match ($0) does not count: if the regexp is "(a)(b)", returns 2.

Returns:

  • (Integer)

    the number of capturing subpatterns



1296
1297
1298
1299
1300
1301
# File 'ext/re2/re2.cc', line 1296

static VALUE re2_regexp_number_of_capturing_groups(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->NumberOfCapturingGroups());
}

#ok?Boolean

Returns whether or not the regular expression was compiled successfully.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.ok? #=> true

Returns:

  • (Boolean)

    whether or not compilation was successful



989
990
991
992
993
994
# File 'ext/re2/re2.cc', line 989

static VALUE re2_regexp_ok(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->ok());
}

#one_line?Boolean

Returns whether or not the regular expression was compiled with the one_line option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", one_line: true)
re2.one_line? #=> true

Returns:

  • (Boolean)

    the one_line option



1178
1179
1180
1181
1182
1183
# File 'ext/re2/re2.cc', line 1178

static VALUE re2_regexp_one_line(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().one_line());
}

#optionsHash

Returns a hash of the options currently set for the RE2::Regexp.

Returns:

  • (Hash)

    the options



1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
# File 'ext/re2/re2.cc', line 1244

static VALUE re2_regexp_options(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  VALUE options = rb_hash_new();

  rb_hash_aset(options, ID2SYM(id_utf8),
      BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8));

  rb_hash_aset(options, ID2SYM(id_posix_syntax),
      BOOL2RUBY(p->pattern->options().posix_syntax()));

  rb_hash_aset(options, ID2SYM(id_longest_match),
      BOOL2RUBY(p->pattern->options().longest_match()));

  rb_hash_aset(options, ID2SYM(id_log_errors),
      BOOL2RUBY(p->pattern->options().log_errors()));

  rb_hash_aset(options, ID2SYM(id_max_mem),
      INT2FIX(p->pattern->options().max_mem()));

  rb_hash_aset(options, ID2SYM(id_literal),
      BOOL2RUBY(p->pattern->options().literal()));

  rb_hash_aset(options, ID2SYM(id_never_nl),
      BOOL2RUBY(p->pattern->options().never_nl()));

  rb_hash_aset(options, ID2SYM(id_case_sensitive),
      BOOL2RUBY(p->pattern->options().case_sensitive()));

  rb_hash_aset(options, ID2SYM(id_perl_classes),
      BOOL2RUBY(p->pattern->options().perl_classes()));

  rb_hash_aset(options, ID2SYM(id_word_boundary),
      BOOL2RUBY(p->pattern->options().word_boundary()));

  rb_hash_aset(options, ID2SYM(id_one_line),
      BOOL2RUBY(p->pattern->options().one_line()));

  /* This is a read-only hash after all... */
  rb_obj_freeze(options);

  return options;
}

#partial_match(text, options = {}) ⇒ RE2::MatchData, ...

Match the pattern against any substring of the given text and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.partial_match('woot')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.partial_match('nope')                #=> nil
r.partial_match('woot', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.partial_match('woot', submatches: 0) #=> true

Parameters:

  • text (String)

    the text to search

  • options (Hash) (defaults to: {})

    the options with which to perform the match

Options Hash (options):

  • :submatches (Integer)

    how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

  • (RE2::MatchData, nil)

    if extracting any submatches

  • (Boolean)

    if not extracting any submatches

Raises:

  • (ArgumentError)

    if given a negative number of submatches

  • (NoMemoryError)

    if there was not enough memory to allocate the matches

  • (TypeError)

    if given non-numeric submatches or non-hash options



39
40
41
# File 'lib/re2/regexp.rb', line 39

def partial_match(text, options = {})
  match(text, Hash(options).merge(anchor: :unanchored))
end

#partial_match?(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
# File 'ext/re2/re2.cc', line 1567

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#patternString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



972
973
974
975
976
977
978
979
# File 'ext/re2/re2.cc', line 972

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#perl_classes?Boolean

Returns whether or not the regular expression was compiled with the perl_classes option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", perl_classes: true)
re2.perl_classes? #=> true

Returns:

  • (Boolean)

    the perl_classes option



1146
1147
1148
1149
1150
1151
# File 'ext/re2/re2.cc', line 1146

static VALUE re2_regexp_perl_classes(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().perl_classes());
}

#posix_syntax?Boolean

Returns whether or not the regular expression was compiled with the posix_syntax option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", posix_syntax: true)
re2.posix_syntax? #=> true

Returns:

  • (Boolean)

    the posix_syntax option



1021
1022
1023
1024
1025
1026
# File 'ext/re2/re2.cc', line 1021

static VALUE re2_regexp_posix_syntax(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().posix_syntax());
}

#program_sizeInteger

Returns the program size, a very approximate measure of a regexp's "cost". Larger numbers are more expensive than smaller numbers.

Returns:

  • (Integer)

    the regexp "cost"



1232
1233
1234
1235
1236
1237
# File 'ext/re2/re2.cc', line 1232

static VALUE re2_regexp_program_size(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->ProgramSize());
}

#scan(text) ⇒ RE2::Scanner

Returns a Scanner for scanning the given text incrementally with FindAndConsume.

Examples:

c = RE2::Regexp.new('(\w+)').scan("Foo bar baz")
#=> #<RE2::Scanner:0x0000000000000001>

Parameters:

  • text (text)

    the text to scan incrementally

Returns:

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
# File 'ext/re2/re2.cc', line 1611

static VALUE re2_regexp_scan(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p;
  re2_scanner *c;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  VALUE scanner = rb_class_new_instance(0, 0, re2_cScanner);
  TypedData_Get_Struct(scanner, re2_scanner, &re2_scanner_data_type, c);

  RB_OBJ_WRITE(scanner, &c->regexp, self);
  RB_OBJ_WRITE(scanner, &c->text, rb_str_new_frozen(text));
  c->input = new(std::nothrow) re2::StringPiece(
      RSTRING_PTR(c->text), RSTRING_LEN(c->text));
  if (c->input == 0) {
    rb_raise(rb_eNoMemError,
             "not enough memory to allocate StringPiece for input");
  }

  if (p->pattern->ok()) {
    c->number_of_capturing_groups = p->pattern->NumberOfCapturingGroups();
  } else {
    c->number_of_capturing_groups = 0;
  }

  c->eof = false;

  return scanner;
}

#sourceString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



972
973
974
975
976
977
978
979
# File 'ext/re2/re2.cc', line 972

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_sString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



972
973
974
975
976
977
978
979
# File 'ext/re2/re2.cc', line 972

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_strString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



972
973
974
975
976
977
978
979
# File 'ext/re2/re2.cc', line 972

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#utf8?Boolean

Returns whether or not the regular expression was compiled with the utf8 option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", utf8: true)
re2.utf8? #=> true

Returns:

  • (Boolean)

    the utf8 option



1005
1006
1007
1008
1009
1010
# File 'ext/re2/re2.cc', line 1005

static VALUE re2_regexp_utf8(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8);
}

#word_boundary?Boolean

Returns whether or not the regular expression was compiled with the word_boundary option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", word_boundary: true)
re2.word_boundary? #=> true

Returns:

  • (Boolean)

    the word_boundary option



1162
1163
1164
1165
1166
1167
# File 'ext/re2/re2.cc', line 1162

static VALUE re2_regexp_word_boundary(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().word_boundary());
}