Class: RE2::Regexp

Inherits:
Object show all
Defined in:
ext/re2/re2.cc,
lib/re2/regexp.rb

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pattern) ⇒ RE2::Regexp #initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

  • #initialize(pattern) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.

    Parameters:

    • pattern (String)

      the pattern to compile

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

  • #initialize(pattern, options) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.

    Parameters:

    • pattern (String)

      the pattern to compile

    • options (Hash)

      the options with which to compile the pattern

    Options Hash (options):

    • :utf8 (Boolean) — default: true

      text and pattern are UTF-8; otherwise Latin-1

    • :posix_syntax (Boolean) — default: false

      restrict regexps to POSIX egrep syntax

    • :longest_match (Boolean) — default: false

      search for longest match, not first match

    • :log_errors (Boolean) — default: true

      log syntax and execution errors to ERROR

    • :max_mem (Integer)

      approx. max memory footprint of RE2

    • :literal (Boolean) — default: false

      interpret string as literal, not regexp

    • :never_nl (Boolean) — default: false

      never match \n, even if it is in regexp

    • :case_sensitive (Boolean) — default: true

      match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

    • :perl_classes (Boolean) — default: false

      allow Perl's \d \s \w \D \S \W when in posix_syntax mode

    • :word_boundary (Boolean) — default: false

      allow \b \B (word boundary and not) when in posix_syntax mode

    • :one_line (Boolean) — default: false

      ^ and $ only match beginning and end of text when in posix_syntax mode

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern



900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
# File 'ext/re2/re2.cc', line 900

static VALUE re2_regexp_initialize(int argc, VALUE *argv, VALUE self) {
  VALUE pattern, options;
  re2_pattern *p;

  rb_scan_args(argc, argv, "11", &pattern, &options);

  /* Ensure pattern is a string. */
  StringValue(pattern);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (RTEST(options)) {
    RE2::Options re2_options;
    parse_re2_options(&re2_options, options);

    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)), re2_options);
  } else {
    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)));
  }

  if (p->pattern == 0) {
    rb_raise(rb_eNoMemError, "not enough memory to allocate RE2 object");
  }

  return self;
}

Class Method Details

.initialize(pattern) ⇒ RE2::Regexp .initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

  • .initialize(pattern) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.

    Parameters:

    • pattern (String)

      the pattern to compile

    Returns:

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

  • .initialize(pattern, options) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.

    Parameters:

    • pattern (String)

      the pattern to compile

    • options (Hash)

      the options with which to compile the pattern

    Options Hash (options):

    • :utf8 (Boolean) — default: true

      text and pattern are UTF-8; otherwise Latin-1

    • :posix_syntax (Boolean) — default: false

      restrict regexps to POSIX egrep syntax

    • :longest_match (Boolean) — default: false

      search for longest match, not first match

    • :log_errors (Boolean) — default: true

      log syntax and execution errors to ERROR

    • :max_mem (Integer)

      approx. max memory footprint of RE2

    • :literal (Boolean) — default: false

      interpret string as literal, not regexp

    • :never_nl (Boolean) — default: false

      never match \n, even if it is in regexp

    • :case_sensitive (Boolean) — default: true

      match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

    • :perl_classes (Boolean) — default: false

      allow Perl's \d \s \w \D \S \W when in posix_syntax mode

    • :word_boundary (Boolean) — default: false

      allow \b \B (word boundary and not) when in posix_syntax mode

    • :one_line (Boolean) — default: false

      ^ and $ only match beginning and end of text when in posix_syntax mode

    Returns:

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

.escape(unquoted) ⇒ String

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2::Regexp.escape("1.5-2.0?") #=> "1\.5\-2\.0\?"

Parameters:

  • unquoted (String)

    the unquoted string

Returns:

  • (String)

    the escaped string

Raises:

  • (TypeError)

    if the given unquoted string cannot be coerced to a String



1771
1772
1773
1774
1775
1776
1777
1778
# File 'ext/re2/re2.cc', line 1771

static VALUE re2_QuoteMeta(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

.match_has_endpos_argument?Boolean

Returns whether the underlying RE2 version supports passing an endpos argument to Match. If not, #match will raise an error if attempting to pass an endpos.

Returns:

  • (Boolean)

    whether the underlying Match has an endpos argument



1644
1645
1646
1647
1648
1649
1650
# File 'ext/re2/re2.cc', line 1644

static VALUE re2_regexp_match_has_endpos_argument_p(VALUE) {
#ifdef HAVE_ENDPOS_ARGUMENT
  return Qtrue;
#else
  return Qfalse;
#endif
}

.quote(unquoted) ⇒ String

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2::Regexp.escape("1.5-2.0?") #=> "1\.5\-2\.0\?"

Parameters:

  • unquoted (String)

    the unquoted string

Returns:

  • (String)

    the escaped string

Raises:

  • (TypeError)

    if the given unquoted string cannot be coerced to a String



1771
1772
1773
1774
1775
1776
1777
1778
# File 'ext/re2/re2.cc', line 1771

static VALUE re2_QuoteMeta(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

Instance Method Details

#===(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
# File 'ext/re2/re2.cc', line 1562

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#=~(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
# File 'ext/re2/re2.cc', line 1562

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#case_insensitive?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

  • (Boolean)

    the inverse of the case_sensitive option



1128
1129
1130
# File 'ext/re2/re2.cc', line 1128

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#case_sensitive?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_sensitive? #=> true

Returns:

  • (Boolean)

    the case_sensitive option



1111
1112
1113
1114
1115
1116
# File 'ext/re2/re2.cc', line 1111

static VALUE re2_regexp_case_sensitive(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().case_sensitive());
}

#casefold?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

  • (Boolean)

    the inverse of the case_sensitive option



1128
1129
1130
# File 'ext/re2/re2.cc', line 1128

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#errorString?

If the RE2::Regexp could not be created properly, returns an error string otherwise returns nil.

Returns:

  • (String, nil)

    the error string or nil



1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
# File 'ext/re2/re2.cc', line 1186

static VALUE re2_regexp_error(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return rb_str_new(p->pattern->error().data(), p->pattern->error().size());
  }
}

#error_argString?

If the RE2::Regexp could not be created properly, returns the offending portion of the regexp otherwise returns nil.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (String, nil)

    the offending portion of the regexp or nil



1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
# File 'ext/re2/re2.cc', line 1207

static VALUE re2_regexp_error_arg(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return encoded_str_new(p->pattern->error_arg().data(),
        p->pattern->error_arg().size(),
        p->pattern->options().encoding());
  }
}

#full_match(text, options = {}) ⇒ RE2::MatchData, ...

Match the pattern against the given text exactly and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.full_match('woo')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.full_match('woot')               #=> nil
r.full_match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.full_match('woo', submatches: 0) #=> true

Parameters:

  • text (String)

    the text to search

  • options (Hash) (defaults to: {})

    the options with which to perform the match

Options Hash (options):

  • :submatches (Integer)

    how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

  • (RE2::MatchData, nil)

    if extracting any submatches

  • (Boolean)

    if not extracting any submatches

Raises:

  • (ArgumentError)

    if given a negative number of submatches

  • (NoMemoryError)

    if there was not enough memory to allocate the matches

  • (TypeError)

    if given non-numeric submatches or non-hash options



68
69
70
# File 'lib/re2/regexp.rb', line 68

def full_match(text, options = {})
  match(text, Hash(options).merge(anchor: :anchor_both))
end

#full_match?(text) ⇒ Boolean

Returns true if the pattern matches the given text using FullMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
# File 'ext/re2/re2.cc', line 1582

static VALUE re2_regexp_full_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::FullMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#inspectString

Returns a printable version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.inspect #=> "#<RE2::Regexp /woo?/>"

Returns:

  • (String)

    a printable version of the regular expression



942
943
944
945
946
947
948
949
950
951
952
953
# File 'ext/re2/re2.cc', line 942

static VALUE re2_regexp_inspect(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  std::ostringstream output;

  output << "#<RE2::Regexp /" << p->pattern->pattern() << "/>";

  return encoded_str_new(output.str().data(), output.str().length(),
      p->pattern->options().encoding());
}

#literal?Boolean

Returns whether or not the regular expression was compiled with the literal option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", literal: true)
re2.literal? #=> true

Returns:

  • (Boolean)

    the literal option



1079
1080
1081
1082
1083
1084
# File 'ext/re2/re2.cc', line 1079

static VALUE re2_regexp_literal(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().literal());
}

#log_errors?Boolean

Returns whether or not the regular expression was compiled with the log_errors option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", log_errors: true)
re2.log_errors? #=> true

Returns:

  • (Boolean)

    the log_errors option



1048
1049
1050
1051
1052
1053
# File 'ext/re2/re2.cc', line 1048

static VALUE re2_regexp_log_errors(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().log_errors());
}

#longest_match?Boolean

Returns whether or not the regular expression was compiled with the longest_match option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", longest_match: true)
re2.longest_match? #=> true

Returns:

  • (Boolean)

    the longest_match option



1032
1033
1034
1035
1036
1037
# File 'ext/re2/re2.cc', line 1032

static VALUE re2_regexp_longest_match(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().longest_match());
}

#match(text) ⇒ RE2::MatchData, ... #match(text, options) ⇒ RE2::MatchData, ... #match(text, submatches) ⇒ RE2::MatchData, ...

General matching: match the pattern against the given text using Match and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Overloads:

  • #match(text) ⇒ RE2::MatchData, ...

    Returns a MatchData containing the matching pattern and all submatches resulting from looking for the regexp in text if the pattern contains capturing groups.

    Returns either true or false indicating whether a successful match was made if the pattern contains no capturing groups.

    Examples:

    Matching with capturing groups

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo') #=> #<RE2::MatchData "woo" 1:"o" 2:"o">

    Matching without capturing groups

    r = RE2::Regexp.new('woo')
    r.match('woo') #=> true

    Parameters:

    • text (String)

      the text to search

    Returns:

    • (RE2::MatchData, nil)

      if the pattern contains capturing groups

    • (Boolean)

      if the pattern does not contain capturing groups

    Raises:

    • (NoMemoryError)

      if there was not enough memory to allocate the submatches

    • (TypeError)

      if given text that cannot be coerced to a String

  • #match(text, options) ⇒ RE2::MatchData, ...

    See match(text) but with customisable offsets for starting and ending matches, optional anchoring to the start or both ends of the text and a specific number of submatches to extract (padded with nils if necessary).

    Examples:

    Matching with capturing groups

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
    r.match('woo', submatches: 3) #=> #<RE2::MatchData "woo" 1:"o" 2:"o" 3:nil>
    r.match('woot', anchor: :anchor_both, submatches: 0)
    #=> false
    r.match('woot', anchor: :anchor_start, submatches: 0)
    #=> true

    Matching without capturing groups

    r = RE2::Regexp.new('wo+')
    r.match('woot', anchor: :anchor_both)  #=> false
    r.match('woot', anchor: :anchor_start) #=> true

    Parameters:

    • text (String)

      the text to search

    • options (Hash)

      the options with which to perform the match

    Options Hash (options):

    • :startpos (Integer) — default: 0

      offset at which to start matching

    • :endpos (Integer)

      offset at which to stop matching, defaults to the text length

    • :anchor (Symbol) — default: :unanchored

      one of :unanchored, :anchor_start, :anchor_both to anchor the match

    • :submatches (Integer)

      how many submatches to extract (0 is fastest), defaults to the number of capturing groups

    Returns:

    • (RE2::MatchData, nil)

      if extracting any submatches

    • (Boolean)

      if not extracting any submatches

    Raises:

    • (ArgumentError)

      if given a negative number of submatches, invalid anchor or invalid startpos, endpos pair

    • (NoMemoryError)

      if there was not enough memory to allocate the matches

    • (TypeError)

      if given non-String text, non-numeric number of submatches, non-symbol anchor or non-hash options

    • (RE2::Regexp::UnsupportedError)

      if given an endpos argument on a version of RE2 that does not support it

  • #match(text, submatches) ⇒ RE2::MatchData, ...
    Deprecated.

    Legacy syntax for matching against text with a specific number of submatches to extract. Use match(text, submatches: n) instead.

    Examples:

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo', 0) #=> true
    r.match('woo', 1) #=> #<RE2::MatchData "woo" 1:"o">
    r.match('woo', 2) #=> #<RE2::MatchData "woo" 1:"o" 2:"o">

    Parameters:

    • text (String)

      the text to search

    • submatches (Integer)

      the number of submatches to extract

    Returns:

    • (RE2::MatchData, nil)

      if extracting any submatches

    • (Boolean)

      if not extracting any submatches

    Raises:

    • (NoMemoryError)

      if there was not enough memory to allocate the submatches

    • (TypeError)

      if given non-numeric number of submatches



1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
# File 'ext/re2/re2.cc', line 1406

static VALUE re2_regexp_match(int argc, VALUE *argv, const VALUE self) {
  re2_pattern *p;
  re2_matchdata *m;
  VALUE text, options;

  rb_scan_args(argc, argv, "11", &text, &options);

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  int n;
  int startpos = 0;
  int endpos = RSTRING_LEN(text);
  RE2::Anchor anchor = RE2::UNANCHORED;

  if (RTEST(options)) {
    if (FIXNUM_P(options)) {
      n = NUM2INT(options);

      if (n < 0) {
        rb_raise(rb_eArgError, "number of matches should be >= 0");
      }
    } else {
      if (TYPE(options) != T_HASH) {
        options = rb_Hash(options);
      }

      VALUE endpos_option = rb_hash_aref(options, ID2SYM(id_endpos));
      if (!NIL_P(endpos_option)) {
#ifdef HAVE_ENDPOS_ARGUMENT
        Check_Type(endpos_option, T_FIXNUM);

        endpos = NUM2INT(endpos_option);

        if (endpos < 0) {
          rb_raise(rb_eArgError, "endpos should be >= 0");
        }
#else
        rb_raise(re2_eRegexpUnsupportedError, "current version of RE2::Match() does not support endpos argument");
#endif
      }

      VALUE anchor_option = rb_hash_aref(options, ID2SYM(id_anchor));
      if (!NIL_P(anchor_option)) {
        Check_Type(anchor_option, T_SYMBOL);

        ID id_anchor_option = SYM2ID(anchor_option);
        if (id_anchor_option == id_unanchored) {
          anchor = RE2::UNANCHORED;
        } else if (id_anchor_option == id_anchor_start) {
          anchor = RE2::ANCHOR_START;
        } else if (id_anchor_option == id_anchor_both) {
          anchor = RE2::ANCHOR_BOTH;
        } else {
          rb_raise(rb_eArgError, "anchor should be one of: :unanchored, :anchor_start, :anchor_both");
        }
      }

      VALUE submatches_option = rb_hash_aref(options, ID2SYM(id_submatches));
      if (!NIL_P(submatches_option)) {
        Check_Type(submatches_option, T_FIXNUM);

        n = NUM2INT(submatches_option);

        if (n < 0) {
          rb_raise(rb_eArgError, "number of matches should be >= 0");
        }
      } else {
        if (!p->pattern->ok()) {
          return Qnil;
        }

        n = p->pattern->NumberOfCapturingGroups();
      }

      VALUE startpos_option = rb_hash_aref(options, ID2SYM(id_startpos));
      if (!NIL_P(startpos_option)) {
        Check_Type(startpos_option, T_FIXNUM);

        startpos = NUM2INT(startpos_option);

        if (startpos < 0) {
          rb_raise(rb_eArgError, "startpos should be >= 0");
        }
      }
    }
  } else {
    if (!p->pattern->ok()) {
      return Qnil;
    }

    n = p->pattern->NumberOfCapturingGroups();
  }

  if (startpos > endpos) {
    rb_raise(rb_eArgError, "startpos should be <= endpos");
  }

  if (n == 0) {
#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, endpos, anchor, 0, 0);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, anchor, 0, 0);
#endif
    return BOOL2RUBY(matched);
  } else {
    /* Because match returns the whole match as well. */
    n += 1;

    VALUE matchdata = rb_class_new_instance(0, 0, re2_cMatchData);
    TypedData_Get_Struct(matchdata, re2_matchdata, &re2_matchdata_data_type, m);
    m->matches = new(std::nothrow) re2::StringPiece[n];
    RB_OBJ_WRITE(matchdata, &m->regexp, self);
    if (!RTEST(rb_obj_frozen_p(text))) {
      text = rb_str_freeze(rb_str_dup(text));
    }
    RB_OBJ_WRITE(matchdata, &m->text, text);

    if (m->matches == 0) {
      rb_raise(rb_eNoMemError,
               "not enough memory to allocate StringPieces for matches");
    }

    m->number_of_matches = n;

#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(m->text), RSTRING_LEN(m->text)),
        startpos, endpos, anchor, m->matches, n);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(m->text), RSTRING_LEN(m->text)),
        startpos, anchor, m->matches, n);
#endif
    if (matched) {
      return matchdata;
    } else {
      return Qnil;
    }
  }
}

#match?(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
# File 'ext/re2/re2.cc', line 1562

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#max_memInteger

Returns the max_mem setting for the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?", max_mem: 1024)
re2.max_mem #=> 1024

Returns:

  • (Integer)

    the max_mem option



1063
1064
1065
1066
1067
1068
# File 'ext/re2/re2.cc', line 1063

static VALUE re2_regexp_max_mem(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->options().max_mem());
}

#named_capturing_groupsHash

Returns a hash of names to capturing indices of groups.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (Hash)

    a hash of names to capturing indices



1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
# File 'ext/re2/re2.cc', line 1307

static VALUE re2_regexp_named_capturing_groups(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  const std::map<std::string, int>& groups = p->pattern->NamedCapturingGroups();
  VALUE capturing_groups = rb_hash_new();

  for (std::map<std::string, int>::const_iterator it = groups.begin(); it != groups.end(); ++it) {
    rb_hash_aset(capturing_groups,
        encoded_str_new(it->first.data(), it->first.size(),
          p->pattern->options().encoding()),
        INT2FIX(it->second));
  }

  return capturing_groups;
}

#never_nl?Boolean

Returns whether or not the regular expression was compiled with the never_nl option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", never_nl: true)
re2.never_nl? #=> true

Returns:

  • (Boolean)

    the never_nl option



1095
1096
1097
1098
1099
1100
# File 'ext/re2/re2.cc', line 1095

static VALUE re2_regexp_never_nl(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().never_nl());
}

#number_of_capturing_groupsInteger

Returns the number of capturing subpatterns, or -1 if the regexp wasn't valid on construction. The overall match ($0) does not count: if the regexp is "(a)(b)", returns 2.

Returns:

  • (Integer)

    the number of capturing subpatterns



1291
1292
1293
1294
1295
1296
# File 'ext/re2/re2.cc', line 1291

static VALUE re2_regexp_number_of_capturing_groups(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->NumberOfCapturingGroups());
}

#ok?Boolean

Returns whether or not the regular expression was compiled successfully.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.ok? #=> true

Returns:

  • (Boolean)

    whether or not compilation was successful



984
985
986
987
988
989
# File 'ext/re2/re2.cc', line 984

static VALUE re2_regexp_ok(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->ok());
}

#one_line?Boolean

Returns whether or not the regular expression was compiled with the one_line option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", one_line: true)
re2.one_line? #=> true

Returns:

  • (Boolean)

    the one_line option



1173
1174
1175
1176
1177
1178
# File 'ext/re2/re2.cc', line 1173

static VALUE re2_regexp_one_line(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().one_line());
}

#optionsHash

Returns a hash of the options currently set for the RE2::Regexp.

Returns:

  • (Hash)

    the options



1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
# File 'ext/re2/re2.cc', line 1239

static VALUE re2_regexp_options(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  VALUE options = rb_hash_new();

  rb_hash_aset(options, ID2SYM(id_utf8),
      BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8));

  rb_hash_aset(options, ID2SYM(id_posix_syntax),
      BOOL2RUBY(p->pattern->options().posix_syntax()));

  rb_hash_aset(options, ID2SYM(id_longest_match),
      BOOL2RUBY(p->pattern->options().longest_match()));

  rb_hash_aset(options, ID2SYM(id_log_errors),
      BOOL2RUBY(p->pattern->options().log_errors()));

  rb_hash_aset(options, ID2SYM(id_max_mem),
      INT2FIX(p->pattern->options().max_mem()));

  rb_hash_aset(options, ID2SYM(id_literal),
      BOOL2RUBY(p->pattern->options().literal()));

  rb_hash_aset(options, ID2SYM(id_never_nl),
      BOOL2RUBY(p->pattern->options().never_nl()));

  rb_hash_aset(options, ID2SYM(id_case_sensitive),
      BOOL2RUBY(p->pattern->options().case_sensitive()));

  rb_hash_aset(options, ID2SYM(id_perl_classes),
      BOOL2RUBY(p->pattern->options().perl_classes()));

  rb_hash_aset(options, ID2SYM(id_word_boundary),
      BOOL2RUBY(p->pattern->options().word_boundary()));

  rb_hash_aset(options, ID2SYM(id_one_line),
      BOOL2RUBY(p->pattern->options().one_line()));

  /* This is a read-only hash after all... */
  rb_obj_freeze(options);

  return options;
}

#partial_match(text, options = {}) ⇒ RE2::MatchData, ...

Match the pattern against any substring of the given text and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.partial_match('woot')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.partial_match('nope')                #=> nil
r.partial_match('woot', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.partial_match('woot', submatches: 0) #=> true

Parameters:

  • text (String)

    the text to search

  • options (Hash) (defaults to: {})

    the options with which to perform the match

Options Hash (options):

  • :submatches (Integer)

    how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

  • (RE2::MatchData, nil)

    if extracting any submatches

  • (Boolean)

    if not extracting any submatches

Raises:

  • (ArgumentError)

    if given a negative number of submatches

  • (NoMemoryError)

    if there was not enough memory to allocate the matches

  • (TypeError)

    if given non-numeric submatches or non-hash options



39
40
41
# File 'lib/re2/regexp.rb', line 39

def partial_match(text, options = {})
  match(text, Hash(options).merge(anchor: :unanchored))
end

#partial_match?(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
# File 'ext/re2/re2.cc', line 1562

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#patternString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



967
968
969
970
971
972
973
974
# File 'ext/re2/re2.cc', line 967

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#perl_classes?Boolean

Returns whether or not the regular expression was compiled with the perl_classes option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", perl_classes: true)
re2.perl_classes? #=> true

Returns:

  • (Boolean)

    the perl_classes option



1141
1142
1143
1144
1145
1146
# File 'ext/re2/re2.cc', line 1141

static VALUE re2_regexp_perl_classes(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().perl_classes());
}

#posix_syntax?Boolean

Returns whether or not the regular expression was compiled with the posix_syntax option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", posix_syntax: true)
re2.posix_syntax? #=> true

Returns:

  • (Boolean)

    the posix_syntax option



1016
1017
1018
1019
1020
1021
# File 'ext/re2/re2.cc', line 1016

static VALUE re2_regexp_posix_syntax(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().posix_syntax());
}

#program_sizeInteger

Returns the program size, a very approximate measure of a regexp's "cost". Larger numbers are more expensive than smaller numbers.

Returns:

  • (Integer)

    the regexp "cost"



1227
1228
1229
1230
1231
1232
# File 'ext/re2/re2.cc', line 1227

static VALUE re2_regexp_program_size(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->ProgramSize());
}

#scan(text) ⇒ RE2::Scanner

Returns a Scanner for scanning the given text incrementally with FindAndConsume.

Examples:

c = RE2::Regexp.new('(\w+)').scan("Foo bar baz")
#=> #<RE2::Scanner:0x0000000000000001>

Parameters:

  • text (text)

    the text to scan incrementally

Returns:

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
# File 'ext/re2/re2.cc', line 1606

static VALUE re2_regexp_scan(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p;
  re2_scanner *c;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  VALUE scanner = rb_class_new_instance(0, 0, re2_cScanner);
  TypedData_Get_Struct(scanner, re2_scanner, &re2_scanner_data_type, c);

  c->input = new(std::nothrow) re2::StringPiece(
      RSTRING_PTR(text), RSTRING_LEN(text));
  RB_OBJ_WRITE(scanner, &c->regexp, self);
  RB_OBJ_WRITE(scanner, &c->text, text);

  if (p->pattern->ok()) {
    c->number_of_capturing_groups = p->pattern->NumberOfCapturingGroups();
  } else {
    c->number_of_capturing_groups = 0;
  }

  c->eof = false;

  return scanner;
}

#sourceString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



967
968
969
970
971
972
973
974
# File 'ext/re2/re2.cc', line 967

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_sString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



967
968
969
970
971
972
973
974
# File 'ext/re2/re2.cc', line 967

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_strString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



967
968
969
970
971
972
973
974
# File 'ext/re2/re2.cc', line 967

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#utf8?Boolean

Returns whether or not the regular expression was compiled with the utf8 option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", utf8: true)
re2.utf8? #=> true

Returns:

  • (Boolean)

    the utf8 option



1000
1001
1002
1003
1004
1005
# File 'ext/re2/re2.cc', line 1000

static VALUE re2_regexp_utf8(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8);
}

#word_boundary?Boolean

Returns whether or not the regular expression was compiled with the word_boundary option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", word_boundary: true)
re2.word_boundary? #=> true

Returns:

  • (Boolean)

    the word_boundary option



1157
1158
1159
1160
1161
1162
# File 'ext/re2/re2.cc', line 1157

static VALUE re2_regexp_word_boundary(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().word_boundary());
}