Class: RE2::Regexp

Inherits:
Object show all
Defined in:
ext/re2/re2.cc,
lib/re2/regexp.rb

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pattern) ⇒ RE2::Regexp #initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

  • #initialize(pattern) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.

    Parameters:

    • pattern (String)

      the pattern to compile

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

  • #initialize(pattern, options) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.

    Parameters:

    • pattern (String)

      the pattern to compile

    • options (Hash)

      the options with which to compile the pattern

    Options Hash (options):

    • :utf8 (Boolean) — default: true

      text and pattern are UTF-8; otherwise Latin-1

    • :posix_syntax (Boolean) — default: false

      restrict regexps to POSIX egrep syntax

    • :longest_match (Boolean) — default: false

      search for longest match, not first match

    • :log_errors (Boolean) — default: true

      log syntax and execution errors to ERROR

    • :max_mem (Integer)

      approx. max memory footprint of RE2

    • :literal (Boolean) — default: false

      interpret string as literal, not regexp

    • :never_nl (Boolean) — default: false

      never match \n, even if it is in regexp

    • :case_sensitive (Boolean) — default: true

      match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

    • :perl_classes (Boolean) — default: false

      allow Perl's \d \s \w \D \S \W when in posix_syntax mode

    • :word_boundary (Boolean) — default: false

      allow \b \B (word boundary and not) when in posix_syntax mode

    • :one_line (Boolean) — default: false

      ^ and $ only match beginning and end of text when in posix_syntax mode

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern



912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
# File 'ext/re2/re2.cc', line 912

static VALUE re2_regexp_initialize(int argc, VALUE *argv, VALUE self) {
  VALUE pattern, options;
  re2_pattern *p;

  rb_scan_args(argc, argv, "11", &pattern, &options);

  /* Ensure pattern is a string. */
  StringValue(pattern);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (RTEST(options)) {
    RE2::Options re2_options;
    parse_re2_options(&re2_options, options);

    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)), re2_options);
  } else {
    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)));
  }

  if (p->pattern == 0) {
    rb_raise(rb_eNoMemError, "not enough memory to allocate RE2 object");
  }

  return self;
}

Class Method Details

.initialize(pattern) ⇒ RE2::Regexp .initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

  • .initialize(pattern) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.

    Parameters:

    • pattern (String)

      the pattern to compile

    Returns:

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

  • .initialize(pattern, options) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.

    Parameters:

    • pattern (String)

      the pattern to compile

    • options (Hash)

      the options with which to compile the pattern

    Options Hash (options):

    • :utf8 (Boolean) — default: true

      text and pattern are UTF-8; otherwise Latin-1

    • :posix_syntax (Boolean) — default: false

      restrict regexps to POSIX egrep syntax

    • :longest_match (Boolean) — default: false

      search for longest match, not first match

    • :log_errors (Boolean) — default: true

      log syntax and execution errors to ERROR

    • :max_mem (Integer)

      approx. max memory footprint of RE2

    • :literal (Boolean) — default: false

      interpret string as literal, not regexp

    • :never_nl (Boolean) — default: false

      never match \n, even if it is in regexp

    • :case_sensitive (Boolean) — default: true

      match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

    • :perl_classes (Boolean) — default: false

      allow Perl's \d \s \w \D \S \W when in posix_syntax mode

    • :word_boundary (Boolean) — default: false

      allow \b \B (word boundary and not) when in posix_syntax mode

    • :one_line (Boolean) — default: false

      ^ and $ only match beginning and end of text when in posix_syntax mode

    Returns:

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

.escape(unquoted) ⇒ String

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2::Regexp.escape("1.5-2.0?") #=> "1\.5\-2\.0\?"

Parameters:

  • unquoted (String)

    the unquoted string

Returns:

  • (String)

    the escaped string

Raises:

  • (TypeError)

    if the given unquoted string cannot be coerced to a String



1783
1784
1785
1786
1787
1788
1789
1790
# File 'ext/re2/re2.cc', line 1783

static VALUE re2_QuoteMeta(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

.match_has_endpos_argument?Boolean

Returns whether the underlying RE2 version supports passing an endpos argument to Match. If not, #match will raise an error if attempting to pass an endpos.

Returns:

  • (Boolean)

    whether the underlying Match has an endpos argument



1656
1657
1658
1659
1660
1661
1662
# File 'ext/re2/re2.cc', line 1656

static VALUE re2_regexp_match_has_endpos_argument_p(VALUE) {
#ifdef HAVE_ENDPOS_ARGUMENT
  return Qtrue;
#else
  return Qfalse;
#endif
}

.quote(unquoted) ⇒ String

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2::Regexp.escape("1.5-2.0?") #=> "1\.5\-2\.0\?"

Parameters:

  • unquoted (String)

    the unquoted string

Returns:

  • (String)

    the escaped string

Raises:

  • (TypeError)

    if the given unquoted string cannot be coerced to a String



1783
1784
1785
1786
1787
1788
1789
1790
# File 'ext/re2/re2.cc', line 1783

static VALUE re2_QuoteMeta(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

Instance Method Details

#===(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
# File 'ext/re2/re2.cc', line 1574

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#=~(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
# File 'ext/re2/re2.cc', line 1574

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#case_insensitive?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

  • (Boolean)

    the inverse of the case_sensitive option



1140
1141
1142
# File 'ext/re2/re2.cc', line 1140

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#case_sensitive?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_sensitive? #=> true

Returns:

  • (Boolean)

    the case_sensitive option



1123
1124
1125
1126
1127
1128
# File 'ext/re2/re2.cc', line 1123

static VALUE re2_regexp_case_sensitive(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().case_sensitive());
}

#casefold?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

  • (Boolean)

    the inverse of the case_sensitive option



1140
1141
1142
# File 'ext/re2/re2.cc', line 1140

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#errorString?

If the RE2::Regexp could not be created properly, returns an error string otherwise returns nil.

Returns:

  • (String, nil)

    the error string or nil



1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
# File 'ext/re2/re2.cc', line 1198

static VALUE re2_regexp_error(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return rb_str_new(p->pattern->error().data(), p->pattern->error().size());
  }
}

#error_argString?

If the RE2::Regexp could not be created properly, returns the offending portion of the regexp otherwise returns nil.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (String, nil)

    the offending portion of the regexp or nil



1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
# File 'ext/re2/re2.cc', line 1219

static VALUE re2_regexp_error_arg(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return encoded_str_new(p->pattern->error_arg().data(),
        p->pattern->error_arg().size(),
        p->pattern->options().encoding());
  }
}

#full_match(text, options = {}) ⇒ RE2::MatchData, ...

Match the pattern against the given text exactly and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.full_match('woo')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.full_match('woot')               #=> nil
r.full_match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.full_match('woo', submatches: 0) #=> true

Parameters:

  • text (String)

    the text to search

  • options (Hash) (defaults to: {})

    the options with which to perform the match

Options Hash (options):

  • :submatches (Integer)

    how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

  • (RE2::MatchData, nil)

    if extracting any submatches

  • (Boolean)

    if not extracting any submatches

Raises:

  • (ArgumentError)

    if given a negative number of submatches

  • (NoMemoryError)

    if there was not enough memory to allocate the matches

  • (TypeError)

    if given non-numeric submatches or non-hash options



68
69
70
# File 'lib/re2/regexp.rb', line 68

def full_match(text, options = {})
  match(text, Hash(options).merge(anchor: :anchor_both))
end

#full_match?(text) ⇒ Boolean

Returns true if the pattern matches the given text using FullMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
# File 'ext/re2/re2.cc', line 1594

static VALUE re2_regexp_full_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::FullMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#inspectString

Returns a printable version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.inspect #=> "#<RE2::Regexp /woo?/>"

Returns:

  • (String)

    a printable version of the regular expression



954
955
956
957
958
959
960
961
962
963
964
965
# File 'ext/re2/re2.cc', line 954

static VALUE re2_regexp_inspect(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  std::ostringstream output;

  output << "#<RE2::Regexp /" << p->pattern->pattern() << "/>";

  return encoded_str_new(output.str().data(), output.str().length(),
      p->pattern->options().encoding());
}

#literal?Boolean

Returns whether or not the regular expression was compiled with the literal option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", literal: true)
re2.literal? #=> true

Returns:

  • (Boolean)

    the literal option



1091
1092
1093
1094
1095
1096
# File 'ext/re2/re2.cc', line 1091

static VALUE re2_regexp_literal(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().literal());
}

#log_errors?Boolean

Returns whether or not the regular expression was compiled with the log_errors option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", log_errors: true)
re2.log_errors? #=> true

Returns:

  • (Boolean)

    the log_errors option



1060
1061
1062
1063
1064
1065
# File 'ext/re2/re2.cc', line 1060

static VALUE re2_regexp_log_errors(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().log_errors());
}

#longest_match?Boolean

Returns whether or not the regular expression was compiled with the longest_match option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", longest_match: true)
re2.longest_match? #=> true

Returns:

  • (Boolean)

    the longest_match option



1044
1045
1046
1047
1048
1049
# File 'ext/re2/re2.cc', line 1044

static VALUE re2_regexp_longest_match(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().longest_match());
}

#match(text) ⇒ RE2::MatchData, ... #match(text, options) ⇒ RE2::MatchData, ... #match(text, submatches) ⇒ RE2::MatchData, ...

General matching: match the pattern against the given text using Match and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Overloads:

  • #match(text) ⇒ RE2::MatchData, ...

    Returns a MatchData containing the matching pattern and all submatches resulting from looking for the regexp in text if the pattern contains capturing groups.

    Returns either true or false indicating whether a successful match was made if the pattern contains no capturing groups.

    Examples:

    Matching with capturing groups

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo') #=> #<RE2::MatchData "woo" 1:"o" 2:"o">

    Matching without capturing groups

    r = RE2::Regexp.new('woo')
    r.match('woo') #=> true

    Parameters:

    • text (String)

      the text to search

    Returns:

    • (RE2::MatchData, nil)

      if the pattern contains capturing groups

    • (Boolean)

      if the pattern does not contain capturing groups

    Raises:

    • (NoMemoryError)

      if there was not enough memory to allocate the submatches

    • (TypeError)

      if given text that cannot be coerced to a String

  • #match(text, options) ⇒ RE2::MatchData, ...

    See match(text) but with customisable offsets for starting and ending matches, optional anchoring to the start or both ends of the text and a specific number of submatches to extract (padded with nils if necessary).

    Examples:

    Matching with capturing groups

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
    r.match('woo', submatches: 3) #=> #<RE2::MatchData "woo" 1:"o" 2:"o" 3:nil>
    r.match('woot', anchor: :anchor_both, submatches: 0)
    #=> false
    r.match('woot', anchor: :anchor_start, submatches: 0)
    #=> true

    Matching without capturing groups

    r = RE2::Regexp.new('wo+')
    r.match('woot', anchor: :anchor_both)  #=> false
    r.match('woot', anchor: :anchor_start) #=> true

    Parameters:

    • text (String)

      the text to search

    • options (Hash)

      the options with which to perform the match

    Options Hash (options):

    • :startpos (Integer) — default: 0

      offset at which to start matching

    • :endpos (Integer)

      offset at which to stop matching, defaults to the text length

    • :anchor (Symbol) — default: :unanchored

      one of :unanchored, :anchor_start, :anchor_both to anchor the match

    • :submatches (Integer)

      how many submatches to extract (0 is fastest), defaults to the number of capturing groups

    Returns:

    • (RE2::MatchData, nil)

      if extracting any submatches

    • (Boolean)

      if not extracting any submatches

    Raises:

    • (ArgumentError)

      if given a negative number of submatches, invalid anchor or invalid startpos, endpos pair

    • (NoMemoryError)

      if there was not enough memory to allocate the matches

    • (TypeError)

      if given non-String text, non-numeric number of submatches, non-symbol anchor or non-hash options

    • (RE2::Regexp::UnsupportedError)

      if given an endpos argument on a version of RE2 that does not support it

  • #match(text, submatches) ⇒ RE2::MatchData, ...
    Deprecated.

    Legacy syntax for matching against text with a specific number of submatches to extract. Use match(text, submatches: n) instead.

    Examples:

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo', 0) #=> true
    r.match('woo', 1) #=> #<RE2::MatchData "woo" 1:"o">
    r.match('woo', 2) #=> #<RE2::MatchData "woo" 1:"o" 2:"o">

    Parameters:

    • text (String)

      the text to search

    • submatches (Integer)

      the number of submatches to extract

    Returns:

    • (RE2::MatchData, nil)

      if extracting any submatches

    • (Boolean)

      if not extracting any submatches

    Raises:

    • (NoMemoryError)

      if there was not enough memory to allocate the submatches

    • (TypeError)

      if given non-numeric number of submatches



1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
# File 'ext/re2/re2.cc', line 1418

static VALUE re2_regexp_match(int argc, VALUE *argv, const VALUE self) {
  re2_pattern *p;
  re2_matchdata *m;
  VALUE text, options;

  rb_scan_args(argc, argv, "11", &text, &options);

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  int n;
  int startpos = 0;
  int endpos = RSTRING_LEN(text);
  RE2::Anchor anchor = RE2::UNANCHORED;

  if (RTEST(options)) {
    if (FIXNUM_P(options)) {
      n = NUM2INT(options);

      if (n < 0) {
        rb_raise(rb_eArgError, "number of matches should be >= 0");
      }
    } else {
      if (TYPE(options) != T_HASH) {
        options = rb_Hash(options);
      }

      VALUE endpos_option = rb_hash_aref(options, ID2SYM(id_endpos));
      if (!NIL_P(endpos_option)) {
#ifdef HAVE_ENDPOS_ARGUMENT
        Check_Type(endpos_option, T_FIXNUM);

        endpos = NUM2INT(endpos_option);

        if (endpos < 0) {
          rb_raise(rb_eArgError, "endpos should be >= 0");
        }
#else
        rb_raise(re2_eRegexpUnsupportedError, "current version of RE2::Match() does not support endpos argument");
#endif
      }

      VALUE anchor_option = rb_hash_aref(options, ID2SYM(id_anchor));
      if (!NIL_P(anchor_option)) {
        Check_Type(anchor_option, T_SYMBOL);

        ID id_anchor_option = SYM2ID(anchor_option);
        if (id_anchor_option == id_unanchored) {
          anchor = RE2::UNANCHORED;
        } else if (id_anchor_option == id_anchor_start) {
          anchor = RE2::ANCHOR_START;
        } else if (id_anchor_option == id_anchor_both) {
          anchor = RE2::ANCHOR_BOTH;
        } else {
          rb_raise(rb_eArgError, "anchor should be one of: :unanchored, :anchor_start, :anchor_both");
        }
      }

      VALUE submatches_option = rb_hash_aref(options, ID2SYM(id_submatches));
      if (!NIL_P(submatches_option)) {
        Check_Type(submatches_option, T_FIXNUM);

        n = NUM2INT(submatches_option);

        if (n < 0) {
          rb_raise(rb_eArgError, "number of matches should be >= 0");
        }
      } else {
        if (!p->pattern->ok()) {
          return Qnil;
        }

        n = p->pattern->NumberOfCapturingGroups();
      }

      VALUE startpos_option = rb_hash_aref(options, ID2SYM(id_startpos));
      if (!NIL_P(startpos_option)) {
        Check_Type(startpos_option, T_FIXNUM);

        startpos = NUM2INT(startpos_option);

        if (startpos < 0) {
          rb_raise(rb_eArgError, "startpos should be >= 0");
        }
      }
    }
  } else {
    if (!p->pattern->ok()) {
      return Qnil;
    }

    n = p->pattern->NumberOfCapturingGroups();
  }

  if (startpos > endpos) {
    rb_raise(rb_eArgError, "startpos should be <= endpos");
  }

  if (n == 0) {
#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, endpos, anchor, 0, 0);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, anchor, 0, 0);
#endif
    return BOOL2RUBY(matched);
  } else {
    /* Because match returns the whole match as well. */
    n += 1;

    VALUE matchdata = rb_class_new_instance(0, 0, re2_cMatchData);
    TypedData_Get_Struct(matchdata, re2_matchdata, &re2_matchdata_data_type, m);
    m->matches = new(std::nothrow) re2::StringPiece[n];
    RB_OBJ_WRITE(matchdata, &m->regexp, self);
    if (!RTEST(rb_obj_frozen_p(text))) {
      text = rb_str_freeze(rb_str_dup(text));
    }
    RB_OBJ_WRITE(matchdata, &m->text, text);

    if (m->matches == 0) {
      rb_raise(rb_eNoMemError,
               "not enough memory to allocate StringPieces for matches");
    }

    m->number_of_matches = n;

#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(m->text), RSTRING_LEN(m->text)),
        startpos, endpos, anchor, m->matches, n);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(m->text), RSTRING_LEN(m->text)),
        startpos, anchor, m->matches, n);
#endif
    if (matched) {
      return matchdata;
    } else {
      return Qnil;
    }
  }
}

#match?(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
# File 'ext/re2/re2.cc', line 1574

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#max_memInteger

Returns the max_mem setting for the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?", max_mem: 1024)
re2.max_mem #=> 1024

Returns:

  • (Integer)

    the max_mem option



1075
1076
1077
1078
1079
1080
# File 'ext/re2/re2.cc', line 1075

static VALUE re2_regexp_max_mem(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->options().max_mem());
}

#named_capturing_groupsHash

Returns a hash of names to capturing indices of groups.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (Hash)

    a hash of names to capturing indices



1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
# File 'ext/re2/re2.cc', line 1319

static VALUE re2_regexp_named_capturing_groups(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  const std::map<std::string, int>& groups = p->pattern->NamedCapturingGroups();
  VALUE capturing_groups = rb_hash_new();

  for (std::map<std::string, int>::const_iterator it = groups.begin(); it != groups.end(); ++it) {
    rb_hash_aset(capturing_groups,
        encoded_str_new(it->first.data(), it->first.size(),
          p->pattern->options().encoding()),
        INT2FIX(it->second));
  }

  return capturing_groups;
}

#never_nl?Boolean

Returns whether or not the regular expression was compiled with the never_nl option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", never_nl: true)
re2.never_nl? #=> true

Returns:

  • (Boolean)

    the never_nl option



1107
1108
1109
1110
1111
1112
# File 'ext/re2/re2.cc', line 1107

static VALUE re2_regexp_never_nl(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().never_nl());
}

#number_of_capturing_groupsInteger

Returns the number of capturing subpatterns, or -1 if the regexp wasn't valid on construction. The overall match ($0) does not count: if the regexp is "(a)(b)", returns 2.

Returns:

  • (Integer)

    the number of capturing subpatterns



1303
1304
1305
1306
1307
1308
# File 'ext/re2/re2.cc', line 1303

static VALUE re2_regexp_number_of_capturing_groups(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->NumberOfCapturingGroups());
}

#ok?Boolean

Returns whether or not the regular expression was compiled successfully.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.ok? #=> true

Returns:

  • (Boolean)

    whether or not compilation was successful



996
997
998
999
1000
1001
# File 'ext/re2/re2.cc', line 996

static VALUE re2_regexp_ok(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->ok());
}

#one_line?Boolean

Returns whether or not the regular expression was compiled with the one_line option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", one_line: true)
re2.one_line? #=> true

Returns:

  • (Boolean)

    the one_line option



1185
1186
1187
1188
1189
1190
# File 'ext/re2/re2.cc', line 1185

static VALUE re2_regexp_one_line(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().one_line());
}

#optionsHash

Returns a hash of the options currently set for the RE2::Regexp.

Returns:

  • (Hash)

    the options



1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
# File 'ext/re2/re2.cc', line 1251

static VALUE re2_regexp_options(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  VALUE options = rb_hash_new();

  rb_hash_aset(options, ID2SYM(id_utf8),
      BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8));

  rb_hash_aset(options, ID2SYM(id_posix_syntax),
      BOOL2RUBY(p->pattern->options().posix_syntax()));

  rb_hash_aset(options, ID2SYM(id_longest_match),
      BOOL2RUBY(p->pattern->options().longest_match()));

  rb_hash_aset(options, ID2SYM(id_log_errors),
      BOOL2RUBY(p->pattern->options().log_errors()));

  rb_hash_aset(options, ID2SYM(id_max_mem),
      INT2FIX(p->pattern->options().max_mem()));

  rb_hash_aset(options, ID2SYM(id_literal),
      BOOL2RUBY(p->pattern->options().literal()));

  rb_hash_aset(options, ID2SYM(id_never_nl),
      BOOL2RUBY(p->pattern->options().never_nl()));

  rb_hash_aset(options, ID2SYM(id_case_sensitive),
      BOOL2RUBY(p->pattern->options().case_sensitive()));

  rb_hash_aset(options, ID2SYM(id_perl_classes),
      BOOL2RUBY(p->pattern->options().perl_classes()));

  rb_hash_aset(options, ID2SYM(id_word_boundary),
      BOOL2RUBY(p->pattern->options().word_boundary()));

  rb_hash_aset(options, ID2SYM(id_one_line),
      BOOL2RUBY(p->pattern->options().one_line()));

  /* This is a read-only hash after all... */
  rb_obj_freeze(options);

  return options;
}

#partial_match(text, options = {}) ⇒ RE2::MatchData, ...

Match the pattern against any substring of the given text and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.partial_match('woot')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.partial_match('nope')                #=> nil
r.partial_match('woot', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.partial_match('woot', submatches: 0) #=> true

Parameters:

  • text (String)

    the text to search

  • options (Hash) (defaults to: {})

    the options with which to perform the match

Options Hash (options):

  • :submatches (Integer)

    how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

  • (RE2::MatchData, nil)

    if extracting any submatches

  • (Boolean)

    if not extracting any submatches

Raises:

  • (ArgumentError)

    if given a negative number of submatches

  • (NoMemoryError)

    if there was not enough memory to allocate the matches

  • (TypeError)

    if given non-numeric submatches or non-hash options



39
40
41
# File 'lib/re2/regexp.rb', line 39

def partial_match(text, options = {})
  match(text, Hash(options).merge(anchor: :unanchored))
end

#partial_match?(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
# File 'ext/re2/re2.cc', line 1574

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#patternString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



979
980
981
982
983
984
985
986
# File 'ext/re2/re2.cc', line 979

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#perl_classes?Boolean

Returns whether or not the regular expression was compiled with the perl_classes option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", perl_classes: true)
re2.perl_classes? #=> true

Returns:

  • (Boolean)

    the perl_classes option



1153
1154
1155
1156
1157
1158
# File 'ext/re2/re2.cc', line 1153

static VALUE re2_regexp_perl_classes(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().perl_classes());
}

#posix_syntax?Boolean

Returns whether or not the regular expression was compiled with the posix_syntax option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", posix_syntax: true)
re2.posix_syntax? #=> true

Returns:

  • (Boolean)

    the posix_syntax option



1028
1029
1030
1031
1032
1033
# File 'ext/re2/re2.cc', line 1028

static VALUE re2_regexp_posix_syntax(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().posix_syntax());
}

#program_sizeInteger

Returns the program size, a very approximate measure of a regexp's "cost". Larger numbers are more expensive than smaller numbers.

Returns:

  • (Integer)

    the regexp "cost"



1239
1240
1241
1242
1243
1244
# File 'ext/re2/re2.cc', line 1239

static VALUE re2_regexp_program_size(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->ProgramSize());
}

#scan(text) ⇒ RE2::Scanner

Returns a Scanner for scanning the given text incrementally with FindAndConsume.

Examples:

c = RE2::Regexp.new('(\w+)').scan("Foo bar baz")
#=> #<RE2::Scanner:0x0000000000000001>

Parameters:

  • text (text)

    the text to scan incrementally

Returns:

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
# File 'ext/re2/re2.cc', line 1618

static VALUE re2_regexp_scan(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p;
  re2_scanner *c;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  VALUE scanner = rb_class_new_instance(0, 0, re2_cScanner);
  TypedData_Get_Struct(scanner, re2_scanner, &re2_scanner_data_type, c);

  c->input = new(std::nothrow) re2::StringPiece(
      RSTRING_PTR(text), RSTRING_LEN(text));
  RB_OBJ_WRITE(scanner, &c->regexp, self);
  RB_OBJ_WRITE(scanner, &c->text, text);

  if (p->pattern->ok()) {
    c->number_of_capturing_groups = p->pattern->NumberOfCapturingGroups();
  } else {
    c->number_of_capturing_groups = 0;
  }

  c->eof = false;

  return scanner;
}

#sourceString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



979
980
981
982
983
984
985
986
# File 'ext/re2/re2.cc', line 979

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_sString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



979
980
981
982
983
984
985
986
# File 'ext/re2/re2.cc', line 979

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_strString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



979
980
981
982
983
984
985
986
# File 'ext/re2/re2.cc', line 979

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#utf8?Boolean

Returns whether or not the regular expression was compiled with the utf8 option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", utf8: true)
re2.utf8? #=> true

Returns:

  • (Boolean)

    the utf8 option



1012
1013
1014
1015
1016
1017
# File 'ext/re2/re2.cc', line 1012

static VALUE re2_regexp_utf8(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8);
}

#word_boundary?Boolean

Returns whether or not the regular expression was compiled with the word_boundary option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", word_boundary: true)
re2.word_boundary? #=> true

Returns:

  • (Boolean)

    the word_boundary option



1169
1170
1171
1172
1173
1174
# File 'ext/re2/re2.cc', line 1169

static VALUE re2_regexp_word_boundary(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().word_boundary());
}