Class: RE2::Regexp

Inherits:
Object show all
Defined in:
ext/re2/re2.cc,
lib/re2/regexp.rb

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pattern) ⇒ RE2::Regexp #initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

  • #initialize(pattern) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.

    Parameters:

    • pattern (String)

      the pattern to compile

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

  • #initialize(pattern, options) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.

    Parameters:

    • pattern (String)

      the pattern to compile

    • options (Hash)

      the options with which to compile the pattern

    Options Hash (options):

    • :utf8 (Boolean) — default: true

      text and pattern are UTF-8; otherwise Latin-1

    • :posix_syntax (Boolean) — default: false

      restrict regexps to POSIX egrep syntax

    • :longest_match (Boolean) — default: false

      search for longest match, not first match

    • :log_errors (Boolean) — default: true

      log syntax and execution errors to ERROR

    • :max_mem (Integer)

      approx. max memory footprint of RE2

    • :literal (Boolean) — default: false

      interpret string as literal, not regexp

    • :never_nl (Boolean) — default: false

      never match \n, even if it is in regexp

    • :case_sensitive (Boolean) — default: true

      match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

    • :perl_classes (Boolean) — default: false

      allow Perl's \d \s \w \D \S \W when in posix_syntax mode

    • :word_boundary (Boolean) — default: false

      allow \b \B (word boundary and not) when in posix_syntax mode

    • :one_line (Boolean) — default: false

      ^ and $ only match beginning and end of text when in posix_syntax mode

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern



1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
# File 'ext/re2/re2.cc', line 1221

static VALUE re2_regexp_initialize(int argc, VALUE *argv, VALUE self) {
  VALUE pattern, options;
  re2_pattern *p;

  rb_scan_args(argc, argv, "11", &pattern, &options);

  /* Ensure pattern is a string. */
  StringValue(pattern);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern) {
    delete p->pattern;
  }

  if (RTEST(options)) {
    RE2::Options re2_options;
    parse_re2_options(&re2_options, options);

    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)), re2_options);
  } else {
    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)));
  }

  if (p->pattern == 0) {
    rb_raise(rb_eNoMemError, "not enough memory to allocate RE2 object");
  }

  return self;
}

Class Method Details

.initialize(pattern) ⇒ RE2::Regexp .initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

  • .initialize(pattern) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.

    Parameters:

    • pattern (String)

      the pattern to compile

    Returns:

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

  • .initialize(pattern, options) ⇒ RE2::Regexp

    Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.

    Parameters:

    • pattern (String)

      the pattern to compile

    • options (Hash)

      the options with which to compile the pattern

    Options Hash (options):

    • :utf8 (Boolean) — default: true

      text and pattern are UTF-8; otherwise Latin-1

    • :posix_syntax (Boolean) — default: false

      restrict regexps to POSIX egrep syntax

    • :longest_match (Boolean) — default: false

      search for longest match, not first match

    • :log_errors (Boolean) — default: true

      log syntax and execution errors to ERROR

    • :max_mem (Integer)

      approx. max memory footprint of RE2

    • :literal (Boolean) — default: false

      interpret string as literal, not regexp

    • :never_nl (Boolean) — default: false

      never match \n, even if it is in regexp

    • :case_sensitive (Boolean) — default: true

      match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)

    • :perl_classes (Boolean) — default: false

      allow Perl's \d \s \w \D \S \W when in posix_syntax mode

    • :word_boundary (Boolean) — default: false

      allow \b \B (word boundary and not) when in posix_syntax mode

    • :one_line (Boolean) — default: false

      ^ and $ only match beginning and end of text when in posix_syntax mode

    Returns:

    Raises:

    • (TypeError)

      if the given pattern can't be coerced to a String

    • (NoMemoryError)

      if memory could not be allocated for the compiled pattern

.escape(unquoted) ⇒ String

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2.escape("1.5-2.0?")         #=> "1\\.5\\-2\\.0\\?"
RE2.quote("1.5-2.0?")          #=> "1\\.5\\-2\\.0\\?"
RE2::Regexp.escape("1.5-2.0?") #=> "1\\.5\\-2\\.0\\?"
RE2::Regexp.quote("1.5-2.0?")  #=> "1\\.5\\-2\\.0\\?"

Parameters:

  • unquoted (String)

    the unquoted string

Returns:

  • (String)

    the escaped string

Raises:

  • (TypeError)

    if the given unquoted string cannot be coerced to a String



2160
2161
2162
2163
2164
2165
2166
2167
# File 'ext/re2/re2.cc', line 2160

static VALUE re2_escape(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

.match_has_endpos_argument?Boolean

Returns whether the underlying RE2 version supports passing an endpos argument to Match. If not, #match will raise an error if attempting to pass an endpos.

Returns:

  • (Boolean)

    whether the underlying Match has an endpos argument



1965
1966
1967
1968
1969
1970
1971
# File 'ext/re2/re2.cc', line 1965

static VALUE re2_regexp_match_has_endpos_argument_p(VALUE) {
#ifdef HAVE_ENDPOS_ARGUMENT
  return Qtrue;
#else
  return Qfalse;
#endif
}

.quote(unquoted) ⇒ String

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2.escape("1.5-2.0?")         #=> "1\\.5\\-2\\.0\\?"
RE2.quote("1.5-2.0?")          #=> "1\\.5\\-2\\.0\\?"
RE2::Regexp.escape("1.5-2.0?") #=> "1\\.5\\-2\\.0\\?"
RE2::Regexp.quote("1.5-2.0?")  #=> "1\\.5\\-2\\.0\\?"

Parameters:

  • unquoted (String)

    the unquoted string

Returns:

  • (String)

    the escaped string

Raises:

  • (TypeError)

    if the given unquoted string cannot be coerced to a String



2160
2161
2162
2163
2164
2165
2166
2167
# File 'ext/re2/re2.cc', line 2160

static VALUE re2_escape(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

Instance Method Details

#===(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Parameters:

  • text (String)

    the text to search

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1884
1885
1886
1887
1888
1889
1890
1891
1892
# File 'ext/re2/re2.cc', line 1884

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#=~(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Parameters:

  • text (String)

    the text to search

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1884
1885
1886
1887
1888
1889
1890
1891
1892
# File 'ext/re2/re2.cc', line 1884

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#case_insensitive?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

  • (Boolean)

    the inverse of the case_sensitive option



1460
1461
1462
# File 'ext/re2/re2.cc', line 1460

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#case_sensitive?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_sensitive? #=> true

Returns:

  • (Boolean)

    the case_sensitive option



1444
1445
1446
1447
1448
# File 'ext/re2/re2.cc', line 1444

static VALUE re2_regexp_case_sensitive(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().case_sensitive());
}

#casefold?Boolean

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

  • (Boolean)

    the inverse of the case_sensitive option



1460
1461
1462
# File 'ext/re2/re2.cc', line 1460

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#errorString?

If the RE2::Regexp could not be created properly, returns an error string otherwise returns nil.

Returns:

  • (String, nil)

    the error string or nil



1515
1516
1517
1518
1519
1520
1521
1522
1523
# File 'ext/re2/re2.cc', line 1515

static VALUE re2_regexp_error(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return rb_str_new(p->pattern->error().data(), p->pattern->error().size());
  }
}

#error_argString?

If the RE2::Regexp could not be created properly, returns the offending portion of the regexp otherwise returns nil.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (String, nil)

    the offending portion of the regexp or nil



1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
# File 'ext/re2/re2.cc', line 1535

static VALUE re2_regexp_error_arg(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return encoded_str_new(p->pattern->error_arg().data(),
        p->pattern->error_arg().size(),
        p->pattern->options().encoding());
  }
}

#full_match(text, options = {}) ⇒ RE2::MatchData, ...

Match the pattern against the given text exactly and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.full_match('woo')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.full_match('woot')               #=> nil
r.full_match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.full_match('woo', submatches: 0) #=> true

Parameters:

  • text (String)

    the text to search

  • options (Hash) (defaults to: {})

    the options with which to perform the match

Options Hash (options):

  • :submatches (Integer)

    how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

  • (RE2::MatchData, nil)

    if extracting any submatches

  • (Boolean)

    if not extracting any submatches

Raises:

  • (ArgumentError)

    if given a negative number of submatches

  • (NoMemoryError)

    if there was not enough memory to allocate the matches

  • (TypeError)

    if given non-numeric submatches or non-hash options



68
69
70
# File 'lib/re2/regexp.rb', line 68

def full_match(text, options = {})
  match(text, Hash(options).merge(anchor: :anchor_both))
end

#full_match?(text) ⇒ Boolean

Returns true if the pattern matches the given text using FullMatch.

Parameters:

  • text (String)

    the text to search

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1903
1904
1905
1906
1907
1908
1909
1910
1911
# File 'ext/re2/re2.cc', line 1903

static VALUE re2_regexp_full_match_p(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(RE2::FullMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#initialize_copy(other) ⇒ Object



1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
# File 'ext/re2/re2.cc', line 1254

static VALUE re2_regexp_initialize_copy(VALUE self, VALUE other) {
  re2_pattern *self_p;
  re2_pattern *other_p = unwrap_re2_regexp(other);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, self_p);

  if (self_p->pattern) {
    delete self_p->pattern;
  }

  self_p->pattern = new(std::nothrow) RE2(other_p->pattern->pattern(),
                                          other_p->pattern->options());
  if (self_p->pattern == 0) {
    rb_raise(rb_eNoMemError, "not enough memory to allocate RE2 object");
  }

  return self;
}

#inspectString

Returns a printable version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.inspect #=> "#<RE2::Regexp /woo?/>"

Returns:

  • (String)

    a printable version of the regular expression



1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
# File 'ext/re2/re2.cc', line 1286

static VALUE re2_regexp_inspect(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  std::ostringstream output;

  output << "#<RE2::Regexp /" << p->pattern->pattern() << "/>";

  return encoded_str_new(output.str().data(), output.str().length(),
      p->pattern->options().encoding());
}

#literal?Boolean

Returns whether or not the regular expression was compiled with the literal option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", literal: true)
re2.literal? #=> true

Returns:

  • (Boolean)

    the literal option



1414
1415
1416
1417
1418
# File 'ext/re2/re2.cc', line 1414

static VALUE re2_regexp_literal(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().literal());
}

#log_errors?Boolean

Returns whether or not the regular expression was compiled with the log_errors option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", log_errors: true)
re2.log_errors? #=> true

Returns:

  • (Boolean)

    the log_errors option



1385
1386
1387
1388
1389
# File 'ext/re2/re2.cc', line 1385

static VALUE re2_regexp_log_errors(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().log_errors());
}

#longest_match?Boolean

Returns whether or not the regular expression was compiled with the longest_match option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", longest_match: true)
re2.longest_match? #=> true

Returns:

  • (Boolean)

    the longest_match option



1370
1371
1372
1373
1374
# File 'ext/re2/re2.cc', line 1370

static VALUE re2_regexp_longest_match(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().longest_match());
}

#match(text) ⇒ RE2::MatchData, ... #match(text, options) ⇒ RE2::MatchData, ... #match(text, submatches) ⇒ RE2::MatchData, ...

General matching: match the pattern against the given text using Match and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Overloads:

  • #match(text) ⇒ RE2::MatchData, ...

    Returns a MatchData containing the matching pattern and all submatches resulting from looking for the regexp in text if the pattern contains capturing groups.

    Returns either true or false indicating whether a successful match was made if the pattern contains no capturing groups.

    Examples:

    Matching with capturing groups

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo') #=> #<RE2::MatchData "woo" 1:"o" 2:"o">

    Matching without capturing groups

    r = RE2::Regexp.new('woo')
    r.match('woo') #=> true

    Parameters:

    • text (String)

      the text to search

    Returns:

    • (RE2::MatchData, nil)

      if the pattern contains capturing groups

    • (Boolean)

      if the pattern does not contain capturing groups

    Raises:

    • (NoMemoryError)

      if there was not enough memory to allocate the submatches

    • (TypeError)

      if given text that cannot be coerced to a String

  • #match(text, options) ⇒ RE2::MatchData, ...

    See match(text) but with customisable offsets for starting and ending matches, optional anchoring to the start or both ends of the text and a specific number of submatches to extract (padded with nils if necessary).

    Examples:

    Matching with capturing groups

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
    r.match('woo', submatches: 3) #=> #<RE2::MatchData "woo" 1:"o" 2:"o" 3:nil>
    r.match('woot', anchor: :anchor_both, submatches: 0)
    #=> false
    r.match('woot', anchor: :anchor_start, submatches: 0)
    #=> true

    Matching without capturing groups

    r = RE2::Regexp.new('wo+')
    r.match('woot', anchor: :anchor_both)  #=> false
    r.match('woot', anchor: :anchor_start) #=> true

    Parameters:

    • text (String)

      the text to search

    • options (Hash)

      the options with which to perform the match

    Options Hash (options):

    • :startpos (Integer) — default: 0

      offset at which to start matching

    • :endpos (Integer)

      offset at which to stop matching, defaults to the text length

    • :anchor (Symbol) — default: :unanchored

      one of :unanchored, :anchor_start, :anchor_both to anchor the match

    • :submatches (Integer)

      how many submatches to extract (0 is fastest), defaults to the number of capturing groups

    Returns:

    • (RE2::MatchData, nil)

      if extracting any submatches

    • (Boolean)

      if not extracting any submatches

    Raises:

    • (ArgumentError)

      if given a negative number of submatches, invalid anchor or invalid startpos, endpos pair

    • (NoMemoryError)

      if there was not enough memory to allocate the matches

    • (TypeError)

      if given non-String text, non-numeric number of submatches, non-symbol anchor or non-hash options

    • (RE2::Regexp::UnsupportedError)

      if given an endpos argument on a version of RE2 that does not support it

  • #match(text, submatches) ⇒ RE2::MatchData, ...
    Deprecated.

    Legacy syntax for matching against text with a specific number of submatches to extract. Use match(text, submatches: n) instead.

    Examples:

    r = RE2::Regexp.new('w(o)(o)')
    r.match('woo', 0) #=> true
    r.match('woo', 1) #=> #<RE2::MatchData "woo" 1:"o">
    r.match('woo', 2) #=> #<RE2::MatchData "woo" 1:"o" 2:"o">

    Parameters:

    • text (String)

      the text to search

    • submatches (Integer)

      the number of submatches to extract

    Returns:

    • (RE2::MatchData, nil)

      if extracting any submatches

    • (Boolean)

      if not extracting any submatches

    Raises:

    • (NoMemoryError)

      if there was not enough memory to allocate the submatches

    • (TypeError)

      if given non-numeric number of submatches



1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
1742
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
1764
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
1790
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
1816
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
1840
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
1857
1858
1859
1860
1861
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
# File 'ext/re2/re2.cc', line 1727

static VALUE re2_regexp_match(int argc, VALUE *argv, const VALUE self) {
  re2_pattern *p;
  re2_matchdata *m;
  VALUE text, options;

  rb_scan_args(argc, argv, "11", &text, &options);

  /* Ensure text is a string. */
  StringValue(text);

  p = unwrap_re2_regexp(self);

  int n;
  int startpos = 0;
  int endpos = RSTRING_LEN(text);
  RE2::Anchor anchor = RE2::UNANCHORED;

  if (RTEST(options)) {
    if (RB_INTEGER_TYPE_P(options)) {
      n = NUM2INT(options);

      if (n < 0) {
        rb_raise(rb_eArgError, "number of matches should be >= 0");
      }
    } else {
      if (TYPE(options) != T_HASH) {
        options = rb_Hash(options);
      }

      VALUE endpos_option = rb_hash_aref(options, ID2SYM(id_endpos));
      if (!NIL_P(endpos_option)) {
#ifdef HAVE_ENDPOS_ARGUMENT
        endpos = NUM2INT(endpos_option);

        if (endpos < 0) {
          rb_raise(rb_eArgError, "endpos should be >= 0");
        }
#else
        rb_raise(re2_eRegexpUnsupportedError, "current version of RE2::Match() does not support endpos argument");
#endif
      }

      VALUE anchor_option = rb_hash_aref(options, ID2SYM(id_anchor));
      if (!NIL_P(anchor_option)) {
        Check_Type(anchor_option, T_SYMBOL);

        ID id_anchor_option = SYM2ID(anchor_option);
        if (id_anchor_option == id_unanchored) {
          anchor = RE2::UNANCHORED;
        } else if (id_anchor_option == id_anchor_start) {
          anchor = RE2::ANCHOR_START;
        } else if (id_anchor_option == id_anchor_both) {
          anchor = RE2::ANCHOR_BOTH;
        } else {
          rb_raise(rb_eArgError, "anchor should be one of: :unanchored, :anchor_start, :anchor_both");
        }
      }

      VALUE submatches_option = rb_hash_aref(options, ID2SYM(id_submatches));
      if (!NIL_P(submatches_option)) {
        n = NUM2INT(submatches_option);

        if (n < 0) {
          rb_raise(rb_eArgError, "number of matches should be >= 0");
        }
      } else {
        if (!p->pattern->ok()) {
          return Qnil;
        }

        n = p->pattern->NumberOfCapturingGroups();
      }

      VALUE startpos_option = rb_hash_aref(options, ID2SYM(id_startpos));
      if (!NIL_P(startpos_option)) {
        startpos = NUM2INT(startpos_option);

        if (startpos < 0) {
          rb_raise(rb_eArgError, "startpos should be >= 0");
        }
      }
    }
  } else {
    if (!p->pattern->ok()) {
      return Qnil;
    }

    n = p->pattern->NumberOfCapturingGroups();
  }

  if (startpos > endpos) {
    rb_raise(rb_eArgError, "startpos should be <= endpos");
  }

  if (n == 0) {
#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, endpos, anchor, 0, 0);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, anchor, 0, 0);
#endif
    return BOOL2RUBY(matched);
  } else {
    if (n == INT_MAX) {
      rb_raise(rb_eRangeError, "number of matches should be < %d", INT_MAX);
    }

    /* Because match returns the whole match as well. */
    n += 1;

    re2::StringPiece *matches = new(std::nothrow) re2::StringPiece[n];
    if (matches == 0) {
      rb_raise(rb_eNoMemError,
               "not enough memory to allocate StringPieces for matches");
    }

    text = rb_str_new_frozen(text);

#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, endpos, anchor, matches, n);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, anchor, matches, n);
#endif
    if (matched) {
      VALUE matchdata = rb_class_new_instance(0, 0, re2_cMatchData);
      TypedData_Get_Struct(matchdata, re2_matchdata, &re2_matchdata_data_type, m);

      RB_OBJ_WRITE(matchdata, &m->regexp, self);
      RB_OBJ_WRITE(matchdata, &m->text, text);
      m->matches = matches;
      m->number_of_matches = n;

      return matchdata;
    } else {
      delete[] matches;

      return Qnil;
    }
  }
}

#match?(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Parameters:

  • text (String)

    the text to search

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1884
1885
1886
1887
1888
1889
1890
1891
1892
# File 'ext/re2/re2.cc', line 1884

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#max_memInteger

Returns the max_mem setting for the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?", max_mem: 1024)
re2.max_mem #=> 1024

Returns:

  • (Integer)

    the max_mem option



1399
1400
1401
1402
1403
# File 'ext/re2/re2.cc', line 1399

static VALUE re2_regexp_max_mem(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return INT2FIX(p->pattern->options().max_mem());
}

#named_capturesHash

Returns a hash of names to capturing indices of groups.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (Hash)

    a hash of names to capturing indices



1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
# File 'ext/re2/re2.cc', line 1630

static VALUE re2_regexp_named_capturing_groups(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);
  const std::map<std::string, int>& groups = p->pattern->NamedCapturingGroups();
  VALUE capturing_groups = rb_hash_new();

  for (std::map<std::string, int>::const_iterator it = groups.begin(); it != groups.end(); ++it) {
    rb_hash_aset(capturing_groups,
        encoded_str_new(it->first.data(), it->first.size(),
          p->pattern->options().encoding()),
        INT2FIX(it->second));
  }

  return capturing_groups;
}

#named_capturing_groupsHash

Returns a hash of names to capturing indices of groups.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

  • (Hash)

    a hash of names to capturing indices



1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
# File 'ext/re2/re2.cc', line 1630

static VALUE re2_regexp_named_capturing_groups(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);
  const std::map<std::string, int>& groups = p->pattern->NamedCapturingGroups();
  VALUE capturing_groups = rb_hash_new();

  for (std::map<std::string, int>::const_iterator it = groups.begin(); it != groups.end(); ++it) {
    rb_hash_aset(capturing_groups,
        encoded_str_new(it->first.data(), it->first.size(),
          p->pattern->options().encoding()),
        INT2FIX(it->second));
  }

  return capturing_groups;
}

#namesArray<String>

Returns an array of names of all named capturing groups. Names are returned in alphabetical order rather than definition order, as RE2 stores named groups internally in a sorted map.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

RE2::Regexp.new('(?P<a>\d+) (?P<b>\w+)').names #=> ["a", "b"]

Returns:

  • (Array<String>)

    an array of names of named capturing groups



296
297
298
299
300
301
302
303
304
305
306
307
308
309
# File 'ext/re2/re2.cc', line 296

static VALUE re2_regexp_names(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  const std::map<std::string, int>& groups = p->pattern->NamedCapturingGroups();
  VALUE names = rb_ary_new2(groups.size());

  for (std::map<std::string, int>::const_iterator it = groups.begin(); it != groups.end(); ++it) {
    rb_ary_push(names,
        encoded_str_new(it->first.data(), it->first.size(),
          p->pattern->options().encoding()));
  }

  return names;
}

#never_nl?Boolean

Returns whether or not the regular expression was compiled with the never_nl option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", never_nl: true)
re2.never_nl? #=> true

Returns:

  • (Boolean)

    the never_nl option



1429
1430
1431
1432
1433
# File 'ext/re2/re2.cc', line 1429

static VALUE re2_regexp_never_nl(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().never_nl());
}

#number_of_capturing_groupsInteger

Returns the number of capturing subpatterns, or -1 if the regexp wasn't valid on construction. The overall match ($0) does not count: if the regexp is "(a)(b)", returns 2.

Returns:

  • (Integer)

    the number of capturing subpatterns



1615
1616
1617
1618
1619
# File 'ext/re2/re2.cc', line 1615

static VALUE re2_regexp_number_of_capturing_groups(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return INT2FIX(p->pattern->NumberOfCapturingGroups());
}

#ok?Boolean

Returns whether or not the regular expression was compiled successfully.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.ok? #=> true

Returns:

  • (Boolean)

    whether or not compilation was successful



1325
1326
1327
1328
1329
# File 'ext/re2/re2.cc', line 1325

static VALUE re2_regexp_ok(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->ok());
}

#one_line?Boolean

Returns whether or not the regular expression was compiled with the one_line option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", one_line: true)
re2.one_line? #=> true

Returns:

  • (Boolean)

    the one_line option



1503
1504
1505
1506
1507
# File 'ext/re2/re2.cc', line 1503

static VALUE re2_regexp_one_line(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().one_line());
}

#optionsHash

Returns a hash of the options currently set for the RE2::Regexp.

Returns:

  • (Hash)

    the options



1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
# File 'ext/re2/re2.cc', line 1565

static VALUE re2_regexp_options(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);
  VALUE options = rb_hash_new();

  rb_hash_aset(options, ID2SYM(id_utf8),
      BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8));

  rb_hash_aset(options, ID2SYM(id_posix_syntax),
      BOOL2RUBY(p->pattern->options().posix_syntax()));

  rb_hash_aset(options, ID2SYM(id_longest_match),
      BOOL2RUBY(p->pattern->options().longest_match()));

  rb_hash_aset(options, ID2SYM(id_log_errors),
      BOOL2RUBY(p->pattern->options().log_errors()));

  rb_hash_aset(options, ID2SYM(id_max_mem),
      INT2FIX(p->pattern->options().max_mem()));

  rb_hash_aset(options, ID2SYM(id_literal),
      BOOL2RUBY(p->pattern->options().literal()));

  rb_hash_aset(options, ID2SYM(id_never_nl),
      BOOL2RUBY(p->pattern->options().never_nl()));

  rb_hash_aset(options, ID2SYM(id_case_sensitive),
      BOOL2RUBY(p->pattern->options().case_sensitive()));

  rb_hash_aset(options, ID2SYM(id_perl_classes),
      BOOL2RUBY(p->pattern->options().perl_classes()));

  rb_hash_aset(options, ID2SYM(id_word_boundary),
      BOOL2RUBY(p->pattern->options().word_boundary()));

  rb_hash_aset(options, ID2SYM(id_one_line),
      BOOL2RUBY(p->pattern->options().one_line()));

  /* This is a read-only hash after all... */
  rb_obj_freeze(options);

  return options;
}

#partial_match(text, options = {}) ⇒ RE2::MatchData, ...

Match the pattern against any substring of the given text and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.partial_match('woot')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.partial_match('nope')                #=> nil
r.partial_match('woot', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.partial_match('woot', submatches: 0) #=> true

Parameters:

  • text (String)

    the text to search

  • options (Hash) (defaults to: {})

    the options with which to perform the match

Options Hash (options):

  • :submatches (Integer)

    how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

  • (RE2::MatchData, nil)

    if extracting any submatches

  • (Boolean)

    if not extracting any submatches

Raises:

  • (ArgumentError)

    if given a negative number of submatches

  • (NoMemoryError)

    if there was not enough memory to allocate the matches

  • (TypeError)

    if given non-numeric submatches or non-hash options



39
40
41
# File 'lib/re2/regexp.rb', line 39

def partial_match(text, options = {})
  match(text, Hash(options).merge(anchor: :unanchored))
end

#partial_match?(text) ⇒ Boolean

Returns true if the pattern matches any substring of the given text using PartialMatch.

Parameters:

  • text (String)

    the text to search

Returns:

  • (Boolean)

    whether the match was successful

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1884
1885
1886
1887
1888
1889
1890
1891
1892
# File 'ext/re2/re2.cc', line 1884

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#patternString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



1309
1310
1311
1312
1313
1314
1315
# File 'ext/re2/re2.cc', line 1309

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#perl_classes?Boolean

Returns whether or not the regular expression was compiled with the perl_classes option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", perl_classes: true)
re2.perl_classes? #=> true

Returns:

  • (Boolean)

    the perl_classes option



1473
1474
1475
1476
1477
# File 'ext/re2/re2.cc', line 1473

static VALUE re2_regexp_perl_classes(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().perl_classes());
}

#posix_syntax?Boolean

Returns whether or not the regular expression was compiled with the posix_syntax option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", posix_syntax: true)
re2.posix_syntax? #=> true

Returns:

  • (Boolean)

    the posix_syntax option



1355
1356
1357
1358
1359
# File 'ext/re2/re2.cc', line 1355

static VALUE re2_regexp_posix_syntax(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().posix_syntax());
}

#program_sizeInteger

Returns the program size, a very approximate measure of a regexp's "cost". Larger numbers are more expensive than smaller numbers.

Returns:

  • (Integer)

    the regexp "cost"



1554
1555
1556
1557
1558
# File 'ext/re2/re2.cc', line 1554

static VALUE re2_regexp_program_size(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return INT2FIX(p->pattern->ProgramSize());
}

#scan(text) ⇒ RE2::Scanner

Returns a Scanner for scanning the given text incrementally with FindAndConsume.

Examples:

c = RE2::Regexp.new('(\w+)').scan("Foo bar baz")
#=> #<RE2::Scanner:0x0000000000000001>

Parameters:

  • text (text)

    the text to scan incrementally

Returns:

Raises:

  • (TypeError)

    if text cannot be coerced to a String



1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
1936
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
# File 'ext/re2/re2.cc', line 1925

static VALUE re2_regexp_scan(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p = unwrap_re2_regexp(self);
  re2_scanner *c;
  VALUE scanner = rb_class_new_instance(0, 0, re2_cScanner);
  TypedData_Get_Struct(scanner, re2_scanner, &re2_scanner_data_type, c);

  RB_OBJ_WRITE(scanner, &c->regexp, self);
  RB_OBJ_WRITE(scanner, &c->text, rb_str_new_frozen(text));
  c->input = new(std::nothrow) re2::StringPiece(
      RSTRING_PTR(c->text), RSTRING_LEN(c->text));
  if (c->input == 0) {
    rb_raise(rb_eNoMemError,
             "not enough memory to allocate StringPiece for input");
  }

  if (p->pattern->ok()) {
    c->number_of_capturing_groups = p->pattern->NumberOfCapturingGroups();
  } else {
    c->number_of_capturing_groups = 0;
  }

  c->eof = false;

  return scanner;
}

#sourceString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



1309
1310
1311
1312
1313
1314
1315
# File 'ext/re2/re2.cc', line 1309

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_sString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



1309
1310
1311
1312
1313
1314
1315
# File 'ext/re2/re2.cc', line 1309

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_strString

Returns a string version of the regular expression.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

  • (String)

    a string version of the regular expression



1309
1310
1311
1312
1313
1314
1315
# File 'ext/re2/re2.cc', line 1309

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#utf8?Boolean

Returns whether or not the regular expression was compiled with the utf8 option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", utf8: true)
re2.utf8? #=> true

Returns:

  • (Boolean)

    the utf8 option



1340
1341
1342
1343
1344
# File 'ext/re2/re2.cc', line 1340

static VALUE re2_regexp_utf8(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8);
}

#word_boundary?Boolean

Returns whether or not the regular expression was compiled with the word_boundary option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", word_boundary: true)
re2.word_boundary? #=> true

Returns:

  • (Boolean)

    the word_boundary option



1488
1489
1490
1491
1492
# File 'ext/re2/re2.cc', line 1488

static VALUE re2_regexp_word_boundary(const VALUE self) {
  re2_pattern *p = unwrap_re2_regexp(self);

  return BOOL2RUBY(p->pattern->options().word_boundary());
}