Class: RE2::Regexp

Inherits:

Object

Object
RE2::Regexp

show all

Defined in:: ext/re2/re2.cc,
lib/re2/regexp.rb more...

Class Method Summary collapse

.compile ⇒ Object
Returns a new Regexp object with a compiled version of pattern stored inside.
.escape(unquoted) ⇒ String
Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta.
.match_has_endpos_argument? ⇒ Boolean
Returns whether the underlying RE2 version supports passing an endpos argument to Match.
.quote(unquoted) ⇒ String
Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta.

Instance Method Summary collapse

#===(text) ⇒ Boolean
Returns true if the pattern matches any substring of the given text using PartialMatch.
#=~(text) ⇒ Boolean
Returns true if the pattern matches any substring of the given text using PartialMatch.
#case_insensitive? ⇒ Boolean
Returns whether or not the regular expression was compiled with the case_sensitive option set to false.
#case_sensitive? ⇒ Boolean
Returns whether or not the regular expression was compiled with the case_sensitive option set to true.
#casefold? ⇒ Boolean
Returns whether or not the regular expression was compiled with the case_sensitive option set to false.
#error ⇒ String^?
If the Regexp could not be created properly, returns an error string otherwise returns nil.
#error_arg ⇒ String^?
If the Regexp could not be created properly, returns the offending portion of the regexp otherwise returns nil.
#full_match(text, options = {}) ⇒ RE2::MatchData, ...
Match the pattern against the given text exactly and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).
#full_match?(text) ⇒ Boolean
Returns true if the pattern matches the given text using FullMatch.
#initialize(*args) ⇒ Object constructor
Returns a new Regexp object with a compiled version of pattern stored inside.
#inspect ⇒ String
Returns a printable version of the regular expression.
#literal? ⇒ Boolean
Returns whether or not the regular expression was compiled with the literal option set to true.
#log_errors? ⇒ Boolean
Returns whether or not the regular expression was compiled with the log_errors option set to true.
#longest_match? ⇒ Boolean
Returns whether or not the regular expression was compiled with the longest_match option set to true.
#match(*args) ⇒ Object
General matching: match the pattern against the given text using Match and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).
#match?(text) ⇒ Boolean
Returns true if the pattern matches any substring of the given text using PartialMatch.
#max_mem ⇒ Integer
Returns the max_mem setting for the regular expression.
#named_capturing_groups ⇒ Hash
Returns a hash of names to capturing indices of groups.
#never_nl? ⇒ Boolean
Returns whether or not the regular expression was compiled with the never_nl option set to true.
#number_of_capturing_groups ⇒ Integer
Returns the number of capturing subpatterns, or -1 if the regexp wasn't valid on construction.
#ok? ⇒ Boolean
Returns whether or not the regular expression was compiled successfully.
#one_line? ⇒ Boolean
Returns whether or not the regular expression was compiled with the one_line option set to true.
#options ⇒ Hash
Returns a hash of the options currently set for the Regexp.
#partial_match(text, options = {}) ⇒ RE2::MatchData, ...
Match the pattern against any substring of the given text and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).
#partial_match?(text) ⇒ Boolean
Returns true if the pattern matches any substring of the given text using PartialMatch.
#pattern ⇒ String
Returns a string version of the regular expression.
#perl_classes? ⇒ Boolean
Returns whether or not the regular expression was compiled with the perl_classes option set to true.
#posix_syntax? ⇒ Boolean
Returns whether or not the regular expression was compiled with the posix_syntax option set to true.
#program_size ⇒ Integer
Returns the program size, a very approximate measure of a regexp's "cost".
#scan(text) ⇒ RE2::Scanner
Returns a Scanner for scanning the given text incrementally with FindAndConsume.
#source ⇒ String
Returns a string version of the regular expression.
#to_s ⇒ String
Returns a string version of the regular expression.
#to_str ⇒ String
Returns a string version of the regular expression.
#utf8? ⇒ Boolean
Returns whether or not the regular expression was compiled with the utf8 option set to true.
#word_boundary? ⇒ Boolean
Returns whether or not the regular expression was compiled with the word_boundary option set to true.

Constructor Details

#initialize(pattern) ⇒ `RE2::Regexp` #initialize(pattern, options) ⇒ `RE2::Regexp`

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

#initialize(pattern) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.
Parameters:
- pattern (String) —
  the pattern to compile
Raises:
- (TypeError) —
  if the given pattern can't be coerced to a String
- (NoMemoryError) —
  if memory could not be allocated for the compiled pattern
#initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.
Parameters:
- pattern (String) —
  the pattern to compile
- options (Hash) —
  the options with which to compile the pattern
Options Hash (options):
- :utf8 (Boolean) — default: true —
  text and pattern are UTF-8; otherwise Latin-1
- :posix_syntax (Boolean) — default: false —
  restrict regexps to POSIX egrep syntax
- :longest_match (Boolean) — default: false —
  search for longest match, not first match
- :log_errors (Boolean) — default: true —
  log syntax and execution errors to ERROR
- :max_mem (Integer) —
  approx. max memory footprint of RE2
- :literal (Boolean) — default: false —
  interpret string as literal, not regexp
- :never_nl (Boolean) — default: false —
  never match \n, even if it is in regexp
- :case_sensitive (Boolean) — default: true —
  match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)
- :perl_classes (Boolean) — default: false —
  allow Perl's \d \s \w \D \S \W when in posix_syntax mode
- :word_boundary (Boolean) — default: false —
  allow \b \B (word boundary and not) when in posix_syntax mode
- :one_line (Boolean) — default: false —
  ^ and $ only match beginning and end of text when in posix_syntax mode
Raises:
- (TypeError) —
  if the given pattern can't be coerced to a String
- (NoMemoryError) —
  if memory could not be allocated for the compiled pattern

[View source]

# File 'ext/re2/re2.cc', line 912

static VALUE re2_regexp_initialize(int argc, VALUE *argv, VALUE self) {
  VALUE pattern, options;
  re2_pattern *p;

  rb_scan_args(argc, argv, "11", &pattern, &options);

  /* Ensure pattern is a string. */
  StringValue(pattern);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (RTEST(options)) {
    RE2::Options re2_options;
    parse_re2_options(&re2_options, options);

    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)), re2_options);
  } else {
    p->pattern = new(std::nothrow) RE2(
        re2::StringPiece(RSTRING_PTR(pattern), RSTRING_LEN(pattern)));
  }

  if (p->pattern == 0) {
    rb_raise(rb_eNoMemError, "not enough memory to allocate RE2 object");
  }

  return self;
}

Class Method Details

.initialize(pattern) ⇒ `RE2::Regexp` .initialize(pattern, options) ⇒ `RE2::Regexp`

Returns a new RE2::Regexp object with a compiled version of pattern stored inside.

Overloads:

.initialize(pattern) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the default options.
Parameters:
- pattern (String) —
  the pattern to compile
Returns:
- (RE2::Regexp) —
  a RE2::Regexp with the specified pattern
Raises:
- (TypeError) —
  if the given pattern can't be coerced to a String
- (NoMemoryError) —
  if memory could not be allocated for the compiled pattern
.initialize(pattern, options) ⇒ RE2::Regexp

Returns a new RE2::Regexp object with a compiled version of pattern stored inside with the specified options.
Parameters:
- pattern (String) —
  the pattern to compile
- options (Hash) —
  the options with which to compile the pattern
Options Hash (options):
- :utf8 (Boolean) — default: true —
  text and pattern are UTF-8; otherwise Latin-1
- :posix_syntax (Boolean) — default: false —
  restrict regexps to POSIX egrep syntax
- :longest_match (Boolean) — default: false —
  search for longest match, not first match
- :log_errors (Boolean) — default: true —
  log syntax and execution errors to ERROR
- :max_mem (Integer) —
  approx. max memory footprint of RE2
- :literal (Boolean) — default: false —
  interpret string as literal, not regexp
- :never_nl (Boolean) — default: false —
  never match \n, even if it is in regexp
- :case_sensitive (Boolean) — default: true —
  match is case-sensitive (regexp can override with (?i) unless in posix_syntax mode)
- :perl_classes (Boolean) — default: false —
  allow Perl's \d \s \w \D \S \W when in posix_syntax mode
- :word_boundary (Boolean) — default: false —
  allow \b \B (word boundary and not) when in posix_syntax mode
- :one_line (Boolean) — default: false —
  ^ and $ only match beginning and end of text when in posix_syntax mode
Returns:
- (RE2::Regexp) —
  a RE2::Regexp with the specified pattern and options
Raises:
- (TypeError) —
  if the given pattern can't be coerced to a String
- (NoMemoryError) —
  if memory could not be allocated for the compiled pattern

.escape(unquoted) ⇒ `String`

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2::Regexp.escape("1.5-2.0?") #=> "1\.5\-2\.0\?"

Parameters:

unquoted (String) —
the unquoted string

Returns:

(String) —
the escaped string

Raises:

(TypeError) —
if the given unquoted string cannot be coerced to a String

[View source]

# File 'ext/re2/re2.cc', line 1783

static VALUE re2_QuoteMeta(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

.match_has_endpos_argument? ⇒ `Boolean`

Returns whether the underlying RE2 version supports passing an endpos argument to Match. If not, #match will raise an error if attempting to pass an endpos.

Returns:

(Boolean) —
whether the underlying Match has an endpos argument

[View source]

# File 'ext/re2/re2.cc', line 1656

static VALUE re2_regexp_match_has_endpos_argument_p(VALUE) {
#ifdef HAVE_ENDPOS_ARGUMENT
  return Qtrue;
#else
  return Qfalse;
#endif
}

.quote(unquoted) ⇒ `String`

Returns a version of str with all potentially meaningful regexp characters escaped using QuoteMeta. The returned string, used as a regular expression, will exactly match the original string.

Examples:

RE2::Regexp.escape("1.5-2.0?") #=> "1\.5\-2\.0\?"

Parameters:

unquoted (String) —
the unquoted string

Returns:

(String) —
the escaped string

Raises:

(TypeError) —
if the given unquoted string cannot be coerced to a String

[View source]

# File 'ext/re2/re2.cc', line 1783

static VALUE re2_QuoteMeta(VALUE, VALUE unquoted) {
  StringValue(unquoted);

  std::string quoted_string = RE2::QuoteMeta(
      re2::StringPiece(RSTRING_PTR(unquoted), RSTRING_LEN(unquoted)));

  return rb_str_new(quoted_string.data(), quoted_string.size());
}

Instance Method Details

#===(text) ⇒ `Boolean`

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

(Boolean) —
whether the match was successful

Raises:

(TypeError) —
if text cannot be coerced to a String

[View source]

# File 'ext/re2/re2.cc', line 1574

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#=~(text) ⇒ `Boolean`

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

(Boolean) —
whether the match was successful

Raises:

(TypeError) —
if text cannot be coerced to a String

[View source]

# File 'ext/re2/re2.cc', line 1574

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#case_insensitive? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

(Boolean) —
the inverse of the case_sensitive option

[View source]


1140
1141
1142

# File 'ext/re2/re2.cc', line 1140

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#case_sensitive? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the case_sensitive option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_sensitive? #=> true

Returns:

(Boolean) —
the case_sensitive option

[View source]

# File 'ext/re2/re2.cc', line 1123

static VALUE re2_regexp_case_sensitive(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().case_sensitive());
}

#casefold? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the case_sensitive option set to false.

Examples:

re2 = RE2::Regexp.new("woo?", case_sensitive: true)
re2.case_insensitive? #=> false
re2.casefold?         #=> false

Returns:

(Boolean) —
the inverse of the case_sensitive option

[View source]


1140
1141
1142

# File 'ext/re2/re2.cc', line 1140

static VALUE re2_regexp_case_insensitive(const VALUE self) {
  return BOOL2RUBY(re2_regexp_case_sensitive(self) != Qtrue);
}

#error ⇒ `String`^?

If the RE2::Regexp could not be created properly, returns an error string otherwise returns nil.

Returns:

(String, nil) —
the error string or nil

[View source]

# File 'ext/re2/re2.cc', line 1198

static VALUE re2_regexp_error(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return rb_str_new(p->pattern->error().data(), p->pattern->error().size());
  }
}

#error_arg ⇒ `String`^?

If the RE2::Regexp could not be created properly, returns the offending portion of the regexp otherwise returns nil.

Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

Returns:

(String, nil) —
the offending portion of the regexp or nil

[View source]

# File 'ext/re2/re2.cc', line 1219

static VALUE re2_regexp_error_arg(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  if (p->pattern->ok()) {
    return Qnil;
  } else {
    return encoded_str_new(p->pattern->error_arg().data(),
        p->pattern->error_arg().size(),
        p->pattern->options().encoding());
  }
}

#full_match(text, options = {}) ⇒ `RE2::MatchData`, ...

Match the pattern against the given text exactly and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.full_match('woo')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.full_match('woot')               #=> nil
r.full_match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.full_match('woo', submatches: 0) #=> true

Parameters:

text (String) —
the text to search
options (Hash) (defaults to: {}) —
the options with which to perform the match

Options Hash (options):

:submatches (Integer) —
how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

(RE2::MatchData, nil) —
if extracting any submatches
(Boolean) —
if not extracting any submatches

Raises:

(ArgumentError) —
if given a negative number of submatches
(NoMemoryError) —
if there was not enough memory to allocate the matches
(TypeError) —
if given non-numeric submatches or non-hash options

[View source]


68
69
70

# File 'lib/re2/regexp.rb', line 68

def full_match(text, options = {})
  match(text, Hash(options).merge(anchor: :anchor_both))
end

#full_match?(text) ⇒ `Boolean`

Returns true if the pattern matches the given text using FullMatch.

Returns:

(Boolean) —
whether the match was successful

Raises:

(TypeError) —
if text cannot be coerced to a String

[View source]

# File 'ext/re2/re2.cc', line 1594

static VALUE re2_regexp_full_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::FullMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#inspect ⇒ `String`

Returns a printable version of the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.inspect #=> "#<RE2::Regexp /woo?/>"

Returns:

(String) —
a printable version of the regular expression

[View source]

# File 'ext/re2/re2.cc', line 954

static VALUE re2_regexp_inspect(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  std::ostringstream output;

  output << "#<RE2::Regexp /" << p->pattern->pattern() << "/>";

  return encoded_str_new(output.str().data(), output.str().length(),
      p->pattern->options().encoding());
}

#literal? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the literal option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", literal: true)
re2.literal? #=> true

Returns:

(Boolean) —
the literal option

[View source]

# File 'ext/re2/re2.cc', line 1091

static VALUE re2_regexp_literal(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().literal());
}

#log_errors? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the log_errors option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", log_errors: true)
re2.log_errors? #=> true

Returns:

(Boolean) —
the log_errors option

[View source]

# File 'ext/re2/re2.cc', line 1060

static VALUE re2_regexp_log_errors(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().log_errors());
}

#longest_match? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the longest_match option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", longest_match: true)
re2.longest_match? #=> true

Returns:

(Boolean) —
the longest_match option

[View source]

# File 'ext/re2/re2.cc', line 1044

static VALUE re2_regexp_longest_match(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().longest_match());
}

#match(text) ⇒ `RE2::MatchData`, ... #match(text, options) ⇒ `RE2::MatchData`, ... #match(text, submatches) ⇒ `RE2::MatchData`, ...

General matching: match the pattern against the given text using Match and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Overloads:

#match(text) ⇒ RE2::MatchData, ...

Returns a MatchData containing the matching pattern and all submatches resulting from looking for the regexp in text if the pattern contains capturing groups.

Returns either true or false indicating whether a successful match was made if the pattern contains no capturing groups.
Examples:

Matching with capturing groups
```
r = RE2::Regexp.new('w(o)(o)')
r.match('woo') #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
```
Matching without capturing groups
```
r = RE2::Regexp.new('woo')
r.match('woo') #=> true
```
Parameters:
- text (String) —
  the text to search
Returns:
- (RE2::MatchData, nil) —
  if the pattern contains capturing groups
- (Boolean) —
  if the pattern does not contain capturing groups
Raises:
- (NoMemoryError) —
  if there was not enough memory to allocate the submatches
- (TypeError) —
  if given text that cannot be coerced to a String
#match(text, options) ⇒ RE2::MatchData, ...

See match(text) but with customisable offsets for starting and ending matches, optional anchoring to the start or both ends of the text and a specific number of submatches to extract (padded with nils if necessary).
Examples:

Matching with capturing groups
```
r = RE2::Regexp.new('w(o)(o)')
r.match('woo', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.match('woo', submatches: 3) #=> #<RE2::MatchData "woo" 1:"o" 2:"o" 3:nil>
r.match('woot', anchor: :anchor_both, submatches: 0)
#=> false
r.match('woot', anchor: :anchor_start, submatches: 0)
#=> true
```
Matching without capturing groups
```
r = RE2::Regexp.new('wo+')
r.match('woot', anchor: :anchor_both)  #=> false
r.match('woot', anchor: :anchor_start) #=> true
```
Parameters:
- text (String) —
  the text to search
- options (Hash) —
  the options with which to perform the match
Options Hash (options):
- :startpos (Integer) — default: 0 —
  offset at which to start matching
- :endpos (Integer) —
  offset at which to stop matching, defaults to the text length
- :anchor (Symbol) — default: :unanchored —
  one of :unanchored, :anchor_start, :anchor_both to anchor the match
- :submatches (Integer) —
  how many submatches to extract (0 is fastest), defaults to the number of capturing groups
Returns:
- (RE2::MatchData, nil) —
  if extracting any submatches
- (Boolean) —
  if not extracting any submatches
Raises:
- (ArgumentError) —
  if given a negative number of submatches, invalid anchor or invalid startpos, endpos pair
- (NoMemoryError) —
  if there was not enough memory to allocate the matches
- (TypeError) —
  if given non-String text, non-numeric number of submatches, non-symbol anchor or non-hash options
- (RE2::Regexp::UnsupportedError) —
  if given an endpos argument on a version of RE2 that does not support it
#match(text, submatches) ⇒ RE2::MatchData, ...

Deprecated.
Legacy syntax for matching against text with a specific number of submatches to extract. Use match(text, submatches: n) instead.
Examples:
```
r = RE2::Regexp.new('w(o)(o)')
r.match('woo', 0) #=> true
r.match('woo', 1) #=> #<RE2::MatchData "woo" 1:"o">
r.match('woo', 2) #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
```
Parameters:
- text (String) —
  the text to search
- submatches (Integer) —
  the number of submatches to extract
Returns:
- (RE2::MatchData, nil) —
  if extracting any submatches
- (Boolean) —
  if not extracting any submatches
Raises:
- (NoMemoryError) —
  if there was not enough memory to allocate the submatches
- (TypeError) —
  if given non-numeric number of submatches

[View source]

# File 'ext/re2/re2.cc', line 1418

static VALUE re2_regexp_match(int argc, VALUE *argv, const VALUE self) {
  re2_pattern *p;
  re2_matchdata *m;
  VALUE text, options;

  rb_scan_args(argc, argv, "11", &text, &options);

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  int n;
  int startpos = 0;
  int endpos = RSTRING_LEN(text);
  RE2::Anchor anchor = RE2::UNANCHORED;

  if (RTEST(options)) {
    if (FIXNUM_P(options)) {
      n = NUM2INT(options);

      if (n < 0) {
        rb_raise(rb_eArgError, "number of matches should be >= 0");
      }
    } else {
      if (TYPE(options) != T_HASH) {
        options = rb_Hash(options);
      }

      VALUE endpos_option = rb_hash_aref(options, ID2SYM(id_endpos));
      if (!NIL_P(endpos_option)) {
#ifdef HAVE_ENDPOS_ARGUMENT
        Check_Type(endpos_option, T_FIXNUM);

        endpos = NUM2INT(endpos_option);

        if (endpos < 0) {
          rb_raise(rb_eArgError, "endpos should be >= 0");
        }
#else
        rb_raise(re2_eRegexpUnsupportedError, "current version of RE2::Match() does not support endpos argument");
#endif
      }

      VALUE anchor_option = rb_hash_aref(options, ID2SYM(id_anchor));
      if (!NIL_P(anchor_option)) {
        Check_Type(anchor_option, T_SYMBOL);

        ID id_anchor_option = SYM2ID(anchor_option);
        if (id_anchor_option == id_unanchored) {
          anchor = RE2::UNANCHORED;
        } else if (id_anchor_option == id_anchor_start) {
          anchor = RE2::ANCHOR_START;
        } else if (id_anchor_option == id_anchor_both) {
          anchor = RE2::ANCHOR_BOTH;
        } else {
          rb_raise(rb_eArgError, "anchor should be one of: :unanchored, :anchor_start, :anchor_both");
        }
      }

      VALUE submatches_option = rb_hash_aref(options, ID2SYM(id_submatches));
      if (!NIL_P(submatches_option)) {
        Check_Type(submatches_option, T_FIXNUM);

        n = NUM2INT(submatches_option);

        if (n < 0) {
          rb_raise(rb_eArgError, "number of matches should be >= 0");
        }
      } else {
        if (!p->pattern->ok()) {
          return Qnil;
        }

        n = p->pattern->NumberOfCapturingGroups();
      }

      VALUE startpos_option = rb_hash_aref(options, ID2SYM(id_startpos));
      if (!NIL_P(startpos_option)) {
        Check_Type(startpos_option, T_FIXNUM);

        startpos = NUM2INT(startpos_option);

        if (startpos < 0) {
          rb_raise(rb_eArgError, "startpos should be >= 0");
        }
      }
    }
  } else {
    if (!p->pattern->ok()) {
      return Qnil;
    }

    n = p->pattern->NumberOfCapturingGroups();
  }

  if (startpos > endpos) {
    rb_raise(rb_eArgError, "startpos should be <= endpos");
  }

  if (n == 0) {
#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, endpos, anchor, 0, 0);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)),
        startpos, anchor, 0, 0);
#endif
    return BOOL2RUBY(matched);
  } else {
    /* Because match returns the whole match as well. */
    n += 1;

    VALUE matchdata = rb_class_new_instance(0, 0, re2_cMatchData);
    TypedData_Get_Struct(matchdata, re2_matchdata, &re2_matchdata_data_type, m);
    m->matches = new(std::nothrow) re2::StringPiece[n];
    RB_OBJ_WRITE(matchdata, &m->regexp, self);
    if (!RTEST(rb_obj_frozen_p(text))) {
      text = rb_str_freeze(rb_str_dup(text));
    }
    RB_OBJ_WRITE(matchdata, &m->text, text);

    if (m->matches == 0) {
      rb_raise(rb_eNoMemError,
               "not enough memory to allocate StringPieces for matches");
    }

    m->number_of_matches = n;

#ifdef HAVE_ENDPOS_ARGUMENT
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(m->text), RSTRING_LEN(m->text)),
        startpos, endpos, anchor, m->matches, n);
#else
    bool matched = p->pattern->Match(
        re2::StringPiece(RSTRING_PTR(m->text), RSTRING_LEN(m->text)),
        startpos, anchor, m->matches, n);
#endif
    if (matched) {
      return matchdata;
    } else {
      return Qnil;
    }
  }
}

#match?(text) ⇒ `Boolean`

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

(Boolean) —
whether the match was successful

Raises:

(TypeError) —
if text cannot be coerced to a String

[View source]

# File 'ext/re2/re2.cc', line 1574

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#max_mem ⇒ `Integer`

Returns the max_mem setting for the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?", max_mem: 1024)
re2.max_mem #=> 1024

Returns:

(Integer) —
the max_mem option

[View source]

# File 'ext/re2/re2.cc', line 1075

static VALUE re2_regexp_max_mem(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->options().max_mem());
}

#named_capturing_groups ⇒ `Hash`

Returns a hash of names to capturing indices of groups.

Returns:

(Hash) —
a hash of names to capturing indices

[View source]

# File 'ext/re2/re2.cc', line 1319

static VALUE re2_regexp_named_capturing_groups(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  const std::map<std::string, int>& groups = p->pattern->NamedCapturingGroups();
  VALUE capturing_groups = rb_hash_new();

  for (std::map<std::string, int>::const_iterator it = groups.begin(); it != groups.end(); ++it) {
    rb_hash_aset(capturing_groups,
        encoded_str_new(it->first.data(), it->first.size(),
          p->pattern->options().encoding()),
        INT2FIX(it->second));
  }

  return capturing_groups;
}

#never_nl? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the never_nl option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", never_nl: true)
re2.never_nl? #=> true

Returns:

(Boolean) —
the never_nl option

[View source]

# File 'ext/re2/re2.cc', line 1107

static VALUE re2_regexp_never_nl(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().never_nl());
}

#number_of_capturing_groups ⇒ `Integer`

Returns the number of capturing subpatterns, or -1 if the regexp wasn't valid on construction. The overall match ($0) does not count: if the regexp is "(a)(b)", returns 2.

Returns:

(Integer) —
the number of capturing subpatterns

[View source]

# File 'ext/re2/re2.cc', line 1303

static VALUE re2_regexp_number_of_capturing_groups(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->NumberOfCapturingGroups());
}

#ok? ⇒ `Boolean`

Returns whether or not the regular expression was compiled successfully.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.ok? #=> true

Returns:

(Boolean) —
whether or not compilation was successful

[View source]

# File 'ext/re2/re2.cc', line 996

static VALUE re2_regexp_ok(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->ok());
}

#one_line? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the one_line option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", one_line: true)
re2.one_line? #=> true

Returns:

(Boolean) —
the one_line option

[View source]

# File 'ext/re2/re2.cc', line 1185

static VALUE re2_regexp_one_line(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().one_line());
}

#options ⇒ `Hash`

Returns a hash of the options currently set for the RE2::Regexp.

Returns:

(Hash) —
the options

[View source]

# File 'ext/re2/re2.cc', line 1251

static VALUE re2_regexp_options(const VALUE self) {
  re2_pattern *p;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  VALUE options = rb_hash_new();

  rb_hash_aset(options, ID2SYM(id_utf8),
      BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8));

  rb_hash_aset(options, ID2SYM(id_posix_syntax),
      BOOL2RUBY(p->pattern->options().posix_syntax()));

  rb_hash_aset(options, ID2SYM(id_longest_match),
      BOOL2RUBY(p->pattern->options().longest_match()));

  rb_hash_aset(options, ID2SYM(id_log_errors),
      BOOL2RUBY(p->pattern->options().log_errors()));

  rb_hash_aset(options, ID2SYM(id_max_mem),
      INT2FIX(p->pattern->options().max_mem()));

  rb_hash_aset(options, ID2SYM(id_literal),
      BOOL2RUBY(p->pattern->options().literal()));

  rb_hash_aset(options, ID2SYM(id_never_nl),
      BOOL2RUBY(p->pattern->options().never_nl()));

  rb_hash_aset(options, ID2SYM(id_case_sensitive),
      BOOL2RUBY(p->pattern->options().case_sensitive()));

  rb_hash_aset(options, ID2SYM(id_perl_classes),
      BOOL2RUBY(p->pattern->options().perl_classes()));

  rb_hash_aset(options, ID2SYM(id_word_boundary),
      BOOL2RUBY(p->pattern->options().word_boundary()));

  rb_hash_aset(options, ID2SYM(id_one_line),
      BOOL2RUBY(p->pattern->options().one_line()));

  /* This is a read-only hash after all... */
  rb_obj_freeze(options);

  return options;
}

#partial_match(text, options = {}) ⇒ `RE2::MatchData`, ...

Match the pattern against any substring of the given text and return a MatchData instance with the specified number of submatches (defaults to the total number of capturing groups) or a boolean (if no submatches are required).

The number of submatches has a significant impact on performance: requesting one submatch is much faster than requesting more than one and requesting zero submatches is faster still.

Examples:

r = RE2::Regexp.new('w(o)(o)')
r.partial_match('woot')                #=> #<RE2::MatchData "woo" 1:"o" 2:"o">
r.partial_match('nope')                #=> nil
r.partial_match('woot', submatches: 1) #=> #<RE2::MatchData "woo" 1:"o">
r.partial_match('woot', submatches: 0) #=> true

Parameters:

text (String) —
the text to search
options (Hash) (defaults to: {}) —
the options with which to perform the match

Options Hash (options):

:submatches (Integer) —
how many submatches to extract (0 is fastest), defaults to the total number of capturing groups

Returns:

(RE2::MatchData, nil) —
if extracting any submatches
(Boolean) —
if not extracting any submatches

Raises:

(ArgumentError) —
if given a negative number of submatches
(NoMemoryError) —
if there was not enough memory to allocate the matches
(TypeError) —
if given non-numeric submatches or non-hash options

[View source]


39
40
41

# File 'lib/re2/regexp.rb', line 39

def partial_match(text, options = {})
  match(text, Hash(options).merge(anchor: :unanchored))
end

#partial_match?(text) ⇒ `Boolean`

Returns true if the pattern matches any substring of the given text using PartialMatch.

Returns:

(Boolean) —
whether the match was successful

Raises:

(TypeError) —
if text cannot be coerced to a String

[View source]

# File 'ext/re2/re2.cc', line 1574

static VALUE re2_regexp_match_p(const VALUE self, VALUE text) {
  re2_pattern *p;

  /* Ensure text is a string. */
  StringValue(text);

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(RE2::PartialMatch(
        re2::StringPiece(RSTRING_PTR(text), RSTRING_LEN(text)), *p->pattern));
}

#pattern ⇒ `String`

Returns a string version of the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

(String) —
a string version of the regular expression

[View source]

# File 'ext/re2/re2.cc', line 979

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#perl_classes? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the perl_classes option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", perl_classes: true)
re2.perl_classes? #=> true

Returns:

(Boolean) —
the perl_classes option

[View source]

# File 'ext/re2/re2.cc', line 1153

static VALUE re2_regexp_perl_classes(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().perl_classes());
}

#posix_syntax? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the posix_syntax option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", posix_syntax: true)
re2.posix_syntax? #=> true

Returns:

(Boolean) —
the posix_syntax option

[View source]

# File 'ext/re2/re2.cc', line 1028

static VALUE re2_regexp_posix_syntax(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().posix_syntax());
}

#program_size ⇒ `Integer`

Returns the program size, a very approximate measure of a regexp's "cost". Larger numbers are more expensive than smaller numbers.

Returns:

(Integer) —
the regexp "cost"

[View source]

# File 'ext/re2/re2.cc', line 1239

static VALUE re2_regexp_program_size(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return INT2FIX(p->pattern->ProgramSize());
}

#scan(text) ⇒ `RE2::Scanner`

Returns a Scanner for scanning the given text incrementally with FindAndConsume.

Examples:

c = RE2::Regexp.new('(\w+)').scan("Foo bar baz")
#=> #<RE2::Scanner:0x0000000000000001>

Parameters:

text (text) —
the text to scan incrementally

Returns:

(RE2::Scanner) —
an Enumerable Scanner object

Raises:

(TypeError) —
if text cannot be coerced to a String

[View source]

# File 'ext/re2/re2.cc', line 1618

static VALUE re2_regexp_scan(const VALUE self, VALUE text) {
  /* Ensure text is a string. */
  StringValue(text);

  re2_pattern *p;
  re2_scanner *c;

  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);
  VALUE scanner = rb_class_new_instance(0, 0, re2_cScanner);
  TypedData_Get_Struct(scanner, re2_scanner, &re2_scanner_data_type, c);

  c->input = new(std::nothrow) re2::StringPiece(
      RSTRING_PTR(text), RSTRING_LEN(text));
  RB_OBJ_WRITE(scanner, &c->regexp, self);
  RB_OBJ_WRITE(scanner, &c->text, text);

  if (p->pattern->ok()) {
    c->number_of_capturing_groups = p->pattern->NumberOfCapturingGroups();
  } else {
    c->number_of_capturing_groups = 0;
  }

  c->eof = false;

  return scanner;
}

#source ⇒ `String`

Returns a string version of the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

(String) —
a string version of the regular expression

[View source]

# File 'ext/re2/re2.cc', line 979

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_s ⇒ `String`

Returns a string version of the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

(String) —
a string version of the regular expression

[View source]

# File 'ext/re2/re2.cc', line 979

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#to_str ⇒ `String`

Returns a string version of the regular expression.

Examples:

re2 = RE2::Regexp.new("woo?")
re2.to_s #=> "woo?"

Returns:

(String) —
a string version of the regular expression

[View source]

# File 'ext/re2/re2.cc', line 979

static VALUE re2_regexp_to_s(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return encoded_str_new(p->pattern->pattern().data(),
      p->pattern->pattern().size(),
      p->pattern->options().encoding());
}

#utf8? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the utf8 option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", utf8: true)
re2.utf8? #=> true

Returns:

(Boolean) —
the utf8 option

[View source]

# File 'ext/re2/re2.cc', line 1012

static VALUE re2_regexp_utf8(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().encoding() == RE2::Options::EncodingUTF8);
}

#word_boundary? ⇒ `Boolean`

Returns whether or not the regular expression was compiled with the word_boundary option set to true.

Examples:

re2 = RE2::Regexp.new("woo?", word_boundary: true)
re2.word_boundary? #=> true

Returns:

(Boolean) —
the word_boundary option

[View source]

# File 'ext/re2/re2.cc', line 1169

static VALUE re2_regexp_word_boundary(const VALUE self) {
  re2_pattern *p;
  TypedData_Get_Struct(self, re2_pattern, &re2_regexp_data_type, p);

  return BOOL2RUBY(p->pattern->options().word_boundary());
}

Class: RE2::Regexp

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pattern) ⇒ RE2::Regexp #initialize(pattern, options) ⇒ RE2::Regexp

Class Method Details

.initialize(pattern) ⇒ RE2::Regexp .initialize(pattern, options) ⇒ RE2::Regexp

.escape(unquoted) ⇒ String

Examples:

.match_has_endpos_argument? ⇒ Boolean

.quote(unquoted) ⇒ String

Examples:

Instance Method Details

#===(text) ⇒ Boolean

#=~(text) ⇒ Boolean

#case_insensitive? ⇒ Boolean

Examples:

#case_sensitive? ⇒ Boolean

Examples:

#casefold? ⇒ Boolean

Examples:

#error ⇒ String?

#error_arg ⇒ String?

#full_match(text, options = {}) ⇒ RE2::MatchData, ...

Examples:

#full_match?(text) ⇒ Boolean

#inspect ⇒ String

Examples:

#literal? ⇒ Boolean

Examples:

#log_errors? ⇒ Boolean

Examples:

#longest_match? ⇒ Boolean

Examples:

#match(text) ⇒ RE2::MatchData, ... #match(text, options) ⇒ RE2::MatchData, ... #match(text, submatches) ⇒ RE2::MatchData, ...

Examples:

Matching with capturing groups

Matching without capturing groups

Examples:

Matching with capturing groups

Matching without capturing groups

Examples:

#match?(text) ⇒ Boolean

#max_mem ⇒ Integer

Examples:

#named_capturing_groups ⇒ Hash

#never_nl? ⇒ Boolean

Examples:

#number_of_capturing_groups ⇒ Integer

#ok? ⇒ Boolean

Examples:

#one_line? ⇒ Boolean

Examples:

#options ⇒ Hash

#partial_match(text, options = {}) ⇒ RE2::MatchData, ...

Examples:

#partial_match?(text) ⇒ Boolean

#pattern ⇒ String

Examples:

#perl_classes? ⇒ Boolean

Examples:

#posix_syntax? ⇒ Boolean

Examples:

#program_size ⇒ Integer

#scan(text) ⇒ RE2::Scanner

Examples:

#source ⇒ String

Examples:

#to_s ⇒ String

Examples:

#to_str ⇒ String

Examples:

#utf8? ⇒ Boolean

Examples:

#word_boundary? ⇒ Boolean

Examples:

#initialize(pattern) ⇒ `RE2::Regexp` #initialize(pattern, options) ⇒ `RE2::Regexp`

.initialize(pattern) ⇒ `RE2::Regexp` .initialize(pattern, options) ⇒ `RE2::Regexp`

.escape(unquoted) ⇒ `String`

.match_has_endpos_argument? ⇒ `Boolean`

.quote(unquoted) ⇒ `String`

#===(text) ⇒ `Boolean`

#=~(text) ⇒ `Boolean`

#case_insensitive? ⇒ `Boolean`

#case_sensitive? ⇒ `Boolean`

#casefold? ⇒ `Boolean`

#error ⇒ `String`^?

#error_arg ⇒ `String`^?

#full_match(text, options = {}) ⇒ `RE2::MatchData`, ...

#full_match?(text) ⇒ `Boolean`

#inspect ⇒ `String`

#literal? ⇒ `Boolean`

#log_errors? ⇒ `Boolean`

#longest_match? ⇒ `Boolean`

#match(text) ⇒ `RE2::MatchData`, ... #match(text, options) ⇒ `RE2::MatchData`, ... #match(text, submatches) ⇒ `RE2::MatchData`, ...

#match?(text) ⇒ `Boolean`

#max_mem ⇒ `Integer`

#named_capturing_groups ⇒ `Hash`

#never_nl? ⇒ `Boolean`

#number_of_capturing_groups ⇒ `Integer`

#ok? ⇒ `Boolean`

#one_line? ⇒ `Boolean`

#options ⇒ `Hash`

#partial_match(text, options = {}) ⇒ `RE2::MatchData`, ...

#partial_match?(text) ⇒ `Boolean`

#pattern ⇒ `String`

#perl_classes? ⇒ `Boolean`

#posix_syntax? ⇒ `Boolean`

#program_size ⇒ `Integer`

#scan(text) ⇒ `RE2::Scanner`

#source ⇒ `String`

#to_s ⇒ `String`

#to_str ⇒ `String`

#utf8? ⇒ `Boolean`

#word_boundary? ⇒ `Boolean`