re2 
Ruby bindings to RE2, a "fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python".
Current version: 2.1.2
Supported Ruby versions: 2.6, 2.7, 3.0, 3.1, 3.2
Bundled RE2 version: libre2.11 (2023-09-01)
Supported RE2 versions: libre2.0 (< 2020-03-02), libre2.1 (2020-03-02), libre2.6 (2020-03-03), libre2.7 (2020-05-01), libre2.8 (2020-07-06), libre2.9 (2020-11-01), libre2.10 (2022-12-01), libre2.11 (2023-07-01)
Installation
The gem comes bundled with a version of RE2 and will compile itself (and any dependencies) on install. As compilation can take a while, precompiled native gems are available for Linux, Windows and macOS.
In v2.0 and later, precompiled native gems are available for Ruby 2.6 to 3.2 on these platforms:
aarch64-linux
(requires: glibc >= 2.29)arm-linux
(requires: glibc >= 2.29)arm64-darwin
x64-mingw32
/x64-mingw-ucrt
x86-linux
(requires: glibc >= 2.17)x86_64-darwin
x86_64-linux
(requires: glibc >= 2.17)
If you wish to opt out of using the bundled libraries, you will need RE2 installed as well as a C++ compiler such as gcc (on Debian and Ubuntu, this is provided by the build-essential package). If you are using macOS, I recommend installing RE2 with Homebrew by running the following:
$ brew install re2
If you are using Debian, you can install the libre2-dev package like so:
$ sudo apt-get install libre2-dev
Recent versions of RE2 require a compiler with C++14 support such as clang 3.4 or gcc 5.
If you are using a packaged Ruby distribution, make sure you also have the Ruby header files installed such as those provided by the ruby-dev package on Debian and Ubuntu.
You can then install the library via RubyGems with gem install re2 --platform=ruby --
--enable-system-libraries
or gem install re2 --platform=ruby -- --enable-system-libraries
--with-re2-dir=/path/to/re2/prefix
if RE2 is not installed in any of the
following default locations:
/usr/local
/opt/homebrew
/usr
Alternatively, you can set the RE2_USE_SYSTEM_LIBRARIES
environment variable instead of passing --enable-system-libraries
to the gem
command.
If you're using Bundler, you can use the
force_ruby_platform
option in your Gemfile.
Documentation
Full documentation automatically generated from the latest version is available at http://mudge.name/re2/.
Note that RE2's regular expression syntax differs from PCRE and Ruby's
built-in Regexp
library, see the official syntax page for more
details.
Usage
While re2 uses the same naming scheme as Ruby's built-in regular expression
library (with Regexp
and
MatchData
), its API is slightly
different:
$ irb -rubygems
> require 're2'
> r = RE2::Regexp.new('w(\d)(\d+)')
=> #<RE2::Regexp /w(\d)(\d+)/>
> m = r.match("w1234")
=> #<RE2::MatchData "w1234" 1:"1" 2:"234">
> m[1]
=> "1"
> m.string
=> "w1234"
> m.begin(1)
=> 1
> m.end(1)
=> 2
> r =~ "w1234"
=> true
> r !~ "bob"
=> true
> r.match("bob")
=> nil
As
RE2::Regexp.new
(or RE2::Regexp.compile
) can be quite verbose, a helper method has been
defined against Kernel
so you can use a shorter version to create regular
expressions:
> RE2('(\d+)')
=> #<RE2::Regexp /(\d+)/>
Note the use of single quotes as double quotes will interpret \d
as d
as
in the following example:
> RE2("(\d+)")
=> #<RE2::Regexp /(d+)/>
As of 0.3.0, you can use named groups:
> r = RE2::Regexp.new('(?P<name>\w+) (?P<age>\d+)')
=> #<RE2::Regexp /(?P<name>\w+) (?P<age>\d+)/>
> m = r.match("Bob 40")
=> #<RE2::MatchData "Bob 40" 1:"Bob" 2:"40">
> m[:name]
=> "Bob"
> m["age"]
=> "40"
As of 0.6.0, you can use RE2::Regexp#scan
to incrementally scan text for
matches (similar in purpose to Ruby's
String#scan
).
Calling scan
will return an RE2::Scanner
which is
enumerable meaning you can
use each
to iterate through the matches (and even use
Enumerator::Lazy
):
re = RE2('(\w+)')
scanner = re.scan("It is a truth universally acknowledged")
scanner.each do |match|
puts match
end
scanner.rewind
enum = scanner.to_enum
enum.next #=> ["It"]
enum.next #=> ["is"]
As of 1.5.0, you can use RE2::Set
to match multiple patterns against a
string. Calling RE2::Set#add
with a pattern will return an integer index of
the pattern. After all patterns have been added, the set can be compiled using
RE2::Set#compile
, and then RE2::Set#match
will return an Array<Integer>
containing the indices of all the patterns that matched.
set = RE2::Set.new
set.add("abc") #=> 0
set.add("def") #=> 1
set.add("ghi") #=> 2
set.compile #=> true
set.match("abcdefghi") #=> [0, 1, 2]
set.match("ghidefabc") #=> [2, 1, 0]
As of 1.6.0, you can use Ruby's pattern matching against RE2::MatchData
with both array patterns and hash patterns:
case RE2('(\w+) (\d+)').match("Alice 42")
in [name, age]
puts "My name is #{name} and I am #{age} years old"
else
puts "No match!"
end
# My name is Alice and I am 42 years old
case RE2('(?P<name>\w+) (?P<age>\d+)').match("Alice 42")
in {name:, age:}
puts "My name is #{name} and I am #{age} years old"
else
puts "No match!"
end
# My name is Alice and I am 42 years old
Encoding
Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be
returned in UTF-8 by default or ISO-8859-1 if the :utf8
option for the
RE2::Regexp
is set to false (any other encoding's behaviour is undefined).
For backward compatibility: re2 won't automatically convert string inputs to the right encoding so this is the responsibility of the caller, e.g.
# By default, RE2 will process patterns and text as UTF-8
RE2(non_utf8_pattern.encode("UTF-8")).match(non_utf8_text.encode("UTF-8"))
# If the :utf8 option is false, RE2 will process patterns and text as ISO-8859-1
RE2(non_latin1_pattern.encode("ISO-8859-1"), :utf8 => false).match(non_latin1_text.encode("ISO-8859-1"))
Features
Pre-compiling regular expressions with
RE2::Regexp.new(re)
,RE2::Regexp.compile(re)
orRE2(re)
(including specifying options, e.g.RE2::Regexp.new("pattern", :case_sensitive => false)
Extracting matches with
re2.match(text)
(and an exact number of matches withre2.match(text, number_of_matches)
such asre2.match("123-234", 2)
)Extracting matches by name (both with strings and symbols)
Checking for matches with
re2 =~ text
,re2 === text
(for use incase
statements) andre2 !~ text
Incrementally scanning text with
re2.scan(text)
Search a collection of patterns simultaneously with
RE2::Set
Checking regular expression compilation with
re2.ok?
,re2.error
andre2.error_arg
Checking regular expression "cost" with
re2.program_size
Checking the options for an expression with
re2.options
or individually withre2.case_sensitive?
Performing a single string replacement with
pattern.replace(replacement, original)
Performing a global string replacement with
pattern.replace_all(replacement, original)
Escaping regular expressions with
RE2.escape(unquoted)
andRE2.quote(unquoted)
Pattern matching with
RE2::MatchData
Contributions
- Thanks to Jason Woods who contributed the
original implementations of
RE2::MatchData#begin
andRE2::MatchData#end
; - Thanks to Stefano Rivera who first contributed C++11 support;
- Thanks to Stan Hu for reporting a bug with empty patterns and
RE2::Regexp#scan
, contributing support for libre2.11 (2023-07-01) and for vendoring RE2 and abseil and compiling native gems in 2.0; - Thanks to Sebastian Reitenbach for reporting
the deprecation and removal of the
utf8
encoding option in RE2; - Thanks to Sergio Medina for reporting a bug when
using
RE2::Scanner#scan
with an invalid regular expression; - Thanks to Pritam Baral for contributing the
initial support for
RE2::Set
.
Contact
All issues and suggestions should go to GitHub Issues.
License
This library is licensed under the BSD 3-Clause License, see LICENSE.txt
.
Dependencies
The source code of RE2 is distributed in the ruby
platform gem. This code is licensed under the BSD 3-Clause License, see LICENSE-DEPENDENCIES.txt
.
The source code of Abseil is distributed in the ruby
platform gem. This code is licensed under the Apache License 2.0, see LICENSE-DEPENDENCIES.txt
.