re2 
Ruby bindings to RE2, a "fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python".
Current version: 2.4.3
Bundled RE2 version: libre2.11 (2023-11-01)
Supported Ruby versions: 2.6, 2.7, 3.0, 3.1, 3.2
Supported RE2 versions: libre2.0 (< 2020-03-02), libre2.1 (2020-03-02), libre2.6 (2020-03-03), libre2.7 (2020-05-01), libre2.8 (2020-07-06), libre2.9 (2020-11-01), libre2.10 (2022-12-01), libre2.11 (2023-07-01)
Installation
The gem comes bundled with a version of RE2 and will compile itself (and any dependencies) on install. As compilation can take a while, precompiled native gems are available for Linux, Windows and macOS.
In v2.0 and later, precompiled native gems are available for Ruby 2.6 to 3.2 on these platforms:
aarch64-linux
(requires: glibc >= 2.29)arm-linux
(requires: glibc >= 2.29)arm64-darwin
x64-mingw32
/x64-mingw-ucrt
x86-linux
(requires: glibc >= 2.17)x86_64-darwin
x86_64-linux
(requires: glibc >= 2.17)
If you wish to opt out of using the bundled libraries, you will need RE2 installed as well as a C++ compiler such as gcc (on Debian and Ubuntu, this is provided by the build-essential package). If you are using macOS, I recommend installing RE2 with Homebrew by running the following:
$ brew install re2
If you are using Debian, you can install the libre2-dev package like so:
$ sudo apt-get install libre2-dev
Recent versions of RE2 require CMake and a compiler with C++14 support such as clang 3.4 or gcc 5.
If you are using a packaged Ruby distribution, make sure you also have the Ruby header files installed such as those provided by the ruby-dev package on Debian and Ubuntu.
You can then install the library via RubyGems with gem install re2 --platform=ruby --
--enable-system-libraries
or gem install re2 --platform=ruby -- --enable-system-libraries
--with-re2-dir=/path/to/re2/prefix
if RE2 is not installed in any of the
following default locations:
/usr/local
/opt/homebrew
/usr
Alternatively, you can set the RE2_USE_SYSTEM_LIBRARIES
environment variable instead of passing --enable-system-libraries
to the gem
command.
If you're using Bundler, you can use the
force_ruby_platform
option in your Gemfile.
Windows users attempting to compile abseil must use pkgconf 2.1.0 or
later, or builds will fail with undefined reference
errors:
pacman -Sy mingw64/mingw-w64-x86_64-pkgconf
This is not needed when using the precompiled gem or building against a system RE2 library.
Documentation
Full documentation automatically generated from the latest version is available at http://mudge.name/re2/.
[!IMPORTANT] Note that RE2's regular expression syntax differs from PCRE and Ruby's built-in
Regexp
library, see the official syntax page for more details.
Usage
While re2 uses the same naming scheme as Ruby's built-in regular expression
library (with Regexp
and
MatchData
), its API is slightly
different:
require "re2"
r = RE2::Regexp.new('w(\d)(\d+)') # => #<RE2::Regexp /w(\d)(\d+)/>
m = r.match("w1234") # => #<RE2::MatchData "w1234" 1:"1" 2:"234">
m[1] # => "1"
# Improve performance by requesting fewer submatches
m = r.match("w1234", 1) # => #<RE2::MatchData "w1234" 1:"1">
# Or no submatches at all
r.match("w1234", 0) # => true
r =~ "w1234" # => true
As
RE2::Regexp.new
(or RE2::Regexp.compile
) can be quite verbose, a helper method has been
defined against Kernel
so you can use a shorter version to create regular
expressions:
RE2('(\d+)') # => #<RE2::Regexp /(\d+)/>
Note the use of single quotes as double quotes will interpret \d
as d
as
in the following example:
RE2("(\d+)") # => #<RE2::Regexp /(d+)/>
As of 0.3.0, you can use named groups:
r = RE2::Regexp.new('(?P<name>\w+) (?P<age>\d+)')
# => #<RE2::Regexp /(?P<name>\w+) (?P<age>\d+)/>
m = r.match("Bob 40") # => #<RE2::MatchData "Bob 40" 1:"Bob" 2:"40">
m[:name] # => "Bob"
m["age"] # => "40"
As of 0.6.0, you can use RE2::Regexp#scan
to incrementally scan text for
matches (similar in purpose to Ruby's
String#scan
).
Calling scan
will return an RE2::Scanner
which is
enumerable meaning you can
use each
to iterate through the matches (and even use
Enumerator::Lazy
):
re = RE2('(\w+)')
scanner = re.scan("It is a truth universally acknowledged")
scanner.each do |match|
puts match
end
scanner.rewind
enum = scanner.to_enum
enum.next #=> ["It"]
enum.next #=> ["is"]
As of 1.5.0, you can use RE2::Set
to match multiple patterns against a
string. Calling RE2::Set#add
with a pattern will return an integer index of
the pattern. After all patterns have been added, the set can be compiled using
RE2::Set#compile
, and then RE2::Set#match
will return an Array<Integer>
containing the indices of all the patterns that matched.
set = RE2::Set.new
set.add("abc") #=> 0
set.add("def") #=> 1
set.add("ghi") #=> 2
set.compile #=> true
set.match("abcdefghi") #=> [0, 1, 2]
set.match("ghidefabc") #=> [2, 1, 0]
As of 1.6.0, you can use Ruby's pattern matching against RE2::MatchData
with both array patterns and hash patterns:
case RE2('(\w+) (\d+)').match("Alice 42")
in [name, age]
puts "My name is #{name} and I am #{age} years old"
else
puts "No match!"
end
# My name is Alice and I am 42 years old
case RE2('(?P<name>\w+) (?P<age>\d+)').match("Alice 42")
in {name:, age:}
puts "My name is #{name} and I am #{age} years old"
else
puts "No match!"
end
# My name is Alice and I am 42 years old
Encoding
[!IMPORTANT] Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the
:utf8
option for theRE2::Regexp
is set to false (any other encoding's behaviour is undefined).
For backward compatibility: re2 won't automatically convert string inputs to the right encoding so this is the responsibility of the caller, e.g.
# By default, RE2 will process patterns and text as UTF-8
RE2(non_utf8_pattern.encode("UTF-8")).match(non_utf8_text.encode("UTF-8"))
# If the :utf8 option is false, RE2 will process patterns and text as ISO-8859-1
RE2(non_latin1_pattern.encode("ISO-8859-1"), :utf8 => false).match(non_latin1_text.encode("ISO-8859-1"))
Features
Pre-compiling regular expressions with
RE2::Regexp.new(re)
,RE2::Regexp.compile(re)
orRE2(re)
(including specifying options, e.g.RE2::Regexp.new("pattern", :case_sensitive => false)
Extracting matches with
re2.match(text)
(and an exact number of submatches withre2.match(text, number_of_submatches)
such asre2.match("123-234", 2)
)Extracting matches by name (both with strings and symbols)
Checking for matches with
re2 =~ text
,re2 === text
(for use incase
statements) andre2 !~ text
Incrementally scanning text with
re2.scan(text)
Search a collection of patterns simultaneously with
RE2::Set
Checking regular expression compilation with
re2.ok?
,re2.error
andre2.error_arg
Checking regular expression "cost" with
re2.program_size
Checking the options for an expression with
re2.options
or individually withre2.case_sensitive?
Performing a single string replacement with
pattern.replace(replacement, original)
Performing a global string replacement with
pattern.replace_all(replacement, original)
Escaping regular expressions with
RE2.escape(unquoted)
andRE2.quote(unquoted)
Pattern matching with
RE2::MatchData
Contributions
- Thanks to Jason Woods who contributed the
original implementations of
RE2::MatchData#begin
andRE2::MatchData#end
. - Thanks to Stefano Rivera who first contributed C++11 support.
- Thanks to Stan Hu for reporting a bug with empty
patterns and
RE2::Regexp#scan
, contributing support for libre2.11 (2023-07-01) and for vendoring RE2 and abseil and compiling native gems in 2.0. - Thanks to Sebastian Reitenbach for reporting
the deprecation and removal of the
utf8
encoding option in RE2. - Thanks to Sergio Medina for reporting a bug when
using
RE2::Scanner#scan
with an invalid regular expression. - Thanks to Pritam Baral for contributing the
initial support for
RE2::Set
. - Thanks to Mike Dalessio for reviewing the precompilation of native gems in 2.0.
- Thanks to Peter Zhu for ruby_memcheck and helping find the memory leaks fixed in 2.1.3.
- Thanks to Jean Boussier for contributing the
switch to Ruby's
TypedData
API and the resulting garbage collection improvements in 2.4.0.
Contact
All issues and suggestions should go to GitHub Issues.
License
This library is licensed under the BSD 3-Clause License, see LICENSE.txt
.
Dependencies
The source code of RE2 is distributed in the ruby
platform gem. This code is licensed under the BSD 3-Clause License, see LICENSE-DEPENDENCIES.txt
.
The source code of Abseil is distributed in the ruby
platform gem. This code is licensed under the Apache License 2.0, see LICENSE-DEPENDENCIES.txt
.