re2 Build Status

Ruby bindings to RE2, a "fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python".

Current version: 2.4.3
Bundled RE2 version: libre2.11 (2023-11-01)
Supported Ruby versions: 2.6, 2.7, 3.0, 3.1, 3.2
Supported RE2 versions: libre2.0 (< 2020-03-02), libre2.1 (2020-03-02), libre2.6 (2020-03-03), libre2.7 (2020-05-01), libre2.8 (2020-07-06), libre2.9 (2020-11-01), libre2.10 (2022-12-01), libre2.11 (2023-07-01)

Installation

The gem comes bundled with a version of RE2 and will compile itself (and any dependencies) on install. As compilation can take a while, precompiled native gems are available for Linux, Windows and macOS.

In v2.0 and later, precompiled native gems are available for Ruby 2.6 to 3.2 on these platforms:

  • aarch64-linux (requires: glibc >= 2.29)
  • arm-linux (requires: glibc >= 2.29)
  • arm64-darwin
  • x64-mingw32 / x64-mingw-ucrt
  • x86-linux (requires: glibc >= 2.17)
  • x86_64-darwin
  • x86_64-linux (requires: glibc >= 2.17)

If you wish to opt out of using the bundled libraries, you will need RE2 installed as well as a C++ compiler such as gcc (on Debian and Ubuntu, this is provided by the build-essential package). If you are using macOS, I recommend installing RE2 with Homebrew by running the following:

$ brew install re2

If you are using Debian, you can install the libre2-dev package like so:

$ sudo apt-get install libre2-dev

Recent versions of RE2 require CMake and a compiler with C++14 support such as clang 3.4 or gcc 5.

If you are using a packaged Ruby distribution, make sure you also have the Ruby header files installed such as those provided by the ruby-dev package on Debian and Ubuntu.

You can then install the library via RubyGems with gem install re2 --platform=ruby -- --enable-system-libraries or gem install re2 --platform=ruby -- --enable-system-libraries --with-re2-dir=/path/to/re2/prefix if RE2 is not installed in any of the following default locations:

  • /usr/local
  • /opt/homebrew
  • /usr

Alternatively, you can set the RE2_USE_SYSTEM_LIBRARIES environment variable instead of passing --enable-system-libraries to the gem command.

If you're using Bundler, you can use the force_ruby_platform option in your Gemfile.

Windows users attempting to compile abseil must use pkgconf 2.1.0 or later, or builds will fail with undefined reference errors:

pacman -Sy mingw64/mingw-w64-x86_64-pkgconf

This is not needed when using the precompiled gem or building against a system RE2 library.

Documentation

Full documentation automatically generated from the latest version is available at http://mudge.name/re2/.

[!IMPORTANT] Note that RE2's regular expression syntax differs from PCRE and Ruby's built-in Regexp library, see the official syntax page for more details.

Usage

While re2 uses the same naming scheme as Ruby's built-in regular expression library (with Regexp and MatchData), its API is slightly different:

require "re2"

r = RE2::Regexp.new('w(\d)(\d+)') # => #<RE2::Regexp /w(\d)(\d+)/>
m = r.match("w1234")              # => #<RE2::MatchData "w1234" 1:"1" 2:"234">
m[1]                              # => "1"

# Improve performance by requesting fewer submatches
m = r.match("w1234", 1)           # => #<RE2::MatchData "w1234" 1:"1">

# Or no submatches at all
r.match("w1234", 0)               # => true
r =~ "w1234"                      # => true

As RE2::Regexp.new (or RE2::Regexp.compile) can be quite verbose, a helper method has been defined against Kernel so you can use a shorter version to create regular expressions:

RE2('(\d+)') # => #<RE2::Regexp /(\d+)/>

Note the use of single quotes as double quotes will interpret \d as d as in the following example:

RE2("(\d+)") # => #<RE2::Regexp /(d+)/>

As of 0.3.0, you can use named groups:

r = RE2::Regexp.new('(?P<name>\w+) (?P<age>\d+)')
# => #<RE2::Regexp /(?P<name>\w+) (?P<age>\d+)/>
m = r.match("Bob 40") # => #<RE2::MatchData "Bob 40" 1:"Bob" 2:"40">
m[:name]              # => "Bob"
m["age"]              # => "40"

As of 0.6.0, you can use RE2::Regexp#scan to incrementally scan text for matches (similar in purpose to Ruby's String#scan). Calling scan will return an RE2::Scanner which is enumerable meaning you can use each to iterate through the matches (and even use Enumerator::Lazy):

re = RE2('(\w+)')
scanner = re.scan("It is a truth universally acknowledged")
scanner.each do |match|
  puts match
end

scanner.rewind

enum = scanner.to_enum
enum.next #=> ["It"]
enum.next #=> ["is"]

As of 1.5.0, you can use RE2::Set to match multiple patterns against a string. Calling RE2::Set#add with a pattern will return an integer index of the pattern. After all patterns have been added, the set can be compiled using RE2::Set#compile, and then RE2::Set#match will return an Array<Integer> containing the indices of all the patterns that matched.

set = RE2::Set.new
set.add("abc") #=> 0
set.add("def") #=> 1
set.add("ghi") #=> 2
set.compile #=> true
set.match("abcdefghi") #=> [0, 1, 2]
set.match("ghidefabc") #=> [2, 1, 0]

As of 1.6.0, you can use Ruby's pattern matching against RE2::MatchData with both array patterns and hash patterns:

case RE2('(\w+) (\d+)').match("Alice 42")
in [name, age]
  puts "My name is #{name} and I am #{age} years old"
else
  puts "No match!"
end
# My name is Alice and I am 42 years old


case RE2('(?P<name>\w+) (?P<age>\d+)').match("Alice 42")
in {name:, age:}
  puts "My name is #{name} and I am #{age} years old"
else
  puts "No match!"
end
# My name is Alice and I am 42 years old

Encoding

[!IMPORTANT] Note RE2 only supports UTF-8 and ISO-8859-1 encoding so strings will be returned in UTF-8 by default or ISO-8859-1 if the :utf8 option for the RE2::Regexp is set to false (any other encoding's behaviour is undefined).

For backward compatibility: re2 won't automatically convert string inputs to the right encoding so this is the responsibility of the caller, e.g.

# By default, RE2 will process patterns and text as UTF-8
RE2(non_utf8_pattern.encode("UTF-8")).match(non_utf8_text.encode("UTF-8"))

# If the :utf8 option is false, RE2 will process patterns and text as ISO-8859-1
RE2(non_latin1_pattern.encode("ISO-8859-1"), :utf8 => false).match(non_latin1_text.encode("ISO-8859-1"))

Features

  • Pre-compiling regular expressions with RE2::Regexp.new(re), RE2::Regexp.compile(re) or RE2(re) (including specifying options, e.g. RE2::Regexp.new("pattern", :case_sensitive => false)

  • Extracting matches with re2.match(text) (and an exact number of submatches with re2.match(text, number_of_submatches) such as re2.match("123-234", 2))

  • Extracting matches by name (both with strings and symbols)

  • Checking for matches with re2 =~ text, re2 === text (for use in case statements) and re2 !~ text

  • Incrementally scanning text with re2.scan(text)

  • Search a collection of patterns simultaneously with RE2::Set

  • Checking regular expression compilation with re2.ok?, re2.error and re2.error_arg

  • Checking regular expression "cost" with re2.program_size

  • Checking the options for an expression with re2.options or individually with re2.case_sensitive?

  • Performing a single string replacement with pattern.replace(replacement, original)

  • Performing a global string replacement with pattern.replace_all(replacement, original)

  • Escaping regular expressions with RE2.escape(unquoted) and RE2.quote(unquoted)

  • Pattern matching with RE2::MatchData

Contributions

  • Thanks to Jason Woods who contributed the original implementations of RE2::MatchData#begin and RE2::MatchData#end.
  • Thanks to Stefano Rivera who first contributed C++11 support.
  • Thanks to Stan Hu for reporting a bug with empty patterns and RE2::Regexp#scan, contributing support for libre2.11 (2023-07-01) and for vendoring RE2 and abseil and compiling native gems in 2.0.
  • Thanks to Sebastian Reitenbach for reporting the deprecation and removal of the utf8 encoding option in RE2.
  • Thanks to Sergio Medina for reporting a bug when using RE2::Scanner#scan with an invalid regular expression.
  • Thanks to Pritam Baral for contributing the initial support for RE2::Set.
  • Thanks to Mike Dalessio for reviewing the precompilation of native gems in 2.0.
  • Thanks to Peter Zhu for ruby_memcheck and helping find the memory leaks fixed in 2.1.3.
  • Thanks to Jean Boussier for contributing the switch to Ruby's TypedData API and the resulting garbage collection improvements in 2.4.0.

Contact

All issues and suggestions should go to GitHub Issues.

License

This library is licensed under the BSD 3-Clause License, see LICENSE.txt.

Dependencies

The source code of RE2 is distributed in the ruby platform gem. This code is licensed under the BSD 3-Clause License, see LICENSE-DEPENDENCIES.txt.

The source code of Abseil is distributed in the ruby platform gem. This code is licensed under the Apache License 2.0, see LICENSE-DEPENDENCIES.txt.