Both sides previous revisionPrevious revisionNext revision | Previous revision |
regex:cheat_sheet [2020/08/25 19:32] – 192.168.1.1 | regex:cheat_sheet [2021/05/20 23:54] (current) – peter |
---|
====== Regex - Cheat Sheet ====== | ====== Regex - Cheat Sheet ====== |
| |
| <code> |
| Cheat Sheet |
| Character classes |
| . any character except newline |
| \w \d \s word, digit, whitespace |
| \W \D \S not word, digit, whitespace |
| [abc] any of a, b, or c |
| [^abc] not a, b, or c |
| [a-g] character between a & g |
| Anchors |
| ^abc$ start / end of the string |
| \b word boundary |
| Escaped characters |
| \. \* \\ escaped special characters |
| \t \n \r tab, linefeed, carriage return |
| \u00A9 unicode escaped © |
| Groups & Lookaround |
| (abc) capture group |
| \1 backreference to group #1 |
| (?:abc) non-capturing group |
| (?=abc) positive lookahead |
| (?!abc) negative lookahead |
| Quantifiers & Alternation |
| a* a+ a? 0 or more, 1 or more, 0 or 1 |
| a{5} a{2,} exactly five, two or more |
| a{1,3} between one & three |
| a+? a{2,}? match as few as possible |
| ab|cd match ab or cd |
| </code> |
| |
| ---- |
| |
===== Basic regex ===== | ===== Basic regex ===== |
|\s|.NET, Python 3, JavaScript: "whitespace character": any Unicode separator|a\sb\sc|a b c| | |\s|.NET, Python 3, JavaScript: "whitespace character": any Unicode separator|a\sb\sc|a b c| |
|\D|One character that is not a digit as defined by your engine's \d|\D\D\D|ABC| | |\D|One character that is not a digit as defined by your engine's \d|\D\D\D|ABC| |
|\W|One character that is not a word character as defined by your engine's \w|\W\W\W\W\W|*-+=)| | |\W|One character that is not a word character as defined by your engine's \w|\W\W\W\W\W|<nowiki>*-+=)</nowiki>| |
|\S|One character that is not a whitespace character as defined by your engine's \s|\S\S\S\S|Yoyo| | |\S|One character that is not a whitespace character as defined by your engine's \s|\S\S\S\S|Yoyo| |
| |
^Lookaround^Legend^Example^Sample Match^ | ^Lookaround^Legend^Example^Sample Match^ |
|(?=…)|Positive lookahead|(?=\d{10})\d{5}|01234 in 0123456789| | |(?=…)|Positive lookahead|(?=\d{10})\d{5}|01234 in 0123456789| |
|(?<=…)|Positive lookbehind|(?<=\d)cat|cat in 1cat| | |<nowiki>(?<=…)</nowiki>|Positive lookbehind|<nowiki>(?<=\d)cat</nowiki>|cat in 1cat| |
|(?!…)|Negative lookahead|(?!theatre)the\w+|theme| | |(?!…)|Negative lookahead|(?!theatre)the\w+|theme| |
|(?<!…)|Negative lookbehind|\w{3}(?<!mon)ster|Munster| | |(?<!…)|Negative lookbehind|\w{3}(?<!mon)ster|Munster| |
|<nowiki>[…-[…]]</nowiki>|.NET: character class subtraction.|<nowiki>[\p{IsArabic}-[\D]]</nowiki>|An Arabic character that is not a non-digit, i.e., an Arabic digit| | |<nowiki>[…-[…]]</nowiki>|.NET: character class subtraction.|<nowiki>[\p{IsArabic}-[\D]]</nowiki>|An Arabic character that is not a non-digit, i.e., an Arabic digit| |
|<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection. One character that is both in those on the left and in the && class.|<nowiki>[\S&&[\D]]</nowiki>|An non-whitespace character that is a non-digit.| | |<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection. One character that is both in those on the left and in the && class.|<nowiki>[\S&&[\D]]</nowiki>|An non-whitespace character that is a non-digit.| |
|<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection.|<nowiki>[\S&&[\D]&&[^a-zA-Z]] An non-whitespace character that a non-digit and not a letter.| | |<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection.|<nowiki>[\S&&[\D]&&[^a-zA-Z]]</nowiki>|An non-whitespace character that a non-digit and not a letter.| |
|<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction is obtained by intersecting a class with a negated class|.<nowiki>[a-z&&[^aeiou]]</nowiki>|An English lowercase letter that is not a vowel.| | |<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction is obtained by intersecting a class with a negated class.|<nowiki>[a-z&&[^aeiou]]</nowiki>|An English lowercase letter that is not a vowel.| |
|<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction|<nowiki>[\p{InArabic}&&[^\p{L}\p{N}]]</nowiki>|An Arabic character that is not a letter or a number| | |<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction|<nowiki>[\p{InArabic}&&[^\p{L}\p{N}]]</nowiki>|An Arabic character that is not a letter or a number| |
| |
---- | ---- |
| |
| ===== Other Syntax ===== |
| |
| ^Syntax^Legend^Example^Sample Match^ |
| |\K|Keep Out. Perl, PCRE (C, PHP, R…), Python's alternate regex engine, Ruby 2+: drop everything that was matched so far from the overall match to be returned.|prefix\K\d+|12| |
| |\Q…\E|Perl, PCRE (C, PHP, R…), Java: treat anything between the delimiters as a literal string. Useful to escape metacharacters.|\Q(C++ ?)\E|(C++ ?)| |
| |
| |