User Tools

Site Tools


regex:cheat_sheet

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
regex:cheat_sheet [2020/08/22 13:14] – [Inline Modifiers] 192.168.1.1regex:cheat_sheet [2021/05/20 23:54] (current) peter
Line 1: Line 1:
 ====== Regex - Cheat Sheet ====== ====== Regex - Cheat Sheet ======
  
 +<code>
 +Cheat Sheet
 +Character classes
 +. any character except newline
 +\w \d \s word, digit, whitespace
 +\W \D \S not word, digit, whitespace
 +[abc] any of a, b, or c
 +[^abc] not a, b, or c
 +[a-g] character between a & g
 +Anchors
 +^abc$ start / end of the string
 +\b word boundary
 +Escaped characters
 +\. \* \\ escaped special characters
 +\t \n \r tab, linefeed, carriage return
 +\u00A9 unicode escaped ©
 +Groups & Lookaround
 +(abc) capture group
 +\1 backreference to group #1
 +(?:abc) non-capturing group
 +(?=abc) positive lookahead
 +(?!abc) negative lookahead
 +Quantifiers & Alternation
 +a* a+ a? 0 or more, 1 or more, 0 or 1
 +a{5} a{2,} exactly five, two or more
 +a{1,3} between one & three
 +a+? a{2,}? match as few as possible
 +ab|cd match ab or cd
 +</code>
 +
 +----
  
 ===== Basic regex ===== ===== Basic regex =====
Line 26: Line 57:
 |\s|.NET, Python 3, JavaScript: "whitespace character": any Unicode separator|a\sb\sc|a b c| |\s|.NET, Python 3, JavaScript: "whitespace character": any Unicode separator|a\sb\sc|a b c|
 |\D|One character that is not a digit as defined by your engine's \d|\D\D\D|ABC| |\D|One character that is not a digit as defined by your engine's \d|\D\D\D|ABC|
-|\W|One character that is not a word character as defined by your engine's \w|\W\W\W\W\W|*-+=)|+|\W|One character that is not a word character as defined by your engine's \w|\W\W\W\W\W|<nowiki>*-+=)</nowiki>|
 |\S|One character that is not a whitespace character as defined by your engine's \s|\S\S\S\S|Yoyo| |\S|One character that is not a whitespace character as defined by your engine's \s|\S\S\S\S|Yoyo|
  
Line 188: Line 219:
 |:::|:::|:::|3| |:::|:::|:::|3|
 |(?m)|In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks.|(?m)From A.*to Z|From A to Z| |(?m)|In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks.|(?m)From A.*to Z|From A to Z|
-|(?x)|Free-Spacing Mode mode (except JavaScript).  Also known as comment mode or whitespace mode.|(?x) # this is a comment|abc d|+|(?x)|Free-Spacing Mode mode (except JavaScript).  Also known as comment mode or whitespace mode.|(?x)abc<nowiki>[ ]</nowiki>d|abc d|
 |:::|Spaces must be in brackets|abc<nowiki>[ ]</nowiki>d| | |:::|Spaces must be in brackets|abc<nowiki>[ ]</nowiki>d| |
 |(?n)|.NET, PCRE 10.30+: named capture only|Turns all (parentheses) into non-capture groups. To capture, use named groups.| | |(?n)|.NET, PCRE 10.30+: named capture only|Turns all (parentheses) into non-capture groups. To capture, use named groups.| |
 |(?d)|Java: Unix linebreaks only|The dot and the <nowiki>^ and $</nowiki> anchors are only affected by \n| | |(?d)|Java: Unix linebreaks only|The dot and the <nowiki>^ and $</nowiki> anchors are only affected by \n| |
 |<nowiki>(?^)</nowiki>|PCRE 10.32+: unset modifiers|Unsets ismnx modifiers| | |<nowiki>(?^)</nowiki>|PCRE 10.32+: unset modifiers|Unsets ismnx modifiers| |
 +
 +
 +----
 +
 +
 +===== Lookarounds =====
 +
 +^Lookaround^Legend^Example^Sample Match^
 +|(?=…)|Positive lookahead|(?=\d{10})\d{5}|01234 in 0123456789|
 +|<nowiki>(?<=…)</nowiki>|Positive lookbehind|<nowiki>(?<=\d)cat</nowiki>|cat in 1cat|
 +|(?!…)|Negative lookahead|(?!theatre)the\w+|theme|
 +|(?<!…)|Negative lookbehind|\w{3}(?<!mon)ster|Munster|
 +
 +----
 +
 +===== Character Class Operations =====
 +
 +^Class Operation^Legend^Example^Sample Match^
 +|<nowiki>[…-[…]]</nowiki>|.NET: character class subtraction. One character that is in those on the left, but not in the subtracted class.|<nowiki>[a-z-[aeiou]]</nowiki>|Any lowercase consonant|
 +|<nowiki>[…-[…]]</nowiki>|.NET: character class subtraction.|<nowiki>[\p{IsArabic}-[\D]]</nowiki>|An Arabic character that is not a non-digit, i.e., an Arabic digit|
 +|<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection. One character that is both in those on the left and in the && class.|<nowiki>[\S&&[\D]]</nowiki>|An non-whitespace character that is a non-digit.|
 +|<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection.|<nowiki>[\S&&[\D]&&[^a-zA-Z]]</nowiki>|An non-whitespace character that a non-digit and not a letter.|
 +|<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction is obtained by intersecting a class with a negated class.|<nowiki>[a-z&&[^aeiou]]</nowiki>|An English lowercase letter that is not a vowel.|
 +|<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction|<nowiki>[\p{InArabic}&&[^\p{L}\p{N}]]</nowiki>|An Arabic character that is not a letter or a number|
 +
 +----
 +
 +===== Other Syntax =====
 +
 +^Syntax^Legend^Example^Sample Match^
 +|\K|Keep Out.  Perl, PCRE (C, PHP, R…), Python's alternate regex engine, Ruby 2+: drop everything that was matched so far from the overall match to be returned.|prefix\K\d+|12|
 +|\Q…\E|Perl, PCRE (C, PHP, R…), Java: treat anything between the delimiters as a literal string. Useful to escape metacharacters.|\Q(C++ ?)\E|(C++ ?)|
 +
 +
regex/cheat_sheet.1598102089.txt.gz · Last modified: 2020/08/22 13:14 by 192.168.1.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki