User Tools

Site Tools


regex:cheat_sheet

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
regex:cheat_sheet [2020/08/22 13:08] – [Inline Modifiers] 192.168.1.1regex:cheat_sheet [2021/05/20 23:54] (current) peter
Line 1: Line 1:
 ====== Regex - Cheat Sheet ====== ====== Regex - Cheat Sheet ======
  
 +<code>
 +Cheat Sheet
 +Character classes
 +. any character except newline
 +\w \d \s word, digit, whitespace
 +\W \D \S not word, digit, whitespace
 +[abc] any of a, b, or c
 +[^abc] not a, b, or c
 +[a-g] character between a & g
 +Anchors
 +^abc$ start / end of the string
 +\b word boundary
 +Escaped characters
 +\. \* \\ escaped special characters
 +\t \n \r tab, linefeed, carriage return
 +\u00A9 unicode escaped ©
 +Groups & Lookaround
 +(abc) capture group
 +\1 backreference to group #1
 +(?:abc) non-capturing group
 +(?=abc) positive lookahead
 +(?!abc) negative lookahead
 +Quantifiers & Alternation
 +a* a+ a? 0 or more, 1 or more, 0 or 1
 +a{5} a{2,} exactly five, two or more
 +a{1,3} between one & three
 +a+? a{2,}? match as few as possible
 +ab|cd match ab or cd
 +</code>
 +
 +----
  
 ===== Basic regex ===== ===== Basic regex =====
Line 26: Line 57:
 |\s|.NET, Python 3, JavaScript: "whitespace character": any Unicode separator|a\sb\sc|a b c| |\s|.NET, Python 3, JavaScript: "whitespace character": any Unicode separator|a\sb\sc|a b c|
 |\D|One character that is not a digit as defined by your engine's \d|\D\D\D|ABC| |\D|One character that is not a digit as defined by your engine's \d|\D\D\D|ABC|
-|\W|One character that is not a word character as defined by your engine's \w|\W\W\W\W\W|*-+=)|+|\W|One character that is not a word character as defined by your engine's \w|\W\W\W\W\W|<nowiki>*-+=)</nowiki>|
 |\S|One character that is not a whitespace character as defined by your engine's \s|\S\S\S\S|Yoyo| |\S|One character that is not a whitespace character as defined by your engine's \s|\S\S\S\S|Yoyo|
  
Line 183: Line 214:
 ^Modifier^Legend^Example^Sample Match^ ^Modifier^Legend^Example^Sample Match^
 |(?i)|Case-insensitive mode (except JavaScript)|(?i)Monday|monDAY| |(?i)|Case-insensitive mode (except JavaScript)|(?i)Monday|monDAY|
-|(?s)|DOTALL mode (except JS and Ruby). The dot (.) matches new line characters (\r\n).  Also known as "single-line mode" because the dot treats the entire input as a single line|(?s)From A.*to Z|From A to Z| +|(?s)|DOTALL mode (except JS and Ruby). The dot (.) matches new line characters (\r\n).  Also known as "single-line mode" because the dot treats the entire input as a single line.|(?s)From A.*to Z|From A to Z| 
-|(?m)|Multiline mode (except Ruby and JS) <nowiki>^ and $</nowiki> match at the beginning and end of every line|<nowiki>(?m)1\r\n^2$\r\n^3$</nowiki>|1|+|(?m)|Multiline mode (except Ruby and JS) <nowiki>^ and $</nowiki> match at the beginning and end of every line.|<nowiki>(?m)1\r\n^2$\r\n^3$</nowiki>|1|
 |:::|:::|:::|2| |:::|:::|:::|2|
 |:::|:::|:::|3| |:::|:::|:::|3|
-|(?m)|In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks|(?m)From A.*to Z|From A to Z| +|(?m)|In Ruby: the same as (?s) in other engines, i.e. DOTALL mode, i.e. dot matches line breaks.|(?m)From A.*to Z|From A to Z| 
-|(?x)|Free-Spacing Mode mode (except JavaScript).  Also known as comment mode or whitespace mode|(?x) # this is a  # comment| +|(?x)|Free-Spacing Mode mode (except JavaScript).  Also known as comment mode or whitespace mode.|(?x)abc<nowiki>[ ]</nowiki>d|abc d| 
-|:::|:::|abc # write on multiple # lines|abc d| +|:::|Spaces must be in brackets|abc<nowiki>[ ]</nowiki>d| |
-|:::|:::|<nowiki>[ ]</nowiki># spaces must be in brackets| |+
 |(?n)|.NET, PCRE 10.30+: named capture only|Turns all (parentheses) into non-capture groups. To capture, use named groups.| | |(?n)|.NET, PCRE 10.30+: named capture only|Turns all (parentheses) into non-capture groups. To capture, use named groups.| |
 |(?d)|Java: Unix linebreaks only|The dot and the <nowiki>^ and $</nowiki> anchors are only affected by \n| | |(?d)|Java: Unix linebreaks only|The dot and the <nowiki>^ and $</nowiki> anchors are only affected by \n| |
-|(?^)|PCRE 10.32+: unset modifiers|Unsets ismnx modifiers| |+|<nowiki>(?^)</nowiki>|PCRE 10.32+: unset modifiers|Unsets ismnx modifiers| | 
 + 
 + 
 +---- 
 + 
 + 
 +===== Lookarounds ===== 
 + 
 +^Lookaround^Legend^Example^Sample Match^ 
 +|(?=…)|Positive lookahead|(?=\d{10})\d{5}|01234 in 0123456789| 
 +|<nowiki>(?<=…)</nowiki>|Positive lookbehind|<nowiki>(?<=\d)cat</nowiki>|cat in 1cat| 
 +|(?!…)|Negative lookahead|(?!theatre)the\w+|theme| 
 +|(?<!…)|Negative lookbehind|\w{3}(?<!mon)ster|Munster| 
 + 
 +---- 
 + 
 +===== Character Class Operations ===== 
 + 
 +^Class Operation^Legend^Example^Sample Match^ 
 +|<nowiki>[…-[…]]</nowiki>|.NET: character class subtraction. One character that is in those on the left, but not in the subtracted class.|<nowiki>[a-z-[aeiou]]</nowiki>|Any lowercase consonant| 
 +|<nowiki>[…-[…]]</nowiki>|.NET: character class subtraction.|<nowiki>[\p{IsArabic}-[\D]]</nowiki>|An Arabic character that is not a non-digit, i.e., an Arabic digit| 
 +|<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection. One character that is both in those on the left and in the && class.|<nowiki>[\S&&[\D]]</nowiki>|An non-whitespace character that is a non-digit.| 
 +|<nowiki>[…&&[…]]</nowiki>|Java, Ruby 2+: character class intersection.|<nowiki>[\S&&[\D]&&[^a-zA-Z]]</nowiki>|An non-whitespace character that a non-digit and not a letter.| 
 +|<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction is obtained by intersecting a class with a negated class.|<nowiki>[a-z&&[^aeiou]]</nowiki>|An English lowercase letter that is not a vowel.| 
 +|<nowiki>[…&&[^…]]</nowiki>|Java, Ruby 2+: character class subtraction|<nowiki>[\p{InArabic}&&[^\p{L}\p{N}]]</nowiki>|An Arabic character that is not a letter or a number| 
 + 
 +---- 
 + 
 +===== Other Syntax ===== 
 + 
 +^Syntax^Legend^Example^Sample Match^ 
 +|\K|Keep Out.  Perl, PCRE (C, PHP, R…), Python's alternate regex engine, Ruby 2+: drop everything that was matched so far from the overall match to be returned.|prefix\K\d+|12| 
 +|\Q…\E|Perl, PCRE (C, PHP, R…), Java: treat anything between the delimiters as a literal string. Useful to escape metacharacters.|\Q(C++ ?)\E|(C++ ?)| 
 + 
regex/cheat_sheet.1598101737.txt.gz · Last modified: 2020/08/22 13:08 by 192.168.1.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki