This is an old revision of the document!
Table of Contents
SQL Injection - Why not to use escape quotes as a defence
Raw SQL
When you're writing SQL – for anything that takes human input really, a lot of things have been done to avoid the injection.
Everyone that's heard of SQL injection knows that (I'm going to use PHP as a sample) doing something like this isn't safe:
$sql = "SELECT * FROM `users` . WHERE `userName` = "{$_POST["username"]}" . AND `pass` = "{$_POST["pass"]}";";
Magic
Why can't we escape all user input using “magic”?
Magic quotes was a controversial feature of the PHP scripting language, wherein strings are automatically escaped. Special characters are prefixed with a backslash before being passed on.
The idea of using “magic escape quotes” to deal with program input that isn't sanitized correctly does not really solve the issue with SQL injection for the following reasons.
- At the time the magic is applied, it is unknown where the data will end up. So magic quotes are destroying data, unless it is written unescaped to a database.
- Not all data that are supplied by the user are intended for insertion into a database. They may be rendered directly to the screen, stored in a session, or previewed before saving. This can result in backslashes being added where they are not wanted and being shown to the end user.
- It may just be used in the HTML response sent back to the client. Think of a form that has not been filled in completely and is therefore shown again to the user. With magic quotes, the data entered on the first attempt will now be SQL escaped, which is meaningless on an HTML page. Even worse: on the second submission the data is SQL escaped again.
- Adding magic quotes and subsequently removing them where appropriate incurs a small but unnecessary amount of performance overhead.
- Magic quotes also use the generic functionality provided by PHP's addslashes() function, which is not Unicode-aware and is still subject to SQL injection vulnerabilities in some multi-byte character encodings.
- Magic quotes do not protect against other common security vulnerabilities such as cross-site scripting attacks or SMTP header injection attacks
Adding slashes
The addslashes() command escapes all the quotes which prevents hackers using something like:
' OR 1=1
However even the documentation for addslashes says that you shouldn't use addslashes, but instead to use database-specific functions such as mysql_real_escape_string().
addslashes() also has issues with multibyte characters:
' 27
\ 5c
뼧 bf 5c
Those are two bytes, but only one Unicode character.
Since addslashes does not know anything about Unicode, it converts the input bf 27 to bf 5c 27. If this is read by a Unicode-aware program, it is seen as 뼧'. Boom.
There is a good explanation of this issue at http://shiflett.org/blog/2006/jan/addslashes-versus-mysql-real-escape-string.
Example of an addslashes() SQL injection attack
Demonstrating an SQL injection attack that is immune to addslashes() is a bit more involved, but still pretty straightforward.
In GBK, 0xbf27 is not a valid multi-byte character, but 0xbf5c is. Interpreted as single-byte characters, 0xbf27 is 0xbf (¿) followed by 0x27 ('), and 0xbf5c is 0xbf (¿) followed by 0x5c (\).
How does this help? If I want to attempt a SQL injection attack against a MySQL database, having single quotes escaped with a backslash is a bummer. If you're using addslashes(), however, I'm in luck. All I need to do is inject something like 0xbf27, and addslashes() modifies this to become 0xbf5c27, a valid multi-byte character followed by a single quote. In other words, I can successfully inject a single quote despite your escaping. That's because 0xbf5c is interpreted as a single character, not two. Oops, there goes the backslash.
Adding slashes specific to Database
So, we can't use DBMS specific *_real_escape_string, we can't use add slashes, the “magic quotes” thing caused lots of issues, and the web is full of short worded quotes such as:
“A dedicated hacker will find a way to jump through your quote-escaping loops, just use the DBAL prepared statements” - John Q any programmer Okay, so that scared me enough to use prepare statements and a DBAL. It didn't really explain anything, but it sounds good because I've heard it a lot.
Prepared statements
So now we're using PDO, or a DBAL from a framework, or something else that wraps all our sql and makes sure someone can't run an sql injection.
My question is basically a “why not?”, not a “what should I use?”. The web's full of people telling you to use this or use that or whatever, but no explanations of why these things had to happen.
Direct questions
Pointed questions (reminder, I'm asking about SQL, PHP was an example language because of it's bad rep around SQL, concepts are universal):
Why can't we escape all user input using “magic” ? Why wasn't addslashes “good enough”? Whats wrong with using DB-specific escape functions, and why was it better than addslashes? Why are prepared statements with frameworks and PDO being hailed as the gold standard of SQL? Why are they better? Why can't I do an SQL injection with these, where as I COULD have with the previously mentioned means? Can a programmer not somehow manage to still screw this up? What should they look out for? Any other concerns I haven't brought up?