Schneier on Security
A blog covering security and security technology.
« Dr. Dobb's Interview |
| Designing a Malicious Processor »
October 16, 2008
How to Write Injection-Proof SQL
It's about time someone wrote this paper:
Googling for "SQL injection" gets about 4 million hits. The topic excites interest and superstitious fear. This whitepaper dymystifies the topic and explains a straightforward approach to writing database PL/SQL programs that provably guarantees their immunity to SQL injection.
Only when a PL/SQL subprogram executes SQL that it creates at run time is there a risk of SQL injection; and you'll see that it's easier than you might think to freeze the SQL at PL/SQL compile time. Then you'll understand that you need the rules which prevent the risk only for the rare scenarios that do require run-time-created SQL. It turns out that these rules are simple to state and easy to follow.
EDITED TO ADD (10/26): Never mind; this seems to be a self-serving marketing piece.
Posted on October 16, 2008 at 5:56 AM
• 49 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
the problem is the people who care already know this, and the vast majority of web-facing database apps are written by people who simply don't care...
Most SQL libraries I'm aware of have something called "placeholders" which sanitize dynamic data for you. As long as you use those instead of dynamically constructing your own SQL strings, you should be fine. See, for example:
Very nice. Looks like a definitive guide to SQL injection for Oracle. Maybe someone should write similar papers for other database systems.
When reading many programming forums, blogs, or Q&A sites on the web, it becomes clear how many ppl have difficulties in getting SQL to work in the first place. They will not go through the trouble of reading and trying to understand a 67 page document on how their program (that finally appears to do what they want) might be used against them.
Obviously this is paper targeted at the possibly more advanced corporate clients of Oracle, but these should know better already, shouldn't they?
Yah it's good but it's only for PL/SQL too, it's not for the rest of the non-Oracle using populace.
And it's not that most people don't care, they just don't know...as mentioned by Derob they have enough problems just getting their program to work without worrying about SQL injection.
"Obviously this is paper targeted at the possibly more advanced corporate clients of Oracle, but these should know better already, shouldn't they?" Derob
Now there's an assumption if ever I saw one...
Seriously it looks like the paper would be of more interest to those starting out in SQL injection "for fun and profit"...
As people have noted the majority of "web coders" out there either don't know or don't care about stoping SQL injection (or most other) attacks.
Of the two I suspect the "don't knows" are the home and mom&pop shop coders, who have read a book or two.
Of the "don't cares" there are those with "marketing" or "artistic" flare. I guess for them it's all about "style over content" with a lot of "as long as it looks good, it's gota be..." type thoughts.
Which brings you onto the "don't cares" who should (and probably do) know better, but due to time presures are just "cutting (the code) and running (to the next task)".
Once upon a time I used to think that DB Admins got paid so much money as compensation for the dullness of the job. These days I'm thinking it's for the thankless nature of keeping upto date in their own time to keep the DB's secure and up and running...
R('nuff of)TFP, it's on page 18 of the pdf.
Anyone using an Oracle database should be familiar with Tom Kyte. He is to Oracle (SQL) as Schneier is to security.
Search for "injection" and start reading. If you need a little motivation, Tom tells us that using bind variables are good for not only security but also performance.
Errr I haven't read it, but how does this warrant an entire paper?
There's usually just one function that escapes unsafe strings.
In libpq (PostgreSQL) use: PQescapeStringConn()
In MySQL use: mysql_real_escape_string()
...and so on. Really not that hard.
"Maybe someone should write similar papers for other database systems."
Maybe someone already has.
Programs that escape potentially-unsafe strings often have bugs.
FORTRAN II got it right: when a string is denoted by 5HHELLO, there's no way to do an injection, because you can't put the magic "end the string here" character inside the string.
Tim: Do read the paper, or just the first dozen pages.
Escaping unsafe strings is in the vast majority of cases the wrong solution and the mark of a barely (or not) competent programmer. The error is writing an applications such that escaping is needed in the first place. It's like building a wall with various holes in it and then cover the holes with plywood board to keep out burglars. What you should do is build a solid wall instead.
Henning, can you explain why "escaping unsafe strings is in the vast majority of cases the wrong solution"?
I second Seth's comment above.
Java, C# and query languages like the LINQ family don't have that problem.
Do the math.
This injection problem is a language problem. It can be solved with better and safer languages.
Tim: using mysql_real_escape string or its equivalents is generally an indication that the code has been written incorrectly. If you're going to apply a general rule it should be 'use bound parameters'.
Many common SQL injection attacks don't use any single quote characters at all which illustrates the pointlessness of this function.
ON ERROR RESUME NEXT...
Julien: This injection problem is a language problem. It can be solved with better and safer languages.
It can't be solved. It's a problem of interpreted languages in general, languages that don't have strong barriers between code and data. Those languages are extremely useful for many problems.
But every language type has a price you pay. If you program in C, you have to carefully handle memory allocation; if you program in Java, you pay an optimization price without gaining full late-binding; if you program in a fully interpreted language, it's up to you to carefully distinguish data from code.
That's life -- it's a bitch. Don't build giant database engines in perl, and be careful with your data. Don't build operating systems in Java. Don't build gui's in C.
I've two simple rules for SQL:
1. Don't generate literal SQL at run-time if you can at all avoid it.
2. If you do have to generate SQL at run-time, don't textually insert the contents of user-supplied strings (or anything which contains anything from a user-supplied string) into the generated SQL. You've got host variables, data areas, placeholders and other tools available, use them instead.
This seems to cover most of the bases.
What ever happened to the Oracle "unbreakable" marketing campaign?
"Provably guarantees their immunity" doesn't have the same ring to it.
@ Julien Couvreur,
"This injection problem is a language problem. It can be solved with better and safer languages."
Yes and no, there is the legacy issue.
Most new languages are effectivly written in older languages such as C, and in the process inherit some of the failings.
Either directly such as language translators (Occam to C) or by maintaining stack architecture compatability etc or indirectly by assumptions about language usage and functional usefullness in the mindset of the language designer.
Then there is the issue of a language reflecting the context of the time preceding it's development, and it's lack of relevance when (and if) it main streams.
Although it is changing languages have taken 20years to mainstream and many fall by the wayside (occam / modula2 / prolog / pilot) or become strictly niche (forth / lisp).
Partly this is due to the conservative nature of programers and their managers and partly the lack of development tools on a varied array of platforms.
This time to mainstream effectivly means that the language is designed to address issues that have long since been marginalised.
Interestingly scripting languages (perl / python / ruby) are showing much shorter times to mainstream.
However scripting lauguages tend to be used at the presentation level not OS level. It is this recognition that "one size does not fit all" that has encoraged their development.
That is, you need an efficient and close to the metal language to write an OS and the languages used have been developed on the assumption the programer knows what they are doing. Which usually means that the language forsakes mechanisums (strong type / bounds checking etc) usefull to security. Which in the past has actually not been to much of an issue.
However with the advent of graphical systems, commodity software and subsiquently the web aspects of the Internet the efficient low languages are inappropriate for the rapid development of presentation level software.
Hence the birth of scripting languages, which unlike other languages are usually designed to be platform and OS agnostic and have many safe guards built in (ie assume programer does not know what they are doing). Which tends to make scripting languages much more secure against old attck vectors but not an appropriate choice for OS development.
However things are likley to change again rather rapidly. For some time now I have banged on about web browsers becoming the new OSs and that we should build all we have learnt from OS security problems into browsers. The idea appeared to have been rejected and more and more attacks through the browser have been the consiquence.
Well it looks like other people (google) have quietly thought the same way (chrome). It will be interesting to see in what way this will influence language development, allong with the new rash of attacks (clickjacking CSS etc) using current scripting languages weaknesses.
Granularity of database user privilege is a more useful area of concern. Get that correct first, and sql injection becomes much less of a worry.
regarding bound parameters: it actually depends on the database; MySQL (an example, there are a few like this) hasn't (and may still not) have a client-side binding API. This is abstracted in database toolkits by mapping those bound variables to (yes, really) escaping routines. Of course, the escaping routines are generally thoroughly tested, but the reality is that's the price you pay for abstraction; ignorance of the details.
This "you can't avoid SQL injection without a procedural language" is pretty much hogwash though. Bound parameters on database systems that have client-side APIs for it (oracle, postgres, and yes, sqlite) serve the same purpose as PL/SQL in this regard; typically at that point it becomes a question of what can be bound and what can't... f.e. a table name. Although I think it's safe to say that if you're templating your table names you shouldn't be anywhere near untrusted user input.
Someone who writes one of those database abstraction layers.
Now such papers for XSS, CSRF, safe sessions, etc., please.
I just think it's funny that the first word of the paper's abstract is "Googling." That's enough to make me not read it right there.
It should come as no surprise to *no one* that static code cannot be injected. I mean... duh.
int x = 1;
compile that and x cannot be changed to something else. Duh.
The simplest thing to do in php is to write sanitizing functions to all input that might be hitting a database.
Even before it hits the database abstraction layer of your code do a simple regex based cleanup and that will go a long way in preventing injection.
"Escaping unsafe strings is in the vast majority of cases the wrong solution and the mark of a barely (or not) competent programmer."
"using mysql_real_escape string or its equivalents is generally an indication that the code has been written incorrectly."
Sorry but I skimmed the paper and it doesn't seem to offer any attacks that work if the strings have been escaped. What is the 'correct' way to do it (from the C API?)
You make the correct claim that you might miss an instance where you should have escaped a string. This is true (however I think it is one of the easiest security flaws to avoid & detect). I never got around to it but I was once going to write a database API layer in C++ that would force you to use escaped strings. I.e. have a EscapedString class and so on.
"do a simple regex based cleanup and that will go a long way in preventing injection"
Are you being sarcastic? Why use a regex that is likely incorrect to go 'a long way' towards preventing injection when you can use a pre-written correct function that goes all the way?
Aren't there any built in functions that can take care of these concerns? Why do I have to keep looking all over the place hoping I haven't skipped something crucial.
i think some of the comments here about one language being better than another for protecting against injections are silly. The same concepts apply to all languages: you filter or sanitize user input by a number of good library functions. On the web, which is mostly what we're talking about (since bad application user input simply breaks the app which doesn't do anything for the hacker except to prove to himself he can break something of his), you know exactly what *could* be sent over: text, cookies and files. Files and blobs you don't store in the database, you only link to their location on the filesystem. You can check them on your own and should never execute them without checking first. Text is something you always run a sanitizer on.
restricting all functionality to PL/SQL objects helps, but the database (esp. Oracle) is the most inflexible, non-portable, and financially expensive place to put your business logic. But then again, I've seen the sludge SQL that middleware crap produces and PL/SQL really is a handy language.
I really don't get how other people can't grasp that languages with true parameter binding are fundamentally safer than ones that can only sanitize. One of them has a strict separation between data and code, the other does not. Bind parameters are an impenetrable brick wall, sanitizers are doors with a tricky lock. Sooner or later someone smarter than the lockmaker will come along and pick it.
SQL injection is a very serious epidemic facing our young generation of hackers. Sure, it starts out small, a SELECT here, or a few INSERTS on the weekend. But soon, it engulfs their lives as they move onto stored procedures while trying to score triggers in the middle of the night.
We must speak to our children about the dangers of SQL injection before it is too late. Just say no!
So what I am failing to get is why the approach in this paper proves that a client trying to execute:
exec usermgmt.authenticate('username', 'password');
without sanitizing user-input correctly won't execute
exec usermgmt.authenticate('username', 'fakepassword'); exec usermgmt.changepassword('username', 'changed');
by sending password="fakepassword'); exec usermgmt.changepassword('username', 'changed"
In the section "Expose the database to clients only via a PL/SQL API" the authors write:
"This paradigm locates the responsibility to prevent SQL injection where it belongs: in the subsystem
of the overall application stack that executes the SQL."
Doesn't my example highlight that you still have to go through the less cost-effective, harder to guarantee
process of analysing all client code, so that attackers don't gain access to the PL/SQL-api which the
database user has privilege to in unintended ways?
Caveat lector: I don't develop for Oracle, so I might be misunderstanding how PL/SQL execution works horribly.
Could any of you bright minds enlighten me?
Test Cases: Does anyone know the URL for a good list of test cases?
Blocking SQL Injection is not just about good intentions when writing your code - you need to test thoroughly too.
A good way to handle this is to cleanse parameters at the API interface. A API
wrapper could clean up all parameters before using it (passing it to the `work'
layer). The API wrapper could be handled by a code generation layer. The
programmer could then be blisfully ignorant about how SQLs are written.
@Tim: Escaping strings is not the solution. That approach depends on being able to figure out every possible way that an attack could be made. That's a losing battle.
A much better solution is something like th mysqli PHP module (notice the 'i' at the end). This module pre-compiles the SQL query without user-provided parameters, and /then/ it inserts the user-provided parameters. Google for 'mysqli' and 'bind_param'. The advantage of this method is that (1) you don't have to worry about whether you are catching every possible bad character that could be misused and (2) it actually makes coding easier because you don't have to "un-escape" the data that you extract from the database.
poohneat: "The simplest thing to do in php is to write sanitizing functions to all input that might be hitting a database."
And that is the WRONG solution. You are never going to come up with every possible way that input can be misconstructed to produce an SQL injection attack. The attacker can send extended characters, unicode, etc. Are you sure that your sanitizing function is secure against the backspace character? If you are relying on a sanitizing function you are fighting a losing battle. A much better solution is to pre-compile the query without any user input and then use bound parameters. This will protect you even against injection attacks that you did not think of. And that's the right way to do security.
Tim: "Sorry but I skimmed the paper and it doesn't seem to offer any attacks that work if the strings have been escaped."
And how can you possibly know if you have successfully escaped every character that could be used in an attack? Do you escape the backspace character? Are you sure that there isn't a unicode character that could be used in an attack? If you (or the guy who wrote your API) are trying to think of every possible character that could be misused, you will never have security because there will always be someone somewhere who will think of something that you did not think of.
It is much better to cirumvent the problem with bound parameters. As a nice side-effect, this will make some aspects of your program easier to write too, because the stuff you are entering in the database would not be mangled by your escaping function.
I'll echo what Jeroen said about placeholders, and add that in practice you absolutely unavoidably have to use them even if you for some reason believe security is not an issue (e.g., for intranet apps where you are sure you can trust all the users), because if you don't use them your software breaks the first time somebody puts an apostrophe in the data (or a quote mark if you use double quotes in your SQL instead of single), which, frankly, happens far too frequently to be ignorable.
It takes any vaguely intelligent programmer about three days of programming DBI stuff without placeholders (well, okay, three days once the first working version is handed over to a beta tester) before he realizes how pervasive and annoying this problem can be and either starts using placeholders, or starts working around it in some other way, such as systematically decoding/encoding the data in a way that mangles metacharacters (e.g., HTML::Entities).
The latter approach (encoding and decoding all the data) is more time consuming for the programmer and has a significant adverse effect on the maintainability of the code, but it does also have the side effect of protecting against injection attacks, provided it's used on all fields. I suppose a sufficiently naive programmer might skip doing the numeric fields at first, but that's going to lead to bugs even in the absence of the security issue (yes, users DO inadvertently type quote marks and apostrophes into numeric fields and expect the program to still work), so sooner or later you end up protecting *all* fields. Doing all that encoding/decoding by hand (or, alternately, checking all input for SQL metacharacters and issuing an error message every time one is encountered, an approach that annoys the users more but is otherwise equally effective) gets old after a while, and you inevitably look for an easier way, which is generally either going to be wrapping 100% of your database access with functions that do this automatically, or switching to placeholders.
Yeah, I know, SQL injection attacks are all over the news, but most of the ones out there in the wild are against software written and maintained by extreme novice programmers, usually people who heard that PHP is almost as easy to learn as HTML, spent ten minutes reading a tutorial, copied and pasted one of the examples, and made a few modifications.
For experienced programmers, this is a non-issue.
"And how can you possibly know if you have successfully escaped every character that could be used in an attack? Do you escape the backspace character? Are you sure that there isn't a unicode character that could be used in an attack? If you (or the guy who wrote your API) are trying to think of every possible character that could be misused, you will never have security because there will always be someone somewhere who will think of something that you did not think of."
Well I assume the MySQL (for example) developers who *wrote the SQL-parsing code* are capable of ensuring that their escaping function is correct! Even if they get it wrong it is a security bug in MySQL, not my code. One can hardly be expected to do a complete audit of the entire MySQL codebase!
I do agree that you should always check that the values are valid.
Tim: Well I assume the MySQL (for example) developers who *wrote the SQL-parsing code* are capable of ensuring that their escaping function is correct!
That is absurd. It is nuts to assume that any human being can find every possible vunerability. There is always someone, somewhere in this planet, who will think of something that you didn't. Btw, the developers who write the escaping functions are not the developers who wrote MySQL or the parsing code. They are PHP developers.
Tim: Even if they get it wrong it is a security bug in MySQL, not my code.
And this is just sad. Telling your customers or your boss that the breakin is someone else's fault doesn't help. The break-in still happened.
Tim: One can hardly be expected to do a complete audit of the entire MySQL codebase!
I don't think you understand what people are telling you (not just me). If you are thinking in terms of auditing code, you missed the point. And btw, the problem is not even in the MySQL codebase. The problem is that it is impossible to find every form of input that can cause an SQL injection attack.
I've told you twice, and two others have told you as well: You need to pre-compile the SQL query and use bound parameters. Google for "SQL bound parameters" or "PHP mysqli". The idea is to compile the SQL query without any user input, and then insert the user data inside the already compiled query. This way you ensure that user data doesn not alter the compilation of the SQL query. This method is simple, easy to do and effective. It protects you even against the injection methods that you have not thought of.
@Julien Couvreur quote: "Java, C# and query languages like the LINQ family don't have that problem.
Do the math.
This injection problem is a language problem. It can be solved with better and safer languages. "
That's funny... I distinctly remember successfully performing SQL injections against apps written in both Java and C#. Injections are NOT a language problem. Any language that accepts outside input is susceptible to an injection attack.
You are missing the point. Imagine that nobody here can find an attack against PHP's real_escape_string(). Does that mean that there isn't one? When Microsoft releases a patch do you conclude that there are no more bugs in Windows and now it is safe? Of course, not. There are always new attacks that you just haven't found yet.
In any case, as chance would have it, I did manage to find an example that would probably break that PHP function:
1) 0xbf27 is an invalid multi-byte character, but 0xbf5c is a valid (Chinese). 0x27 is the character for a single quote and 0x5c is a backslash.
2) Enter the 0xbf27 multi-byte character. This gets interpreted as 0xbf (¿) followed by 0x27 (single quote). The latter character is an apostrophe so it should be escaped with a backslash (0x5c). This produces the byte sequence: 0xbf5c27
3) So the database server gets 0xbf5c27. It recognizes that 0xbf5c is a valid multi-byte character and is interpreted as such. Then it finds a 0x27 which is an appostrophe. So what the database server sees is a chinese character followed by an apostrophe.
Voila! We just managed to get an apostrophe into the SQL expression. Now, you might find a way to avoid this particular attack. But how do you know that there isn't another attack based on Tamil characters that you haven't thought of?
Another example. I notice that that function does not escape the DEL character (0x7f). Suppose that you send 0x7f followed by an apostrophe. The PHP function adds a backslash before the apostrophe. Are you confident that when it goes to the database server, the DEL character will not remove the backslash? Personally, I don't know, and it might even depend on the server, but are you willing to take the risk?
The important point is: You will never be sure that you caught all possible bad characters. Use a system that does not require you to come up with bad characters. Use msqli and bound parameters.
Uhhh if you have no privileges to do anything on my system except what I want you to do I have saved a bunch of hassle (no developer ever gets dbo and asking for sa is hilarious).
Then if I make you use parameters for a stored proc I have safe guarded even further.
This is late 90's stuff.
I run strings for _my_ stuff (find all foo in all db's etc). I am sa and can do what I like, strings are quicker for dba stuff.
String are so pitifully slower in ms sql (compared to a nicely written proc with a cached exe plan) that I laugh at the developer that tells me he needs a string executed. Then I show him my list of sites I have helped recover from sql injections.Then I say he can have them if he puts a couple of hours downtime in bond (he gets the money back when his code is retired plus interest).
And I am one of those fancy new "friendly" dba's that have turned up over the past decade. I find it hard to believe a serious dba in any language which supports procs would allow anything else.
Bernie: "Anyone using an Oracle database should be familiar with Tom Kyte. He is to Oracle (SQL) as Schneier is to security.:
Hardly. Schneier is independent and unafraid to say bad things, whilst Kyte is just a mouthpiece for Oracle Corporation.
Why do I ask.. but...
Shouldn't the application be handling security more than some random framework? You are relying on generics to solve specific problems. There are only X number of cases where injenction truly becomes an issue if you do proper validation and rules.
That isn't to say escaping is bad, but everyone seems to want to give the applications a by in all this and they really can't. I'd be curious to see how many applications TYPES that store data in a database truly need to have all allowable input of all allowable lengths.
I personally find the ';' (support for multiple/batch SQL statements in SQL Server) to be a very major concern.
It should be a 'feature' that can be disabled by the DBA.
By default, it is not supported by Oracle.
The half million websites that got injected last summer were due to ';' support.
Escaping the ', is not full proof, since, for one, you have to check for string truncation in the code too.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.