[Whole Web] SQL INJECTION
SQL injection (SQLi) is a vulnerability that allows the attacker to make arbitrary queries to an an application’s SQL database. Depending on circumstances, the attacker may be able to bypass access control, read data from the database, tamper with the records, or even gain remote code execution on the database server.
The anatomy of an authentication bypass
In the example we just saw, the attacker exploited an SQLi vulnerability to bypass the application’s authentication mechanisms (an attack which is aptly but unimaginatively named authentication bypass). To learn how these types of vulnerabilities happen, we’ll study the source code of a vulnerable application.
The following function handles login form submissions and uses the form data to authenticates users:
fastify.post('/', async (request, reply) => { // Read the username and password from the // submitted login form. const { username, password } = request.body; // Create an SQL query for retrieving the user // with the given username and password. // // Usernames are unique, so we can only get // 0 or 1 result rows. const sqlQuery = "SELECT * FROM users" + " WHERE username='" + username + "'" + " AND password='" + password + "'"; // Execute the query. fastify.db.query(sqlQuery, (err, res) => { if (res.rowCount == 1) { // Authentication succeeded. // Add the user's information to the session // and redirect to the profile page. request.session.username = res.rows[0].username; reply.redirect('/profile'); } else { // Authentication failed. // Redirect back to the login page. reply.redirect('/'); } }); });
By definition, in an SQLi vulnerability untrusted data can change the meaning of an SQL query that the vulnerable application executes. The only place where that can happen is in the place where such data meets the rest of the query, which in the example above is the statement const sqlQuery = ...
.
Code like this seems to work correctly when tested with simple and harmless data but shows its true colors when it meets more complicated or downright malicious input.
Testing with different inputs
Let’s assume that the login form is submitted with the username rob
and the password pwned
. Then the definition of sqlQuery
becomes equivalent to:
const sqlQuery = "SELECT * FROM users" +
" WHERE username='" + "rob" + "'" +
" AND password='" + "pwned" + "'";
});
This results in the following SQL query:
SELECT * FROM users WHERE username='rob' AND password='pwned'
Let’s assume that the login form is submitted with the username rob'--
and the password pwned
. Then the definition of sqlQuery
becomes equivalent to:
const sqlQuery = "SELECT * FROM users" +
" WHERE username='" + "rob'--" + "'" +
" AND password='" + "pwned" + "'";
});
SELECT * FROM users WHERE username='rob'--' AND password='pwned'
Let’s assume that the login form is submitted with the username rob' OR 'x'='
and the password pwned
. Then the definition of sqlQuery
becomes equivalent to:
const sqlQuery = "SELECT * FROM users" +
" WHERE username='" + "rob' OR 'x'='" + "'" +
" AND password='" + "pwned" + "'";
});
SELECT * FROM users WHERE username='rob' OR 'x'='' AND password='pwned'
Exploit 1
The first exploit appends --
at the end of the username. In SQL, two dashes mark the rest of the line as a comment that should be ignored when the query is parsed, so the database only performs the username check and ignores the password check.
Exploit 2
The second exploit makes the database ignore the result of the password check by inserting a boolean OR
operator after the username check. (It also needs to add some extra characters just to make the SQL query syntactically valid. To see why they’re necessary, return to the input testing section and try to remove some or all of them from the input.)
How to avoid SQLi
When thinking about how to solve this problem, your first instinct might be to sanitize the untrusted data – replace quotes '
with their escaped versions '''
so the attacker can’t break out of string literals, and so on – but it can be very hard to get right, so it should be the absolutely last resort that you only use when nothing else works.
Fortunately, all commonly used SQL databases provide a feature that is easy to use and eliminates all chance of SQL injection: parametrized queries.
Parametrized queries
When using parametrized queries, you don’t insert the untrusted data into the query string. Instead, you use placeholders like $1
and $2
(for the first and second parameter, respectively), and pass the parameter values to the database separately from the query, so there’s no chance of them ever getting mixed up.
Here’s the vulnerable application again:
// ❌ DON'T DO THIS! const sqlQuery = "SELECT * FROM users" + " WHERE username='" + username + "'" + " AND password='" + password + "'"; db.query(sqlQuery, (err, res) => { // ... }
Using a parametrized query the code would look like this:
// ✅ No SQLi here! const sqlQuery = "SELECT * FROM users" + " WHERE username=$1" + " AND password=$2"; db.query(sqlQuery, [username, password], (err, res) => { // ... }
When the username is rob
, the updated application passes the database the following query for execution:
SELECT * FROM users WHERE username=$1 AND password=$2
Benefits and limitations
This is one of those rare cases where the secure approach is at least as easy as the vulnerable one. You might even consider the safe query easier to read and modify without all those extra quotes and string concatenation!
Speaking of quotes, notice how there are no quotes around the parameter placeholders. Whenever the database sees a placeholder, it knows it’s dealing with a parameter value and not, for example, a column name, so the quotes are unnecessary.
This brings us to the one major limitation of parametrized queries: you can only use them for values. Sometimes you need untrusted data to affect the query in a way that’s only possible by changing the structure of the query itself. In the next lesson we’ll see how to handle these cases safely.
When you can’t use parametrized queries
In the last lesson you learned how parametrized queries are a safe and simple way to refer to untrusted values in your queries. But what if you need to use completely different queries for each possible input?
A common example would be changing the way the results are sorted based on the user’s preferences. In the following query, email
and DESC
are not runtime values, so we can’t specify them using query parameters. They are part of the query itself, and changing either one of them would produce a completely different query from the database’s point of view. (It may choose to use a completely different strategy, or query plan, for retrieving the data depending on the way the results need to be sorted.)
SELECT * FROM users ORDER BY email DESC
We’ve already seen how inserting any untrusted data directly into a query is always a recipe for disaster, so it is out of the question. Just to drive this point home, though, we’ll take a quick look at exploiting SQLi in the sort order.
Exploiting SQLi in an ORDER BY clause
One might think that an injection in the ASC
/DESC
position can’t be too bad, since those are the only two keywords the SQL syntax allows after ORDER BY <column>
. One would also be catastrophically wrong, as the following exploit demonstrates. In this case the attacker has provided the sort order ASC LIMIT CASE WHEN (SELECT version()) LIKE 'PostgreSQL 13.%' THEN 1 ELSE 2 END
:
SELECT * FROM users ORDER BY email ASC LIMIT CASE WHEN (SELECT version()) LIKE 'PostgreSQL 13.%' THEN 1 ELSE 2 END
The query will return one row if the PostgreSQL version is 13, and two rows otherwise, thus exfiltrating one bit of information from the database. In a later lesson we’ll see how this is enough to retrieve all data that the database user has access to.
Parse and discard
Whenever you deal with untrusted data that is supposed to represent a choice from a limited number of options – and in the case of sorting there is a limited number of sorting columns and directions – there is a simple two-step process to keep the dangerous data at arm’s length from the SQL query:
-
Parse the untrusted data to determine which option the data represents. If it doesn’t match any of the legal options, either reject it with an error or use a default value. You should do this as soon as possible and then discard the original data to minimize the amount of code that comes into contact with the untrusted data and is thus potentially vulnerable.
-
Pick a constant SQL fragment that represents the selected option and insert it into the query. You should do this as late as possible to minimize the chance of any code modifying the SQL fragment or the final query in a dangerous way.
This way the attacker can only ever influence which of the known-safe options gets selected.
Parsing untrusted input
The concrete representation you should use for your parsed data depends on your programming language. An enum type is usually the best choice, but as a simple example that would work in any language we’ll use integer constants:
SORT_FIELD_NAME = 0; SORT_FIELD_EMAIL = 1; function parseSortField(untrustedData) { if (untrustedData === 'name') { return SORT_FIELD_NAME; } if (untrustedData === 'email') { return SORT_FIELD_EMAIL; } throw new Error('Invalid sort field'); } // Parse any untrusted data as soon as possible and // discard the original data to make sure that no code // misuses it in a dangerous way. const sortField = parseSortField(request.body.sortField); // Only use `sortField` from this point on.
No matter what the untrusted data contains, the parsing function will either map it to one of the valid constants or pull the emergency break by throwing an exception if there’s something fishy going on.
Generating SQL
If you need to build many different SQL queries dynamically, or if the queries built are especially complex, you should probably consider using an SQL query building library or an Object-Relational Mapper (ORM) for your language to make the task easier and safer.
If no such library is available to you, or if you don’t want to add an extra dependency for a one-off query, you can use the following approach. It builds the SQL query with string concatenation like the vulnerable code we saw earlier, but in this case all SQL fragments are constant values that are known to be safe.
function orderByClause(sortField) { let column; if (sortField === SORT_FIELD_NAME) { column = 'name'; } else if (sortField == SORT_FIELD_EMAIL) { column = 'email'; } else { throw new Error('Invalid sort field'); } return 'ORDER BY ' + column; } // Map the parsed data to SQL as late as possible to make sure // that no code modifies the query in a dangerous way. const sql = 'SELECT * FROM users ' + orderByClause(sortField); executeQuery(sql);
Again, the important thing about this function is that no matter what the sortField
parameter contains, the column is chosen from a fixed set of constant values. This ensures that even if the attacker managed to somehow tamper with the sortField
value, they can’t use it to inject anything into the generated SQL query.
Prefer parametrized queries
Especially if you work without a library that helps you generate the queries safely, you should stick with parametrized queries whenever possible before employing this more complex method of building queries.
Many patterns that may at first seem impossible with static queries turn out to be possible with a good knowledge and clever application of your database’s features. For example, you can make a condition in the WHERE
clause optional by ignoring it when its value is NULL
:
SELECT * FROM users WHERE ($1 IS NULL OR city = $1) AND ($2 IS NULL OR country = $2)
This is much simpler – and thus easier to get right – than conditionally appending the WHERE
clause and its conditions to the query.
Exploiting error-based SQLi
The following application has a classic SQLi vulnerability where a resource ID is taken from the URL and inserted as-is into a SELECT
query. You can confirm this easily by adding a single quote '
at the end of the page name, which results in an SQL syntax error.
An SQLi where syntax errors are shown to the user who triggers them is called error-based SQLi. It may seem like a bug or a usability problem instead of a vulnerability, since no actual data is returned. Unfortunately there are tricks that allow the attacker to select arbitrary data from the database and get it displayed in the error message.
Your task is to find out the name of the database that the application uses. You will find the solution in good old swisskyrepo/PayloadsAllTheThings.
Some more information to help you with the hack:
- The database is PostgreSQL.
- You are trying to exploit an error-based SQL injection.
- PayloadsAllTheThings typically lists each exploit on its own line. You will only need to copy a single line of SQL!
- You will need to add the exploit to a vulnerable parameter in the URL.
One of these exploits will do what you need.
https://example.com/?page_name=test-site' and 1=cast((SELECT concat('DATABASE: ',current_database())) as int) and '1'='1
For extra challenge, use the error message to read the password of the user called user
from the database. You will need to modify the exploit to run your own query instead of the database lookup.
https://example.com/?page_name=test-site' and 1=cast((SELECT table_name FROM information_schema.tables LIMIT 1 OFFSET 0) as int) and '1'='1 users https://example.com/?page_name=test-site' and 1=cast((SELECT column_name FROM information_schema.columns WHERE table_name='users' LIMIT 1 OFFSET 1) as int) and '1'='1 username https://example.com/?page_name=test-site' and 1=cast((SELECT column_name FROM information_schema.columns WHERE table_name='users' LIMIT 1 OFFSET 2) as int) and '1'='1 password https://example.com/?page_name=test-site' and 1=cast((SELECT password FROM users WHERE username='user' LIMIT 1 OFFSET 0) as int) and '1'='1 NateSmithIsTheAceOfAces
Dos and Don’ts
The following rules will help you avoid SQL injections in the code you write and spot them in the code you review.
Do
-
Use parametrized queries
Parametrized queries should be your primary way of referring to untrusted data in your SQL queries. Other methods should only be considered in the relatively rare cases where parametrized queries can’t be used. -
Parse and discard
When handling untrusted data that represents one choice out of a fixed set of options, parse it to an enum type as soon as possible and discard the original value, which may be malicious. -
Build SQL queries out of constant fragments
If you must build SQL queries dynamically, only use untrusted data to make a choice between known-safe, constant SQL fragments.
Consider
- Consider using an SQL query builder or an ORM
If your application needs to build a lot of queries dynamically, consider using a library to make this easier and safer. Don’t blindly rely on the library to keep you safe, though. You need to understand how it inserts its inputs into the queries it generates.
Don’t
-
Do not add untrusted data to your SQL queries
Keep data and query structure separate with the approaches described in this course. -
Do not attempt to clean up untrusted data
If you try to remove unsafe characters or keywords to make the input safe to include in an SQL query, you are almost guaranteed to miss something. Instead, use approaches that eliminate any chance of SQLi. -
Do not pass around SQL query fragments
The wider you spread the code that can affect the resulting SQL query string, the harder it is to ensure that no untrusted data will end up in the query. Build your queries just before executing them to make it obvious that they only contain known-safe, constant SQL fragments. -
Do not ignore any instance of SQLi, no matter how harmless it seems
Every SQLi vulnerability gives the attacker at least read access with all of the database user’s privileges.