In a natural join, the column on which the join was made occurs twice in the new table

MySQL supports the following JOIN syntax for the table_references part of SELECT statements and multiple-table DELETE and UPDATE statements:

table_references: escaped_table_reference [, escaped_table_reference] ... escaped_table_reference: { table_reference | { OJ table_reference } } table_reference: { table_factor | joined_table } table_factor: { tbl_name [PARTITION (partition_names)] [[AS] alias] [index_hint_list] | table_subquery [AS] alias | ( table_references ) } joined_table: { table_reference [INNER | CROSS] JOIN table_factor [join_specification] | table_reference STRAIGHT_JOIN table_factor | table_reference STRAIGHT_JOIN table_factor ON search_condition | table_reference {LEFT|RIGHT} [OUTER] JOIN table_reference join_specification | table_reference NATURAL [{LEFT|RIGHT} [OUTER]] JOIN table_factor } join_specification: { ON search_condition | USING (join_column_list) } join_column_list: column_name [, column_name] ... index_hint_list: index_hint [, index_hint] ... index_hint: { USE {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] ([index_list]) | {IGNORE|FORCE} {INDEX|KEY} [FOR {JOIN|ORDER BY|GROUP BY}] (index_list) } index_list: index_name [, index_name] ...

A table reference is also known as a join expression.

A table reference (when it refers to a partitioned table) may contain a PARTITION clause, including a list of comma-separated partitions, subpartitions, or both. This option follows the name of the table and precedes any alias declaration. The effect of this option is that rows are selected only from the listed partitions or subpartitions. Any partitions or subpartitions not named in the list are ignored. For more information and examples, see Section 19.5, “Partition Selection”.

The syntax of table_factor is extended in MySQL in comparison with standard SQL. The standard accepts only table_reference, not a list of them inside a pair of parentheses.

This is a conservative extension if each comma in a list of table_reference items is considered as equivalent to an inner join. For example:

SELECT * FROM t1 LEFT JOIN (t2, t3, t4) ON (t2.a = t1.a AND t3.b = t1.b AND t4.c = t1.c)

is equivalent to:

SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4) ON (t2.a = t1.a AND t3.b = t1.b AND t4.c = t1.c)

In MySQL, JOIN, CROSS JOIN, and INNER JOIN are syntactic equivalents (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise.

In general, parentheses can be ignored in join expressions containing only inner join operations. MySQL also supports nested joins. See Section 8.2.1.7, “Nested Join Optimization”.

Index hints can be specified to affect how the MySQL optimizer makes use of indexes. For more information, see Section 8.9.3, “Index Hints”. The optimizer_switch system variable is another way to influence optimizer use of indexes. See Section 8.9.2, “Switchable Optimizations”.

The following list describes general factors to take into account when writing joins:

  • A table reference can be aliased using tbl_name AS alias_name or tbl_name alias_name:

    SELECT t1.name, t2.salary FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name; SELECT t1.name, t2.salary FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;
  • A table_subquery is also known as a derived table or subquery in the FROM clause. See Section 13.2.10.8, “Derived Tables”. Such subqueries must include an alias to give the subquery result a table name. A trivial example follows:

    SELECT * FROM (SELECT 1, 2, 3) AS t1;
  • The maximum number of tables that can be referenced in a single join is 61.

  • INNER JOIN and , (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).

    However, the precedence of the comma operator is less than that of INNER JOIN, CROSS JOIN, LEFT JOIN, and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form Unknown column 'col_name' in 'on clause' may occur. Information about dealing with this problem is given later in this section.

  • The search_condition used with ON is any conditional expression of the form that can be used in a WHERE clause. Generally, the ON clause serves for conditions that specify how to join tables, and the WHERE clause restricts which rows to include in the result set.

  • If there is no matching row for the right table in the ON or USING part in a LEFT JOIN, a row with all columns set to NULL is used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:

    SELECT left_tbl.* FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id WHERE right_tbl.id IS NULL;

    This example finds all rows in left_tbl with an id value that is not present in right_tbl (that is, all rows in left_tbl with no corresponding row in right_tbl). See Section 8.2.1.8, “Outer Join Optimization”.

  • The USING(join_column_list) clause names a list of columns that must exist in both tables. If tables a and b both contain columns c1, c2, and c3, the following join compares corresponding columns from the two tables:

    a LEFT JOIN b USING (c1, c2, c3)
  • The NATURAL [LEFT] JOIN of two tables is defined to be semantically equivalent to an INNER JOIN or a LEFT JOIN with a USING clause that names all columns that exist in both tables.

  • RIGHT JOIN works analogously to LEFT JOIN. To keep code portable across databases, it is recommended that you use LEFT JOIN instead of RIGHT JOIN.

  • The { OJ ... } syntax shown in the join syntax description exists only for compatibility with ODBC. The curly braces in the syntax should be written literally; they are not metasyntax as used elsewhere in syntax descriptions.

    SELECT left_tbl.* FROM { OJ left_tbl LEFT OUTER JOIN right_tbl ON left_tbl.id = right_tbl.id } WHERE right_tbl.id IS NULL;

    You can use other types of joins within { OJ ... }, such as INNER JOIN or RIGHT OUTER JOIN. This helps with compatibility with some third-party applications, but is not official ODBC syntax.

  • STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.

Some join examples:

SELECT * FROM table1, table2; SELECT * FROM table1 INNER JOIN table2 ON table1.id = table2.id; SELECT * FROM table1 LEFT JOIN table2 ON table1.id = table2.id; SELECT * FROM table1 LEFT JOIN table2 USING (id); SELECT * FROM table1 LEFT JOIN table2 ON table1.id = table2.id LEFT JOIN table3 ON table2.id = table3.id;

Natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard:

  • Redundant columns of a NATURAL join do not appear. Consider this set of statements:

    CREATE TABLE t1 (i INT, j INT); CREATE TABLE t2 (k INT, j INT); INSERT INTO t1 VALUES(1, 1); INSERT INTO t2 VALUES(1, 1); SELECT * FROM t1 NATURAL JOIN t2; SELECT * FROM t1 JOIN t2 USING (j);

    In the first SELECT statement, column j appears in both tables and thus becomes a join column, so, according to standard SQL, it should appear only once in the output, not twice. Similarly, in the second SELECT statement, column j is named in the USING clause and should appear only once in the output, not twice.

    Thus, the statements produce this output:

    +------+------+------+ | j | i | k | +------+------+------+ | 1 | 1 | 1 | +------+------+------+ +------+------+------+ | j | i | k | +------+------+------+ | 1 | 1 | 1 | +------+------+------+

    Redundant column elimination and column ordering occurs according to standard SQL, producing this display order:

    • First, coalesced common columns of the two joined tables, in the order in which they occur in the first table

    • Second, columns unique to the first table, in order in which they occur in that table

    • Third, columns unique to the second table, in order in which they occur in that table

    The single result column that replaces two common columns is defined using the coalesce operation. That is, for two t1.a and t2.a the resulting single join column a is defined as a = COALESCE(t1.a, t2.a), where:

    COALESCE(x, y) = (CASE WHEN x IS NOT NULL THEN x ELSE y END)

    If the join operation is any other join, the result columns of the join consist of the concatenation of all columns of the joined tables.

    A consequence of the definition of coalesced columns is that, for outer joins, the coalesced column contains the value of the non-NULL column if one of the two columns is always NULL. If neither or both columns are NULL, both common columns have the same value, so it doesn't matter which one is chosen as the value of the coalesced column. A simple way to interpret this is to consider that a coalesced column of an outer join is represented by the common column of the inner table of a JOIN. Suppose that the tables t1(a, b) and t2(a, c) have the following contents:

    t1 t2 ---- ---- 1 x 2 z 2 y 3 w

    Then, for this join, column a contains the values of t1.a:

    mysql> SELECT * FROM t1 NATURAL LEFT JOIN t2; +------+------+------+ | a | b | c | +------+------+------+ | 1 | x | NULL | | 2 | y | z | +------+------+------+

    By contrast, for this join, column a contains the values of t2.a.

    mysql> SELECT * FROM t1 NATURAL RIGHT JOIN t2; +------+------+------+ | a | c | b | +------+------+------+ | 2 | z | y | | 3 | w | NULL | +------+------+------+

    Compare those results to the otherwise equivalent queries with JOIN ... ON:

    mysql> SELECT * FROM t1 LEFT JOIN t2 ON (t1.a = t2.a); +------+------+------+------+ | a | b | a | c | +------+------+------+------+ | 1 | x | NULL | NULL | | 2 | y | 2 | z | +------+------+------+------+mysql> SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a = t2.a); +------+------+------+------+ | a | b | a | c | +------+------+------+------+ | 2 | y | 2 | z | | NULL | NULL | 3 | w | +------+------+------+------+
  • A USING clause can be rewritten as an ON clause that compares corresponding columns. However, although USING and ON are similar, they are not quite the same. Consider the following two queries:

    a LEFT JOIN b USING (c1, c2, c3) a LEFT JOIN b ON a.c1 = b.c1 AND a.c2 = b.c2 AND a.c3 = b.c3

    With respect to determining which rows satisfy the join condition, both joins are semantically identical.

    With respect to determining which columns to display for SELECT * expansion, the two joins are not semantically identical. The USING join selects the coalesced value of corresponding columns, whereas the ON join selects all columns from all tables. For the USING join, SELECT * selects these values:

    COALESCE(a.c1, b.c1), COALESCE(a.c2, b.c2), COALESCE(a.c3, b.c3)

    For the ON join, SELECT * selects these values:

    a.c1, a.c2, a.c3, b.c1, b.c2, b.c3

    With an inner join, COALESCE(a.c1, b.c1) is the same as either a.c1 or b.c1 because both columns have the same value. With an outer join (such as LEFT JOIN), one of the two columns can be NULL. That column is omitted from the result.

  • An ON clause can refer only to its operands.

    Example:

    CREATE TABLE t1 (i1 INT); CREATE TABLE t2 (i2 INT); CREATE TABLE t3 (i3 INT); SELECT * FROM t1 JOIN t2 ON (i1 = i3) JOIN t3;

    The statement fails with an Unknown column 'i3' in 'on clause' error because i3 is a column in t3, which is not an operand of the ON clause. To enable the join to be processed, rewrite the statement as follows:

    SELECT * FROM t1 JOIN t2 JOIN t3 ON (i1 = i3);
  • JOIN has higher precedence than the comma operator (,), so the join expression t1, t2 JOIN t3 is interpreted as (t1, (t2 JOIN t3)), not as ((t1, t2) JOIN t3). This affects statements that use an ON clause because that clause can refer only to columns in the operands of the join, and the precedence affects interpretation of what those operands are.

    Example:

    CREATE TABLE t1 (i1 INT, j1 INT); CREATE TABLE t2 (i2 INT, j2 INT); CREATE TABLE t3 (i3 INT, j3 INT); INSERT INTO t1 VALUES(1, 1); INSERT INTO t2 VALUES(1, 1); INSERT INTO t3 VALUES(1, 1); SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);

    The JOIN takes precedence over the comma operator, so the operands for the ON clause are t2 and t3. Because t1.i1 is not a column in either of the operands, the result is an Unknown column 't1.i1' in 'on clause' error.

    To enable the join to be processed, use either of these strategies:

    • Group the first two tables explicitly with parentheses so that the operands for the ON clause are (t1, t2) and t3:

      SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);
    • Avoid the use of the comma operator and use JOIN instead:

      SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);

    The same precedence interpretation also applies to statements that mix the comma operator with INNER JOIN, CROSS JOIN, LEFT JOIN, and RIGHT JOIN, all of which have higher precedence than the comma operator.

  • A MySQL extension compared to the SQL:2003 standard is that MySQL permits you to qualify the common (coalesced) columns of NATURAL or USING joins, whereas the standard disallows that.


Page 2

SELECT ... UNION [ALL | DISTINCT] SELECT ... [UNION [ALL | DISTINCT] SELECT ...]

UNION combines the result from multiple SELECT statements into a single result set.

The result set column names are taken from the column names of the first SELECT statement.

Selected columns listed in corresponding positions of each SELECT statement should have the same data type. For example, the first column selected by the first statement should have the same type as the first column selected by the other statements. If the data types of corresponding SELECT columns do not match, the types and lengths of the columns in the UNION result take into account the values retrieved by all the SELECT statements. For example, consider the following, where the column length is not constrained to the length of the value from the first SELECT:

mysql> SELECT REPEAT('a',1) UNION SELECT REPEAT('b',20); +----------------------+ | REPEAT('a',1) | +----------------------+ | a | | bbbbbbbbbbbbbbbbbbbb | +----------------------+
UNION DISTINCT and UNION ALL

By default, duplicate rows are removed from UNION results. The optional DISTINCT keyword has the same effect but makes it explicit. With the optional ALL keyword, duplicate-row removal does not occur and the result includes all matching rows from all the SELECT statements.

You can mix UNION ALL and UNION DISTINCT in the same query. Mixed UNION types are treated such that a DISTINCT union overrides any ALL union to its left. A DISTINCT union can be produced explicitly by using UNION DISTINCT or implicitly by using UNION with no following DISTINCT or ALL keyword.

ORDER BY and LIMIT in Unions

To apply an ORDER BY or LIMIT clause to an individual SELECT, parenthesize the SELECT and place the clause inside the parentheses:

(SELECT a FROM t1 WHERE a=10 AND B=1 ORDER BY a LIMIT 10) UNION (SELECT a FROM t2 WHERE a=11 AND B=2 ORDER BY a LIMIT 10);

Use of ORDER BY for individual SELECT statements implies nothing about the order in which the rows appear in the final result because UNION by default produces an unordered set of rows. Therefore, ORDER BY in this context typically is used in conjunction with LIMIT, to determine the subset of the selected rows to retrieve for the SELECT, even though it does not necessarily affect the order of those rows in the final UNION result. If ORDER BY appears without LIMIT in a SELECT, it is optimized away because it has no effect.

To use an ORDER BY or LIMIT clause to sort or limit the entire UNION result, parenthesize the individual SELECT statements and place the ORDER BY or LIMIT after the last one:

(SELECT a FROM t1 WHERE a=10 AND B=1) UNION (SELECT a FROM t2 WHERE a=11 AND B=2) ORDER BY a LIMIT 10;

A statement without parentheses is equivalent to one parenthesized as just shown.

This kind of ORDER BY cannot use column references that include a table name (that is, names in tbl_name.col_name format). Instead, provide a column alias in the first SELECT statement and refer to the alias in the ORDER BY. (Alternatively, refer to the column in the ORDER BY using its column position. However, use of column positions is deprecated.)

Also, if a column to be sorted is aliased, the ORDER BY clause must refer to the alias, not the column name. The first of the following statements is permitted, but the second fails with an Unknown column 'a' in 'order clause' error:

(SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY b; (SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY a;

To cause rows in a UNION result to consist of the sets of rows retrieved by each SELECT one after the other, select an additional column in each SELECT to use as a sort column and add an ORDER BY that sorts on that column following the last SELECT:

(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1) UNION (SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col;

To additionally maintain sort order within individual SELECT results, add a secondary column to the ORDER BY clause:

(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1) UNION (SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col, col1a;

Use of an additional column also enables you to determine which SELECT each row comes from. Extra columns can provide other identifying information as well, such as a string that indicates a table name.

UNION Restrictions

In a UNION, the SELECT statements are normal select statements, but with the following restrictions:

  • HIGH_PRIORITY in the first SELECT has no effect. HIGH_PRIORITY in any subsequent SELECT produces a syntax error.

  • Only the last SELECT statement can use an INTO clause. However, the entire UNION result is written to the INTO output destination.


Page 3

A subquery is a SELECT statement within another statement.

All subquery forms and operations that the SQL standard requires are supported, as well as a few features that are MySQL-specific.

Here is an example of a subquery:

SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);

In this example, SELECT * FROM t1 ... is the outer query (or outer statement), and (SELECT column1 FROM t2) is the subquery. We say that the subquery is nested within the outer query, and in fact it is possible to nest subqueries within other subqueries, to a considerable depth. A subquery must always appear within parentheses.

The main advantages of subqueries are:

  • They allow queries that are structured so that it is possible to isolate each part of a statement.

  • They provide alternative ways to perform operations that would otherwise require complex joins and unions.

  • Many people find subqueries more readable than complex joins or unions. Indeed, it was the innovation of subqueries that gave people the original idea of calling the early SQL Structured Query Language.

Here is an example statement that shows the major points about subquery syntax as specified by the SQL standard and supported in MySQL:

DELETE FROM t1 WHERE s11 > ANY (SELECT COUNT(*) /* no hint */ FROM t2 WHERE NOT EXISTS (SELECT * FROM t3 WHERE ROW(5*t2.s1,77)= (SELECT 50,11*s1 FROM t4 UNION SELECT 50,77 FROM (SELECT * FROM t5) AS t5)));

A subquery can return a scalar (a single value), a single row, a single column, or a table (one or more rows of one or more columns). These are called scalar, column, row, and table subqueries. Subqueries that return a particular kind of result often can be used only in certain contexts, as described in the following sections.

There are few restrictions on the type of statements in which subqueries can be used. A subquery can contain many of the keywords or clauses that an ordinary SELECT can contain: DISTINCT, GROUP BY, ORDER BY, LIMIT, joins, index hints, UNION constructs, comments, functions, and so on.

A subquery's outer statement can be any one of: SELECT, INSERT, UPDATE, DELETE, SET, or DO.

In MySQL, you cannot modify a table and select from the same table in a subquery. This applies to statements such as DELETE, INSERT, REPLACE, UPDATE, and (because subqueries can be used in the SET clause) LOAD DATA.

For information about how the optimizer handles subqueries, see Section 8.2.2, “Optimizing Subqueries and Derived Tables”. For a discussion of restrictions on subquery use, including performance issues for certain forms of subquery syntax, see Section 13.2.10.12, “Restrictions on Subqueries”.


Page 4

13.2.10.1 The Subquery as Scalar Operand

In its simplest form, a subquery is a scalar subquery that returns a single value. A scalar subquery is a simple operand, and you can use it almost anywhere a single column value or literal is legal, and you can expect it to have those characteristics that all operands have: a data type, a length, an indication that it can be NULL, and so on. For example:

CREATE TABLE t1 (s1 INT, s2 CHAR(5) NOT NULL); INSERT INTO t1 VALUES(100, 'abcde'); SELECT (SELECT s2 FROM t1);

The subquery in this SELECT returns a single value ('abcde') that has a data type of CHAR, a length of 5, a character set and collation equal to the defaults in effect at CREATE TABLE time, and an indication that the value in the column can be NULL. Nullability of the value selected by a scalar subquery is not copied because if the subquery result is empty, the result is NULL. For the subquery just shown, if t1 were empty, the result would be NULL even though s2 is NOT NULL.

There are a few contexts in which a scalar subquery cannot be used. If a statement permits only a literal value, you cannot use a subquery. For example, LIMIT requires literal integer arguments, and LOAD DATA requires a literal string file name. You cannot use subqueries to supply these values.

When you see examples in the following sections that contain the rather spartan construct (SELECT column1 FROM t1), imagine that your own code contains much more diverse and complex constructions.

Suppose that we make two tables:

CREATE TABLE t1 (s1 INT); INSERT INTO t1 VALUES (1); CREATE TABLE t2 (s1 INT); INSERT INTO t2 VALUES (2);

Then perform a SELECT:

SELECT (SELECT s1 FROM t2) FROM t1;

The result is 2 because there is a row in t2 containing a column s1 that has a value of 2.

A scalar subquery can be part of an expression, but remember the parentheses, even if the subquery is an operand that provides an argument for a function. For example:

SELECT UPPER((SELECT s1 FROM t1)) FROM t2;



Page 5

13.2.10.2 Comparisons Using Subqueries

The most common use of a subquery is in the form:

non_subquery_operand comparison_operator (subquery)

Where comparison_operator is one of these operators:

= > < >= <= <> != <=>

For example:

... WHERE 'a' = (SELECT column1 FROM t1)

MySQL also permits this construct:

non_subquery_operand LIKE (subquery)

At one time the only legal place for a subquery was on the right side of a comparison, and you might still find some old DBMSs that insist on this.

Here is an example of a common-form subquery comparison that you cannot do with a join. It finds all the rows in table t1 for which the column1 value is equal to a maximum value in table t2:

SELECT * FROM t1 WHERE column1 = (SELECT MAX(column2) FROM t2);

Here is another example, which again is impossible with a join because it involves aggregating for one of the tables. It finds all rows in table t1 containing a value that occurs twice in a given column:

SELECT * FROM t1 AS t WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);

For a comparison of the subquery to a scalar, the subquery must return a scalar. For a comparison of the subquery to a row constructor, the subquery must be a row subquery that returns a row with the same number of values as the row constructor. See Section 13.2.10.5, “Row Subqueries”.



Page 6

13.2.10.3 Subqueries with ANY, IN, or SOME

Syntax:

operand comparison_operator ANY (subquery) operand IN (subquery) operand comparison_operator SOME (subquery)

Where comparison_operator is one of these operators:

= > < >= <= <> !=

The ANY keyword, which must follow a comparison operator, means return TRUE if the comparison is TRUE for ANY of the values in the column that the subquery returns. For example:

SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);

Suppose that there is a row in table t1 containing (10). The expression is TRUE if table t2 contains (21,14,7) because there is a value 7 in t2 that is less than 10. The expression is FALSE if table t2 contains (20,10), or if table t2 is empty. The expression is unknown (that is, NULL) if table t2 contains (NULL,NULL,NULL).

When used with a subquery, the word IN is an alias for = ANY. Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2); SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);

IN and = ANY are not synonyms when used with an expression list. IN can take an expression list, but = ANY cannot. See Section 12.4.2, “Comparison Functions and Operators”.

NOT IN is not an alias for <> ANY, but for <> ALL. See Section 13.2.10.4, “Subqueries with ALL”.

The word SOME is an alias for ANY. Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 <> ANY (SELECT s1 FROM t2); SELECT s1 FROM t1 WHERE s1 <> SOME (SELECT s1 FROM t2);

Use of the word SOME is rare, but this example shows why it might be useful. To most people, the English phrase a is not equal to any b means there is no b which is equal to a, but that is not what is meant by the SQL syntax. The syntax means there is some b to which a is not equal. Using <> SOME instead helps ensure that everyone understands the true meaning of the query.



Page 7

13.2.10.4 Subqueries with ALL

Syntax:

operand comparison_operator ALL (subquery)

The word ALL, which must follow a comparison operator, means return TRUE if the comparison is TRUE for ALL of the values in the column that the subquery returns. For example:

SELECT s1 FROM t1 WHERE s1 > ALL (SELECT s1 FROM t2);

Suppose that there is a row in table t1 containing (10). The expression is TRUE if table t2 contains (-5,0,+5) because 10 is greater than all three values in t2. The expression is FALSE if table t2 contains (12,6,NULL,-100) because there is a single value 12 in table t2 that is greater than 10. The expression is unknown (that is, NULL) if table t2 contains (0,NULL,1).

Finally, the expression is TRUE if table t2 is empty. So, the following expression is TRUE when table t2 is empty:

SELECT * FROM t1 WHERE 1 > ALL (SELECT s1 FROM t2);

But this expression is NULL when table t2 is empty:

SELECT * FROM t1 WHERE 1 > (SELECT s1 FROM t2);

In addition, the following expression is NULL when table t2 is empty:

SELECT * FROM t1 WHERE 1 > ALL (SELECT MAX(s1) FROM t2);

In general, tables containing NULL values and empty tables are edge cases. When writing subqueries, always consider whether you have taken those two possibilities into account.

NOT IN is an alias for <> ALL. Thus, these two statements are the same:

SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2); SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);



Page 8

Scalar or column subqueries return a single value or a column of values. A row subquery is a subquery variant that returns a single row and can thus return more than one column value. Legal operators for row subquery comparisons are:

= > < >= <= <> != <=>

Here are two examples:

SELECT * FROM t1 WHERE (col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10); SELECT * FROM t1 WHERE ROW(col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);

For both queries, if the table t2 contains a single row with id = 10, the subquery returns a single row. If this row has col3 and col4 values equal to the col1 and col2 values of any rows in t1, the WHERE expression is TRUE and each query returns those t1 rows. If the t2 row col3 and col4 values are not equal the col1 and col2 values of any t1 row, the expression is FALSE and the query returns an empty result set. The expression is unknown (that is, NULL) if the subquery produces no rows. An error occurs if the subquery produces multiple rows because a row subquery can return at most one row.

For information about how each operator works for row comparisons, see Section 12.4.2, “Comparison Functions and Operators”.

The expressions (1,2) and ROW(1,2) are sometimes called row constructors. The two are equivalent. The row constructor and the row returned by the subquery must contain the same number of values.

A row constructor is used for comparisons with subqueries that return two or more columns. When a subquery returns a single column, this is regarded as a scalar value and not as a row, so a row constructor cannot be used with a subquery that does not return at least two columns. Thus, the following query fails with a syntax error:

SELECT * FROM t1 WHERE ROW(1) = (SELECT column1 FROM t2)

Row constructors are legal in other contexts. For example, the following two statements are semantically equivalent (and are handled in the same way by the optimizer):

SELECT * FROM t1 WHERE (column1,column2) = (1,1); SELECT * FROM t1 WHERE column1 = 1 AND column2 = 1;

The following query answers the request, find all rows in table t1 that also exist in table t2:

SELECT column1,column2,column3 FROM t1 WHERE (column1,column2,column3) IN (SELECT column1,column2,column3 FROM t2);

For more information about the optimizer and row constructors, see Section 8.2.1.18, “Row Constructor Expression Optimization”



Page 9

13.2.10.6 Subqueries with EXISTS or NOT EXISTS

If a subquery returns any rows at all, EXISTS subquery is TRUE, and NOT EXISTS subquery is FALSE. For example:

SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);

Traditionally, an EXISTS subquery starts with SELECT *, but it could begin with SELECT 5 or SELECT column1 or anything at all. MySQL ignores the SELECT list in such a subquery, so it makes no difference.

For the preceding example, if t2 contains any rows, even rows with nothing but NULL values, the EXISTS condition is TRUE. This is actually an unlikely example because a [NOT] EXISTS subquery almost always contains correlations. Here are some more realistic examples:

  • What kind of store is present in one or more cities?

    SELECT DISTINCT store_type FROM stores WHERE EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type);
  • What kind of store is present in no cities?

    SELECT DISTINCT store_type FROM stores WHERE NOT EXISTS (SELECT * FROM cities_stores WHERE cities_stores.store_type = stores.store_type);
  • What kind of store is present in all cities?

    SELECT DISTINCT store_type FROM stores s1 WHERE NOT EXISTS ( SELECT * FROM cities WHERE NOT EXISTS ( SELECT * FROM cities_stores WHERE cities_stores.city = cities.city AND cities_stores.store_type = stores.store_type));

The last example is a double-nested NOT EXISTS query. That is, it has a NOT EXISTS clause within a NOT EXISTS clause. Formally, it answers the question does a city exist with a store that is not in Stores? But it is easier to say that a nested NOT EXISTS answers the question is x TRUE for all y?



Page 10

13.2.10.7 Correlated Subqueries

A correlated subquery is a subquery that contains a reference to a table that also appears in the outer query. For example:

SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2 WHERE t2.column2 = t1.column2);

Notice that the subquery contains a reference to a column of t1, even though the subquery's FROM clause does not mention a table t1. So, MySQL looks outside the subquery, and finds t1 in the outer query.

Suppose that table t1 contains a row where column1 = 5 and column2 = 6; meanwhile, table t2 contains a row where column1 = 5 and column2 = 7. The simple expression ... WHERE column1 = ANY (SELECT column1 FROM t2) would be TRUE, but in this example, the WHERE clause within the subquery is FALSE (because (5,6) is not equal to (5,7)), so the expression as a whole is FALSE.

Scoping rule: MySQL evaluates from inside to outside. For example:

SELECT column1 FROM t1 AS x WHERE x.column1 = (SELECT column1 FROM t2 AS x WHERE x.column1 = (SELECT column1 FROM t3 WHERE x.column2 = t3.column1));

In this statement, x.column2 must be a column in table t2 because SELECT column1 FROM t2 AS x ... renames t2. It is not a column in table t1 because SELECT column1 FROM t1 ... is an outer query that is farther out.

For subqueries in HAVING or ORDER BY clauses, MySQL also looks for column names in the outer select list.

For certain cases, a correlated subquery is optimized. For example:

val IN (SELECT key_val FROM tbl_name WHERE correlated_condition)

Otherwise, they are inefficient and likely to be slow. Rewriting the query as a join might improve performance.

Aggregate functions in correlated subqueries may contain outer references, provided the function contains nothing but outer references, and provided the function is not contained in another function or expression.



Page 11

A derived table is an expression that generates a table within the scope of a query FROM clause. For example, a subquery in a SELECT statement FROM clause is a derived table:

SELECT ... FROM (subquery) [AS] tbl_name ...

The [AS] tbl_name clause is mandatory because every table in a FROM clause must have a name. Any columns in the derived table must have unique names.

For the sake of illustration, assume that you have this table:

CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);

Here is how to use a subquery in the FROM clause, using the example table:

INSERT INTO t1 VALUES (1,'1',1.0); INSERT INTO t1 VALUES (2,'2',2.0); SELECT sb1,sb2,sb3 FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb WHERE sb1 > 1;

Result:

+------+------+------+ | sb1 | sb2 | sb3 | +------+------+------+ | 2 | 2 | 4 | +------+------+------+

Here is another example: Suppose that you want to know the average of a set of sums for a grouped table. This does not work:

SELECT AVG(SUM(column1)) FROM t1 GROUP BY column1;

However, this query provides the desired information:

SELECT AVG(sum_column1) FROM (SELECT SUM(column1) AS sum_column1 FROM t1 GROUP BY column1) AS t1;

Notice that the column name used within the subquery (sum_column1) is recognized in the outer query.

A derived table can return a scalar, column, row, or table.

Derived tables are subject to these restrictions:

  • A derived table cannot be a correlated subquery.

  • A derived table cannot contain references to other tables of the same SELECT.

  • A derived table cannot contain outer references. This is a MySQL restriction, not a restriction of the SQL standard.

The optimizer determines information about derived tables in such a way that EXPLAIN does not need to materialize them. See Section 8.2.2.4, “Optimizing Derived Tables”.

It is possible under certain circumstances that using EXPLAIN SELECT modifies table data. This can occur if the outer query accesses any tables and an inner query invokes a stored function that changes one or more rows of a table. Suppose that there are two tables t1 and t2 in database d1, and a stored function f1 that modifies t2, created as shown here:

CREATE DATABASE d1; USE d1; CREATE TABLE t1 (c1 INT); CREATE TABLE t2 (c1 INT); CREATE FUNCTION f1(p1 INT) RETURNS INT BEGIN INSERT INTO t2 VALUES (p1); RETURN p1; END;

Referencing the function directly in an EXPLAIN SELECT has no effect on t2, as shown here:

mysql> SELECT * FROM t2; Empty set (0.02 sec) mysql> EXPLAIN SELECT f1(5)\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: No tables used 1 row in set (0.01 sec) mysql> SELECT * FROM t2; Empty set (0.01 sec)

This is because the SELECT statement did not reference any tables, as can be seen in the table and Extra columns of the output. This is also true of the following nested SELECT:

mysql> EXPLAIN SELECT NOW() AS a1, (SELECT f1(5)) AS a2\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL filtered: NULL Extra: No tables used 1 row in set, 1 warning (0.00 sec) mysql> SHOW WARNINGS; +-------+------+------------------------------------------+ | Level | Code | Message | +-------+------+------------------------------------------+ | Note | 1249 | Select 2 was reduced during optimization | +-------+------+------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM t2; Empty set (0.00 sec)

However, if the outer SELECT references any tables, the optimizer executes the statement in the subquery as well, with the result that t2 is modified:

mysql> EXPLAIN SELECT * FROM t1 AS a1, (SELECT f1(5)) AS a2\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: <derived2> type: system possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 Extra: NULL *************************** 2. row *************************** id: 1 select_type: PRIMARY table: a1 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 1 Extra: NULL *************************** 3. row *************************** id: 2 select_type: DERIVED table: NULL type: NULL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: No tables used 3 rows in set (0.00 sec) mysql> SELECT * FROM t2; +------+ | c1 | +------+ | 5 | +------+ 1 row in set (0.00 sec)



Page 12

13.2.10.9 Subquery Errors

There are some errors that apply only to subqueries. This section describes them.

  • Unsupported subquery syntax:

    ERROR 1235 (ER_NOT_SUPPORTED_YET) SQLSTATE = 42000 Message = "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'"

    This means that MySQL does not support statements of the following form:

    SELECT * FROM t1 WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1)
  • Incorrect number of columns from subquery:

    ERROR 1241 (ER_OPERAND_COL) SQLSTATE = 21000 Message = "Operand should contain 1 column(s)"

    This error occurs in cases like this:

    SELECT (SELECT column1, column2 FROM t2) FROM t1;

    You may use a subquery that returns multiple columns, if the purpose is row comparison. In other contexts, the subquery must be a scalar operand. See Section 13.2.10.5, “Row Subqueries”.

  • Incorrect number of rows from subquery:

    ERROR 1242 (ER_SUBSELECT_NO_1_ROW) SQLSTATE = 21000 Message = "Subquery returns more than 1 row"

    This error occurs for statements where the subquery must return at most one row but returns multiple rows. Consider the following example:

    SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);

    If SELECT column1 FROM t2 returns just one row, the previous query works. If the subquery returns more than one row, error 1242 occurs. In that case, the query should be rewritten as:

    SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2);
  • Incorrectly used table in subquery:

    Error 1093 (ER_UPDATE_TABLE_USED) SQLSTATE = HY000 Message = "You can't specify target table 'x' for update in FROM clause"

    This error occurs in cases such as the following, which attempts to modify a table and select from the same table in the subquery:

    UPDATE t1 SET column2 = (SELECT MAX(column1) FROM t1);

    You can use a subquery for assignment within an UPDATE statement because subqueries are legal in UPDATE and DELETE statements as well as in SELECT statements. However, you cannot use the same table (in this case, table t1) for both the subquery FROM clause and the update target.

For transactional storage engines, the failure of a subquery causes the entire statement to fail. For nontransactional storage engines, data modifications made before the error was encountered are preserved.



Page 13

13.2.10.10 Optimizing Subqueries

Development is ongoing, so no optimization tip is reliable for the long term. The following list provides some interesting tricks that you might want to play with. See also Section 8.2.2, “Optimizing Subqueries and Derived Tables”.

  • Use subquery clauses that affect the number or order of the rows in the subquery. For example:

    SELECT * FROM t1 WHERE t1.column1 IN (SELECT column1 FROM t2 ORDER BY column1); SELECT * FROM t1 WHERE t1.column1 IN (SELECT DISTINCT column1 FROM t2); SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 LIMIT 1);
  • Replace a join with a subquery. For example, try this:

    SELECT DISTINCT column1 FROM t1 WHERE t1.column1 IN ( SELECT column1 FROM t2);

    Instead of this:

    SELECT DISTINCT t1.column1 FROM t1, t2 WHERE t1.column1 = t2.column1;
  • Some subqueries can be transformed to joins for compatibility with older versions of MySQL that do not support subqueries. However, in some cases, converting a subquery to a join may improve performance. See Section 13.2.10.11, “Rewriting Subqueries as Joins”.

  • Move clauses from outside to inside the subquery. For example, use this query:

    SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1 UNION ALL SELECT s1 FROM t2);

    Instead of this query:

    SELECT * FROM t1 WHERE s1 IN (SELECT s1 FROM t1) OR s1 IN (SELECT s1 FROM t2);

    For another example, use this query:

    SELECT (SELECT column1 + 5 FROM t1) FROM t2;

    Instead of this query:

    SELECT (SELECT column1 FROM t1) + 5 FROM t2;
  • Use a row subquery instead of a correlated subquery. For example, use this query:

    SELECT * FROM t1 WHERE (column1,column2) IN (SELECT column1,column2 FROM t2);

    Instead of this query:

    SELECT * FROM t1 WHERE EXISTS (SELECT * FROM t2 WHERE t2.column1=t1.column1 AND t2.column2=t1.column2);
  • Use NOT (a = ANY (...)) rather than a <> ALL (...).

  • Use x = ANY (table containing (1,2)) rather than x=1 OR x=2.

  • Use = ANY rather than EXISTS.

  • For uncorrelated subqueries that always return one row, IN is always slower than =. For example, use this query:

    SELECT * FROM t1 WHERE t1.col_name = (SELECT a FROM t2 WHERE b = some_const);

    Instead of this query:

    SELECT * FROM t1 WHERE t1.col_name IN (SELECT a FROM t2 WHERE b = some_const);

These tricks might cause programs to go faster or slower. Using MySQL facilities like the BENCHMARK() function, you can get an idea about what helps in your own situation. See Section 12.16, “Information Functions”.

Some optimizations that MySQL itself makes are:

  • MySQL executes uncorrelated subqueries only once. Use EXPLAIN to make sure that a given subquery really is uncorrelated.

  • MySQL rewrites IN, ALL, ANY, and SOME subqueries in an attempt to take advantage of the possibility that the select-list columns in the subquery are indexed.

  • MySQL replaces subqueries of the following form with an index-lookup function, which EXPLAIN describes as a special join type (unique_subquery or index_subquery):

    ... IN (SELECT indexed_column FROM single_table ...)
  • MySQL enhances expressions of the following form with an expression involving MIN() or MAX(), unless NULL values or empty sets are involved:

    value {ALL|ANY|SOME} {> | < | >= | <=} (uncorrelated subquery)

    For example, this WHERE clause:

    WHERE 5 > ALL (SELECT x FROM t)

    might be treated by the optimizer like this:

    WHERE 5 > (SELECT MAX(x) FROM t)

See also MySQL Internals: How MySQL Transforms Subqueries.


Page 14

13.2.10.11 Rewriting Subqueries as Joins

Sometimes there are other ways to test membership in a set of values than by using a subquery. Also, on some occasions, it is not only possible to rewrite a query without a subquery, but it can be more efficient to make use of some of these techniques rather than to use subqueries. One of these is the IN() construct:

For example, this query:

SELECT * FROM t1 WHERE id IN (SELECT id FROM t2);

Can be rewritten as:

SELECT DISTINCT t1.* FROM t1, t2 WHERE t1.id=t2.id;

The queries:

SELECT * FROM t1 WHERE id NOT IN (SELECT id FROM t2); SELECT * FROM t1 WHERE NOT EXISTS (SELECT id FROM t2 WHERE t1.id=t2.id);

Can be rewritten as:

SELECT table1.* FROM table1 LEFT JOIN table2 ON table1.id=table2.id WHERE table2.id IS NULL;

A LEFT [OUTER] JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone. Prior to SQL-92, outer joins did not exist, so subqueries were the only way to do certain things. Today, MySQL Server and many other modern database systems offer a wide range of outer join types.

MySQL Server supports multiple-table DELETE statements that can be used to efficiently delete rows based on information from one table or even from many tables at the same time. Multiple-table UPDATE statements are also supported. See Section 13.2.2, “DELETE Statement”, and Section 13.2.11, “UPDATE Statement”.



Page 15

13.2.10.12 Restrictions on Subqueries

  • In general, you cannot modify a table and select from the same table in a subquery. For example, this limitation applies to statements of the following forms:

    DELETE FROM t WHERE ... (SELECT ... FROM t ...); UPDATE t ... WHERE col = (SELECT ... FROM t ...); {INSERT|REPLACE} INTO t (SELECT ... FROM t ...);

    Exception: The preceding prohibition does not apply if for the modified table you are using a derived table and that derived table is materialized rather than merged into the outer query. Example:

    UPDATE t ... WHERE col = (SELECT * FROM (SELECT ... FROM t...) AS _t ...);

    Here the result from the derived table is materialized as a temporary table, so the relevant rows in t have already been selected by the time the update to t takes place.

  • Row comparison operations are only partially supported:

    • For expr [NOT] IN subquery, expr can be an n-tuple (specified using row constructor syntax) and the subquery can return rows of n-tuples. The permitted syntax is therefore more specifically expressed as row_constructor [NOT] IN table_subquery

    • For expr op {ALL|ANY|SOME} subquery, expr must be a scalar value and the subquery must be a column subquery; it cannot return multiple-column rows.

    In other words, for a subquery that returns rows of n-tuples, this is supported:

    (expr_1, ..., expr_n) [NOT] IN table_subquery

    But this is not supported:

    (expr_1, ..., expr_n) op {ALL|ANY|SOME} subquery

    The reason for supporting row comparisons for IN but not for the others is that IN is implemented by rewriting it as a sequence of = comparisons and AND operations. This approach cannot be used for ALL, ANY, or SOME.

  • Subqueries in the FROM clause cannot be correlated subqueries. They are materialized in whole (evaluated to produce a result set) during query execution, so they cannot be evaluated per row of the outer query. Before MySQL 5.6.3, materialization takes place before evaluation of the outer query. As of 5.6.3, the optimizer delays materialization until the result is needed, which may permit materialization to be avoided. See Section 8.2.2.4, “Optimizing Derived Tables”.

  • MySQL does not support LIMIT in subqueries for certain subquery operators:

    mysql> SELECT * FROM t1 WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1); ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
  • MySQL permits a subquery to refer to a stored function that has data-modifying side effects such as inserting rows into a table. For example, if f() inserts rows, the following query can modify data:

    SELECT ... WHERE x IN (SELECT f() ...);

    This behavior is an extension to the SQL standard. In MySQL, it can produce nondeterministic results because f() might be executed a different number of times for different executions of a given query depending on how the optimizer chooses to handle it.

    For statement-based or mixed-format replication, one implication of this indeterminism is that such a query can produce different results on the source and its slaves.

  • Before MySQL 5.6.3, a subquery in the FROM clause is evaluated by materializing the result into a temporary table, and this table does not use indexes. As of 5.6.3, the optimizer creates an index on the materialized table if this results in faster query execution. See Section 8.2.2.4, “Optimizing Derived Tables”.


Page 16

UPDATE is a DML statement that modifies rows in a table.

Single-table syntax:

UPDATE [LOW_PRIORITY] [IGNORE] table_reference SET assignment_list [WHERE where_condition] [ORDER BY ...] [LIMIT row_count] value: {expr | DEFAULT} assignment: col_name = value assignment_list: assignment [, assignment] ...

Multiple-table syntax:

UPDATE [LOW_PRIORITY] [IGNORE] table_references SET assignment_list [WHERE where_condition]

For the single-table syntax, the UPDATE statement updates columns of existing rows in the named table with new values. The SET clause indicates which columns to modify and the values they should be given. Each value can be given as an expression, or the keyword DEFAULT to set a column explicitly to its default value. The WHERE clause, if given, specifies the conditions that identify which rows to update. With no WHERE clause, all rows are updated. If the ORDER BY clause is specified, the rows are updated in the order that is specified. The LIMIT clause places a limit on the number of rows that can be updated.

For the multiple-table syntax, UPDATE updates rows in each table named in table_references that satisfy the conditions. Each matching row is updated once, even if it matches the conditions multiple times. For multiple-table syntax, ORDER BY and LIMIT cannot be used.

For partitioned tables, both the single-single and multiple-table forms of this statement support the use of a PARTITION clause as part of a table reference. This option takes a list of one or more partitions or subpartitions (or both). Only the partitions (or subpartitions) listed are checked for matches, and a row that is not in any of these partitions or subpartitions is not updated, whether it satisfies the where_condition or not.

Note

Unlike the case when using PARTITION with an INSERT or REPLACE statement, an otherwise valid UPDATE ... PARTITION statement is considered successful even if no rows in the listed partitions (or subpartitions) match the where_condition.

For more information and examples, see Section 19.5, “Partition Selection”.

where_condition is an expression that evaluates to true for each row to be updated. For expression syntax, see Section 9.5, “Expressions”.

table_references and where_condition are specified as described in Section 13.2.9, “SELECT Statement”.

You need the UPDATE privilege only for columns referenced in an UPDATE that are actually updated. You need only the SELECT privilege for any columns that are read but not modified.

The UPDATE statement supports the following modifiers:

  • With the LOW_PRIORITY modifier, execution of the UPDATE is delayed until no other clients are reading from the table. This affects only storage engines that use only table-level locking (such as MyISAM, MEMORY, and MERGE).

  • With the IGNORE modifier, the update statement does not abort even if errors occur during the update. Rows for which duplicate-key conflicts occur on a unique key value are not updated. Rows updated to values that would cause data conversion errors are updated to the closest valid values instead.

UPDATE IGNORE statements, including those having an ORDER BY clause, are flagged as unsafe for statement-based replication. (This is because the order in which the rows are updated determines which rows are ignored.) Such statements produce a warning in the error log when using statement-based mode and are written to the binary log using the row-based format when using MIXED mode. (Bug #11758262, Bug #50439) See Section 17.1.2.3, “Determination of Safe and Unsafe Statements in Binary Logging”, for more information.

If you access a column from the table to be updated in an expression, UPDATE uses the current value of the column. For example, the following statement sets col1 to one more than its current value:

UPDATE t1 SET col1 = col1 + 1;

The second assignment in the following statement sets col2 to the current (updated) col1 value, not the original col1 value. The result is that col1 and col2 have the same value. This behavior differs from standard SQL.

UPDATE t1 SET col1 = col1 + 1, col2 = col1;

Single-table UPDATE assignments are generally evaluated from left to right. For multiple-table updates, there is no guarantee that assignments are carried out in any particular order.

If you set a column to the value it currently has, MySQL notices this and does not update it.

If you update a column that has been declared NOT NULL by setting to NULL, an error occurs if strict SQL mode is enabled; otherwise, the column is set to the implicit default value for the column data type and the warning count is incremented. The implicit default value is 0 for numeric types, the empty string ('') for string types, and the zero value for date and time types. See Section 11.5, “Data Type Default Values”.

UPDATE returns the number of rows that were actually changed. The mysql_info() C API function returns the number of rows that were matched and updated and the number of warnings that occurred during the UPDATE.

You can use LIMIT row_count to restrict the scope of the UPDATE. A LIMIT clause is a rows-matched restriction. The statement stops as soon as it has found row_count rows that satisfy the WHERE clause, whether or not they actually were changed.

If an UPDATE statement includes an ORDER BY clause, the rows are updated in the order specified by the clause. This can be useful in certain situations that might otherwise result in an error. Suppose that a table t contains a column id that has a unique index. The following statement could fail with a duplicate-key error, depending on the order in which rows are updated:

UPDATE t SET id = id + 1;

For example, if the table contains 1 and 2 in the id column and 1 is updated to 2 before 2 is updated to 3, an error occurs. To avoid this problem, add an ORDER BY clause to cause the rows with larger id values to be updated before those with smaller values:

UPDATE t SET id = id + 1 ORDER BY id DESC;

You can also perform UPDATE operations covering multiple tables. However, you cannot use ORDER BY or LIMIT with a multiple-table UPDATE. The table_references clause lists the tables involved in the join. Its syntax is described in Section 13.2.9.2, “JOIN Clause”. Here is an example:

UPDATE items,month SET items.price=month.price WHERE items.id=month.id;

The preceding example shows an inner join that uses the comma operator, but multiple-table UPDATE statements can use any type of join permitted in SELECT statements, such as LEFT JOIN.

If you use a multiple-table UPDATE statement involving InnoDB tables for which there are foreign key constraints, the MySQL optimizer might process tables in an order that differs from that of their parent/child relationship. In this case, the statement fails and rolls back. Instead, update a single table and rely on the ON UPDATE capabilities that InnoDB provides to cause the other tables to be modified accordingly. See Section 13.1.17.5, “FOREIGN KEY Constraints”.

You cannot update a table and select directly from the same table in a subquery. You can work around this by using a multi-table update in which one of the tables is derived from the table that you actually wish to update, and referring to the derived table using an alias. Suppose you wish to update a table named items which is defined using the statement shown here:

CREATE TABLE items ( id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY, wholesale DECIMAL(6,2) NOT NULL DEFAULT 0.00, retail DECIMAL(6,2) NOT NULL DEFAULT 0.00, quantity BIGINT NOT NULL DEFAULT 0 );

To reduce the retail price of any items for which the markup is 30% or greater and of which you have fewer than one hundred in stock, you might try to use an UPDATE statement such as the one following, which uses a subquery in the WHERE clause. As shown here, this statement does not work:

mysql> UPDATE items > SET retail = retail * 0.9 > WHERE id IN > (SELECT id FROM items > WHERE retail / wholesale >= 1.3 AND quantity > 100); ERROR 1093 (HY000): You can't specify target table 'items' for update in FROM clause

Instead, you can employ a multi-table update in which the subquery is moved into the list of tables to be updated, using an alias to reference it in the outermost WHERE clause, like this:

UPDATE items, (SELECT id FROM items WHERE id IN (SELECT id FROM items WHERE retail / wholesale >= 1.3 AND quantity < 100)) AS discounted SET items.retail = items.retail * 0.9 WHERE items.id = discounted.id;

Because the optimizer tries by default to merge the derived table discounted into the outermost query block, this works only if you force materialization of the derived table. You can do this by setting the derived_merge flag of the optimizer_switch system variable to off before running the update, or by using the NO_MERGE optimizer hint, as shown here:

UPDATE /*+ NO_MERGE(discounted) */ items, (SELECT id FROM items WHERE retail / wholesale >= 1.3 AND quantity < 100) AS discounted SET items.retail = items.retail * 0.9 WHERE items.id = discounted.id;

The advantage of using the optimizer hint in such a case is that it applies only within the query block where it is used, so that it is not necessary to change the value of optimizer_switch again after executing the UPDATE.

Another possibility is to rewrite the subquery so that it does not use IN or EXISTS, like this:

UPDATE items, (SELECT id, retail / wholesale AS markup, quantity FROM items) AS discounted SET items.retail = items.retail * 0.9 WHERE discounted.markup >= 1.3 AND discounted.quantity < 100 AND items.id = discounted.id;

In this case, the subquery is materialized by default rather than merged, so it is not necessary to disable merging of the derived table.

An UPDATE on a partitioned table using a storage engine such as MyISAM that employs table-level locks locks only those partitions containing rows that match the UPDATE statement WHERE clause, as long as none of the table partitioning columns are updated. (For storage engines such as InnoDB that employ row-level locking, no locking of partitions takes place.) For more information, see Section 19.6.4, “Partitioning and Locking”.


Page 17

13.3 Transactional and Locking Statements

MySQL supports local transactions (within a given client session) through statements such as SET autocommit, START TRANSACTION, COMMIT, and ROLLBACK. See Section 13.3.1, “START TRANSACTION, COMMIT, and ROLLBACK Statements”. XA transaction support enables MySQL to participate in distributed transactions as well. See Section 13.3.7, “XA Transactions”.