Query To Generate Surrogate Key In Teradata
Surrogate keys are often used in data warehousing systems, as the high data volume in a data warehouse means that optimizing query speed becomes important. Using a surrogate key is advantageous because it is quicker to join on a numeric field rather than a non-numeric field. Teradata Database Query Tool Features. The Teradata database query tool provided by RazorSQL includes visual tools for working with Teradata, a Teradata database browser, an SQL editor, import and export tools, a query builder, and an edit table tool. See below for more information. Jan 31, 2011 When a primary key is generated at runtime, it is called a surrogate key. A surrogate key is typically a numeric value. Within SQL Server, Microsoft allows you to define a column with an identity property to help generate surrogate key values. You can generate surrogate keys in Teradata Database using the identity column feature (see “Identity Columns” on page 537 ). To use an identity column to generate surrogate key values, specify the GENERATED ALWAYS and NO CYCLE options and ensure that the data type for the column is either NUMERIC (18,0) or DECIMAL (18,0). Feb 10, 2014 I have a requirement where I have to generate a surrogate ID for every row in the FACT table. Previously we used to use ROWNUMBER for the same. But the data volume has gone up in the recent requirements upto 100 mill +. The Queries are taking very long time but compartively very less without the ROWNUMBER. What would be the best way to do so? Dec 03, 2014 How to generate surrogate keys with Teradata and Pushdown tatup Nov 25, 2014 5:29 AM Without sequence generator or expression's variable ports I find it really hard to come up with an elegant way of generating ID's.
- Teradata Query Grid
- Database Surrogate Key Definition
- Surrogate Key Example
- Query To Generate Surrogate Key In Teradata Word
Teradata: Surrogate Key Concept
Surrogate Key is a unique, database supplied or generated identifier generally used as the primary key/index of a table.
➠ When to use a surrogate key
- Surrogate key should be used if each row of table cannot be uniquely identified using 1 or more columns.
- Surrogate key can also be used when a unique key is too long and non-numeric.
➠ Advantages of surrogate key
- Each row can be uniquely identified within a table using surrogate key value.
- Surrogate key can be used as primary index for a table to distribute data evenly on all the AMPs.
➠ Disadvantages of surrogate key
- We cannot derive any meaning or relationship between the surrogate key and the rest of the data columns in a row, therefore surrogate keys have no meaning to the users.
- There will be cases when data will be shared among different databases. In this case, same rows(from different database) may have different surrogate key and different rows(from different database) may have same surrogate key.
➠ How to generate surrogate key in Teradata
- By using analytical functions
- By using CSUM analytical function, Syntax/Example 1:Single AMP (usually vproc 0) generally processes all the data when using CSUM(1,1)
- By using SUM analytical function, Syntax/Example 2:
- By using ROW_NUMBER analytical function, Syntax/Example 3:Note: '(SELECT ZEROIFNULL(MAX(emp_no)) FROM employee)' is used in above examples to generate new sequence which must be greater than the current max value present in the table.
- By using CSUM analytical function, Syntax/Example 1:Single AMP (usually vproc 0) generally processes all the data when using CSUM(1,1)
- By using Identity column, check Identity Columns page for more detail on Identity Column(Sequence) in Teradata.
Syntax/Example 4:Note: There will always be gaps in generated number when using identity column because in Teradata it is not one sequence but multiple parallel sequences(one on each AMP).
This article demonstrates how to “roll your own” surrogate keys and sequences in a platform-independent way, using standard SQL.
Surrogate keys
Relational theory talks about something called a “candidate key.” In SQL terms, a candidate key is any combination of columns that uniquely identifies a row (SQL and the relational model aren’t the same thing, but I’ll put that aside for this article). The data’s primary key is the minimal candidate key. Many people think a primary key is something the DBA defines, but that’s not true. The primary key is a property of the data, not the table that holds the data.
Unfortunately, the minimal candidate key is sometimes not a good primary key in the real world. For example, if the primary key is 6 columns wide and I need to refer to a row from another table, it’s impractical to make a 6-column wide foreign key. For this reason, database designers sometimes introduce a surrogate key, which uniquely identifies every row in the table and is “more minimal” than the inherently unique aspect of the data. The usual choice is a monotonically increasing integer, which is small and easy to use in foreign keys.
Every RDBMS of which I’m aware offers a feature to make surrogate keys easier by automatically generating the next larger value upon insert. In SQL Server, it’s called an IDENTITY
column. In MySQL, it’s called AUTO_INCREMENT
. It’s possible to generate the value in SQL, but it’s easier and generally safer to let the RDBMS do it instead. This does lead to some issues itself, such as the need to find out the value that was generated by the last insertion, but those are usually not hard to solve (LAST_INSERT_ID()
and similar functions, for example).
Teradata Query Grid
It’s sometimes desirable not to use the provided feature. For instance, I might want to be sure I always use the next available number. In that case, I can’t use the built-in features, because they don’t generate the next available number under some circumstances. For example, SQL Server doesn’t decrement the internal counter when transactions are rolled back, leaving holes in the data (see my article on finding missing numbers in a sequence). Neither MySQL nor SQL Server decrements the counter when rows are deleted.
In these cases, it’s possible to generate the next value in the insert statement. Suppose my table looks like this:
The next value for c1
is simply the maximum value + 1. If there is no maximum value, it is 1, which is the same as 0 + 1.
There are platform-dependent ways to write that statement as well, such as using SQL Server’s ISNULL
function or MySQL’s IFNULL
. This code can be combined into an INSERT
statement, such as the following statement to insert 3 into the second column:
The code above is a single atomic statement and will prevent any two concurrent inserts from getting the same value for c1
. It is not safe to find the next value in one statement and use it in another, unless both statements are in a transaction. Reimage pc repair license key generator 2018. I would consider that a bad idea, though. There’s no need for a transaction in the statement above.
Downsides to this approach are inability to find the value of c1
immediately after inserting, and inability to insert multiple rows at once. The first problem is inherently caused by inserting meaningless data, and is always a problem, even with the built-in surrogate keys where the RDBMS provides a mechanism to retrieve the value.
Sequences: a better surrogate key
Database Surrogate Key Definition
Surrogate keys are often considered very bad practice, for a variety of good reasons I won’t discuss here. Sometimes, though, there is just nothing for it but to artificially unique-ify the data. In these cases, a sequence number can often be a less evil approach. A sequence is just a surrogate key that restarts at 1 for each group of related records. For example, consider a table of log entries related to records in my t1
table:
At this point I might want to enter some more records (0, 11) into t1
:
Now suppose I want the following three log entries for the first row in t1
:
Surrogate Key Example
There’s no good primary key in this data. I will have to add a surrogate key. It might seem I could add a date-time column instead, but that’s a dangerous design. It breaks as soon as two records are inserted within a timespan less than the maximum resolution of the data type. It also breaks if two records are inserted in a single transaction where the time is consistent from the first to the last statement. I’m much happier with a sequence column. The following statement will insert the log records as desired:
Query To Generate Surrogate Key In Teradata Word
If I want to enter a log record on another record in t1
, the sequence will start at 1 for it:
MySQL actually allows an AUTO_INCREMENT
value to serve as a sequence for certain table types (MyISAM and BDB). To do tihs, just make the column the last column in a multi-column primary key. I’m not aware of any other RDBMS that does this.