Viewing entries tagged
script

Comment

Implementing SQL Source Control

Every developer working with databases and backend magic needs DB source control. If you think you don’t, then you are crazy, or will become crazy soon after you realize it. But not every developer has the money to pay for the awesome RedGate SQL Source Control tool (~$1500 per developer, or ~$6000 for a team of 5). Sometimes, the ramen noodles expression must be interpreted literally. So how do you resolve/implement this problem? You may want to follow K. Scott Allen’s database version control blog series (links below).

  1. Three rules for database work
  2. The Baseline
  3. Change Scripts
  4. Views, Stored Procedures and the Like
  5. Branching and Merging

These posts are great for the conceptual background and practices when doing the change scripts and such. The problem is there is no much implementation in the series to help speed up the process. To complement the series, you can DOWNLOAD my implementation of SQL Source Control. It consist of roughly of 4 tables and 4 sprocs. If you want to implement it on your SQL Server Database, download this update script and run it against your database (make a backup first, just in case).

Here is the ERD of the version control entities:

To use it, just create your change script (maybe follow some of the advice from Scott Allen), and follow this template to add the final touches to the script. Essentially all you have to do is call the Database_Update sproc and inspect the DatabaseErrorLog table whenever you encounter a problem to find more details.

[sourcecode language="sql"] BEGIN TRY

BEGIN TRAN

DECLARE @DatabaseVersionLogId INT , --This is for later, you may needed if you want to report to a CI Server. @UpdateFromDatabaseVersion VARCHAR(32) = '1.2.3' ,--The database version you want to update. This is your starting point. @UpdateToDatabaseVersion VARCHAR(32) = '1.2.4' ,--The new version of the database after successfuly updated @UpdateScriptName VARCHAR(64) = N'''1.2.3 to 1.2.4''' ,--A friendly name for this script @UpdateScriptFileName VARCHAR(1024)= N'''123 to 124.sql''' ,--The file name containing this script. @CreateScriptName VARCHAR(64) = N'1.2.4' ,--A friendly name for ANOTHER script (not this one) that generates the new schema from scratch as opposed to from the @UpdateFromDatabaseVersion database @CreateScriptFileName VARCHAR(1024)= N'124.sql' ,--The file name containing the @CreateScriptName script @Tags VARCHAR(MAX) = N'1.2.4, 1.2.3, super release'--A CSV for tagging and searching your releases later on.

-- ADD YOUR CHANGE SCRIPT IN THIS SECTION -- ADD YOUR CHANGE SCRIPT IN THIS SECTION -- ADD YOUR CHANGE SCRIPT IN THIS SECTION -- ADD YOUR CHANGE SCRIPT IN THIS SECTION -- ADD YOUR CHANGE SCRIPT IN THIS SECTION -- ADD YOUR CHANGE SCRIPT IN THIS SECTION -- ADD YOUR CHANGE SCRIPT IN THIS SECTION -- ADD YOUR CHANGE SCRIPT IN THIS SECTION

EXEC @return_value = [dbo].[Database_Update] @UpdateFromDatabaseVersion, @UpdateToDatabaseVersion, @UpdateScriptName, @UpdateScriptFileName, @CreateScriptName, @CreateScriptFileName, @Tags, @DatabaseVersionLogId = @DatabaseVersionLogId OUTPUT

SELECT @DatabaseVersionLogId AS N'@DatabaseVersionLogId'

PRINT ( 'All good. Update finished... now go open a beer' )

COMMIT

END TRY

BEGIN CATCH

PRINT ( 'Michael Jackson''s moonwalk. The script is a piece of shit. Fix it and try again. Your database is safe.' )

ROLLBACK

END CATCH [/sourcecode]

You can download the same script HERE

Enjoy and Happy coding!

Comment

Comment

SQL Convert CSV to Table. The Better Way.

A very common database operation is to transform a simple CSV string into a table so it can be used in more complex query operations. There is an overabundance of SQL functions online that perform such transformations. Here is one: [sourcecode language="sql"] CREATE FUNCTION [dbo].[fn_ParseCsvString] ( @CSVString VARCHAR(max) , @Delimiter VARCHAR(1) ) RETURNS @tbl TABLE ( s VARCHAR(128) ) AS BEGIN DECLARE @i INT , @j INT SELECT @i = 1 WHILE @i BEGIN SELECT @j = CHARINDEX(@Delimiter, @CSVString, @i) IF @j = 0 BEGIN SELECT @j = LEN(@CSVString) + 1 END INSERT @tbl SELECT SUBSTRING(@CSVString, @i, @j - @i) SELECT @i = @j + LEN(@Delimiter) END RETURN END [/sourcecode]

And you’ll use it like this:

[sourcecode language="sql"] select * from dbo.fn_ParseCSVString ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15', ',') [/sourcecode]

The problem is the way we use those functions and the fact that their performance order is linear to the number of comma separated values in the string. That works out ok for a small number of items, but not when you want to JOIN on a very large number of CSV. When you are using the CSV functions in a sproc or a query, you'd be writing JOIN statements to the result set from the function in order to filter a real table in your database.

Let's look at a case scenario: Suppose you have an Address table in your Geo schema. Your address table has a primary key 'Id' column and other columns that are irrelevant for the purpose of this article. If you wanted to query your Address based on CSV values using your function, you'd do something like this:

[sourcecode language="sql"]

DECLARE @csv VARCHAR(MAX) = '1,2,3,4,5,6,7'

SELECT a.Id FROM Geo.[Address] a JOIN dbo.fn_ParseCsvString(@csv, ',') tbl ON a.Id = tbl.s [/sourcecode]

Here is some benchmarking on the previous query (all in microseconds): 1 second = 1000 milliseconds = 1'000'000 microseconds

1000 CSV  ~ 0.5 secs – T1: 557'272 – T2: 408'185 – T3: 446'485

5000 CSV  ~ 2.3 secs  – T1: 2'376'159 – T2: 2'307'394 – T3: 2'370'191

10000 CSV ~ 4.6 secs  – T1: 4'699'332 – T2: 4'737'877 – T3: 4'641'049

You get the idea and it gets even worse when you do left joins on CSV. Even if your tables are indexed on the actual columns you are joining to your function, sequentially calculating those values and inserting them into a table will take some unnecessary time. So here is a solution that requires not a function but instead a store procedure and the reason for the store procedure is the need to run Dynamic SQL.

[sourcecode language="sql"]

PROCEDURE [dbo].[CsvToTable] @Csv VARCHAR(MAX) , @SchemaName VARCHAR(128) , @TableName VARCHAR(128) , @KeyColumnName VARCHAR(32) = 'Id' , @Debug BIT = 0 AS SET NOCOUNT ON ; BEGIN TRY DECLARE @mySql NVARCHAR(MAX)= 'select distinct ' + @KeyColumnName + ' from ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName)

IF ( @Csv IS NULL OR @Csv = '' ) BEGIN SET @mySql = @mySql + ' where 1=0' END ELSE BEGIN

SET @mySql = @mySql + ' where ' + @KeyColumnName + ' in (' + @Csv + ')' END IF ( @Debug = 1 ) PRINT @mySql EXEC sp_executesql @mySql END TRY BEGIN CATCH --Your catch logic here --EXEC dbo.Error_Rethrow END CATCH

[/sourcecode]

And this is how you’d use it:

[sourcecode language="sql"] DECLARE @csv VARCHAR(MAX) = '1,2,3,4,5,6,7'

DECLARE @tbl TABLE ( Id INT ) INSERT INTO @tbl EXEC dbo.CsvToTable @csv, N'Geo', N'Address', N'Id'

SELECT a.Id FROM Geo.Address a JOIN @tbl tbl ON tbl.Id = a.Id [/sourcecode]

Here is some benchmarking on the new solution (all in microseconds): 1 second = 1000 milliseconds = 1'000'000 microseconds

10000 CSV ~ 0.2 secs – T1: 212'289 – T2: 183'635 – T3: 204'688

WOW! That's  1/5 of a second on average to query 10K records. MUCH MUCH better ah?!

Let’s check pros and cons of this solution.

Pros:

  1. There is nothing faster than this solution to convert CSV into a table. Period.
  2. You can configure additional indexes on the columns you want to lookup using a CSV approach and the indexed will have an impact on the speed of the CSV look-up; versus always having a linear execution order because the function is unaware of indexes.
  3. Simple to use.

Cons:

  1. It’s using dynamic SQL, and that brings all the negative things writing dynamic SQL has. I will recommend having a dedicated role to runners of this sproc so that users do not have permissions to run it. Try to always use this sproc through another sproc.
  2. The fact that is a sproc “feels” wrong, since by their nature sprocs can have side effects, although this one has none.  This is precisely the reason why a function does not allow for execution of dynamic SQL.

I think I can live with the feeling in trade of the unmatched performance this solution offers. A couple of things to note that were done to prevent SQL injection on the Dynamic SQL:

  1. Both schema and table names are provides as separate parameters of the sproc and they are put together in the code with the '.' separator.
  2. Both schema and table name are wrapped using quotename()
  3. When executing the Dynamic SQL, I'm using sp_executesql rather than just the plain EXEC() function.
  4. There is a @Debug parameter that prints the generated Dynamic SQL when debugging. This will safe you a couple of headaches.

Comment