Viewing entries tagged
sql tricks


SQL Convert CSV to Table. The Better Way.

A very common database operation is to transform a simple CSV string into a table so it can be used in more complex query operations. There is an overabundance of SQL functions online that perform such transformations. Here is one: [sourcecode language="sql"] CREATE FUNCTION [dbo].[fn_ParseCsvString] ( @CSVString VARCHAR(max) , @Delimiter VARCHAR(1) ) RETURNS @tbl TABLE ( s VARCHAR(128) ) AS BEGIN DECLARE @i INT , @j INT SELECT @i = 1 WHILE @i BEGIN SELECT @j = CHARINDEX(@Delimiter, @CSVString, @i) IF @j = 0 BEGIN SELECT @j = LEN(@CSVString) + 1 END INSERT @tbl SELECT SUBSTRING(@CSVString, @i, @j - @i) SELECT @i = @j + LEN(@Delimiter) END RETURN END [/sourcecode]

And you’ll use it like this:

[sourcecode language="sql"] select * from dbo.fn_ParseCSVString ('1,2,3,4,5,6,7,8,9,10,11,12,13,14,15', ',') [/sourcecode]

The problem is the way we use those functions and the fact that their performance order is linear to the number of comma separated values in the string. That works out ok for a small number of items, but not when you want to JOIN on a very large number of CSV. When you are using the CSV functions in a sproc or a query, you'd be writing JOIN statements to the result set from the function in order to filter a real table in your database.

Let's look at a case scenario: Suppose you have an Address table in your Geo schema. Your address table has a primary key 'Id' column and other columns that are irrelevant for the purpose of this article. If you wanted to query your Address based on CSV values using your function, you'd do something like this:

[sourcecode language="sql"]

DECLARE @csv VARCHAR(MAX) = '1,2,3,4,5,6,7'

SELECT a.Id FROM Geo.[Address] a JOIN dbo.fn_ParseCsvString(@csv, ',') tbl ON a.Id = tbl.s [/sourcecode]

Here is some benchmarking on the previous query (all in microseconds): 1 second = 1000 milliseconds = 1'000'000 microseconds

1000 CSV  ~ 0.5 secs – T1: 557'272 – T2: 408'185 – T3: 446'485

5000 CSV  ~ 2.3 secs  – T1: 2'376'159 – T2: 2'307'394 – T3: 2'370'191

10000 CSV ~ 4.6 secs  – T1: 4'699'332 – T2: 4'737'877 – T3: 4'641'049

You get the idea and it gets even worse when you do left joins on CSV. Even if your tables are indexed on the actual columns you are joining to your function, sequentially calculating those values and inserting them into a table will take some unnecessary time. So here is a solution that requires not a function but instead a store procedure and the reason for the store procedure is the need to run Dynamic SQL.

[sourcecode language="sql"]

PROCEDURE [dbo].[CsvToTable] @Csv VARCHAR(MAX) , @SchemaName VARCHAR(128) , @TableName VARCHAR(128) , @KeyColumnName VARCHAR(32) = 'Id' , @Debug BIT = 0 AS SET NOCOUNT ON ; BEGIN TRY DECLARE @mySql NVARCHAR(MAX)= 'select distinct ' + @KeyColumnName + ' from ' + QUOTENAME(@schemaName) + '.' + QUOTENAME(@tableName)

IF ( @Csv IS NULL OR @Csv = '' ) BEGIN SET @mySql = @mySql + ' where 1=0' END ELSE BEGIN

SET @mySql = @mySql + ' where ' + @KeyColumnName + ' in (' + @Csv + ')' END IF ( @Debug = 1 ) PRINT @mySql EXEC sp_executesql @mySql END TRY BEGIN CATCH --Your catch logic here --EXEC dbo.Error_Rethrow END CATCH


And this is how you’d use it:

[sourcecode language="sql"] DECLARE @csv VARCHAR(MAX) = '1,2,3,4,5,6,7'

DECLARE @tbl TABLE ( Id INT ) INSERT INTO @tbl EXEC dbo.CsvToTable @csv, N'Geo', N'Address', N'Id'

SELECT a.Id FROM Geo.Address a JOIN @tbl tbl ON tbl.Id = a.Id [/sourcecode]

Here is some benchmarking on the new solution (all in microseconds): 1 second = 1000 milliseconds = 1'000'000 microseconds

10000 CSV ~ 0.2 secs – T1: 212'289 – T2: 183'635 – T3: 204'688

WOW! That's  1/5 of a second on average to query 10K records. MUCH MUCH better ah?!

Let’s check pros and cons of this solution.


  1. There is nothing faster than this solution to convert CSV into a table. Period.
  2. You can configure additional indexes on the columns you want to lookup using a CSV approach and the indexed will have an impact on the speed of the CSV look-up; versus always having a linear execution order because the function is unaware of indexes.
  3. Simple to use.


  1. It’s using dynamic SQL, and that brings all the negative things writing dynamic SQL has. I will recommend having a dedicated role to runners of this sproc so that users do not have permissions to run it. Try to always use this sproc through another sproc.
  2. The fact that is a sproc “feels” wrong, since by their nature sprocs can have side effects, although this one has none.  This is precisely the reason why a function does not allow for execution of dynamic SQL.

I think I can live with the feeling in trade of the unmatched performance this solution offers. A couple of things to note that were done to prevent SQL injection on the Dynamic SQL:

  1. Both schema and table names are provides as separate parameters of the sproc and they are put together in the code with the '.' separator.
  2. Both schema and table name are wrapped using quotename()
  3. When executing the Dynamic SQL, I'm using sp_executesql rather than just the plain EXEC() function.
  4. There is a @Debug parameter that prints the generated Dynamic SQL when debugging. This will safe you a couple of headaches.



SQL - Select rows in chaotic order

Ok, another SQL trick. How to get a row set where the rows are ordered by a specific value sequence of a particular column?

This is not a very common problem, but if you happen to need something like this, you may scratch your head for a couple of minutes before you figure it out.

So, the challenge is the following, you have a table like the one below:


If you want to return all rows with ids 4, 5, 7 and 8  in order by any column, you can do so easily using order by [the_column] asc or desc; but if you want to return those same rows in a chaotic desired order is no so trivial.

Say we want the rows ids 4, 5, 7 and 8 and we want them such that the ids appear in the following order: 7, 4, 8 and then 5

Before looking at how to do it, challenge yourself and try to come up with that query.

Ok now let’s look at the solution. It involves the use of the CHARINDEX string function. CHARINDEX returns the starting index of parameter1 in the string given on parameter2, if parameter1 is a substring of parameter2. For example:

[sourcecode language="sql"] SELECT CHARINDEX ( '23', 'AA23BB') --returns 3 SELECT CHARINDEX ( 'abc', 'aapppabcbc') --returns 6 [/sourcecode]

That means we can do something similar with a set of numbers when treated as a strings, like so:

[sourcecode language="sql"] SELECT CHARINDEX ( '7', '7,4,8,5') --returns 1 SELECT CHARINDEX ( '4', '7,4,8,5') --returns 3 SELECT CHARINDEX ( '8', '7,4,8,5') --returns 5 SELECT CHARINDEX ( '5', '7,4,8,5') --returns 7 [/sourcecode]

Now you see how easy is toget order range using this style. Finally, we can create our query using this technique and return our rows in the order we want.

[sourcecode language="sql"] select * from Person where PersonID in (4,5,7,8) order by CHARINDEX(CONVERT(varchar, Person.PersonID), '7,4,8,5') [/sourcecode]

Happy coding!



SQL MaxInt and MinInt functions

Is kind of frustrating when you are trying to write a complex query and there is that one function not provided by SQL Server out of the box to use it right away. This is the case with a function to compare 2 integers to get the Max or Min between them, similar to the Math.Max and Math.Min methods in .NET. You could create your own function and re-use it everywhere, but the first implementation that comes to your mind is a very inconvenient way to deal with this, since you cannot use it (easily) within an inline Sql query. Let's take a look at the easiest way to implement our function:

[sourcecode language="sql"] create function dbo.MaxInt(@int1 int, @int2 int) returns int as begin if @int1 > @int2 return @int1 return @int2 end [/sourcecode]

As I mentioned, there is no easy way to have this query as a subquery in a complex sql statement. And if you make it possible, the performance implications can be kinda bad. But the world doesn't end here! There is a convenient solution that will allow you to write an efficient function for it, or even better, have it inline right within your query. The trick is to double the desired value and then divide it by 2. So if you want to find the max value, find the value that is 2 times the max and then divide it by 2; if you want to find the min value, find the value that is 2 times the min and then divide it by 2. Let's take a look:

Max Function

[sourcecode language="sql"] create function dbo.MaxInt(@int1 int, @int2 int) returns int as begin return ((@int1 + @int2) + ABS(@int1 - @int2))/2 end [/sourcecode]

Min Function

[sourcecode language="sql"] create function dbo.MinInt(@int1 int, @int2 int) returns int as begin return ((@int1 + @int2) - ABS(@int1 - @int2))/2 end [/sourcecode]


And using this doubling notion with the absolute value tricks, we can elegantly write min and max comparisons right inside the Sql statement without the need of a function. Here is an example of an inline Sql statement:

[sourcecode language="sql"] select a, b, ((a + b) + ABS(a - b))/2 as maxValue from myTable [/sourcecode]

Sweet ahh?! Happy coding!