Alternative Tricks To Importing Data From Excel

I often find myself with the task of importing some dataset into a database, and the source of that data is usually Excel. The common approach to importing data into a database like SQL Server is to import the data into a table. While that usually works, there are times where that doesn’t work, because the data is non-standard, or for who knows what reason.

The spreadsheet I’m about to show you will easily import, but let’s assume it doesn’t for some reason. If we had a datasheet like this:

AccountNumber LastName FirstName Active
12345 Smith Tom Yes
12643 Peter Tom Yes
2354365 Smith Amy No

Nothing complicated here. But assuming we can’t get it in, one trick i have found myself doing is using some Excel functions that generate a SQL statement. For instance, if we did the following equation in Excel:

=CONCATENATE("insert into Customers (AccountNumber, LastName, FirstName) values ('", A4, "', '", B4, "', '", C4, "')")

Concatenate is an Excel function, very similar to .NET String.Concat method, that takes each string and combines them into one. Instead of using the plus sign to concatenate values, each string entered, separated by commas, is concatenated together, which may also include a cell value. The following above takes the values in the row 4 set of ABC cells and combines them into the template insert statement.

If we copy this function across all of the rows, it would generate a bulk insert statement like:

insert into Customers (AccountNumber, LastName, FirstName) values ('12345', 'Smith', 'Tom')
insert into Customers (AccountNumber, LastName, FirstName) values ('12643', 'Peter', 'Tom')
insert into Customers (AccountNumber, LastName, FirstName) values ('2354365', 'Smith', 'Amy')

Again nothing complicated; however, this can be useful. It can even be used to not include inserts if the user is inactive; if we update our formula:

=IF(D2="Yes",CONCATENATE("insert into Customers (AccountNumber, LastName, FirstName) values ('",A2,"', '",B2,"', '",C2,"')"),"")

This would only output the two active records:

insert into Customers (AccountNumber, LastName, FirstName) values ('12345', 'Smith', 'Tom')
insert into Customers (AccountNumber, LastName, FirstName) values ('12643', 'Peter', 'Tom')

The inactive record didn’t copy. After generating the scripts, we can copy the entire bunch of statements and paste into SQL Server Management Studio, which will execute each statement separately. With this approach, we could also do updates; maybe we want to sync the active status (a boolean in this scenario). We could use the following formula:

=CONCATENATE("update dbo.Customers set Active = ", IF(D2 = "YES", "1", "0"), " where AccountNumber = '", A2, "'")

Which generates:

update dbo.Customers set Active = 1 where AccountNumber = '12345'
update dbo.Customers set Active = 1 where AccountNumber = '12643'
update dbo.Customers set Active = 0 where AccountNumber = '2354365'

This feature may not be useful as a whole, but one scenario it is useful in is to insert data into a table variable. Data in a table variable can be used to conduct research, do data massaging before inserting into a table, or whatever else you want. For instance, suppose we had the following script shell in SQL Server:

declare @output table
(
  ROWID int identity(1,1) primary key clustered not null,
  Account varchar(100),
  First varchar(100),
  Last varchar(100),
  Active bit
)

-- INSERT insert templated scripts here

select count(*) from @output where Account like 'BN%'

select count(*) from @output where Active = 1

select count(*) from @output where Active = 0

select count(*) from @output where Account like '%99'

We use the following formula to generate the inserts into the table:

=CONCATENATE("insert into @output values ('",A2,"', '",B2,"', '",C2,"', ",IF(D2="YES","1","0"),")")

Which produces the scripts:

insert into @output values ('12345', 'Smith', 'Tom', 1)
insert into @output values ('12643', 'Peter', 'Tom', 1)
insert into @output values ('2354365', 'Smith', 'Amy', 0)

And our final script is:

declare @output table
(
  ROWID int identity(1,1) primary key clustered not null,
  Account varchar(100),
  First varchar(100),
  Last varchar(100),
  Active bit
)

insert into @output values ('12345', 'Smith', 'Tom', 1)
insert into @output values ('12643', 'Peter', 'Tom', 1)
insert into @output values ('2354365', 'Smith', 'Amy', 0)


select count(*) from @output where Account like 'BN%'

select count(*) from @output where Active = 1

select count(*) from @output where Active = 0

select count(*) from @output where Account like '%99'

And know we’ve done analysis on our results without needing to create a SQL table; total time to create the script may be faster, depending on how proficient you are at Excel. Excel has a lot of nifty functions helpful for data conversion. Is this dirty? Yes. Is it a one-time throwaway solution? Yes. But it can be handy in a bind, and may simplify a one-time migration later on.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s