In this blog post I cover the solution for a weird behavior that bcp has when trying to import files that contain Unix line endings.
The weird behavior I’m referring to, is no data being imported into SQL Server and bcp throwing the following error messages:
- SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file - SQLState = 22005, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification - SQLState = 22003, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range
Setting up the environment
I’m using the AdventureWorks2019 database as both source and target for the csv files.
This is similar to my post on exporting (and importing) multiple csv files, but, to make things easier, I’m providing the necessary files my blog’s GitHub repo.
Creating the CSV files
This section is just in case you want to follow along with the process I’ve used to create the CSV files for this demo, otherwise you can just download the “ready-to-use” ones from the repo.
Export the CSVs
For the export itself I use the following PowerShell code:
1 2 3 4 5 6 7 8 9 | $Server = "LOCALHOST\VSQL2019" $Database = "AdventureWorks2019" $OutputDir = "F:\CSVOut\" $Tables = @("HumanResources.vEmployee", "HumanResources.vEmployeeDepartment", "HumanResources.vEmployeeDepartmentHistory") foreach ($Table in $Tables) { $CsvPath = $OutputDir + $Table.replace('HumanResources.v', '') + ".csv" bcp "$Table" out "$CsvPath" -w -t"|" -S "$Server" -d "$Database" -T -k } |
As mentioned in my aforementioned blog post, the following error message is actually a warning and the data is exported successfully:
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Warning: BCP import with a format file will convert empty strings in delimited columns to NULL.
Convert the CSVs
I’m using the resulting EmployeeDepartment.csv file as an example of what the files currently look like in terms of encoding and line endings.
Note the Windows-style CRLF (carriage return + line feed) line endings. As well as the UTF-16 LE BOM encoding.
To make the CSV files look and feel like they’ve been generated on a Linux or Unix OS, I do the following in Notepad++:
- I change the Encoding to UTF-8. Go to Encoding (in the menu bar) and click on UTF-8
- I convert the line endings from CRLF to LF. Go to Edit (in the menu bar), EOL Conversion and click on Unix (LF)
- Save the changes
At this point all the files should have LF line endings and use the UTF-8 encoding.
First attempt to import the files
I’m already addressing the UTF-8 encoding with the following two command line options:
-c
– this tells bcp to perform the operation using a character data type. The data is stored as char and the row separator is\r\n
(CRLF).
It’s recommended to use-c
instead of-w
when dealing with data that was exported or should be imported into other applications or systems.-C 65001
– this tells bcp to use codepage 65001, which in Windows is the legacy equivalent of Unicode.
This is because my CSVs contain names such as Sánchez and other strings with Unicode characters that would get mangled when treated as char data types (due to-c
).
So, at this point the PowerShell code to import the CSVs looks like this:
1 2 3 4 5 6 7 8 9 10 11 | $Server = "LOCALHOST\VSQL2019" $Database = "AdventureWorks2019" $CSVPath = "F:\CSVOut" $CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name foreach ($CSV in $CsvFiles) { $FilePath = "$CSVPath\$CSV" $Target = $CSV.replace('.csv', '') Write-Host "" Write-Host " -- Importing $CSV --" bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -S "$Server" -d "$Database" -T -k } |
But when doing the import I get the following errors:
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file
SQLState = 22005, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification
And, for CSVs where the last column is any type of number, not the case in this demo, but I’ve seen it in the wild:
SQLState = 22003, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range
And as a result 0 records are copied.
Small gotcha
At this point, dear reader, I’m sure you’ve already spotted something that might be causing this.
Specifically the fact that the default row terminator is \r\n (CLRF) when using the -c
option, which isn’t the same row terminator used in the CSV files.
So, changing the bcp command to specify the correct row terminator -r"\n"
fixes the errors, right?
This is how the code would look like in that case:
1 2 3 4 5 6 7 8 9 10 11 | $Server = "LOCALHOST\VSQL2019" $Database = "AdventureWorks2019" $CSVPath = "F:\CSVOut" $CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name foreach ($CSV in $CsvFiles) { $FilePath = "$CSVPath\$CSV" $Target = $CSV.replace('.csv', '') Write-Host "" Write-Host " -- Importing $CSV --" bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -r"\n" -S "$Server" -d "$Database" -T -k } |
But it throws the same errors, because when specifying –r"\n"
, bcp slaps a \r
in front of \n
to turn it into \r\n
This sounds pretty silly, right?
Don’t take my word for it, Microsoft actually documented this.
The right way to get bcp to play nice with Unix line endings
The only way to ensure bcp doesn’t mess with the LF row terminator is to pass its hex equivalent to the -r
option.
The hex equivalent of the linefeed (or new line) character is 0x0a
.
You can check this using PowerShell:
1 2 | $LF = "`n" $LF | Format-Hex |
Or in ByteTool’s ASCII table.
The updated import command:
1 2 3 4 5 6 7 8 9 10 11 | $Server = "LOCALHOST\VSQL2019" $Database = "AdventureWorks2019" $CSVPath = "F:\CSVOut" $CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name foreach ($CSV in $CsvFiles) { $FilePath = "$CSVPath\$CSV" $Target = $CSV.replace('.csv', '') Write-Host "" Write-Host " -- Importing $CSV --" bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -r"0x0a" -S "$Server" -d "$Database" -T -k } |
And bcp is able to successfully process the CSV files and import them into SQL Server.
I check the imported data using the following queries:
1 2 3 | SELECT * FROM [dbo].[Employee]; SELECT * FROM [dbo].[EmployeeDepartment]; SELECT * FROM [dbo].[EmployeeDepartmentHistory]; |
And it looks like the none of the Unicode characters have been mangled.
Does dbatools have the same issue with LF?
As long as you use the right option for this output file, such as -Delimiter "|"
and -NoHeaderRow
, the data is imported without any issues.
1 2 3 4 | $MyConnection = Connect-DbaInstance -SqlInstance LOCALHOST\VSQL2019 $Database = "AdventureWorks2019" Get-ChildItem -Path F:\CSVOut\*.csv | Import-DbaCsv -SqlInstance $MyConnection ` -Database $Database -UseFileNameForSchema -Delimiter "|" -NoHeaderRow |
Why not just modify the CSV files?
Well, dear reader, you might run into silly situations where such CSV files are designed as test data to be loaded into multiple RDBMS (such as Oracle, MySQL, SQL Server, etc.) platforms, and you can’t just modify them to your liking.
Conclusion
In order to successfully import files generated on Unix systems using bcp, be mindful of the three options: -c
, -C
and -r
and of the fact that \n
will not work as expected so its hex equivalent, 0x0a
, is the way to go.