Home » The weird case of bcp and Unix line endings

The weird case of bcp and Unix line endings

by Vlad Drumea
0 comment

In this blog post I cover the solution for a weird behavior that bcp has when trying to import files that contain Unix line endings.

The weird behavior I’m referring to, is no data being imported into SQL Server and bcp throwing the following error messages:

  • SQLState = S1000, NativeError = 0
    Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file
  • SQLState = 22005, NativeError = 0
    Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification
  • SQLState = 22003, NativeError = 0
    Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

Setting up the environment

I’m using the AdventureWorks2019 database as both source and target for the csv files.

This is similar to my post on exporting (and importing) multiple csv files, but, to make things easier, I’m providing the necessary files my blog’s GitHub repo.

Creating the CSV files

This section is just in case you want to follow along with the process I’ve used to create the CSV files for this demo, otherwise you can just download the “ready-to-use” ones from the repo.

Export the CSVs

For the export itself I use the following PowerShell code:

As mentioned in my aforementioned blog post, the following error message is actually a warning and the data is exported successfully:

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Warning: BCP import with a format file will convert empty strings in delimited columns to NULL.

Convert the CSVs

I’m using the resulting EmployeeDepartment.csv file as an example of what the files currently look like in terms of encoding and line endings.

Note the Windows-style CRLF (carriage return + line feed) line endings. As well as the UTF-16 LE BOM encoding.

To make the CSV files look and feel like they’ve been generated on a Linux or Unix OS, I do the following in Notepad++:

  1. I change the Encoding to UTF-8. Go to Encoding (in the menu bar) and click on UTF-8
  2. I convert the line endings from CRLF to LF. Go to Edit (in the menu bar), EOL Conversion and click on Unix (LF)
  3. Save the changes

At this point all the files should have LF line endings and use the UTF-8 encoding.

First attempt to import the files

I’m already addressing the UTF-8 encoding with the following two command line options:

  • -c – this tells bcp to perform the operation using a character data type. The data is stored as char and the row separator is \r\n (CRLF).
    It’s recommended to use -c instead of -w when dealing with data that was exported or should be imported into other applications or systems.
  • -C 65001 – this tells bcp to use codepage 65001, which in Windows is the legacy equivalent of Unicode.
    This is because my CSVs contain names such as Sánchez and other strings with Unicode characters that would get mangled when treated as char data types (due to -c).

So, at this point the PowerShell code to import the CSVs looks like this:

But when doing the import I get the following errors:

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file

SQLState = 22005, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification

PowerShell window showing the error messages -- Importing Employee.csv -- Starting copy... SQLState = S1000, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file 0 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 1 -- Importing EmployeeDepartment.csv -- Starting copy... SQLState = 22005, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification 0 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 1 -- Importing EmployeeDepartmentHistory.csv -- Starting copy... SQLState = 22005, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification 0 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 1 SEO bcp Unix line endings SQLState = 22003, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

And, for CSVs where the last column is any type of number, not the case in this demo, but I’ve seen it in the wild:

SQLState = 22003, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

And as a result 0 records are copied.

Small gotcha

At this point, dear reader, I’m sure you’ve already spotted something that might be causing this.

Specifically the fact that the default row terminator is \r\n (CLRF) when using the -c option, which isn’t the same row terminator used in the CSV files.

So, changing the bcp command to specify the correct row terminator -r"\n" fixes the errors, right?

Well yes, but actually no SEO bcp Unix line endings SQLState = S1000, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file SQLState = 22005, NativeError = 0Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification SQLState = 22003, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

This is how the code would look like in that case:

But it throws the same errors, because when specifying –r"\n", bcp slaps a \r in front of \n to turn it into \r\n

This sounds pretty silly, right?
Don’t take my word for it, Microsoft actually documented this.

The right way to get bcp to play nice with Unix line endings

The only way to ensure bcp doesn’t mess with the LF row terminator is to pass its hex equivalent to the -r option.

The hex equivalent of the linefeed (or new line) character is 0x0a.
You can check this using PowerShell:

PowerShell window PS C:\Users\Vlad> $LF = "`n" PS C:\Users\Vlad> $LF | Format-Hex Label: String (System.String) <78B22376> Offset Bytes Ascii 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ------ ----------------------------------------------- ----- 0000000000000000 0A SEO bcp Unix line endings SQLState = S1000, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file SQLState = 22005, NativeError = 0Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification SQLState = 22003, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

Or in ByteTool’s ASCII table.

The updated import command:

And bcp is able to successfully process the CSV files and import them into SQL Server.

I check the imported data using the following queries:

And it looks like the none of the Unicode characters have been mangled.

Does dbatools have the same issue with LF?

As long as you use the right option for this output file, such as -Delimiter "|" and -NoHeaderRow, the data is imported without any issues.

Why not just modify the CSV files?

Well, dear reader, you might run into silly situations where such CSV files are designed as test data to be loaded into multiple RDBMS (such as Oracle, MySQL, SQL Server, etc.) platforms, and you can’t just modify them to your liking.

Conclusion

In order to successfully import files generated on Unix systems using bcp, be mindful of the three options: -c, -C and -r and of the fact that \n will not work as expected so its hex equivalent, 0x0a, is the way to go.

You may also like

Leave a Comment

* By using this form you agree with the storage and handling of your data by this website.