The weird case of bcp and Unix line endings

In this blog post I cover the solution for a weird behavior that bcp has when trying to import files that contain Unix line endings.

The weird behavior I’m referring to, is no data being imported into SQL Server and bcp throwing the following error messages:

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file
SQLState = 22005, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification
SQLState = 22003, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

Setting up the environment

I’m using the AdventureWorks2019 database as both source and target for the csv files.

This is similar to my post on exporting (and importing) multiple csv files, but, to make things easier, I’m providing the necessary files my blog’s GitHub repo.

Creating the CSV files

This section is just in case you want to follow along with the process I’ve used to create the CSV files for this demo, otherwise you can just download the “ready-to-use” ones from the repo.

Export the CSVs

For the export itself I use the following PowerShell code:

$Server = "LOCALHOST\VSQL2019"
$Database = "AdventureWorks2019"
$OutputDir = "F:\CSVOut\"
$Tables = @("HumanResources.vEmployee", "HumanResources.vEmployeeDepartment", 
    "HumanResources.vEmployeeDepartmentHistory")
foreach ($Table in $Tables) {
    $CsvPath = $OutputDir + $Table.replace('HumanResources.v', '') + ".csv"
    bcp "$Table" out "$CsvPath" -w -t"|" -S "$Server" -d "$Database"  -T -k
}

$Server = "LOCALHOST\VSQL2019"

$Database = "AdventureWorks2019"

$OutputDir = "F:\CSVOut\"

$Tables = @("HumanResources.vEmployee", "HumanResources.vEmployeeDepartment",

"HumanResources.vEmployeeDepartmentHistory")

foreach ($Table in $Tables) {

$CsvPath = $OutputDir + $Table.replace('HumanResources.v', '') + ".csv"

bcp "$Table" out "$CsvPath" -w -t"|" -S "$Server" -d "$Database" -T -k

}

As mentioned in my aforementioned blog post, the following error message is actually a warning and the data is exported successfully:

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Warning: BCP import with a format file will convert empty strings in delimited columns to NULL.

Convert the CSVs

I’m using the resulting EmployeeDepartment.csv file as an example of what the files currently look like in terms of encoding and line endings.

Note the Windows-style CRLF (carriage return + line feed) line endings. As well as the UTF-16 LE BOM encoding.

To make the CSV files look and feel like they’ve been generated on a Linux or Unix OS, I do the following in Notepad++:

I change the Encoding to UTF-8. Go to Encoding (in the menu bar) and click on UTF-8
I convert the line endings from CRLF to LF. Go to Edit (in the menu bar), EOL Conversion and click on Unix (LF)
Save the changes

At this point all the files should have LF line endings and use the UTF-8 encoding.

First attempt to import the files

I’m already addressing the UTF-8 encoding with the following two command line options:

-c – this tells bcp to perform the operation using a character data type. The data is stored as char and the row separator is \r\n (CRLF).
It’s recommended to use -c instead of -w when dealing with data that was exported or should be imported into other applications or systems.
-C 65001 – this tells bcp to use codepage 65001, which in Windows is the legacy equivalent of Unicode.
This is because my CSVs contain names such as Sánchez and other strings with Unicode characters that would get mangled when treated as char data types (due to -c).

So, at this point the PowerShell code to import the CSVs looks like this:

$Server = "LOCALHOST\VSQL2019"
$Database = "AdventureWorks2019"
$CSVPath = "F:\CSVOut"
$CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name
foreach ($CSV in $CsvFiles) {
    $FilePath = "$CSVPath\$CSV"
    $Target = $CSV.replace('.csv', '')
    Write-Host ""
    Write-Host " -- Importing $CSV --"
    bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -S "$Server" -d "$Database"  -T -k
}

$Server = "LOCALHOST\VSQL2019"

$Database = "AdventureWorks2019"

$CSVPath = "F:\CSVOut"

$CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name

foreach ($CSV in $CsvFiles) {

$FilePath = "$CSVPath\$CSV"

$Target = $CSV.replace('.csv', '')

Write-Host ""

Write-Host " -- Importing $CSV --"

bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -S "$Server" -d "$Database" -T -k

}

But when doing the import I get the following errors:

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file

SQLState = 22005, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification

PowerShell window showing the error messages -- Importing Employee.csv -- Starting copy... SQLState = S1000, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file 0 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 1 -- Importing EmployeeDepartment.csv -- Starting copy... SQLState = 22005, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification 0 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 1 -- Importing EmployeeDepartmentHistory.csv -- Starting copy... SQLState = 22005, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification 0 rows copied. Network packet size (bytes): 4096 Clock Time (ms.) Total : 1 SEO bcp Unix line endings SQLState = 22003, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

And, for CSVs where the last column is any type of number, not the case in this demo, but I’ve seen it in the wild:

SQLState = 22003, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

And as a result 0 records are copied.

Small gotcha

At this point, dear reader, I’m sure you’ve already spotted something that might be causing this.

Specifically the fact that the default row terminator is \r\n (CLRF) when using the -c option, which isn’t the same row terminator used in the CSV files.

So, changing the bcp command to specify the correct row terminator -r"\n" fixes the errors, right?

Well yes, but actually no SEO bcp Unix line endings SQLState = S1000, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file SQLState = 22005, NativeError = 0Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification SQLState = 22003, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

This is how the code would look like in that case:

$Server = "LOCALHOST\VSQL2019"
$Database = "AdventureWorks2019"
$CSVPath = "F:\CSVOut"
$CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name
foreach ($CSV in $CsvFiles) {
    $FilePath = "$CSVPath\$CSV"
    $Target = $CSV.replace('.csv', '')
    Write-Host ""
    Write-Host " -- Importing $CSV --"
    bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -r"\n" -S "$Server" -d "$Database"  -T -k
}

$Server = "LOCALHOST\VSQL2019"

$Database = "AdventureWorks2019"

$CSVPath = "F:\CSVOut"

$CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name

foreach ($CSV in $CsvFiles) {

$FilePath = "$CSVPath\$CSV"

$Target = $CSV.replace('.csv', '')

Write-Host ""

Write-Host " -- Importing $CSV --"

bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -r"\n" -S "$Server" -d "$Database" -T -k

}

But it throws the same errors, because when specifying –r"\n", bcp slaps a \r in front of \n to turn it into \r\n

This sounds pretty silly, right?
Don’t take my word for it, Microsoft actually documented this.

The right way to get bcp to play nice with Unix line endings

The only way to ensure bcp doesn’t mess with the LF row terminator is to pass its hex equivalent to the -r option.

The hex equivalent of the linefeed (or new line) character is 0x0a.
You can check this using PowerShell:

$LF = "`n"
$LF | Format-Hex

1 2	$LF = "`n" $LF \| Format-Hex

PowerShell window PS C:\Users\Vlad> $LF = "`n" PS C:\Users\Vlad> $LF | Format-Hex Label: String (System.String) <78B22376> Offset Bytes Ascii 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F ------ ----------------------------------------------- ----- 0000000000000000 0A SEO bcp Unix line endings SQLState = S1000, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file SQLState = 22005, NativeError = 0Error = [Microsoft][ODBC Driver 17 for SQL Server]Invalid character value for cast specification SQLState = 22003, NativeError = 0 Error = [Microsoft][ODBC Driver 17 for SQL Server]Numeric value out of range

Or in ByteTool’s ASCII table.

The updated import command:

$Server = "LOCALHOST\VSQL2019"
$Database = "AdventureWorks2019"
$CSVPath = "F:\CSVOut"
$CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name
foreach ($CSV in $CsvFiles) {
    $FilePath = "$CSVPath\$CSV"
    $Target = $CSV.replace('.csv', '')
    Write-Host ""
    Write-Host " -- Importing $CSV --"
    bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -r"0x0a" -S "$Server" -d "$Database"  -T -k
}

$Server = "LOCALHOST\VSQL2019"

$Database = "AdventureWorks2019"

$CSVPath = "F:\CSVOut"

$CsvFiles = Get-ChildItem -Path $CSVPath\*.csv | Select-Object -ExpandProperty Name

foreach ($CSV in $CsvFiles) {

$FilePath = "$CSVPath\$CSV"

$Target = $CSV.replace('.csv', '')

Write-Host ""

Write-Host " -- Importing $CSV --"

bcp "$Target" in "$FilePath" -c -C 65001 -t"|" -r"0x0a" -S "$Server" -d "$Database" -T -k

}

And bcp is able to successfully process the CSV files and import them into SQL Server.

I check the imported data using the following queries:

SELECT * FROM [dbo].[Employee];
SELECT * FROM [dbo].[EmployeeDepartment];
SELECT * FROM [dbo].[EmployeeDepartmentHistory];

SELECT * FROM [dbo].[Employee];

SELECT * FROM [dbo].[EmployeeDepartment];

SELECT * FROM [dbo].[EmployeeDepartmentHistory];

And it looks like the none of the Unicode characters have been mangled.

Does dbatools have the same issue with LF?

As long as you use the right option for this output file, such as -Delimiter "|" and -NoHeaderRow, the data is imported without any issues.

$MyConnection = Connect-DbaInstance -SqlInstance LOCALHOST\VSQL2019
$Database = "AdventureWorks2019"
Get-ChildItem -Path F:\CSVOut\*.csv | Import-DbaCsv -SqlInstance $MyConnection `
-Database $Database -UseFileNameForSchema -Delimiter "|" -NoHeaderRow

$MyConnection = Connect-DbaInstance -SqlInstance LOCALHOST\VSQL2019

$Database = "AdventureWorks2019"

Get-ChildItem -Path F:\CSVOut\*.csv | Import-DbaCsv -SqlInstance $MyConnection `

-Database $Database -UseFileNameForSchema -Delimiter "|" -NoHeaderRow

Why not just modify the CSV files?

Well, dear reader, you might run into silly situations where such CSV files are designed as test data to be loaded into multiple RDBMS (such as Oracle, MySQL, SQL Server, etc.) platforms, and you can’t just modify them to your liking.

Conclusion

In order to successfully import files generated on Unix systems using bcp, be mindful of the three options: -c, -C and -r and of the fact that \n will not work as expected so its hex equivalent, 0x0a, is the way to go.

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The weird case of bcp and Unix line endings

Setting up the environment

Creating the CSV files

Export the CSVs

Convert the CSVs

First attempt to import the files

Small gotcha

The right way to get bcp to play nice with Unix line endings

Does dbatools have the same issue with LF?

Why not just modify the CSV files?

Conclusion

Script to switch a whole SQL Server instance to SIMPLE recovery

Fix Asus router SSL certificate unknown or processing

You may also like

Leave a Comment Cancel Reply