Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations GregLocock on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

Excel corruption of .csv file data

Status
Not open for further replies.

carmelobeach69

Computer
May 30, 2015
1
I've become aware of the limitations of opening .csv data files into Excel 2007 (earlier versions have the same issue). Various changes are made to the data as the file is opened, including at least the following:
* Fields that appear to be numeric and have leading or trailing zeroes have the leading/trailing zeroes removed
* Values that Excel thinks match a date such as 5-10 get assigned a data type of date. Reformatting the field's data type to text gives a crazy value such as 40308 (probably the number of days from 1/1/1970 to the date 5/10 in the current year)

Saving changes to the original .csv file causes permanent data loss as the above changes are saved back to the file (What were they thinking?!).

This issue can be worked around by using the Data -> Get External Data From Text tool, but I deal with .csv files so frequently that I want an easier way of getting my data opened without corruption. I came up with the below set of actions:

When right clicking on a .csv file in Windows Explorer, a context menu appears with various options. It would be useful to have an option “Open without corruption using Excel” or similar, which does the following:

1.	Opens Excel
2.	Creates a blank spreadsheet
3.	Imports the select .csv file similar to the following commands:
a.	Select Data -> From Text
b.	Select the .csv file to be imported
c.	Choose the “Delimited” option
d.	Use comma as the delimiter character
e.	Select all columns (hold shift and click the right-most column header)
f.	Change the “Column data format” to Text
g.	Click Finish
h.	Put the data into cell A1
4.	And possibly even alter the currently open Excel file name from “Book1” (or whatever it is) to the name of the .csv file that was opened.

I have a colleague trying to learn to program with .NET that can attempt this, but the key questions are:

1. Will it be possible to perform the above actions with .NET?
2. What is a general programming plan (perhaps which modules to use, etc?) that I can pass on to my colleague for development?
 
Replies continue below

Recommended for you

I can't directly answer your two questions, but:
[ul]
[li]Change the file extension from csv to txt[/li]
[li]Open excel[/li]
[li]Use the open dialog to open the file as a text file[/li]
[li]Select Delimited, Next[/li]
[li]Uncheck Tab, check Comma, Next[/li]
[li]Highlight all of the columns, select Text, Finish[/li]
[/ul]
 
hi,

I NEVER open a text file of any kind directly with Excel. Excel cam and will CHANGE your data under certain well documented circumstances. And if you are using Excel, you would be really pissed off if Excel did NOT convert your data under certain circumstances, so it cuts both ways.

When you enter 1/2/2015, Excel analyses the entry and converts 1/2/2015 to 42006 but also changes the cell's Number Format to a default Date format.

In fact, if you enter 1/2 in a cell, Excel ASSUMES that you intended to enter a date for the current year and you get the same result as if you entered 1/2/2015

We had some Work Centers in our machine shop that were 171E2. Excel converts this input to 17100 as it analyses the input as a number entered in scientific notation.

12:00 is converted to .5 and formatted with a TIME format.

Numeric whole values with leading zeros are dropped.

Decimal values with trailing zeros are dropped.

So when I have a text file of any kind, I use the TEXT IMPORT feature in the Data TAB, and I EXPLICITLY DEFINE THE DATA TYPE OF EACH IMPORT COLUMN.
If I need leading or trailing zeros IN EXCEL (this is not like an output you might need for a COBOL file) or ANY colunm that will not be used in arithmetic like Part Numbers or Work Centers (Identifiers), I import that column as TEXT.

Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
Let me address these two questions. In fast the FIRST question might be best addressed at
The best programming plan: I do all my Excel coding in Excel VBA. Your friend will need to be familiar with the Excel Object Model as he/she will be coding in .Net. But having the VBA code will help.

Once you know the steps to IMPORT, parse and define each field's data type appropriately, then turn on your macro recorder and record the process. The generated code usually need customization. You can get some specific help at
One of the advantages of the IMPORT method, is that once the Query Table has been added to the sheet, all that need to happen to get current data from the text file, is to REFRESH the Query Table. You can also import data from other databases and other Excel worksheets or workbooks.

BTW, Tek-Tips is a sister site to Eng-Tips, in case you had not noticed, for computer professionals.

Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
A few comments:

I tried opening a csv file with a date column in Excel 2013 and saving it from Excel, and the date format was unchanged. I don't know if this is something new in 2013, or whether there is some setting that can be changed.

I have a text import spreadsheet that you might find useful. If you pre-format any column with dates as text it should keep the dates in the same format as imported. Also I have added a Date2String function, which should convert any Excel date number to a date in string format.
You can download from: I will add some examples of using Date2String in the next few days.

I am sure you can set up a program in .NET to do what you want, but it would be much easier in VBA.

Finally, why not let it convert dates to Excel date numbers? Then you can do many things more easily, like sort by date, calculate number of days between two dates, etc. (The date number is the number of days since 1 Jan 1900 by the way).

Doug Jenkins
Interactive Design Services
 
Sorry, ignore the first and last comments. In Excel 2013 it does convert a text string that looks like a date (such as 10-5) into a date, and leading zeros are also truncated.

If the import range columns in my text-in spreadsheet are set to Text format it will keep 5-10 as a text string, but it does truncate leading zeros. I will see if I can find a way to stop that.

Doug Jenkins
Interactive Design Services
 
Folks,

This is NOT corruption. This is Excel working as designed. Trouble is, most people don't know what Excel is designed to do.

Leading or trailing zeros removed.
Try entering 001.100. What happens? You get 1.1. The leading and trailing zeros are meaningless to the number. NOT A PROBLEM!

Values such as 5-10.
As I explained earlier, 5/10 or 5-10 Excel assumes a date (depending on your settings) 5/10/2015 (current year) and parses and converts that string to a DateSerial NUMBER while changing the cells Number Format to a default Date format. NOT A PROBLEM if you intend a Date. However, if you intend a STRING, you must change the Number Format to Text BEFORE entering such data, as simply changing the Number Format changes NOTHING IN THE UNDERLYING DATA.

If you understand how Excel data entry works and the features available to assure data integrity, like Data > Get external data > ..... the Parsing alternatives and the Column Data Type alternatives, you will not have these kind of problems that appear to be corruption, but are really Excel acting normally and the user not acting in their interest.





Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
Skip - if you have a csv file with text strings that may look like a date or start with a zero, and those text strings are converted into date numbers or numbers with the leading zero deleted, then clearly that is a problem, under many circumstances.

There should be a built in way to open and save csv files without any automatic edits happening, without going through the text import process, but (as far as I know) there isn't.

Doug Jenkins
Interactive Design Services
 
Doug,

You think that's a problem, clearly. Hmmmmmmm?

I really don't believe it's that clear.

Let's say we have a text file that has columns of text strings that appear to be dates and other columns of text strings of digits with leading zeros.

YOU want them imported into your worksheet undisturbed with absolutely no conversion.

On the other hand, I want the date strings converted to REAL DATES, cuz I need to do math and sort, and I don't care about leading/trailing zeros, cuz they muck up my table with needless stuff and I know that 01.10 is merely 1.1.

So, the designers of Microsoft Excel, in their finite wisdom, made it so that we BOTH can accomplish our purposes. They designed Excel to be extremely flexible. And ya know what? An extremely flexible tool, makes it possible to go great feats of spreadsheet stuff, as well as great defeats.

So you can dig in your heels and fight the features. Or learn and leverage the features to your advantage. I chose the later long ago, and I am learning and leveraging yet.

Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
BTW,

I've been on a minor crusade for a number of years. Here are some FAQs I've posted regarding these issues:

Understanding Dates and Times & why they seem to be so much trouble? Posted: 20 Apr 05 (Edited 25 Oct 11)

When is a NUMBER not a NUMBER? Posted: 4 Jun 07
BTW, this answer to this one is NOT, "when it's ajar!"

Be Aware: Excel can Change Your Data Posted: 16 Aug 10





Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
Skip - I think your crusading has made you mis-read what people are saying.

No-one is saying that Excel should not have the facility to turn every string that looks like a date or a number into a date or a number. It's just that if you open a csv file that should not be the default, and if it is the default, there should be an easy way to switch it off.

It's just yet another case of Excel being too clever for its own good.

Doug Jenkins
Interactive Design Services
 
Well, Doug, you can switch it off, as described: column by column.

What you propose is to have a "switch" to import EVERYTHING as TEXT. That's really a bad option! It takes more effort, needless effort, to then figure out how to convert each text column that should be numeric, as Excel is primarily designed to work with numeric data. Many more steps than using the Text Import feature as designed!

BTW, with a few keystrokes, you can select ALL COLUMNS and set them ALL to Text om import if you so choose.

And remember, if all your data is Text, changing the Number Format changes NOTHING in the underlying data. It remains Text!

Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
Skip - we aren't talking about text import, we are talking about opening a csv file and saving a csv file. If you open a file and then save it without changing anything the saved file shouldn't be changed, but if you do it in Excel it is.

But anyway, I'm going to add options to my text import spreadsheet to import everything as text, or import specific columns as text.

Doug Jenkins
Interactive Design Services
 
So you want Excel to open text files as ONLY TEXT?

How do you use data like that when it comes to what should be numeric data?

Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
So you want Excel to open text files as ONLY TEXT?

How do you use data like that when it comes to what should be numeric data?

Either activate the option to import numbers as numbers, use import rather than open, or use one of the simple ways to convert numeric text strings into numbers.

Doug Jenkins
Interactive Design Services
 
I thought we were talking about opening text files in Excel?

But wishin don't make it so.

The fact of the matter is that the way Excel works when you ENTER data manually or when you OPEN a text file with Excel, is the same.

So if you don't understand how Excel works, you will get bit!

Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
If you re-read the OP you will see he is well aware of how Excel works, knows how the problem can be worked around, and is looking for a more convenient solution.

Doug Jenkins
Interactive Design Services
 
And my rejoinder To the OP is, what good is numeric data that is imported as TEXT?

You haven't solved anything! You've just replaced one perceived problem with yet another actual problem.

In order to use data, be it from a text file or a database, you have to understand the data and a preconceived procedure that assumes Text, just doesn't cut it. Sorry, Jack!

Skip,

[glasses]Just traded in my OLD subtlety...
for a NUance![tongue]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor