Continue to Site

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations KootK on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report 2

Status
Not open for further replies.
Replies continue below

Recommended for you

Savvy Excel users know all about this.

... and even savvy biologists knew all about it more than 10 years ago:

That link is from the recent paper, which can be read in full at:

The graph of the frequency of these errors is quite interesting:
13059_2016_1044_Fig1_HTML.gif


As Skip says, we have discussed this "feature" here before, but to blame it all on Excel is ridiculous. Aren't these papers supposed to be peer reviewed? Do they blame unit conversion errors, or misplaced bracket errors on Excel as well?

Anyway, the solution (which apparently was pretty well known in 2005, and has since been forgotten) can be found in the 2004 paper:

There are a number of work-arounds to these behaviors in Excel, but all of them require continued attention on the part of the user to avoid introducing errors. The appropriate solutions depend on the context in which Excel is opened:

If Excel is configured to open a test file automatically from another application, then the data must be pre-processed in the upstream application. For example, a space character or an apostrophe can be placed in front of the gene name. That is the solution implemented in the Excel output format option of MatchMiner [1] and the primary approach recommended by Microsoft in their Knowledge Base Article on the issue [6].

If a text file is to be opened by Excel, open Excel first and then select the text file to read. Then select "text" mode for the column(s) containing potentially affected symbols in the Text Import Wizard Step 3 of 3.

If text is to be copied from another application (such as a text processor) and pasted into a pre-opened Excel spreadsheet, the formatting must be set in the spreadsheet. Within the pre-opened spreadsheet, prior to pasting, use Format -> cells to specify which columns of the recipient spreadsheet are to be treated as text [6]. That procedure works for copying from several text processors tested on Mac OS 10.2. However, changing the format of the spreadsheet column to text fails to solve the conversion problem when pasting data from a Microsoft Word file. In that case, in addition to the formatting, use the Paste Special -> Paste: As: Text command to insert the text.

But I don't agree with their next statement:
Despite the work-arounds, even the most vigilant investigator can inadvertently introduce conversion errors, and it is often necessary to screen data received from other sources.

How can a research scientist who lets errors as obvious as these to sslip through into published work be described as "vigilant"?



Doug Jenkins
Interactive Design Services
 
I agree, not very vigilant.
If we are using a more traditional programming language, it usually forces us to tell it what type of data going into each variable.
An Excel worksheet does not force us to tell it the type of data going into each cell, so we should expect some problems if we don't tell it what type of data (by preformatting).
Thanks skip for the FAQ's that made it pretty simple to grasp.


=====================================
(2B)+(2B)' ?
 
Every technical user ought to 1) be aware of and 2) be thinking of the fact that data conversion is happening every time they enter data into a spreadsheet. So it is incumbent upon each of us to verify our process and results, especially when such a tool is used in "mission critical" tasks. Every column of a spreadsheet table ought to have a proper Number Format applied BEFORE the table is used/filled. Structured Tables make the maintenance of such formatting as a table grows in size, seamless.

One of the other areas that often causes problems is numeric IDENTIFIERS. data like Invoice Numbers, Item Numbers, Part Numbers, Postal Numbers, Member IDs (and the list goes on): numeric data on which you will NEVER perform a mathematical operation, are often stored as numbers. WHY? It is asking for trouble that an IDENTIFIER would be entered and stored in a format that IS CONVERTED INTO SOMETHING EVERY TIME, and therefore is susceptible to being converted into something unwanted and is not in itself a meaningful number. Every identifier ought to be entered/stored as TEXT, as described in the FAQ. You might need a work-around when joining to other databases but it's worth the effort.

Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
In addition to fractions converting to dates, I get plagued by UK-US date conversion. 1/8 becomes 01-August (UK format) but 1/16 becomes 16-January (US format). Somehow, even though everything is UK, I still end up getting US dates. It wouldn't be so bad if I got 02 Jan, 04 Jan, 08 Jan - then I know it is all US format. If I got 01 Feb, 01 Apr, 01 Aug then I know it is UK format but when half converts to UK and the other half to US, it is just painful to sort out. 2016 is a particularly bad year - sometimes, 3/16 converts to 01 mar 2016 and sometimes 1/2 converts to 1 Feb 1900! I now use a macro to check the "fractions"
 
That would be extremely frustrating. WOW!

So are you saying that if you entered those examples on your sheet right now, that there would be absolutely no consistency in the date conversion, entering each of the fractions you mentioned, or does this happen under different circumstances?

[pre][/pre]I'm very interested in this, since I wrote the FAQs that I posted above and would like to understand the possible pitfalls related to data conversion.

Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
It is when I cut from IE or Acrobat and paste it in Excel. It is particularly bad on the 1/16. Next year it should be OK again. The next problem year will be 2032.
 
Well I can't duplicate this anomaly and it may be that I'm using USA format MDY and Bill designed Excel in the state of Washington.

You do realize that if you want to enter fractions you must format the cell/range as TEXT before entering any data. Changing the Number Format after the fact does nothing, because a Number Format is merely a display feature. Having entered a fraction as text, it renders the value virtually useless as a numerical value.

Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
Well why haven't you used the TEXT Column Data Format on Step 3 of 3 of the Text to Columns wizard?

You're shooting yourself in the foot!

Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
Happens regardless of whether I choose general or text.
 
Well I'm at a loss to explain your experience. I performed

1) copying the table in question,
2) pasting into a cell, which, BTW, resulted in one long string,
3) started the Text to columns wizard,
4) chose Delimited in Step 1,
5) Checked Space in Step 2,
6) in Step 3 FIRST made sure that ALL the columns were selected (darkened) in the data preview and then selected the TEXT Column data format, and finally
7) FINISH.

I suspect that you failed to SELECT the columns to import as TEXT in Step 3 of the Text to columns Wizard.

BTW, in Step 3, selecting all the columns (63) one at a time would be tedious: select the first, scroll to right, SHIFT + SELECT last column to select all.

Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
try this...
Code:
Sub ParseColumns()[b]
'SkipVought 9/16/2016 Eng-Tips
'copy table page 6 in [URL unfurl="true"]http://russell.htpgusa.com/images/documentation/products/condensers/RU-RDX-0816A.pdf[/URL]
'paste into A1[/b][b][b][/b][/b]

    Range("A1").TextToColumns _
        Destination:=Range("A1"), _
        DataType:=xlDelimited, _
        TextQualifier:=xlDoubleQuote, _
        ConsecutiveDelimiter:=True, _
        Tab:=False, _
        Semicolon:=False, _
        Comma:=False, _
        Space:=True, _
        Other:=False, _
        FieldInfo:=Array(Array(1, 2), Array(2, 2), Array(3, 2), Array(4, 2), Array(5, 2), Array(6, 2), _
        Array(7, 2), Array(8, 2), Array(9, 2), Array(10, 2), Array(11, 2), Array(12, 2), Array(13, 2), _
        Array(14, 2), Array(15, 2), Array(16, 2), Array(17, 2), Array(18, 2), Array(19, 2), Array(20, 2), _
        Array(21, 2), Array(22, 2), Array(23, 2), Array(24, 2), Array(25, 2), Array(26, 2), Array(27, 2), _
        Array(28, 2), Array(29, 2), Array(30, 2), Array(31, 2), Array(32, 2), Array(33, 2), Array(34, 2), _
        Array(35, 2), Array(36, 2), Array(37, 2), Array(38, 2), Array(39, 2), Array(40, 2), Array(41, 2), _
        Array(42, 2), Array(43, 2), Array(44, 2), Array(45, 2), Array(46, 2), Array(47, 2), Array(48, 2), _
        Array(49, 2), Array(50, 2), Array(51, 2), Array(52, 2), Array(53, 2), Array(54, 2), Array(55, 2), _
        Array(56, 2), Array(57, 2), Array(58, 2), Array(59, 2), Array(60, 2), Array(61, 2), Array(62, 2), _
        Array(63, 2)), _
        TrailingMinusNumbers:=True
End Sub

Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
That's exactly what I did. The way I got around it was to preset the columns to fractional format before converting.
 
Help me out. Are you stating that if you were to 1) paste in that PDF table to A1 and then 2) run my procedure, that the date conversion still occurs?

Also, you stated, "before converting." Could you explain what it is you were converting?

Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
Yes - if I paste the PDF table to A1 and then go from step 3 to 7, the fractions get converted to dates.

Sometimes they get converted to US dates, sometimes to UK dates. I didn't use the sub as there are lots of tables from lots of manufacturers and every one is different. Rewriting the sub for every table would be a nightmare. Sometimes, I have to input the table manually because the PDF just contains a bitmap [ponder]

If I note which columns should be fractions and preset the format of those column to fractions before running steps 3 to 7, everything comes out correctly.

DrawOh - apologies for hijacking your thread
 
Yes - if I paste the PDF table to A1 and then go from step 3 to 7, the fractions get converted to dates.

You and I are talking about two completely different things it appears. The Text to column wizard, Excel 2013, has THREE steps, not SEVEN. What feature are you using?

This lead me to believe that we are, maybe not worlds apart but at least not referring to the same process. The reason for me posting the macro was to determine if my process was the same as your process. Apparently NOT. I want to understand whats going on, because this is one of the features in Excel that is often misunderstood and misused, as the subject of this thread testifies.

Can, will you continue to pursue this with me? TIA


Skip,
[sub]
[glasses]Just traded in my OLD subtlety...
for a NUance![tongue][/sub]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor