Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations waross on being selected by the Eng-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data analysis question 4

Status
Not open for further replies.

ACtrafficengr

Civil/Environmental
Jan 5, 2002
1,641
US
I'm trying to analyze some crash data for our county highways.

The goal is to find specific things causing crashes that can be addressed by changing the road design. For example, if "traffic control device disregarded" is over-represented at a specific location, maybe the stop sign is in the wrong place.

I have some data like this:

Case # Vehicle Contributing factor
11 1 Alcohol
11 1 Glare
25 1 Unsafe speed
25 1 not applicable
25 2 Traffic control device disregarded
25 2 Tire failure/inadequate

Each case number represents an individual crash. Each case has two contributing factors per vehicle involved, which means in practical terms, each case number has between 2 and 10 (usually 2-4) contributing factors.

For each of the 45 different factors, I need the percentage of case numbers that have that contributing factor. For example, 15% of cases involve speeding, 18% involve slippery pavement, etc. Also, I don't want to double-count cases where the same factor was attributed to both vehicles.

Any suggestions?

If your suggestion is, "You need to learn Access," my reply is, "Yes, that's true, but I don't have time."

Thanks!

"...students of traffic are beginning to realize the false economy of mechanically controlled traffic, and hand work by trained officers will again prevail." - Wm. Phelps Eno, ca. 1928

"I'm searching for the questions, so my answers will make sense." - Stephen Brust

 
Replies continue below

Recommended for you

Well, the simplest solution would be to assign a numeric label to each factor, and then you'd have a column for each factor that has something like =if(or(cause1 = Factor1, cause2 = Factor1),1,0).

Sum everything up at the bottom and divide by the total.

TTFN

FAQ731-376
 
If DMV gave me a single row for each crash, and a separate column for each contributing factor, that would work.

Unfortunately, I have multiple rows for each crash, and one column for contributing factors.



"...students of traffic are beginning to realize the false economy of mechanically controlled traffic, and hand work by trained officers will again prevail." - Wm. Phelps Eno, ca. 1928

"I'm searching for the questions, so my answers will make sense." - Stephen Brust

 
That would seem to depend on what you want to do with the other rows. If you want all the factors, then what I posted would basically work. If you only want the first factor, you'd compare against the previous row the accident number, and only put something in the factor column if the accident numbers are different, i.e., for row 2 factor column, something like if(AN1<>AN2,factor, "")

TTFN

FAQ731-376
 
It still seems to me that would give me the percentage of all factors for each factor, not the percentage of crashes where the factor in question played a role.

"...students of traffic are beginning to realize the false economy of mechanically controlled traffic, and hand work by trained officers will again prevail." - Wm. Phelps Eno, ca. 1928

"I'm searching for the questions, so my answers will make sense." - Stephen Brust

 
I hope this isn't too late..

Assuming that you want to do this all within the worksheet, and not use VBA or anything like that, perhaps, on a second sheet in the workbook, you can create it so that it lists the case number followed by whether or not the contributing factor applies to that case or not, and from there do something like what IRStuff suggests.

A way to do that is by using Countifs statements. For example:

IF(COUNTIFS(Cont. Factor List,Cont. Factor1,Case list,Case #)>0,"yes","no")

this basically says that count the number of times that: the list of the contributing factor has factor 1 AND the corresponding case list equals the case number. Then, if this number is >0 (basically it will equal 1 if it applies to 1 vehicle or 2 if it applies to 2 vehicles) then say "yes" in the cell.

I've wrote up a quick excel file and attached it to demonstrate this.

I hope this helps!
 
 http://files.engineering.com/getfile.aspx?folder=ee9c771e-601c-4a4a-9d7b-a6d4821cf125&file=Example.xlsx
Alas, 'twould work if only I had Excel 2007, which I don't. Still, it's worth a star!

"...students of traffic are beginning to realize the false economy of mechanically controlled traffic, and hand work by trained officers will again prevail." - Wm. Phelps Eno, ca. 1928
 
You should be able to download the Excel 2007 converter directly from Microsoft that would allow you to read 2007 files from older versions of Excel.

TTFN

FAQ731-376
 
Try this instead. I did a quick google search and found a solution here:
Basically intead of using COUNTIFS it makes use of multiplying two booleans together. So, if both columns aren't what you want, the multiplication will be zero (and it won't count it).

The only trick is that in sheet 2, if you change something inside a cell, you must hit Ctrl+Alt+Enter in it before dragging it across or down, since it is technically an array.

Good Luck!
 
IRStuff, the problem is that that particular function (Countifs) does not even exist in older versions of excel. So even if you have the compatibility pack, or if you save "down" the original file, that function won't work if you're not using excel 2007. There are a couple of other functions and instances where some thing that works in 2007 will not work in older versions no matter what.
 
The right tool really is Access, but if you would rather spend your time connecting wires together with a hammer, knock yourself out.

In Access it would be trivial. In Excel, it is going to require a bunch of really esoteric stuff. If I had to do this in Excel I would use VBA so I could loop within each case number.

David
 
Thanks Spatel... info provided is good for other stuff...

Dik
 
I'm thinking something along the lines of starting by sorting the data by Factor and then by Case. Now you can do a check if the line above has the same case number and factor, then don't count it. Might be a little easier than looking for duplicates when you can't specify one specific row to check.

-- MechEng2005
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top