Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

  • Congratulations waross on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with very basic SQL queries

Status
Not open for further replies.

gphatch

Materials
Jan 31, 2003
45
0
0
US
Hi there - it's been so long since I did a SQL query that I've forgot what I need to do here...

I have three tables in an MS Access database:

1) "Name" with columns "NameID" and "Name".
2) "Category" with columns "CategoryID", "Category" and "Added?".
3) "Link" with columns "LinkID", "NameID" and "CategoryID".

The NameID and CategoryID items in the Link table are tied to those respective items in the other two tables.

So here's what I need to do with this data:

1) There are some duplicate names in the "Name" Table. This is because the same name may appear in different Categories. However, I really only want to have a single instance of any name, in the Names table. I need to run a query that will discover which names have duplicates, and then go in and replace the NameID-CategoryID link pairs in the Link table, with pairs that contain on one NameID in each instance. I would then want to delete the duplicate Name records in the Name table.

For example, The name Bob might appear three times in the Names table, with NameIDs of 1, 2 and 3. There might be three categories in the Category table that the name "Bob" is associated with, with CategoryIDs of 5, 6 and 7. So currently in the Links table [in this example], there would be Link pairs of 1-5, 2-6 and 3-7. I want to replace these pairs with 1-5, 1-6 and 1-7, and delete the records 2 and 3 in the Names table.

What's the easiest way to do this using SQL?

2) Once I've done all that - what's the easiest way to query the DB to produce:

a) A report / output that lists how many names there are in each category and
b) creates a report / output that lists all the names in alphabetical order, grouped by Category name but ONLY if the Category has at least 1 name associated with it?

THANKS in advance for any help!! I plan to use SQL in either ASP or CFML to generate this stuff, but if it's easier to do it directly in MS Access [rather than just using the DM as a datasource], I'd appreciate pointers on that too!
 
Replies continue below

Recommended for you

Made a backup, before, of the database
To find how many duplicates for each name:
select name,count(name) from name group by name

To modify Link table with the lowest nameid for each name:
update link set nameid=(select min(nameid) from name, link where (name.nameid=link.nameid) and (name.name='...'))
for each name with duplicates; i'm not sure this will work on Access but make a try.

To delete duplicate names, keeping only the lowest id for each name:
delete from name where nameid > (select min(nameid) from name where name='....')
for each name with duplicates

2a:
select categoryid, count(nameid) from link group by categoryid
2b:
select distinct name, category from name inner join link on name.nameid=link.nameid inner join category on category.categoryid=link.categoryid order by name

All the problems came from the bad design of the database. I will remove the link table and I will add a nameid lookup field in the category table. In this way I will have a direct link with the name table and ... goodbye duplicates and troubles.
 
Thank you so much for this assistance! I'll try doing this, this week.

On your comment regarding the database design - I can see how your suggestion would work if a name was associated with only one Category - but since it could be associated with more than one Category, how would I accommodate that with a single NameID lookup field in the Category table?
 
It will a typical one-to-many relation.
When you insert a line into Category table you'll have:
-a CategoryID which is a autonumbered field
-you set a name for Category
-you set a Yes or No for Added
-you set a NameID from the list of values given by the lookup field which take these values from Name table
The next line you insert will have the same fields as before. And so on.
Therefore, you will be assured that all Categories will have a NameID which exists in table Name and of course you'll have same name for multiple Categories if you selected it during the insertion.
 
Thanks for the explanation. I think I'm still missing one small piece of the puzzle - are you suggesting that I should have multiple records in the Category table, which each have the same Category Name, but unique CategoryID and NameID values?

E.g.

1 - History - Yes - 45
2 - History - Yes - 67
3 - History - Yes - 78

Could the same thing be achieved with a list of NameIDs [perhaps separated by commas?] in the NameID field, for each Category record - with only one record per Category? Or is the first option the better way, from a database design point of view?

E.g.

1 - History - Yes - 45, 67, 78

Gareth P. Hatch, Ph.D.
Director of Technology
Dexter Magnetic Technologies
 
I was thinking over about. The way the database is constructed suggest that is a many-to-many relation between table Name and table Category. Like the classical Customers-Products-Orders example: a Customer can have multiple Products and a Product can be bought by multiple Customers. Your Link table plays the same role as Orders table in this case. What is confusing is the duplicates in the Name table. Probably the person who entered the data has had no ideea about what is doing.
So, if your database is a many-to-many relation then you don't need to change nothing, just execute the SQL's to make the corrections.
A one-to-many relation can be translated as follows: a Name can have many Categories, but a Category can have JUST ONE name. After your last post, I beleive is a many-to-many relation. Let's do an example:
Table Name
1 Joe
2 Mary
3 Ann
Table Category
1 History Yes
2 Geography Yes
3 Math Yes
Now, Table Link should be:
1 1 ->Joe is at History
1 2 ->Joe is at Geography as well
2 1 ->Mary is at History
2 3 ->Mary is at Math as well
3 3 ->Ann is at Math
... and so on.
In this case multiple assignments are for Joe and Mary.
I think this is your case.
 
Yes - you describe the situation very well.

The problem with the original data entry, is that the person was entering a whole list of names, in each Category. They did not really know if the same name appeared in a different or older Category - they just knew that it was in the Category they were working on at the time, and entered the data.

Thanks for the input!

Gareth P. Hatch, Ph.D.
Director of Technology
Dexter Magnetic Technologies
 
You can improve your database more to avoid future problems.
Assign "unique" property for "name" in Name table. Therefore you can't have more than one Joe for example. If the user try to enter Joe second time (he forgot that Joe is already entered) then the database will refuse.
Same for "Category" in Category table.
For the link table, the pair "NameID-CategoryID" should be unique. As such, you can't have more than one pair Joe-History, but you can have Joe-Geography and so on.

Best regards,
rtmpxr
 
Me again... how would I automate the modification to the Link table with the lowest nameid for each name [which you kindly showed me in post 2 above], using the first query you gave me, which shows which names have duplicates?

I ask this becuase I just re-ran the COUNT query and I have over 1000 names that have duplicates - so doing this manually is not really feasible!

Thanks!

Gareth P. Hatch, Ph.D.
Director of Technology
Dexter Magnetic Technologies
 
You must create a procedure to automate the process. Maybe this can help
[link] www.databasejournal.com/features/msaccess/article.php/3407531[/url]
 
Status
Not open for further replies.
Back
Top