Help with very basic SQL queries

gphatch · May 8, 2005

Hi there - it's been so long since I did a SQL query that I've forgot what I need to do here...

I have three tables in an MS Access database:

1) "Name" with columns "NameID" and "Name".
2) "Category" with columns "CategoryID", "Category" and "Added?".
3) "Link" with columns "LinkID", "NameID" and "CategoryID".

The NameID and CategoryID items in the Link table are tied to those respective items in the other two tables.

So here's what I need to do with this data:

1) There are some duplicate names in the "Name" Table. This is because the same name may appear in different Categories. However, I really only want to have a single instance of any name, in the Names table. I need to run a query that will discover which names have duplicates, and then go in and replace the NameID-CategoryID link pairs in the Link table, with pairs that contain on one NameID in each instance. I would then want to delete the duplicate Name records in the Name table.

For example, The name Bob might appear three times in the Names table, with NameIDs of 1, 2 and 3. There might be three categories in the Category table that the name "Bob" is associated with, with CategoryIDs of 5, 6 and 7. So currently in the Links table [in this example], there would be Link pairs of 1-5, 2-6 and 3-7. I want to replace these pairs with 1-5, 1-6 and 1-7, and delete the records 2 and 3 in the Names table.

What's the easiest way to do this using SQL?

2) Once I've done all that - what's the easiest way to query the DB to produce:

a) A report / output that lists how many names there are in each category and
b) creates a report / output that lists all the names in alphabetical order, grouped by Category name but ONLY if the Category has at least 1 name associated with it?

THANKS in advance for any help!! I plan to use SQL in either ASP or CFML to generate this stuff, but if it's easier to do it directly in MS Access [rather than just using the DM as a datasource], I'd appreciate pointers on that too!

rtmpxr · May 8, 2005

Made a backup, before, of the database
To find how many duplicates for each name:
select name,count(name) from name group by name

To modify Link table with the lowest nameid for each name:
update link set nameid=(select min(nameid) from name, link where (name.nameid=link.nameid) and (name.name='...'))
for each name with duplicates; i'm not sure this will work on Access but make a try.

To delete duplicate names, keeping only the lowest id for each name:
delete from name where nameid > (select min(nameid) from name where name='....')
for each name with duplicates

2a:
select categoryid, count(nameid) from link group by categoryid
2b:
select distinct name, category from name inner join link on name.nameid=link.nameid inner join category on category.categoryid=link.categoryid order by name

All the problems came from the bad design of the database. I will remove the link table and I will add a nameid lookup field in the category table. In this way I will have a direct link with the name table and ... goodbye duplicates and troubles.

gphatch · May 9, 2005

Thank you so much for this assistance! I'll try doing this, this week.

On your comment regarding the database design - I can see how your suggestion would work if a name was associated with only one Category - but since it could be associated with more than one Category, how would I accommodate that with a single NameID lookup field in the Category table?

rtmpxr · May 9, 2005

It will a typical one-to-many relation.
When you insert a line into Category table you'll have:
-a CategoryID which is a autonumbered field
-you set a name for Category
-you set a Yes or No for Added
-you set a NameID from the list of values given by the lookup field which take these values from Name table
The next line you insert will have the same fields as before. And so on.
Therefore, you will be assured that all Categories will have a NameID which exists in table Name and of course you'll have same name for multiple Categories if you selected it during the insertion.

gphatch · May 9, 2005

Thanks for the explanation. I think I'm still missing one small piece of the puzzle - are you suggesting that I should have multiple records in the Category table, which each have the same Category Name, but unique CategoryID and NameID values?

E.g.

1 - History - Yes - 45
2 - History - Yes - 67
3 - History - Yes - 78

Could the same thing be achieved with a list of NameIDs [perhaps separated by commas?] in the NameID field, for each Category record - with only one record per Category? Or is the first option the better way, from a database design point of view?

E.g.

1 - History - Yes - 45, 67, 78

Gareth P. Hatch, Ph.D.
Director of Technology
Dexter Magnetic Technologies

http://www.dextermag.com

rtmpxr · May 9, 2005

I was thinking over about. The way the database is constructed suggest that is a many-to-many relation between table Name and table Category. Like the classical Customers-Products-Orders example: a Customer can have multiple Products and a Product can be bought by multiple Customers. Your Link table plays the same role as Orders table in this case. What is confusing is the duplicates in the Name table. Probably the person who entered the data has had no ideea about what is doing.
So, if your database is a many-to-many relation then you don't need to change nothing, just execute the SQL's to make the corrections.
A one-to-many relation can be translated as follows: a Name can have many Categories, but a Category can have JUST ONE name. After your last post, I beleive is a many-to-many relation. Let's do an example:
Table Name
1 Joe
2 Mary
3 Ann
Table Category
1 History Yes
2 Geography Yes
3 Math Yes
Now, Table Link should be:
1 1 ->Joe is at History
1 2 ->Joe is at Geography as well
2 1 ->Mary is at History
2 3 ->Mary is at Math as well
3 3 ->Ann is at Math
... and so on.
In this case multiple assignments are for Joe and Mary.
I think this is your case.

gphatch · May 10, 2005

Yes - you describe the situation very well.

The problem with the original data entry, is that the person was entering a whole list of names, in each Category. They did not really know if the same name appeared in a different or older Category - they just knew that it was in the Category they were working on at the time, and entered the data.

Thanks for the input!

Gareth P. Hatch, Ph.D.
Director of Technology
Dexter Magnetic Technologies

http://www.dextermag.com

rtmpxr · May 10, 2005

You can improve your database more to avoid future problems.
Assign "unique" property for "name" in Name table. Therefore you can't have more than one Joe for example. If the user try to enter Joe second time (he forgot that Joe is already entered) then the database will refuse.
Same for "Category" in Category table.
For the link table, the pair "NameID-CategoryID" should be unique. As such, you can't have more than one pair Joe-History, but you can have Joe-Geography and so on.

Best regards,
rtmpxr

gphatch · Jun 7, 2005

Me again... how would I automate the modification to the Link table with the lowest nameid for each name [which you kindly showed me in post 2 above], using the first query you gave me, which shows which names have duplicates?

I ask this becuase I just re-ran the COUNT query and I have over 1000 names that have duplicates - so doing this manually is not really feasible!

Thanks!

Gareth P. Hatch, Ph.D.
Director of Technology
Dexter Magnetic Technologies

http://www.dextermag.com

rtmpxr · Jun 9, 2005

You must create a procedure to automate the process. Maybe this can help
[link] www.databasejournal.com/features/msaccess/article.php/3407531[/url]

rtmpxr · Jun 9, 2005

Correction of the http link from above:

http://www.databasejournal.com/features/msaccess/article.php/3407531

Eng-Tips is the largest engineering community on the Internet

Intelligent Work Forums for Engineering Professionals

Help with very basic SQL queries

gphatch

Materials

Recommended for you

rtmpxr

Nuclear

gphatch

Materials

rtmpxr

Nuclear

gphatch

Materials

rtmpxr

Nuclear

gphatch

Materials

rtmpxr

Nuclear

gphatch

Materials

rtmpxr

Nuclear

rtmpxr

Nuclear

Similar threads