duplication in joined tables in access

  • 15 years ago

    I have a relationship between one table and another table which has given me a hard time. I found that if there were duplicate items in the primary key column in either table, those elements would be duplicated in the merged query. I tried to say the relationship between the two tables was a one - to many relationship to resolve this issue but it was still wrong. The way I solved it is to use excel instead of access. I did everything else the same except when it came time to join the two tables via the primary key that had the same data in one table that it had in the other table, I used the vlookup formula in excel, where both tables were located, to match up the transaction specific data to the main table instead of joining the tables in Access. In all cases where the join would have duplicated some transactions in the table, it no longer duplicated any transactions. I can now utilize that policy for future situations that I may encounter.

  • 15 years ago

    Hi Rick,
    Can you provide the structure for the tables concerned and some example data and the SQL used for your query as your description is difficult to follow.

    Quote:
    I found that if there were duplicate items in the primary key column in either table
    doesn't make sense, by definition you can't have duplicate data in a primary key column.


    Trevor

  • 15 years ago

    one table is a ledger of credit card transactions from the perspective of my companies bank, which is copied and pasted into access from Excel. It has 300,000 transactions in it. The other table is a ledger of the same transactions from the perspective of my companies in house online order processing application, which only has about 15,000 transactions in it because I just started merging the two tables this past June. I created a primary key column in the bank table made up of the amount, the first 4 digits and the last 4 digits of the credit card account number, whether the transaction was positive or negative, and the date of the transaction. The foreign key column in the in house table contained all the same things. What happens is this, a customer will purchase something, change their mind return it and we credit it back out, then the next hour they come back and purchase the item again. This generates a duplicate primary key in the bank table consisting of the first sale and the second sale (the credit only occured once that day so it is not involved in any duplicated key problems). We generate an invoice for the first sale, and a credit memo for the first return, and a second invoice for the second sale. This generates a duplicate foreign key in the in house table consisting of the first invoice and the second invoice. When I set up the relationships between the two tables, I tried to join the primary key in the bank table with a one to one join to the in house table. each duplicate in the bank table would be joined to each duplicate in the in house table. I wound up with a big mess, overstating the total sales for the period by the duplicate primary keys in each of the two tables. I needed to find a fast way to not overstate the total sales while still allowing the duplicate primary keys since both sales were true sales, only they each could be tied to different transactions numbers within the batch. There are many ways to come up with unique keys using additional fields in the bank table, and many ways to come up with unique keys using additional fields in the in house table also. The problem is, there is no additional field in either of the two tables that is common to both tables. I had already created the maximum number of items to put into the primary key and the foreign key, and it still had the duplicate transactions. By using Excel to join the two tables, the use of a join in the relationships dialogue is not necessary and the duplicate overstatement of sales does not occur. The Excel spreadsheet starts out being the bank table. Then I paste the in house table into a second sheet. I put the vlookup formula in the bank table sheet. Every time there is a match (including duplicate data that is matched) the additional details about the transaction can be appended to the bank table. The duplicate data in the bank table will grab only the last of the duplicate data in the in house table. But the duplicate data in the bank table won't get doubled because of the operation. For example, say I have 2 sales in the bank table for 25.00 on June 4 made up of transaction 1 and transaction 5 of batch 34 and one is invoice number 24  and the other is invoice number 29 in the in house table. I would in an ideal world want invoice number 24 to be matched to transaction 1 and invoice number 29 to be matched to transaction number 5. The method using the Join feature in Access can't do that without manual intervention in the form of extracting duplicate data out of the two tables and manually changing some of that data in some way so the subsequent join will not double up on the totals. However the one using vlookup in Excel is not able to do that either. Excel will match the second of the two duplicate 25.00 invoices to each transaction, calling transaction 1 a match to invoice 29 and also calling transaction 5 a match to invoice 29. The people who use the report will cry and moan and complain that my report is flawed, but I don't know what else to do. The bank report is not corrupted, just the match to in house table, and only in those situations where there were duplicate invoices for the same credit card on the same day. I have not received any complaints from any of my fellow workers that my report is flawed yet. I expect it, however and they will be right, the report is flawed and a manual matching up needs to be done by the fellow workers to get each invoice number to tie to each transaction number. I thought about adding a step where I would give the users a new report that just discussed the flaws contained in the big report so if someone wanted to complain I could say, "check to see if you are complaining about a known flaw in the big report."

  • 15 years ago


    Hi,


    I have just read your long missive and can see your problem.


    You say that the key that you generate includes the date, can it also include transaction time, because if so problem would go away for multiple transactions on a particular day ?



  • 15 years ago

    sorry, but that is not an available option because one of the tables does not have the time of day. I need the time field to be a common field to both tables and it is not.

  • 15 years ago

    I see your problem.

    If you added an autonumber field to each table so that you knew the records were in correct order, it would help.

    The only records for which you have problem are the ones with duplicate keys - how many do you have each day on average ?

    I would then write some vb to marry the records and ensure each pair had a unique key.

    For each day do you now have equal numbers of records in each table ?


  • 15 years ago
    how many do you have each day on average
    1 or 0, average is only 5 duplicates per month

    For each day do you now have equal numbers of records in each table ?
    The in house table averages 6000 transactions per month

    there is a second in house tabel with another 4000 transactions per month that I have not attempted to reconcile yet, coming in from a different direction.

    The bank table has an average of 10,000 transactions per month

    I agree that the duplicate items need to be married and paired up but I have not ever written any vb that would handle that.
    I have been thinking about the term "referential integrity"

Post a reply

Enter your message below

Sign in or Join us (it's free).

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“The trouble with programmers is that you can never tell what a programmer is doing until it's too late.” - Seymour Cray