Requires an Inbuilt function or any other solution

db , sql server Kākināda, India
  • 13 years ago

    Hi experts,

    I am facing one performance issue.This is the scenario.

    For eg., EmpId,EmpName,DeptId,DeptName,Sal are there.Except sal all the remaining columns values are same(I mean having same data).

    Assume,here 4 rows are there in that table.

    so I wrote a select query to get the data like 1,xx,1,yy,50,500,5000,1000

    here 50,500,5000,1000... are the sal column values.

    I wrote a function to get the sal values as comma separated ones if the remaining column values are same.But for Huge no' of records (assume for 25000 records), the performance is very bad.

    So need a inbuilt query / any other solution for this problem.

     

  • 13 years ago

    I'm sure you already have this... the only way I think you can do this is with a function. I may of course be wrong, but here's what I'd do:

    select EmpId, EmpName, DeptId, DeptName, dbo.fselSalList(EmpId, EmpName, DeptId, DeptName)
    from Duplicates
    group by EmpId, EmpName, DeptId, DeptName


    create function fselSalList
    (
        @empId int,
        @empName varchar(50),
        @deptId int,
        @deptName varchar(50)
    )
    returns varchar(200)
    as
    begin
        DECLARE @list varchar(200)

        SELECT @list = COALESCE(@list + ',', '') + Cast(sal As varchar(10))
        FROM Duplicates
        where EmpId = @empId and EmpName = @empName and DeptId = @deptId and DeptName = @deptName

        return @list
    end

    I'd also put an index on the EmpId, EmpName, DeptId, DeptName ... clustered if possible in order to help process the records. After I'd done that, 100% processing on the select statement appeared at the function call level (using query analyser). Sorry, but I can't test using your 25,000 records, so I have no idea whether this solution would be any better or worse than your current one. I'd certainly be interested to know one way or the other though.

    Joe 

  • 13 years ago

    Yeah I did the same thing to solve my problem.But calling the same function repeatedly for 25000 records was reducing the performance.So I require some other solution to increase the performance.

    Please help me in this regard.

     

  • 13 years ago

    The only other thing that I can suggest is that you create another table in your database - kind of running the lines of data-warehousing now - where you have EmpId, EmpName, DeptId, DeptId and SalCSV. (De-normalising). You then initially load this table using the function you already have and then add triggers to the original table wherein you 'adjust' the warehouse table SalCSV with your function whenever a CRUD operation is made so that the new warehouse table always maintains the current "combination" data.

    In this way you will have a single table that has all the information for you ready to be extracted - and performace will not be hampered by the amount of records you are extracting, since the data is already 'formatted' accordingly.

    Joe 

  • 13 years ago

    Hi Joe,

    Thanks for your reply.

    But I am new to the concepts of data - warehousing. So If you don't mind could you please explain this in detail.

     

  • 13 years ago

    Ok, generally speaking, relational databases are set up to optimise CRUD type operations (add, edit, delete). For this they tend to be normalized - usually sufficient to 3rd normal form. Some go to 5th, but that is perhaps overkill. Each DBA will have his own thoughts on this. The issue with this, is that when it comes time to extract data there can be quite a lot of interplay between the tables in order to "structure" the data in a format that is useful for reporting and/or data extraction. In other words it makes the extract process more complicated.

    Data warehousing can be regarded as simply de-normalising the CRUD database into a form that is set up to optimise the extraction of data. In your case, instead of having to use an inline function in the normal extraction process from the CRUD database, the table already exists with the data in the format that you require in the data warehouse database.

    Of course, you now have a dependency. The data warehouse database relies upon the CRUD database in order to function correctly, so some form of processing is required between the two of them. Now, depending upon the environment, this could be a batch job ran at specific intervals during the day (or night) or almost immediately (as in CRUD triggers). It depends upon factors such as: what performace hit is "accepable" or how "fresh" the extract data needs to be. What can be said is that almost inevitably, the data warehouse information always "lags" behind the CRUD database.

    Then again, the original database can be a mix of CRUD type tables and data warehouse tables - but pragmatism will usually dictate which approach is taken.

    I hope that this at least gives you a "flavour" of the differing concepts between the 2 types of databases described.  

    Joe 

  • 13 years ago

    Hi Joe,

    Thank you so much.

    If possible could you please give me an example to implement this.

  • 13 years ago

    Here's one way in which it could be done:

    USE [TestBed]
    GO
    /****** Object:  Table [dbo].[Duplicates]    Script Date: 02/06/2008 10:22:18 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[Duplicates](
        [EmpId] [int] NOT NULL,
        [EmpName] [nvarchar](50) COLLATE Latin1_General_CI_AS NOT NULL,
        [DeptId] [int] NOT NULL,
        [DeptName] [nvarchar](50) COLLATE Latin1_General_CI_AS NOT NULL,
        [Sal] [int] NOT NULL
    ) ON [PRIMARY]
    GO
    /****** Object:  Table [dbo].[DuplicatesCSV]    Script Date: 02/06/2008 10:22:30 ******/
    CREATE TABLE [dbo].[DuplicatesCSV](
        [EmpId] [int] NOT NULL,
        [EmpName] [nvarchar](50) COLLATE Latin1_General_CI_AS NOT NULL,
        [DeptId] [int] NOT NULL,
        [DeptName] [nvarchar](50) COLLATE Latin1_General_CI_AS NOT NULL,
        [SalCSV] [nvarchar](200) COLLATE Latin1_General_CI_AS NOT NULL
    ) ON [PRIMARY]
    GO
    USE [TestBed]
    GO
    /****** Object:  Index [IX_Duplicates]    Script Date: 02/06/2008 10:26:18 ******/
    CREATE CLUSTERED INDEX [IX_Duplicates] ON [dbo].[Duplicates]
    (
        [EmpId] ASC,
        [DeptId] ASC,
        [DeptName] ASC,
        [EmpName] ASC
    )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]
    GO
    /****** Object:  Index [IX_DuplicatesCSV]    Script Date: 02/06/2008 10:26:18 ******/
    CREATE CLUSTERED INDEX [IX_DuplicatesCSV] ON [dbo].[DuplicatesCSV]
    (
        [EmpId] ASC,
        [DeptId] ASC,
        [DeptName] ASC,
        [EmpName] ASC
    )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]
    GO
    /****** Object:  UserDefinedFunction [dbo].[fselSalList]    Script Date: 02/06/2008 10:23:59 ******/
    CREATE function [dbo].[fselSalList]
    (
        @empId int,
        @empName varchar(50),
        @deptId int,
        @deptName varchar(50)
    )
    returns varchar(200)
    as
    begin
        declare @list varchar(200)

        select @list = COALESCE(@list + ',', '') + Cast(sal As varchar(10))
        from Duplicates
        where EmpId = @empId and EmpName = @empName and DeptId = @deptId and DeptName = @deptName

        return @list
    end
    GO
    /*
    -- to initially load duplicates tables (should Duplicates table already exist)
    insert into DuplicatesCSV(EmpId, EmpName, DeptId, DeptName, SalCSV)
    select distinct EmpId, EmpName, DeptId, DeptName, dbo.fselSalList(EmpId, EmpName, DeptId, DeptName)
    from Duplicates
    */
    GO
    CREATE TRIGGER [iuDuplicates]
       ON  [dbo].[Duplicates]
       FOR INSERT, UPDATE
    AS
    BEGIN
        set nocount on

        declare @empId int, @empName varchar(20), @deptId int, @DeptName varchar(20)

        select @empId = EmpId, @empName = EmpName, @deptId = DeptId, @deptName = DeptName
        from inserted

        delete from DuplicatesCSV
        where EmpId = @empId
        and EmpName = @empName
        and DeptId = @deptId
        and DeptName = @deptName
       
        insert into DuplicatesCSV(EmpId, EmpName, DeptId, DeptName, SalCSV)
        select distinct EmpId, EmpName, DeptId, DeptName, dbo.fselSalList(EmpId, EmpName, DeptId, DeptName)
        from inserted
    END
    GO
    /****** Object:  Trigger [dDuplicates]    Script Date: 02/06/2008 10:27:36 ******/
    CREATE TRIGGER [dDuplicates]
       ON  [dbo].[Duplicates]
       FOR DELETE
    AS
    BEGIN
        set nocount on

        declare @empId int, @empName varchar(20), @deptId int, @DeptName varchar(20)

        select @empId = EmpId, @empName = EmpName, @deptId = DeptId, @deptName = DeptName
        from deleted

        delete from DuplicatesCSV
        where EmpId = @empId
        and EmpName = @empName
        and DeptId = @deptId
        and DeptName = @deptName

        if exists
        (
            select EmpId
            from Duplicates
            where EmpId = @empId
            and EmpName = @empName
            and DeptId = @deptId
            and DeptName = @deptName
        )
        begin
            insert into DuplicatesCSV(EmpId, EmpName, DeptId, DeptName, SalCSV)
            select distinct EmpId, EmpName, DeptId, DeptName, dbo.fselSalList(EmpId, EmpName, DeptId, DeptName)
            from deleted
        end
    END

    I'm sure that there are others... but this works on a small dataset .. not sure how 'efficient' it would be for 25,000 records. Try it out on a test database 'copy'.

    Joe 

  • 13 years ago

    Hi Joe,

    Thank you so much for your replies.

    I'll try this.

  • 13 years ago

     No problem .. hope you get it sorted.

    Joe 

Post a reply

Enter your message below

Sign in or Join us (it's free).

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” - Bill Gates