Friday, March 30, 2012

Moving Average using Select Statement or Cursor based?

ID DATE(dd/mm/yy) TYPE QTYIN COST_IN_AMT COST_OUT_AMT(MOVING AVERAGE)
1 01/01/2007 PURCHASE 10 1000
2 01/01/2007 PURCHAES 5 1100
3 01/01/2007 SALES -5 *TobeCalculated
4 02/01/2007 Purchase 20 9000
5 02/01/2007 SALES -10 *TobeCalculated
5 02/01/2007 purchase 50 8000
6 03/01/2007 Sales -10 *TobeCalculate

7 01/01/2007 Purchase 20 12000

I have a table when user add new sales or puchase will be added to this table ITEM_TXNS. The above date is part of the table for a ProductID . (The field is removed here)

In order to calculate the balance amount using moving average, I must calculated the cost_out_amt first on the fly.

When user add new sales I also need to determine the cost/unit for a product id using moving average. The problem is I can not just use sum, because i need to determine cost_out_amt for each sales first which will be calculated on the fly.

The reason i dont store the cost_out_amt (instead calculate on the fly) because User could Edit the previous sales/purchase txn or Insert new sales for a previous date. Example THe record with ID 9. By Adding this txn with ID 9, would cause all the cost_out_amt will be incorrect (Using moving Average) if i store the cost_amout_out on entrying txn and need to be recalculated.

Instead I just want to calculate on the fly and able to determine the cost avr for a specific point of time.

Should I just use Cursor and loop all the record and calculate the cost or maybe I can just use on Select Statement?

It is best to learn to avoiding cursors; Transact SQL is a set-based language and works best when used to perform set-based operations rather than use cursors, which are record-based. I am still not sure that I fully understand the problem, but here is my first pass:

Code Snippet

create table dbo.item_txns
( id integer,
date datetime,
type varchar(10),
qtyin integer,
cost_out_amt decimal(12,2)
)
go

declare @.avg table
( id integer,
date datetime,
type varchar(10),
qtyin integer,
cost_out_amt decimal(12,2),
seq integer primary key,
moving_cost decimal(12,2),
moving_qty integer
)

insert into dbo.item_txns
select 1,'01/01/2007','PURCHASE',10,1000 union all
select 2,'01/01/2007','PURCHAES',5,1100 union all
select 3,'01/01/2007','SALES',-5,null union all
select 4,'02/01/2007','Purchase',20,9000 union all
select 5,'02/01/2007','SALES',-10,null union all
select 5,'02/01/2007','purchase',50,8000 union all
select 6,'03/01/2007','Sales',-10,null union all
select 7,'01/01/2007','Purchase',20,12000

;with tranCTE as
( select id,
date,
type,
qtyin,
cost_out_amt,
row_number() over
( order by date, id, type, qtyin )
as Seq
from item_txns
)
insert into @.avg
( id,
date,
type,
qtyin,
cost_out_amt,
seq
)
select id,
date,
type,
qtyin,
cost_out_amt,
seq
from tranCTE

--select * from @.avg

declare @.movingCost decimal(12,2) set @.movingCost = 0
declare @.movingQty integer set @.movingQty = 0

update @.avg
set moving_cost = @.movingCost,
moving_qty = @.movingQty,
cost_out_amt
= case when cost_out_amt is null
and @.movingQty <> 0
then qtyin * (@.movingCost/@.movingQty)
when cost_out_amt is null
and @.movingQty = 0
then 0
else cost_out_amt
end,
@.movingCost = @.movingCost +
case when cost_out_amt is null
and @.movingQty <> 0
then qtyin * (@.movingCost/@.movingQty)
when cost_out_amt is null
then 0
else cost_out_amt
end,
@.movingQty = @.movingQty + isnull(qtyin,0)

select * from @.avg

/*
id date type qtyin cost_out_amt seq moving_cost moving_qty
-- -- - - -- --
1 2007-01-01 00:00:00.000 PURCHASE 10 1000.00 1 1000.00 10
2 2007-01-01 00:00:00.000 PURCHAES 5 1100.00 2 2100.00 15
3 2007-01-01 00:00:00.000 SALES -5 -700.00 3 1400.00 10
7 2007-01-01 00:00:00.000 Purchase 20 12000.00 4 13400.00 30
4 2007-02-01 00:00:00.000 Purchase 20 9000.00 5 22400.00 50
5 2007-02-01 00:00:00.000 purchase 50 8000.00 6 30400.00 100
5 2007-02-01 00:00:00.000 SALES -10 -3040.00 7 27360.00 90
6 2007-03-01 00:00:00.000 Sales -10 -3040.00 8 24320.00 80
*/

|||

WOW!!! Marvelous. I have been thinking and figuring out this for a long long time. How To do it without using cursor and looping the cursor row by row. THANK YOU VERY MUCH !!! I never thought it was so straightforward. The reason I can not figure it out because I never thought you can declare variable at update statement to store the previous calculations.

Can I just use the Select Statement from the CTE directly in the Insert statement to insert the record without using the CTE?
Why do you use CTE anyway?
Why do use the row_number()? I have read the help file, but i dont understand. I believe the purpose is to build the result set in sequential order.
What is the purpose to use seq while you can just order the result set?
What do you think about the performance ? Because each time user add new or adjust sales data I need to be able determine the cost average on the fly

And If I want to adopt the FIFO costing method do I need to add another table or Field? I know this more about db design issues but please help me

|||

Hi Kent,

Is this solution based on an ordered update?

If so, then it is not a reliable one.

For this kind of calculation (ordered calculations), a cursor based solution uses to perform better than a set one. That is the reason why Itzik Ben-Gan and Sujata Mehta sent an enhancement request to Microsoft, to improve the OVER clause. Here is the link to the document, It is worth to read it. Inside the document, you will find the links to the entries of the suggestions in Microsoft connect, I encourage everyone that read the document, please, to vote.

OVER Clause and Orderes Calculations - Feature Enhancements Request by Itzik Ben-Gan and Sujata Mehta

(first link after Whitepapers)

http://www.sql.co.il/books/insidetsql2005/resources.htm

Here Is the cursor based solution, notice that the table will be scanned just once, contrary to the set based one.

Code Snippet

use tempdb

go

create table dbo.item_txns

( id integer,

date datetime,

type varchar(10),

qtyin integer,

cost_out_amt decimal(12,2)

)

go

declare @.avg table

( id integer,

date datetime,

type varchar(10),

qtyin integer,

cost_out_amt decimal(12,2),

seq integer primary key,

moving_cost decimal(12,2),

moving_qty integer

)

insert into dbo.item_txns

select 1,'01/01/2007','PURCHASE',10,1000 union all

select 2,'01/01/2007','PURCHAES',5,1100 union all

select 3,'01/01/2007','SALES',-5,null union all

select 4,'02/01/2007','Purchase',20,9000 union all

select 5,'02/01/2007','SALES',-10,null union all

select 5,'02/01/2007','purchase',50,8000 union all

select 6,'03/01/2007','Sales',-10,null union all

select 7,'01/01/2007','Purchase',20,12000

declare @.id integer

declare @.date datetime

declare @.type varchar(10)

declare @.qtyin integer

declare @.cost_out_amt decimal(12,2)

declare @.Seq int

declare @.movingCost decimal(12,2)

declare @.movingQty integer

declare c cursor fast_forward read_only

for

select

id,

date,

[type],

qtyin,

cost_out_amt

from

item_txns

order by

date, id, [type], qtyin

set @.Seq = 0

set @.movingCost = 0

set @.movingQty = 0

open c

while 1 = 1

begin

fetch next from c into @.id, @.date, @.type, @.qtyin, @.cost_out_amt

if @.@.error != 0 or @.@.fetch_status != 0 break -- please, use better error handle

set @.Seq = @.Seq + 1

set @.cost_out_amt = coalesce(@.cost_out_amt, @.qtyin * isnull((@.movingCost/nullif(@.movingQty, 0)), 0))

set @.movingCost = @.movingCost + coalesce(@.cost_out_amt, @.qtyin * isnull((@.movingCost/nullif(@.movingQty, 0)), 0))

set @.movingQty = @.movingQty + isnull(@.qtyin,0)

insert into @.avg (

id,

date,

type,

qtyin,

cost_out_amt,

seq,

moving_cost,

moving_qty

)

values(@.id, @.date, @.type, @.qtyin, @.cost_out_amt, @.Seq, @.movingCost, @.movingQty)

end

close c

deallocate c

select * from @.avg order by Seq

drop table dbo.item_txns

go

AMB

|||

Thanks for picking me up!

|||

Hi Kent,

I do not understand what you try to tell me. I just wanted to point that this is one of the cases where cursor based solution can outperform set based one. I also wanted to let you know about this document and the possibility to help ourself voting in connect.

I had no other intention.

Regards,

Alejandro Mesa

|||

I mean thank you for helping me, Alejandro. This is what I love about this forum; I get "picked up" when I fall without getting grief over an "honest mistake".

I wandered about the first pass through the table in this case and I wandered about the use of the update extensions. However, to emphasize for those who are following.

I created a table variable with a primary key based on sequence number

The primary key has a default attribute of being clustered (an assumption, is this correct?)

I inserted the records into the table variable

When I process the table variable, the order processed will NOT be according to the primary key -- despite the assumption of clustering

Because of the ordering problem, the update will not be reliable

|||

Hi Kent,

Now I understand, sometimes I have problem with the language.

You explained it very well, except for the two-pass process. If we want to use a set based solution, then we can use a correlated subquery, this will be used per each row, or we can use "update ... from " joining the table with it self. Let us see how we can calculate the moving quantity, for the sake of simplicity.

update @.avg

set moving_qty = (select sum(qtyin) from @.avg as a where a.seq <= [@.avg].seq)

can you see how many passes?

You can read more in the document sent by Itzik and Sujata, they included a comparison between different approach.

AMB

|||

Alejandro:

Really, since I can't pin any hopes on the update being ordered I see nothing redeaming about the process I outlined. Man, I don't know what the number of posts I made which turn out to be just wrong -- and the previous posts were not corrected!

I'm worried about the damage I've done.

|||

Hi Kent,

Do not feel bad, I have been there too. Learning from our mistakes make us better. There is no doubt that you have done a very good job helping others in this newsgroup.

Sincerely,

Alejandro Mesa

|||

Thanks Alejandro. I need some advice as to what action is best to take. I figure I got 7-15 previous posts in which I might had outlined a similar routine and problem was not pointed out. Do I need to go back to those previous posts and straighten them out as best as I can? Most of them are easy enough to find.

I think I remember one man in particular -- Jake. He was working with the orange alert program; trying to do something about protecting kids. I really think I ought to re-establish contact with Jake and let him know of the potential problem.

There was also a guy a year ago named William who had a query that took between 40 and 60 hours to process; I had a process that cut it down to a few minutes. But another guy had a process similar to the one I outlined that was somewhat faster using ordered updates. It wasn't my solution, but should I contact William?

Again, looking for advice.

Kent

(I am going to split this off from the original post because I feel this is a separate issue and doesn't contribute to the original post. )

Also, would I be better off moving this portion to the REPORTING forum?

|||

Hi Kent,

This shows us how much you care for the members of the ng, it is really kind. I do not know what to tell you, because I have no idea where to knock. If you can find those post and contact the OP, for sure they will apreciate it.

Sorry I can not give you a better advice.

Regards,

Alejandro Mesa

|||

Alejandro:

That is an excellent answer. I will give it a go. Thank you for helping me.

Kent

|||

" Is this solution based on an ordered update? "

Could you explain more in detail please?

|||

Please use the solution supplied by Alejandro (hunchback); apparently, the solution I provided is not correct. Sorry for the inconvenience.

No comments:

Post a Comment