Descriptive statistics and time series in STATISTICA

#1
Hello everyone,

I have the following question regarding descriptive statistics and producing aggregate results. My data source is an excel workbook containing two data tables. The tables are numerical tables denoting the performance of investment funds. In detail, the data is as follows:

  • Return-on-investment (ROI) for different funds in different months (Table 1)
  • Assets-under-Management (AUM) for different funds in different months (Table 2)
  • The tables are congruent (i.e. all funds contained in Table 1 are contained in Table 2)
  • Consequently, the data are time series where the variables are different funds and the category is time
  • The time series are continuous but do not start or end at the same time

I would like to produce aggregate descriptive statistics for all funds grouped in the aforementioned table (average ROI, Stdev, Skew ...). The data in Table 2 (AUM) can be used to assign weights to the ROI data from Table 1. I managed to import the ROI data from the Exel spreadsheet but have no idea how to:

  1. aggregate the data as is
  2. aggregate the data using the weights from Table 2

Can anyone give me some pointers please? Thanx
 

JenB

New Member
#2
You need to aggregate the data first and then analyze it.

It sounds like you want to sum amounts by identifier (e.g., fund) and month/year. I’m assuming a dataset called TABLE1 with three variables: Identifier (ID), Date (DT), and Amount (AMT).

Here is the same example done with two different programming languages.


A. Run SQL in a relational database

Here is a SQL code for use with Microsoft SQL Server:

SELECT ID, STR(DATEPART(YYYY,DT)) + '-' + REPLACE(STR(DATEPART(MM,DT),2),' ','0') DT_NEW, SUM(AMT) AMT_NEW
FROM TABLE1
GROUP BY ID, STR(DATEPART(YYYY,DT)) + '-' + REPLACE(STR(DATEPART(MM,DT),2),' ','0')
ORDER BY ID, DT_NEW



B. Run an SVB script in STATISTICA

Here is STATISTICA Visual Basic (SVB) code that will create a new aggregated dataset:

Sub Main

Dim s1 As Spreadsheet
Dim s2 As Spreadsheet
Dim id1 As String
Dim id2 As String
Dim dt1 As String
Dim dt2 As String
Dim amt1 As Double
Dim amt2 As Double
Dim c1 As Long
Dim c2 As Long

'Reference input spreadsheet
Set s1 = ActiveDataSet
id1 = s1.Cells(1,1).Value
dt1 = Format(s1.Cells(1,2).Value,"yyyy") + "-" + Format(s1.Cells(1,2).Value,"mm")
amt1 = s1.Cells(1,3).Value

'Create new spreadsheet
Set s2 = s1.SubsetEx2("1-3",False,True,"","0","","")

'Populate new spreadsheet
c2 = 1
s2.Cells(c2,1).Value = id1
s2.Cells(c2,2).Value = dt1
s2.Cells(c2,3).Value = amt1

For c1 = 2 To s1.EntireRange.LastRow
id2 = s1.Cells(c1,1).Value
dt2 = Format(s1.Cells(c1,2).Value,"yyyy") + "-" + Format(s1.Cells(c1,2).Value,"mm")
amt2 = s1.Cells(c1,3).Value
If id1 <> id2 Or dt1 <> dt2 Then
s2.AddCases c2, 1
c2 = c2 + 1
s2.Cells(c2,1).Value = id2
s2.Cells(c2,2).Value = dt2
s2.Cells(c2,3).Value = amt2
id1 = id2
dt1 = dt2​
Else
s2.Cells(c2,3).Value = s2.Cells(c2,3).Value + amt2​
End If​
Next c1

s2.Visible = True​
End Sub