Hi guys,
I have the following problem:
I am trying to do a multinomial logistic regression on a very large dataset.
My dataset has 6 columns (5 independent variables and 1 dependent variable) and 176,483 rows (observations).
The function in R which does exactly what I need is multinom().
However, because my dataset is too large, R quickly runs into memory issues and gives me the following complaint: too many (1402960) weights and memory exceeded.
One suggestion I received to resolve this problem was to use a database system and execute SQL statements on it.
Now I am using the RSQLite package in R and I am trying to "translate" my algorithm into SQL statements.
This is where I encounter another problem - my code has a syntax error that I cannot correct. My code looks like this:
library(RSQLite)
m = dbDriver("SQLite")
dbfile = "multinomial.db"
con = dbConnect(m, dbname = dbfile)
f <- read.csv("data.csv", header=TRUE, sep=";")
if(dbExistsTable(con, "mydata")) dbRemoveTable(con, "mydata")
dbWriteTable(con, "mydata", f, row.names=FALSE, column.names=TRUE)
library(PivotalR)
script <- paste("SELECT madlib.mlogregr_train('mydata',
'mydata_output',
'Col6',
'ARRAY[1, Col1, Col2, Col3, Col4, Col5]',
0,
'max_iter=20, optimizer=irls, precision=0.0001')",
"SELECT * FROM mydata_output")
result <- dbSendQuery(con, script)
Everything runs OK until the last command. When I run the last line of code I get the following error:
Error in sqliteExecStatement(conn, statement, ...) :
RS-DBI driver: (error in statement: near "(": syntax error)
Do you know how to solve this error?
Any help would be very much appreciated!
I have the following problem:
I am trying to do a multinomial logistic regression on a very large dataset.
My dataset has 6 columns (5 independent variables and 1 dependent variable) and 176,483 rows (observations).
The function in R which does exactly what I need is multinom().
However, because my dataset is too large, R quickly runs into memory issues and gives me the following complaint: too many (1402960) weights and memory exceeded.
One suggestion I received to resolve this problem was to use a database system and execute SQL statements on it.
Now I am using the RSQLite package in R and I am trying to "translate" my algorithm into SQL statements.
This is where I encounter another problem - my code has a syntax error that I cannot correct. My code looks like this:
library(RSQLite)
m = dbDriver("SQLite")
dbfile = "multinomial.db"
con = dbConnect(m, dbname = dbfile)
f <- read.csv("data.csv", header=TRUE, sep=";")
if(dbExistsTable(con, "mydata")) dbRemoveTable(con, "mydata")
dbWriteTable(con, "mydata", f, row.names=FALSE, column.names=TRUE)
library(PivotalR)
script <- paste("SELECT madlib.mlogregr_train('mydata',
'mydata_output',
'Col6',
'ARRAY[1, Col1, Col2, Col3, Col4, Col5]',
0,
'max_iter=20, optimizer=irls, precision=0.0001')",
"SELECT * FROM mydata_output")
result <- dbSendQuery(con, script)
Everything runs OK until the last command. When I run the last line of code I get the following error:
Error in sqliteExecStatement(conn, statement, ...) :
RS-DBI driver: (error in statement: near "(": syntax error)
Do you know how to solve this error?
Any help would be very much appreciated!