I have the following problem:

I am trying to do a multinomial logistic regression on a very large dataset.

My dataset has 6 columns (5 independent variables and 1 dependent variable) and 176,483 rows (observations).

The function in R which does exactly what I need is multinom().

However, because my dataset is too large, R quickly runs into memory issues and gives me the following complaint: too many (1402960) weights and memory exceeded.

One suggestion I received to resolve this problem was to use a database system and execute SQL statements on it.

Now I am using the RSQLite package in R and I am trying to "translate" my algorithm into SQL statements.

This is where I encounter another problem - my code has a syntax error that I cannot correct. My code looks like this:

library(RSQLite)

m = dbDriver("SQLite")

dbfile = "multinomial.db"

con = dbConnect(m, dbname = dbfile)

f <- read.csv("data.csv", header=TRUE, sep=";")

if(dbExistsTable(con, "mydata")) dbRemoveTable(con, "mydata")

dbWriteTable(con, "mydata", f, row.names=FALSE, column.names=TRUE)

library(PivotalR)

script <- paste("SELECT madlib.mlogregr_train('mydata',

'mydata_output',

'Col6',

'ARRAY[1, Col1, Col2, Col3, Col4, Col5]',

0,

'max_iter=20, optimizer=irls, precision=0.0001')",

"SELECT * FROM mydata_output")

result <- dbSendQuery(con, script)

Everything runs OK until the last command. When I run the last line of code I get the following error:

Error in sqliteExecStatement(conn, statement, ...) :

RS-DBI driver: (error in statement: near "(": syntax error)

Do you know how to solve this error?

Any help would be very much appreciated!