Mac / *nix: Allowing a disowned R process to do file transfers via curl

Jake

Cookie Scientist
#1
Maybe one of you Linux people knows the answer to this... I am actually using the Mac command line, but my understanding is that this is virtually identical to a Linux/Unix command line.

I am using vacant lab computers in my department building to run some simulations in the background. There are many such vacant computers on the weekends, so this is like a (free) poor man's parallelization for some of my larger simulations. Anyway, for each machine I start the sim script in R, then background and disown the process, and log out. There is a command at the end of the script to write the sim results to a little file in the /tmp/ directory (which persists across logouts) locally on that machine, and finally to transfer that file to an FTP server on my office machine using the curl command.

Everything works fine as long as I stay logged in at this machine, even with the process disowned. The results are sent to my office machine with no problems. But when I do this and then actually log out, it seems that everything works fine except that the final curl command never succeeds. That is, if I come back to the machine at a later time, log back in, and check the /tmp/ directory, I can see that the results file has been created and that it contains the expected contents. But it was never sent to my office machine. At this later time I can manually send the results to my office machine by typing in the same curl command that was included in the script. But it would be way easier if it just worked as planned.

How can I get my disowned R processes to successfully upload files via curl / FTP?
 

Jake

Cookie Scientist
#2
Just read about package RCurl which could possibly be useful, maybe it will work better than using curl within a call to system() in R...
 

Jake

Cookie Scientist
#4
Apparently httr doesn't support FTP (at least, the phrase "ftp" is not found in the package documentation). Anyway, I used package RCurl rather than calling system() and it worked like a charm!
 

TheEcologist

Global Moderator
#5
Everything works fine as long as I stay logged in at this machine, even with the process disowned. The results are sent to my office machine with no problems. But when I do this and then actually log out, it seems that everything works fine except that the final curl command never succeeds. That is, if I come back to the machine at a later time, log back in, and check the /tmp/ directory, I can see that the results file has been created and that it contains the expected contents. But it was never sent to my office machine. At this later time I can manually send the results to my office machine by typing in the same curl command that was included in the script. But it would be way easier if it just worked as planned.

How can I get my disowned R processes to successfully upload files via curl / FTP?
For future reference:

As mac is basically *unix you can simply use the age old and tested "screen" interface from the terminal era on your machines.
check "man screen" for details.

A typical operation would in involve:

1) Login
2) Type screen -S myscreenname
3) do your R stuff
4) type screen -d myscreenname (or use CTRL+A+D if in R) to detach your screen session
5) logout (with no fear of losing the session, it will remain).
6) Login again
7) type "screen -r myscreenname" to resume/re-attach the session (or screen -ls if you forgot your name)
8) do what needs to be done
9) Profit

Screen is for the win.

TE
 

Jake

Cookie Scientist
#7
Thank you for both contributions TE ;)

Okay, this last part is probably a long shot, but here it goes. Right now I have it all working so that the sim results (distributed across 50 lab computers) are automatically sent to my office for easy aggregation once they are finished. But to initialize each sim I have to physically log in to each machine and enter a few commands at the Terminal and then log out. Of course it would be easiest of all if I could start the sim processes on each lab machine remotely, that is, without having to physically log in to each machine in the first place. That way I could do everything from my office. Now, I strongly suspect that for very good security-type reasons, there will not be any feasible way for me to make this work. But I thought I would ask if any of you had ideas about it just in case. Note that this would need to be accomplished without even any initial installation of software on the lab machines, since upon rebooting (which happens every 1-2 days) each machine is returned to a previous state so that all changes to the hard drive are lost.
 

TheEcologist

Global Moderator
#8
Thank you for both contributions TE ;)

Okay, this last part is probably a long shot, but here it goes. Right now I have it all working so that the sim results (distributed across 50 lab computers) are automatically sent to my office for easy aggregation once they are finished. But to initialize each sim I have to physically log in to each machine and enter a few commands at the Terminal and then log out. Of course it would be easiest of all if I could start the sim processes on each lab machine remotely, that is, without having to physically log in to each machine in the first place. That way I could do everything from my office. Now, I strongly suspect that for very good security-type reasons, there will not be any feasible way for me to make this work. But I thought I would ask if any of you had ideas about it just in case. Note that this would need to be accomplished without even any initial installation of software on the lab machines, since upon rebooting (which happens every 1-2 days) each machine is returned to a previous state so that all changes to the hard drive are lost.
Long answer short: does your university allow remote login (using SSH)? If yes? --> All you would need would be the hostname or IP.. then you could do something like this from you office:

Code:
ssh jakesusername@labmachostname.jakesuni.edu
And then use screen and setup your stuff then logout. To copy back the results you would only need to do something like this (no need to login again via screen):

Code:
# using server copy: see man scp
scp user@hostname.web.edu:/folder/on/labmac/myRresults.Rdata /local/office/computer/folder/myRresults_labmacX.Rdata
It works at my university, but will depend on your institute if this works.
 

Jake

Cookie Scientist
#9
We can definitely do outgoing SSH connections from the lab computers, but I don't know about accepting incoming connections. I will try it soon though, thanks.
 

Dason

Ambassador to the humans
#10
We can definitely do outgoing SSH connections from the lab computers, but I don't know about accepting incoming connections. I will try it soon though, thanks.
If it doesn't work straight away just talk to your IT guys. On our servers you need to connect to a different port.
 

Jake

Cookie Scientist
#11
I feel like it might not be a good idea to draw IT's attention to what I'm doing... I mean, it's all innocuous academic stuff, but I could see them feeling a little uneasy about it. I wonder if there's a way of gleaning this information from the lab machines themselves.