Wednesday, 31 March 2010
Monday, 29 March 2010
I'll update the "About the Author" box on the right whenever I manage to find the "Layout" tab in the blog management interface. It seems to have gone missing for this blog.
Friday, 2 October 2009
I've started to look at it because I've written some Spotfire extensions that use R for computation and some people have expressed interest in a server version.
General criteria are:
- Linux and Windows
- Extensible by modifying text files
- Accessible and understandable by a typical R user
- Self-contained and easy to install
Since Rapache doesn't have a Windows version, it needs to be run within a VM. I really like the idea of using standard Apache server configuration procedures along with either standard R scripts or "brew" templates. But I lost patience with working in a Linux terminal within a VMPlayer window.
Wednesday, 1 April 2009
The motivation for the port is I'm doing more with the-code-formerly-known-as-ArrayAnalyzer lately, and the Bioconductor Case Studies book likes to use RColorBrewer in examples.
HOW TO SUBMIT A PACKAGE TO CSAN
You can share your S-PLUS package with other users within your
department, company, or university. Just send them the package
archive, and have them install it with the INSTALL script or the
install.packages function (setting repos=NULL).
If you want to share your package with the entire S-PLUS community,
you can submit your package for inclusion in the Comprehensive S
Archive Network (CSAN). To submit a package, upload the source
package archive (the result of running Splus CMD build) to:
Once you have uploaded your file, send a message to
stating the name of the package archive you submitted.
Before submitting a package for inclusion in CSAN, it should pass the
check utility. Make sure these key fields in the DESCRIPTION file
have appropriate values: Package, Title, Version, Author,
Maintainer, and License. If any of these are missing, your package
will not be posted to CSAN.
Insightful will review your submitted package, run the check utility,
and create a Windows binary archive. If everything passes, the
package is posted to the CSAN site. Any problems with the package
are sent to the package submitter.
Wednesday, 18 March 2009
I'm asked regularly how hard it'll be to port a particular package from R to S+. The general answer is that the basic mechanics of it are easy. If the R code is basically using things already in S+ and any C/Fortran code is just working on arrays, then things may port with few changes.
Some items that make a port more difficult are:
- Extensive usage of functions that aren't available in S+ and aren't easily ported to S+. The main place this has come up in code I've seen lately is usage of the "grid" graphics.
- Usage of more advanced C macros to manipulate R objects at the C level. That is, code using .Call() rather than .C().
- The only item that's a real showstopper is use of external pointers with a finalizer. S+ doesn't have a way of calling a C function when and S object is released to do finalization, so you can't do things like having an S object with a reference to a Java object. There's no way to know when to free the Java object. I'm still trying to figure out a workaround for this.
To do R to S+ ports, the first step is to get set up to build S+ packages. This is described in the "Guide to Packages" included with S+. On Linux, you'll be good to go if you have S+ installed and standard tools such as perl, gcc, and gfortran. On Windows, it's a bit more involved.
I've had a pro-Windows bias for many years but I've recently switched to doing the ports on Linux. The main reason is I no longer have a copy of "Visual Fortran", which is required to build Fortran code on Windows. Perhaps I'll get a copy of this installed, or perhaps I'll stick with Linux.
The basic steps involved for a port are:
- Put the files in the standard structure for an S+ package. This matches the structure of an R package, so if you are starting with the R package source you can just unzip it.
- Modify the DESCRIPTION file to adjust the package dependencies, e.g. add a "DependsSplus" line that's referring to S+ packages rather than R packages.
- Run "Splus CMD build -binary [pkgname]" from an OS command line. You'll repeat this over and over. Ideally things will build right away. If not you'll need to modify the source code until it does. For this listing, let's assume the S code is syntactically correct and the C code compiles so we can proceed to the next step. If the C code is failing, move it aside until the S code has been fixed up.
- In a separate window start S-PLUS. Use "library(pkgutils)" to load the package utility functions.
- Use "unresolvedGlobalReferences([R code dir name])" to get a list of objects that will not be found under S+ scoping rules. This is an invaluable tool. The objects not found are usually either misspelled object/function names, functions available in R and not in S+, or local variables that need to be explicitely passed to inner functions. The next step is to modify the S code to resolve the missing references.
- The first step I take on resolving the references is to check which are references to R functions not in S+. Then I put in stub functions that just call "stop()".
- The second step is I go through the code fixing misspellings and modifying calls to anonymous functions used in "apply()" to explicitely pass values that are used in the inner functions.
- The third step of changes related to object scoping is to change assign() statements so that instead of assigning to ".GlobalEnv" they assign to "where=1" when the intent is to maintain a global variable. Potentially you can store global objects in "frame=0" instead, but it isn't garbage collected very aggressively so this can lead to memory buildup.
- At this point in theory the S code builds, scoping problems are fixed, and we've identified missing functions. Now the missing functions need to either be implemented or replaced with calls to other S+ functions.
- If the C code was failing to compile, move it back into place and fix the problems in the C code. This can be either easy or horribly hard depending on how complicated the code is.
- Now you're ready to test functionality using examples from the help files. At this point you'll identify differences in behavior or arguments between R and S+ functions of the same name.
- Repeat until everything works.
So I'm starting to get a routine in place. The only part I find difficult is the C stuff, but that's because I don't do a lot of C programming and I get rusty between uses.
I'm cooling a little on Python at the moment as I haven't come across an opportunity to use it professionally (yet). However, I think it's interesting that SPSS came to the same conclusion as I did regarding its suitability as a scripting language for the statistical audience.