Friday 6 March 2009

S+ Tips: startsWith() and endsWith()

Whenever I go from using C# or Java back to using S, one of the annoyances is that the standard string methods available in most languages aren't around. That is, if one wants something more convenient than straight calls to grep().

I usually end up defining some wrappers to let me do "starts with" and "ends with" operations on single string values:


"strStartsWith" <- function(str, prefix){
# Test whether a string starts with a particular prefix.
# case insensitive
if (is.null(str) !is.character(str) length(str) != 1){
stop("The 'str' argument must be a single string value.")
}
(length(grep(paste("^", prefix, sep=""), str, ignore.case = TRUE)) > 0)
}

"strEndsWith" <- function(str, suffix){
# Test whether a string ends with a particular suffix.
# case insensitive
if (is.null(str) !is.character(str) length(str) != 1){
stop("The 'str' argument must be a single string value.")
}
(length(grep(paste(suffix, "$", sep=""), str, ignore.case=TRUE)) > 0)
}




To make these polished S functions it would be better to make them vectorized rather than throwing the error.

I considered implementing these by calling into Java, but went with the grep() approach so they could be used without loading Java. The downside of using Java is just that the startup time for a Java application is significantly slower than for a straight C application, and this in turn leads to slower startup of S+ when Java is loaded.

Another thing to note is that I used the name strStartsWith() rather than just startsWith(). When actually defining these in a package I'd really use a prefix common to all the functions in the package. For example, in the "fm" package it would be fmStartsWith(). I do this because S+ just has one big namespace, so I want to avoid using up commonly-used names that might have a different purpose in another package.

No comments:

Post a Comment