Friday, November 22, 2024
- Advertisment -
HomeTechnology & GadgetsR Programming for Statistical Analysis

R Programming for Statistical Analysis

There are many advantages that distinguish the ancient statistical language R. It is an easy language to learn (especially for those who do not have a deep background in computer science) and has a lively and passionate community, which enriches the list of packages and libraries developed for this beautiful language. Plus, it’s made specifically for data by developers who are versed in the issues that data analysts face themselves. Despite all that, these things are not what makes R above the language pyramid in the field of data science. The R language adopts in its basic building blocks a great programming philosophy that has captured the hearts of many data analysts, myself included. This philosophy is called functional programming and is the subject of this post.

What is Functional Programming?

Functional programming is one of the programming styles such as object-oriented programming and deterministic programming. In the world of functional programming, a function is the first citizen since it is treated like any variable we are familiar with. Functions can be stored in variables and passed as variables to other functions. It can also be created inside functions, allowing for a lot of very useful possibilities in the field of data science.

You can also get statistics assignment help and save yourself a lot of effort and time to get a chance to focus on your studies and other critical tasks.

Principles of functional programming

This philosophy is generally based on three principles

1 – The possibility of high-level functions

In the R language, we can define a synthetic function for other functions according to our needs. This idea is not alien to mathematics. For example, a derivation function takes a function and returns another function (the derivative function). Therefore, the derivation function is a high-level function because its inputs are functions and its outputs are also functions.

2- Functions must be pure

Pure functions, or what is known as pure functions, have the advantage that they do not change the state of their input variables, but rather create new variables. Also, there are no side effects of running it as changing variables outside the scope of the function. We will discuss later examples and codes that explain what we mean by those effects.

3 – Replacing loops with functional functions

Loops are plagued by many problems such as efficiency and speed, as well as repetition in code, which makes programming more prone to errors. Functional programming aims to solve these problems in a beautiful and concise way.

Comparing functional programming philosophy with other philosophies

In imperative programming, processes sequentially change the state of a program or data. This method makes debugging difficult as the state of variables depends on when and where they are changed in the code, unlike functional programming. In functional programming, functions do not change the state of variables, but rather define new variables that carry these changes.

// In C++ Language

#include <Rcpp.h>

int x = 0;

// [[Rcpp::export]]

int accumulate(int num) {

x += num;

return x;

}

# The results of this code depend on where the command is executed, which makes tracing the problem a difficult process

accumulate(1)

##[1] 1

accumulate(1)

##[1] 2

accumulate(1)

##[1] 3

It is similar in the philosophy of object-oriented programming. The change in the properties of objects or objects depends on the type of operations performed on them. Not only that, but the qualitative or class variables of these objects can also be changed, resulting in side effects on all existing objects.

# In Python Language

class human:

legs = 2

def __init__(self, name, age):

self.name = name

self.age = age

hussain = human(“Hussain”, 32)

mohammed = human(“Mohammed”, 29)

# The number of legs of any human being according to the definition of the above human category is 2

hussain.legs

## 2

mohammed.legs

## 2

# In Python Language

# But if this feature is changed in the human class, there will be other side effects on all existing organisms

human.legs = 4

hussain.legs

## 4

mohammed.legs

## 4

The beauty of functional programming in R

We will now see how amazing functional programming is through three examples that embody the supernatural power of this language

First example: descriptive statistical calculations of data

A recurring task of data scientists is to calculate some descriptive statistics for columns of data. The primitive method is to count the number of columns and then develop an iteration loop that does the same calculations for each column. It also requires that we know a variable that aggregates these calculations. The code will be similar to the following

summary_stat <- function(df) {

n_col <- ncol(df)

funs <- c(“mean”, “median”, “sd”, “IQR”)

statistics <- matrix(nrow = 4, ncol = n_col,dinames = list(funs))

for (i in seq(n_col)){

for (j in seq_along(funs)){

statistics[j,i] <- round(do.call(funs[j],list(x = df[,i]))),2)

}

}

colnames(statistics) <- colnames(df)

return(statistics)

}

summary_stat(cars)

## speed dist

## mean 15.40 42.98

## median 15.00 36.00

## sd 5.29 25.77

## IQR 7.00 30.00

But it is possible to use functional programming and get the same results without knowing the number of columns or developing a loop as in the following code

summary_stat <- function(column){

funs <- c(mean, median, sd, IQR)

lapply(funs, function(f) round(f(column,na.rm = TRUE),2))

}

# Here the sapply function applies the function we have defined to all columns without any loops or pre-knowledge of the number of columns

sapply(cars, summary_stat)

## speed dist

## [1,] 15.4 42.98

## [2,] 15 36

## [3,] 5.29 25.77

## [4,] 7 30

Second example: simplify the polynomial

You may remember when you were in high school in math classes the method of simplifying polynomials. Let me refresh your memory with this example. Suppose we have this equation.

(x+y)2

We simplify it by applying what we have memorized about the quadratic equations, which is “the first squared plus two times the first and the second plus the second squared” to produce this equation for us

.

x2+2∗x∗y+y2

But what if we could develop a function whose input is a polynomial degree and its output is a simplified function. This method does not enable us to simplify the equation, but rather use it as any other equation. Indeed, this function is not only theoretical, but I developed it and used it in the specialized bezieR package for analyzing Bezie curves. You can read her post here.

make_terms <- function (n) {

n <- n + 1

cp <- rep(1,n)

#Pascal trinagle

# this is a pre-calculated terms for optimization purposes to reducing the factorial operation

lut <- list(c(1), # n : 0

c(1,1), # n : 1

c(1,2,1), # n : 2

c(1,3,3,1), # n : 3

c(1,4,6,4,1), # n : 4

c(1,5,10,10,5,1), # n : 5

c(1,6,15,20,15,6,1) # n : 6

if(missing(cp) || length(cp)!= n )

{

stop(paste0(“you must provide number of terms with their control points coordinates”))

}

if(n > 6) lut[[n]] <- choose(n-1,0:(n-1))

trms <- rep(NA, n)

for (i in 1:n) {

trms[i] <- paste0(cp[i],”*”,lut[[n]][i],”*”,”x^”,(ni), “*y^”,i-1, “”)

}

eq_str <- paste0(trms,collapse = “+”)

function(x,y,equation = eq_str){

if(!missing(x) && !missing(y)){

expr <- str2lang(eq_str)

eval(expr)

} else {

print(equation)

}

}

}

The above code is a high-level function that creates a simplified function. If we suppose that we want to simplify the previous equation, which is a quadratic equation, then this equation can be made easily by using the following code

eq_2 <- make_terms(2)

eq_2()

## [1] “1*1*x^2*y^0 +1*2*x^1*y^1 +1*1*x^0*y^2”

# We can also substitute for the values ​​of the variables x and y and find the result with ease

eq_2(x = 2, y = 3)

##[1] 25

# You can check the result by yourself

Third example: derivation

As in the previous example, we can also develop a function whose inputs are any equation and its outputs are another function that is a derivation of the input function. This programming power is what distinguishes R from many other programming languages.

library(“mosaicCalc”)

# Let’s differentiate a simple function

D(x^3 + x^2 + x ~ x)

## function (x)

## 3 * x^2 + 2 * x + 1

dx <- D(x^3 + x^2 + x ~ x)

dx(x = 2)

## [1] 17

RELATED ARTICLES
- Advertisment -

Most Popular

- Advertisement -

All Categories

- Advertisment -