# [ACCEPTED]-R Function for returning ALL factors-factorization

Score: 26

To follow up on my comment (thanks to @Ramnath 3 for my typo), the brute force method seems 2 to work reasonably well here on my 64 bit 1 8 gig machine:

``````FUN <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x))
factors <- div[x %% div == 0L]
factors <- list(neg = -factors, pos = factors)
return(factors)
}
``````

A few examples:

``````> FUN(100)
\$neg
   -1   -2   -4   -5  -10  -20  -25  -50 -100

\$pos
   1   2   4   5  10  20  25  50 100

> FUN(-42)
\$neg
  -1  -2  -3  -6  -7 -14 -21 -42

\$pos
  1  2  3  6  7 14 21 42

#and big number

> system.time(FUN(1e8))
user  system elapsed
1.95    0.18    2.14
``````
Score: 15

You can get all factors from the prime factors. `gmp` calculates 1 these very quickly.

``````library(gmp)
library(plyr)

get_all_factors <- function(n)
{
prime_factor_tables <- lapply(
setNames(n, n),
function(i)
{
if(i == 1) return(data.frame(x = 1L, freq = 1L))
plyr::count(as.integer(gmp::factorize(i)))
}
)
lapply(
prime_factor_tables,
function(pft)
{
powers <- plyr::alply(pft, 1, function(row) row\$x ^ seq.int(0L, row\$freq))
power_grid <- do.call(expand.grid, powers)
sort(unique(apply(power_grid, 1, prod)))
}
)
}

get_all_factors(c(1, 7, 60, 663, 2520, 75600, 15876000, 174636000, 403409160000))
``````
Score: 9

# Update

This is now implemented in the package `RcppBigIntAlgos`. See 61 this answer for more details.

### Original Post

The algorithm has been 60 fully updated and now implements multiple 59 polynomials as well as some clever sieving 58 techniques that eliminates millions of checks. In 57 addition to the original links, this paper along with 56 this post from primo were very helpful for this last stage 55 (many kudos to primo). Primo does a great 54 job of explaining the guts of the QS in 53 a relatively short space and also wrote 52 a pretty amazing algorithm (it will factor 51 the number at the bottom, 38! + 1, in under 50 2 secs!! Insane!!).

As promised, below is 49 my humble R implementation of the Quadratic Sieve. I have 48 been working on this algorithm sporadically 47 since I promised it in late January. I 46 will not try to explain it fully (unless 45 requested... also, the links below do a 44 very good job) as it is very complicated 43 and hopefully, my function names speak for 42 themselves. This has proved to be one of 41 the most challenging algorithms I have ever 40 attempted to execute as it is demanding 39 both from a programmer's point of view as 38 well as mathematically. I have read countless 37 papers and ultimately, I found these five 36 to be the most helpful (QSieve1, QSieve2, QSieve3, QSieve4, QSieve5).

N.B. This 35 algorithm, as it stands, does not serve 34 very well as a general prime factorization 33 algorithm. If it was optimized further, it 32 would need to be accompanied by a section 31 of code that factors out smaller primes 30 (i.e. less than 10^5 as suggested by this post), then 29 call QuadSieveAll, check to see if these 28 are primes, and if not, call QuadSieveAll 27 on both of these factors, etc. until you 26 are left with all primes (all of these steps 25 are not that difficult). However, the main 24 point of this post is to highlight the heart 23 of the Quadratic Sieve, so the examples 22 below are all semiprimes (even though it 21 will factor most odd numbers not containing 20 a square… Also, I haven’t seen an example 19 of the QS that didn’t demonstrate a non-semiprime). I 18 know the OP was looking for a method to 17 return all factors and not the prime factorization, but 16 this algorithm (if optimized further) coupled 15 with one of the algorithms above would be 14 a force to reckon with as a general factoring 13 algorithm (especially given that the OP 12 was needing something for Project Euler, which usually 11 requires much more than brute force methods). By 10 the way, the `MyIntToBit` function is a variation of 9 this answer and the `PrimeSieve` is from a post that @Dontas 8 appeared on a while back (Kudos on that 7 as well).

``````QuadSieveMultiPolysAll <- function(MyN, fudge1=0L, fudge2=0L, LenB=0L) {
### 'MyN' is the number to be factored; 'fudge1' is an arbitrary number
### that is used to determine the size of your prime base for sieving;
### 'fudge2' is used to set a threshold for sieving;
### 'LenB' is a the size of the sieving interval. The last three
### arguments are optional (they are determined based off of the
### size of MyN if left blank)

### The first 8 functions are helper functions

PrimeSieve <- function(n) {
n <- as.integer(n)
if (n > 1e9) stop("n too large")
primes <- rep(TRUE, n)
primes <- FALSE
last.prime <- 2L
fsqr <- floor(sqrt(n))
while (last.prime <= fsqr) {
primes[seq.int(last.prime^2, n, last.prime)] <- FALSE
sel <- which(primes[(last.prime + 1):(fsqr + 1)])
if (any(sel)) {
last.prime <- last.prime + min(sel)
} else {
last.prime <- fsqr + 1
}
}
MyPs <- which(primes)
rm(primes)
gc()
MyPs
}

MyIntToBit <- function(x, dig) {
i <- 0L
string <- numeric(dig)
while (x > 0) {
string[dig - i] <- x %% 2L
x <- x %/% 2L
i <- i + 1L
}
string
}

ExpBySquaringBig <- function(x, n, p) {
if (n == 1) {
MyAns <- mod.bigz(x,p)
} else if (mod.bigz(n,2)==0) {
MyAns <- ExpBySquaringBig(mod.bigz(pow.bigz(x,2),p),div.bigz(n,2),p)
} else {
MyAns <- mod.bigz(mul.bigz(x,ExpBySquaringBig(mod.bigz(
pow.bigz(x,2),p), div.bigz(sub.bigz(n,1),2),p)),p)
}
MyAns
}

TonelliShanks <- function(a,p) {
P1 <- sub.bigz(p,1); j <- 0L; s <- P1
while (mod.bigz(s,2)==0L) {s <- s/2; j <- j+1L}
if (j==1L) {
MyAns1 <- ExpBySquaringBig(a,(p+1L)/4,p)
MyAns2 <- mod.bigz(-1 * ExpBySquaringBig(a,(p+1L)/4,p),p)
} else {
n <- 2L
Legendre2 <- ExpBySquaringBig(n,P1/2,p)
while (Legendre2==1L) {n <- n+1L; Legendre2 <- ExpBySquaringBig(n,P1/2,p)}
x <- ExpBySquaringBig(a,(s+1L)/2,p)
b <- ExpBySquaringBig(a,s,p)
g <- ExpBySquaringBig(n,s,p)
r <- j; m <- 1L
Test <- mod.bigz(b,p)
while (!(Test==1L) && !(m==0L)) {
m <- 0L
Test <- mod.bigz(b,p)
while (!(Test==1L)) {m <- m+1L; Test <- ExpBySquaringBig(b,pow.bigz(2,m),p)}
if (!m==0) {
x <- mod.bigz(x * ExpBySquaringBig(g,pow.bigz(2,r-m-1L),p),p)
g <- ExpBySquaringBig(g,pow.bigz(2,r-m),p)
b <- mod.bigz(b*g,p); r <- m
}; Test <- 0L
}; MyAns1 <- x; MyAns2 <- mod.bigz(p-x,p)
}
c(MyAns1, MyAns2)
}

SieveLists <- function(facLim, FBase, vecLen, sieveD, MInt) {
vLen <- ceiling(vecLen/2); SecondHalf <- (vLen+1L):vecLen
MInt1 <- MInt[1:vLen]; MInt2 <- MInt[SecondHalf]
tl <- vector("list",length=facLim)

for (m in 3:facLim) {
st1 <- mod.bigz(MInt1,FBase[m])
m1 <- 1L+as.integer(mod.bigz(sieveD[[m]] - st1,FBase[m]))
m2 <- 1L+as.integer(mod.bigz(sieveD[[m]] - st1,FBase[m]))
sl1 <- seq.int(m1,vLen,FBase[m])
sl2 <- seq.int(m2,vLen,FBase[m])
tl1 <- list(sl1,sl2)
st2 <- mod.bigz(MInt2,FBase[m])
m3 <- vLen+1L+as.integer(mod.bigz(sieveD[[m]] - st2,FBase[m]))
m4 <- vLen+1L+as.integer(mod.bigz(sieveD[[m]] - st2,FBase[m]))
sl3 <- seq.int(m3,vecLen,FBase[m])
sl4 <- seq.int(m4,vecLen,FBase[m])
tl2 <- list(sl3,sl4)
tl[[m]] <- list(tl1,tl2)
}
tl
}

SieverMod <- function(facLim, FBase, vecLen, SD, MInt, FList, LogFB, Lim, myCol) {

MyLogs <- rep(0,nrow(SD))

for (m in 3:facLim) {
MyBool <- rep(FALSE,vecLen)
MyBool[c(FList[[m]][][],FList[[m]][][])] <- TRUE
MyBool[c(FList[[m]][][],FList[[m]][][])] <- TRUE
temp <- which(MyBool)
MyLogs[temp] <- MyLogs[temp] + LogFB[m]
}

MySieve <- which(MyLogs > Lim)
MInt <- MInt[MySieve]; NewSD <- SD[MySieve,]
newLen <- length(MySieve); GoForIT <- FALSE

MyMat <- matrix(integer(0),nrow=newLen,ncol=myCol)
MyMat[which(NewSD[,1L] < 0),1L] <- 1L; MyMat[which(NewSD[,1L] > 0),1L] <- 0L
if ((myCol-1L) - (facLim+1L) > 0L) {MyMat[,((facLim+2L):(myCol-1L))] <- 0L}
if (newLen==1L) {MyMat <- matrix(MyMat,nrow=1,byrow=TRUE)}

if (newLen > 0L) {
GoForIT <- TRUE
for (m in 1:facLim) {
vec <- rep(0L,newLen)
temp <- which((NewSD[,1L]%%FBase[m])==0L)
NewSD[temp,] <- NewSD[temp,]/FBase[m]; vec[temp] <- 1L
test <- temp[which((NewSD[temp,]%%FBase[m])==0L)]
while (length(test)>0L) {
NewSD[test,] <- NewSD[test,]/FBase[m]
vec[test] <- (vec[test]+1L)
test <- test[which((NewSD[test,]%%FBase[m])==0L)]
}
MyMat[,m+1L] <- vec
}
}

list(MyMat,NewSD,MInt,GoForIT)
}

reduceMatrix <- function(mat) {
tempMin <- 0L; n1 <- ncol(mat); n2 <- nrow(mat)
mymax <- 1L
for (i in 1:n1) {
temp <- which(mat[,i]==1L)
t <- which(temp >= mymax)
if (length(temp)>0L && length(t)>0L) {
MyMin <- min(temp[t])
if (!(MyMin==mymax)) {
vec <- mat[MyMin,]
mat[MyMin,] <- mat[mymax,]
mat[mymax,] <- vec
}
t <- t[-1]; temp <- temp[t]
for (j in temp) {mat[j,] <- (mat[j,]+mat[mymax,])%%2L}
mymax <- mymax+1L
}
}

if (mymax<n2) {simpMat <- mat[-(mymax:n2),]} else {simpMat <- mat}
lenSimp <- nrow(simpMat)
if (is.null(lenSimp)) {lenSimp <- 0L}
mycols <- 1:n1

if (lenSimp>1L) {
## "Diagonalizing" Matrix
for (i in 1:lenSimp) {
if (all(simpMat[i,]==0L)) {simpMat <- simpMat[-i,]; next}
if (!simpMat[i,i]==1L) {
t <- min(which(simpMat[i,]==1L))
vec <- simpMat[,i]; tempCol <- mycols[i]
simpMat[,i] <- simpMat[,t]; mycols[i] <- mycols[t]
simpMat[,t] <- vec; mycols[t] <- tempCol
}
}

lenSimp <- nrow(simpMat); MyList <- vector("list",length=n1)
MyFree <- mycols[which((1:n1)>lenSimp)];  for (i in MyFree) {MyList[[i]] <- i}
if (is.null(lenSimp)) {lenSimp <- 0L}

if (lenSimp>1L) {
for (i in lenSimp:1L) {
t <- which(simpMat[i,]==1L)
if (length(t)==1L) {
simpMat[ ,t] <- 0L
MyList[[mycols[i]]] <- 0L
} else {
t1 <- t[t>i]
if (all(t1 > lenSimp)) {
MyList[[mycols[i]]] <- MyList[[mycols[t1]]]
if (length(t1)>1) {
for (j in 2:length(t1)) {MyList[[mycols[i]]] <- c(MyList[[mycols[i]]], MyList[[mycols[t1[j]]]])}
}
}
else {
for (j in t1) {
if (length(MyList[[mycols[i]]])==0L) {MyList[[mycols[i]]] <- MyList[[mycols[j]]]}
else {
e1 <- which(MyList[[mycols[i]]]%in%MyList[[mycols[j]]])
if (length(e1)==0) {
MyList[[mycols[i]]] <- c(MyList[[mycols[i]]],MyList[[mycols[j]]])
} else {
e2 <- which(!MyList[[mycols[j]]]%in%MyList[[mycols[i]]])
MyList[[mycols[i]]] <- MyList[[mycols[i]]][-e1]
if (length(e2)>0L) {MyList[[mycols[i]]] <- c(MyList[[mycols[i]]], MyList[[mycols[j]]][e2])}
}
}
}
}
}
}
TheList <- lapply(MyList, function(x) {if (length(x)==0L) {0} else {x}})
list(TheList,MyFree)
} else {
list(NULL,NULL)
}
} else {
list(NULL,NULL)
}
}

GetFacs <- function(vec1, vec2, n) {
x <- mod.bigz(prod.bigz(vec1),n)
y <- mod.bigz(prod.bigz(vec2),n)
MyAns <- c(gcd.bigz(x-y,n),gcd.bigz(x+y,n))
MyAns[sort.list(asNumeric(MyAns))]
}

SolutionSearch <- function(mymat, M2, n, FB) {

colTest <- which(apply(mymat, 2, sum) == 0)
if (length(colTest) > 0) {solmat <- mymat[ ,-colTest]} else {solmat <- mymat}

if (length(nrow(solmat)) > 0) {
nullMat <- reduceMatrix(t(solmat %% 2L))
listSol <- nullMat[]; freeVar <- nullMat[]; LF <- length(freeVar)
} else {LF <- 0L}

if (LF > 0L) {
for (i in 2:min(10^8,(2^LF + 1L))) {
PosAns <- MyIntToBit(i, LF)
posVec <- sapply(listSol, function(x) {
t <- which(freeVar %in% x)
if (length(t)==0L) {
0
} else {
sum(PosAns[t])%%2L
}
})
ansVec <- which(posVec==1L)
if (length(ansVec)>0) {

if (length(ansVec) > 1L) {
myY <- apply(mymat[ansVec,],2,sum)
} else {
myY <- mymat[ansVec,]
}

if (sum(myY %% 2) < 1) {
myY <- as.integer(myY/2)
myY <- pow.bigz(FB,myY[-1])
temp <- GetFacs(M2[ansVec], myY, n)
if (!(1==temp) && !(1==temp)) {
return(temp)
}
}
}
}
}
}

### Below is the main portion of the Quadratic Sieve

BegTime <- Sys.time(); MyNum <- as.bigz(MyN); DigCount <- nchar(as.character(MyN))
P <- PrimeSieve(10^5)
SqrtInt <- .mpfr2bigz(trunc(sqrt(mpfr(MyNum,sizeinbase(MyNum,b=2)+5L))))

if (DigCount < 24) {
DigSize <- c(4,10,15,20,23)
f_Pos <- c(0.5,0.25,0.15,0.1,0.05)
MSize <- c(5000,7000,10000,12500,15000)

if (fudge1==0L) {
LM1 <- lm(f_Pos ~ DigSize)
m1 <- summary(LM1)\$coefficients[2,1]
b1 <- summary(LM1)\$coefficients[1,1]
fudge1 <- DigCount*m1 + b1
}

if (LenB==0L) {
LM2 <- lm(MSize ~ DigSize)
m2 <- summary(LM2)\$coefficients[2,1]
b2 <- summary(LM2)\$coefficients[1,1]
LenB <- ceiling(DigCount*m2 + b2)
}

LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
B <- P[P<=LimB]; B <- B[-1]
facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
LenFBase <- length(facBase)+1L
} else if (DigCount < 67) {
## These values were obtained from "The Multiple Polynomial
## Quadratic Sieve" by Robert D. Silverman
DigSize <- c(24,30,36,42,48,54,60,66)
FBSize <- c(100,200,400,900,1200,2000,3000,4500)
MSize <- c(5,25,25,50,100,250,350,500)

LM1 <- loess(FBSize ~ DigSize)
LM2 <- loess(MSize ~ DigSize)

if (fudge1==0L) {
fudge1 <- -0.4
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
myTarget <- ceiling(predict(LM1, DigCount))

while (LimB < myTarget) {
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
fudge1 <- fudge1+0.001
}
B <- P[P<=LimB]; B <- B[-1]
facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
LenFBase <- length(facBase)+1L

while (LenFBase < myTarget) {
fudge1 <- fudge1+0.005
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
myind <- which(P==max(B))+1L
myset <- tempP <- P[myind]
while (tempP < LimB) {
myind <- myind + 1L
tempP <- P[myind]
myset <- c(myset, tempP)
}

for (p in myset) {
t <- ExpBySquaringBig(MyNum,(p-1)/2,p)==1L
if (t) {facBase <- c(facBase,p)}
}
B <- c(B, myset)
LenFBase <- length(facBase)+1L
}
} else {
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
B <- P[P<=LimB]; B <- B[-1]
facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
LenFBase <- length(facBase)+1L
}
if (LenB==0L) {LenB <- 1000*ceiling(predict(LM2, DigCount))}
} else {
return("The number you've entered is currently too big for this algorithm!!")
}

SieveDist <- lapply(facBase, function(x) TonelliShanks(MyNum,x))
SieveDist <- c(1L,SieveDist); SieveDist[] <- c(SieveDist[],1L); facBase <- c(2L,facBase)
Lower <- -LenB; Upper <- LenB; LenB2 <- 2*LenB+1L; MyInterval <- Lower:Upper
M <- MyInterval + SqrtInt ## Set that will be tested
SqrDiff <- matrix(sub.bigz(pow.bigz(M,2),MyNum),nrow=length(M),ncol=1L)
maxM <- max(MyInterval)
LnFB <- log(facBase)

## N.B. primo uses 0.735, as his siever
## is more efficient than the one employed here
if (fudge2==0L) {
if (DigCount < 8) {
fudge2 <- 0
} else if (DigCount < 12) {
fudge2 <- .7
} else if (DigCount < 20) {
fudge2 <- 1.3
} else {
fudge2 <- 1.6
}
}

TheCut <- log10(maxM*sqrt(2*asNumeric(MyNum)))*fudge2
myPrimes <- as.bigz(facBase)

CoolList <- SieveLists(LenFBase, facBase, LenB2, SieveDist, MyInterval)
GetMatrix <- SieverMod(LenFBase, facBase, LenB2, SqrDiff, M, CoolList, LnFB, TheCut, LenFBase+1L)

if (GetMatrix[]) {
newmat <- GetMatrix[]; NewSD <- GetMatrix[]; M <- GetMatrix[]
NonSplitFacs <- which(abs(NewSD[,1L])>1L)
newmat <- newmat[-NonSplitFacs, ]
M <- M[-NonSplitFacs]
lenM <- length(M)

if (class(newmat) == "matrix") {
if (nrow(newmat) > 0) {
PosAns <- SolutionSearch(newmat,M,MyNum,myPrimes)
} else {
PosAns <- vector()
}
} else {
newmat <- matrix(newmat, nrow = 1)
PosAns <- vector()
}
} else {
newmat <- matrix(integer(0),ncol=(LenFBase+1L))
PosAns <- vector()
}

Atemp <- .mpfr2bigz(trunc(sqrt(sqrt(mpfr(2*MyNum))/maxM)))
if (Atemp < max(facBase)) {Atemp <- max(facBase)}; myPoly <- 0L

while (length(PosAns)==0L) {LegTest <- TRUE
while (LegTest) {
Atemp <- nextprime(Atemp)
Legendre <- asNumeric(ExpBySquaringBig(MyNum,(Atemp-1L)/2,Atemp))
if (Legendre == 1) {LegTest <- FALSE}
}

A <- Atemp^2
Btemp <- max(TonelliShanks(MyNum, Atemp))
B2 <- (Btemp + (MyNum - Btemp^2) * inv.bigz(2*Btemp,Atemp))%%A
C <- as.bigz((B2^2 - MyNum)/A)
myPoly <- myPoly + 1L

polySieveD <- lapply(1:LenFBase, function(x) {
AInv <- inv.bigz(A,facBase[x])
asNumeric(c(((SieveDist[[x]]-B2)*AInv)%%facBase[x],
((SieveDist[[x]]-B2)*AInv)%%facBase[x]))
})

M1 <- A*MyInterval + B2
SqrDiff <- matrix(A*pow.bigz(MyInterval,2) + 2*B2*MyInterval + C,nrow=length(M1),ncol=1L)
CoolList <- SieveLists(LenFBase, facBase, LenB2, polySieveD, MyInterval)
myPrimes <- c(myPrimes,Atemp)
LenP <- length(myPrimes)
GetMatrix <- SieverMod(LenFBase, facBase, LenB2, SqrDiff, M1, CoolList, LnFB, TheCut, LenP+1L)

if (GetMatrix[]) {
n2mat <- GetMatrix[]; N2SD <- GetMatrix[]; M1 <- GetMatrix[]
n2mat[,LenP+1L] <- rep(2L,nrow(N2SD))
if (length(N2SD) > 0) {NonSplitFacs <- which(abs(N2SD[,1L])>1L)} else {NonSplitFacs <- LenB2}
if (length(NonSplitFacs)<2*LenB) {
M1 <- M1[-NonSplitFacs]; lenM1 <- length(M1)
n2mat <- n2mat[-NonSplitFacs,]
if (lenM1==1L) {n2mat <- matrix(n2mat,nrow=1)}
if (ncol(newmat) < (LenP+1L)) {
numCol <- (LenP + 1L) - ncol(newmat)
newmat <-     cbind(newmat,matrix(rep(0L,numCol*nrow(newmat)),ncol=numCol))
}
newmat <- rbind(newmat,n2mat); lenM <- lenM+lenM1; M <- c(M,M1)
if (class(newmat) == "matrix") {
if (nrow(newmat) > 0) {
PosAns <- SolutionSearch(newmat,M,MyNum,myPrimes)
}
}
}
}
}

EndTime <- Sys.time()
TotTime <- EndTime - BegTime
print(format(TotTime))
return(PosAns)
}
``````

With Old QS algorithm

``````> library(gmp)
> library(Rmpfr)

> n3 <- prod(nextprime(urand.bigz(2, 40, 17)))
user  system elapsed
164.72    0.77  165.63
> system.time(t6 <- factorize(n3))
user  system elapsed
0.1     0.0     0.1
> all(t5[sort.list(asNumeric(t5))]==t6[sort.list(asNumeric(t6))])
 TRUE
``````

With New Muli-Polynomial QS algorithm

``````> QuadSieveMultiPolysAll(n3)
 "4.952 secs"
Big Integer ('bigz') object of length 2:
 342086446909 483830424611

> n4 <- prod(nextprime(urand.bigz(2,50,5)))
> QuadSieveMultiPolysAll(n4)   ## With old algo, it took over 4 hours
 "1.131717 mins"
Big Integer ('bigz') object of length 2:
 166543958545561 880194119571287

> n5 <- as.bigz("94968915845307373740134800567566911")   ## 35 digits
 "3.813167 mins"
Big Integer ('bigz') object of length 2:
 216366620575959221 438925910071081891

> system.time(factorize(n5))   ## It appears we are reaching the limits of factorize
user  system elapsed
131.97    0.00  131.98
``````

Side note: The number n5 above is 6 a very interesting number. Check it out 5 here

The Breaking Point!!!!

``````> n6 <- factorialZ(38) + 1L   ## 45 digits
 "22.79092 mins"
Big Integer ('bigz') object of length 2:
 14029308060317546154181 37280713718589679646221

> system.time(factorize(n6))   ## Shut it down after 2 days of running
``````

Latest Triumph (50 digits)

``````> n9 <- prod(nextprime(urand.bigz(2,82,42)))
 "12.9297 hours"
Big Integer ('bigz') object of length 2:
 2128750292720207278230259 4721136619794898059404993

## Based off of some crude test, factorize(n9) would take more than a year.
``````

It should be noted that the QS generally 4 doesn't perform as well as the Pollard's 3 rho algorithm on smaller numbers and the 2 power of the QS starts to become apparent 1 as the numbers get larger.

Score: 7

# Major Update

Below is my latest R factorization algorithm. It 44 is way faster and pays homage to the rle function.

Algorithm 3 (Updated)

``````library(gmp)
MyFactors <- function(MyN) {
myRle <- function (x1) {
n1 <- length(x1)
y1 <- x1[-1L] != x1[-n1]
i <- c(which(y1), n1)
list(lengths = diff(c(0L, i)), values = x1[i], uni = sum(y1)+1L)
}

if (MyN==1L) return(MyN)
else {
pfacs <- myRle(factorize(MyN))
unip <- pfacs\$values
pv <- pfacs\$lengths
n <- pfacs\$uni
myf <- unip[1L]^(0L:pv[1L])
if (n > 1L) {
for (j in 2L:n) {
myf <- c(myf, do.call(c,lapply(unip[j]^(1L:pv[j]), function(x) x*myf)))
}
}
}
myf[order(asNumeric(myf))]  ## 'order' is faster than 'sort.list'
}
``````

Below 43 are the new benchmarks (As Dirk Eddelbuettel 42 says here, "Can't argue with empirics."):

Case 1 (large prime factors)

``````set.seed(100)
myList <- lapply(1:10^3, function(x) sample(10^6, 10^5))
benchmark(SortList=lapply(myList, function(x) sort.list(x)),
OrderFun=lapply(myList, function(x) order(x)),
replications=3,
columns = c("test", "replications", "elapsed", "relative"))
test replications elapsed relative
2 OrderFun            3   59.41    1.000
1 SortList            3   61.52    1.036

## The times are limited by "gmp::factorize" and since it relies on
## pseudo-random numbers, the times can vary (i.e. one pseudo random
## number may lead to a factorization faster than others). With this
## in mind, any differences less than a half of second
## (or so) should be viewed as the same.
x <- pow.bigz(2,256)+1
system.time(z1 <- MyFactors(x))
user  system elapsed
14.94    0.00   14.94
system.time(z2 <- all_divisors(x))      ## system.time(factorize(x))
user  system elapsed                    ##  user  system elapsed
14.94    0.00   14.96                   ## 14.94    0.00   14.94
all(z1==z2)
 TRUE

x <- as.bigz("12345678987654321321")
system.time(x1 <- MyFactors(x^2))
user  system elapsed
20.66    0.02   20.71
system.time(x2 <- all_divisors(x^2))    ## system.time(factorize(x^2))
user  system elapsed                    ##  user  system elapsed
20.69    0.00   20.69                   ## 20.67    0.00   20.67
all(x1==x2)
 TRUE
``````

Case 2 (smaller numbers)

``````set.seed(199)
samp <- sample(10^9, 10^5)
benchmark(JosephDivs=sapply(samp, MyFactors),
DontasDivs=sapply(samp, all_divisors),
OldDontas=sapply(samp, Oldall_divisors),
replications=10,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")
test replications elapsed relative
1 JosephDivs           10  470.31    1.000
2 DontasDivs           10  567.10    1.206  ## with vapply(..., USE.NAMES = FALSE)
3  OldDontas           10  626.19    1.331  ## with sapply
``````

Case 3 (for complete thoroughness)

``````set.seed(97)
samp <- sample(10^6, 10^4)
benchmark(JosephDivs=sapply(samp, MyFactors),
DontasDivs=sapply(samp, all_divisors),
CottonDivs=sapply(samp, get_all_factors),
ChaseDivs=sapply(samp, FUN),
replications=5,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")
test replications elapsed relative
1 JosephDivs            5   22.68    1.000
2 DontasDivs            5   27.66    1.220
3 CottonDivs            5  126.66    5.585
4  ChaseDivs            5  554.25   24.438
``````

# Original Post

The algorithm by @RichieCotton 41 is a very nice R implementation. The brute 40 force method will only get you so far and 39 fails with large numbers. I have provided 38 three algorithms that will meet different 37 needs. The first one (is the original algorithm 36 I posted in Jan 15 and has been updated 35 slightly), is a stand-alone factorization 34 algorithm which offers a combinatorial approach 33 that is efficient, accurate, and can be 32 easily translated into other languages. The 31 second algorithm is more of a sieve that 30 is very fast and extremely useful when you 29 need the factorization of thousands of numbers 28 quickly. The third is a short (posted above), yet 27 powerful stand-alone algorithm that is superior 26 for any number less than 2^70 (I scrapped 25 almost everything from my original code). I 24 drew inspiration from Richie Cotton's use 23 of the `plyr::count` function (it inspired me to write 22 my own `rle` function that has a very similar 21 return as `plyr::count`), George Dontas's clean way of 20 handling the trivial case (i.e. `if (n==1) return(1)`), and the 19 solution provided by @Zelazny7 to a question I had 18 regarding bigz vectors.

Algorithm 1 (original)

``````library(gmp)
factor2 <- function(MyN) {
if (MyN == 1) return(1L)
else {
max_p_div <- factorize(MyN)
prime_vec <- max_p_div <- max_p_div[sort.list(asNumeric(max_p_div))]
my_factors <- powers <- as.bigz(vector())
uni_p <- unique(prime_vec); maxp <- max(prime_vec)
for (i in 1:length(uni_p)) {
temp_size <- length(which(prime_vec == uni_p[i]))
powers <- c(powers, pow.bigz(uni_p[i], 1:temp_size))
}
my_factors <- c(as.bigz(1L), my_factors, powers)
temp_facs <- powers; r <- 2L
temp_facs2 <- max_p_div2 <- as.bigz(vector())
while (r <= length(uni_p)) {
for (i in 1:length(temp_facs)) {
a <- which(prime_vec >  max_p_div[i])
temp <- mul.bigz(temp_facs[i], powers[a])
temp_facs2 <- c(temp_facs2, temp)
max_p_div2 <- c(max_p_div2, prime_vec[a])
}
my_sort <- sort.list(asNumeric(max_p_div2))
temp_facs <- temp_facs2[my_sort]
max_p_div <- max_p_div2[my_sort]
my_factors <- c(my_factors, temp_facs)
temp_facs2 <- max_p_div2 <- as.bigz(vector()); r <- r+1L
}
}
my_factors[sort.list(asNumeric(my_factors))]
}
``````

Algorithm 2 (sieve)

``````EfficientFactorList <- function(n) {
MyFactsList <- lapply(1:n, function(x) 1)
for (j in 2:n) {
for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)}
}; MyFactsList}
``````

It gives the factorization 17 of every number between 1 and 100,000 in 16 less than 2 seconds. To give you an idea 15 of the efficiency of this algorithm, the 14 time to factor 1 - 100,000 using the brute 13 force method takes close to 3 minutes.

``````system.time(t1 <- EfficientFactorList(10^5))
user  system elapsed
1.04    0.00    1.05
system.time(t2 <- sapply(1:10^5, MyFactors))
user  system elapsed
39.21    0.00   39.23
system.time(t3 <- sapply(1:10^5, all_divisors))
user  system elapsed
49.03    0.02   49.05

TheTest <- sapply(1:10^5, function(x) all(t2[[x]]==t3[[x]]) && all(asNumeric(t2[[x]])==t1[[x]]) && all(asNumeric(t3[[x]])==t1[[x]]))
all(TheTest)
 TRUE
``````

Final Thoughts

@Dontas’s 12 original comment about factoring large numbers 11 got me thinking, what about really really large 10 numbers… like numbers greater than 2^200. You 9 will see that whichever algorithm you choose 8 on this page, they will all take a very 7 long time because most of them rely on `gmp::factorize` which 6 uses the Pollard-Rho algorithm. From this question, this algorithm is 5 only reasonable for numbers less than 2^70. I 4 am currently working on my own factorize algorithm 3 which will implement the Quadratic Sieve, which should 2 take all of these algorithms to the next 1 level.

Score: 7

The following approach deliver correct results, even 4 in cases of really big numbers (which should 3 be passed as strings). And it's really fast.

``````# TEST
# x <- as.bigz("12345678987654321")
# all_divisors(x)
# all_divisors(x*x)

# x <- pow.bigz(2,89)-1
# all_divisors(x)

library(gmp)
options(scipen =30)

sort_listz <- function(z) {
#==========================
z <- z[order(as.numeric(z))] # sort(z)
} # function  sort_listz

mult_listz <- function(x,y) {
do.call('c', lapply(y, function(i) i*x))
}

all_divisors <- function(x) {
#==========================
if (abs(x)<=1) return(x)
else {

factorsz <- as.bigz(factorize(as.bigz(x))) # factorize returns up to
# e.g. x= 12345678987654321  factors: 3 3 3 3 37 37 333667 333667

factorsz <- sort_listz(factorsz) # vector of primes, sorted

prime_factorsz <- unique(factorsz)
#prime_ekt <- sapply(prime_factorsz, function(i) length( factorsz [factorsz==i]))
prime_ekt <- vapply(prime_factorsz, function(i) sum(factorsz==i), integer(1), USE.NAMES=FALSE)
spz <- vector() # keep all divisors
all <-1
n <- length(prime_factorsz)
for (i in 1:n) {
pr <- prime_factorsz[i]
pe <- prime_ekt[i]
all <- all*(pe+1) #counts all divisors

prz <- as.bigz(pr)
pse <- vector(mode="raw",length=pe+1)
pse <- c( as.bigz(1), prz)

if (pe>1) {
for (k in 2:pe) {
prz <- prz*pr
pse[k+1] <- prz
} # for k
} # if pe>1

if (i>1) {
spz <- mult_listz (spz, pse)
} else {
spz <- pse;
} # if i>1
} #for n
spz <- sort_listz (spz)

return (spz)
}
} # function  factors_all_divisors

#====================================
``````

Refined 2 version, very fast. Code remains simple, readable 1 & clean.

TEST

``````#Test 4 (big prime factor)
x <- pow.bigz(2,256)+1 # = 1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321
system.time(z2 <- all_divisors(x))
#   user  system elapsed
#  19.27    1.27   20.56

#Test 5 (big prime factor)
x <- as.bigz("12345678987654321321") # = 3 * 19 * 216590859432531953

system.time(x2 <- all_divisors(x^2))
#user  system elapsed
#25.65    0.00   25.67
``````
Score: 7

A lot has changed in the R language since 49 this question was originally asked. In version 48 `0.6-3` of the `numbers` package, the function `divisors` was included 47 that is very useful for getting all of the 46 factors of a number. It will meet the needs 45 of most users, however if you are looking 44 for raw speed or you are working with larger 43 numbers, you will need an alternative method. I 42 have authored two new packages (partially 41 inspired by this question, I might add) that 40 contain highly optimized functions aimed 39 at problems just like this. The first one 38 is `RcppAlgos` and the other is `RcppBigIntAlgos` (formerly called `bigIntegerAlgos`).

# `RcppAlgos`

`RcppAlgos` contains 37 two functions for obtaining divisors of 36 numbers less than `2^53 - 1` : `divisorsRcpp` (a vectorized function 35 for quickly obtaining the complete factorization 34 of many numbers) & `divisorsSieve` (quickly generates 33 the complete factorization over a range). First 32 up, we factor many random numbers using 31 `divisorsRcpp`:

``````library(gmp)  ## for all_divisors by @GeorgeDontas
library(RcppAlgos)
library(numbers)
options(scipen = 999)
set.seed(42)
testSamp <- sample(10^10, 10)

## vectorized so you can pass the entire vector as an argument
testRcpp <- divisorsRcpp(testSamp)
testDontas <- lapply(testSamp, all_divisors)

identical(lapply(testDontas, as.numeric), testRcpp)
 TRUE
``````

And now, factor many numbers over a range 30 using `divisorsSieve`:

``````system.time(testSieve <- divisorsSieve(10^13, 10^13 + 10^5))
user  system elapsed
0.242   0.006   0.247

system.time(testDontasSieve <- lapply((10^13):(10^13 + 10^5), all_divisors))
user  system elapsed
47.880   0.132  47.922

identical(lapply(testDontasSieve, asNumeric), testSieve)
 TRUE
``````

Both `divisorsRcpp` and `divisorsSieve` are nice functions that 29 are flexible and efficient, however they 28 are limited to `2^53 - 1`.

# `RcppBigIntAlgos`

The `RcppBigIntAlgos` package (formerly 27 called `bigIntegerAlgos` prior to version 0.2.0) links directly 26 to the C library gmp and features `divisorsBig` which is designed 25 for very large numbers.

``````library(RcppBigIntAlgos)
## testSamp is defined above... N.B. divisorsBig is not quite as
## efficient as divisorsRcpp. This is so because divisorsRcpp
## can take advantage of more efficient data types.
testBig <- divisorsBig(testSamp)

identical(testDontas, testBig)
 TRUE
``````

And here are the 24 benchmark as defined in my original post 23 (N.B. `MyFactors` is replaced by `divisorsRcpp` and `divisorsBig`).

``````## Case 2
library(rbenchmark)
set.seed(199)
samp <- sample(10^9, 10^5)
benchmark(RcppAlgos=divisorsRcpp(samp),
RcppBigIntAlgos=divisorsBig(samp),
DontasDivs=lapply(samp, all_divisors),
replications=10,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")

test replications elapsed relative
1       RcppAlgos           10   5.236    1.000
2 RcppBigIntAlgos           10  12.846    2.453
3      DontasDivs           10 383.742   73.289

## Case 3
set.seed(97)
samp <- sample(10^6, 10^4)
benchmark(RcppAlgos=divisorsRcpp(samp),
RcppBigIntAlgos=divisorsBig(samp),
numbers=lapply(samp, divisors),      ## From the numbers package
DontasDivs=lapply(samp, all_divisors),
CottonDivs=lapply(samp, get_all_factors),
ChaseDivs=lapply(samp, FUN),
replications=5,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")

test replications elapsed relative
1       RcppAlgos            5   0.083    1.000
2 RcppBigIntAlgos            5   0.265    3.193
3         numbers            5  12.913  155.578
4      DontasDivs            5  15.813  190.518
5      CottonDivs            5  60.745  731.867
6       ChaseDivs            5 299.520 3608.675
``````

The next benchmarks 22 demonstrate the true power of the underlying 21 algorithm in the `divisorsBig` function. The number being 20 factored is a power of `10`, so the prime factoring 19 step can almost be completely ignored (e.g. `system.time(factorize(pow.bigz(10,30)))` registers 18 `0` on my machine). Thus, the difference in 17 timing is due solely to how quickly the 16 prime factors can be combined to produce 15 all factors.

``````library(microbenchmark)
powTen <- pow.bigz(10, 30)
microbenchmark(divisorsBig(powTen), all_divisors(powTen), unit = "relative")
Unit: relative
expr      min       lq     mean   median       uq      max neval
divisorsBig(powTen)  1.00000  1.00000  1.00000  1.00000  1.00000  1.00000   100
all_divisors(powTen) 21.49849 21.27973 21.13085 20.63345 21.18834 20.38772   100

## Negative numbers show an even greater increase in efficiency
negPowTen <- powTen * -1
microbenchmark(divisorsBig(negPowTen), all_divisors(negPowTen), unit = "relative")
Unit: relative
expr      min      lq    mean   median       uq      max neval
divisorsBig(negPowTen)  1.00000  1.0000  1.0000  1.00000  1.00000  1.00000   100
all_divisors(negPowTen) 28.75275 28.1864 27.9335 27.57434 27.91376 30.16962   100
``````

## Very Large Numbers

With `divisorsBig`, obtaining the complete 14 factorization with very large inputs is 13 no problem. The algorithm dynamically adjusts 12 based off of the input and applies different 11 algorithms in different situations. We can 10 also take advantage of multithreading if 9 Lenstra's Elliptic Curve method or the Quadratic Sieve is utilized.

Here are some examples 8 using `n5` and `n9` defined in this answer.

``````n5 <- as.bigz("94968915845307373740134800567566911")
system.time(print(divisorsBig(n5)))
Big Integer ('bigz') object of length 4:
 1           216366620575959221         438925910071081891
 94968915845307373740134800567566911
user  system elapsed
0.162   0.003   0.164

n9 <- prod(nextprime(urand.bigz(2, 82, 42)))
Big Integer ('bigz') object of length 4:
 1
 2128750292720207278230259
 4721136619794898059404993
 10050120961360479179164300841596861740399588283187
user  system elapsed
1.776   0.011   0.757
``````

Here is an example 7 provided by @Dontas with one large prime 6 and one smaller prime:

``````x <- pow.bigz(2, 256) + 1

Summary Statistics for Factoring:
115792089237316195423570985008687907853269984665640564039457584007913129639937

|  Pollard Rho Time  |
|--------------------|
|        479ms       |

|  Lenstra ECM Time  |  Number of Curves  |
|--------------------|--------------------|
|      1s 870ms      |        2584        |

|     Total Time     |
|--------------------|
|      2s 402ms      |

Big Integer ('bigz') object of length 4:
 1
 1238926361552897
 93461639715357977769163558199606896584051237541638188580280321
 115792089237316195423570985008687907853269984665640564039457584007913129639937
``````

Compare this to finding 5 the prime factorization using `gmp::factorize`:

``````system.time(factorize(x))
user  system elapsed
9.199   0.036   9.248
``````

Lastly, here 4 is an example with a large semiprime (N.B. since 3 we know it's a semiprime, we skip the extended 2 Pollard's rho algorithm as well as Lentra's 1 elliptic curve method).

``````## https://members.loria.fr/PZimmermann/records/rsa.html
rsa79 <- as.bigz("7293469445285646172092483905177589838606665884410340391954917800303813280275279")

Summary Statistics for Factoring:
7293469445285646172092483905177589838606665884410340391954917800303813280275279

|      MPQS Time     | Complete | Polynomials |   Smooths  |  Partials  |
|--------------------|----------|-------------|------------|------------|
|    2m 49s 174ms    |   100    |    91221    |    5651    |    7096    |

|  Mat Algebra Time  |    Mat Dimension   |
|--------------------|--------------------|
|      14s 863ms     |    12625 x 12747   |

|     Total Time     |
|--------------------|
|     3m 4s 754ms    |

Big Integer ('bigz') object of length 4:
 1
 848184382919488993608481009313734808977
 8598919753958678882400042972133646037727
 7293469445285646172092483905177589838606665884410340391954917800303813280275279
``````

More Related questions