[ACCEPTED]-R Function for returning ALL factors-factorization
To follow up on my comment (thanks to @Ramnath 3 for my typo), the brute force method seems 2 to work reasonably well here on my 64 bit 1 8 gig machine:
FUN <- function(x) {
x <- as.integer(x)
div <- seq_len(abs(x))
factors <- div[x %% div == 0L]
factors <- list(neg = -factors, pos = factors)
return(factors)
}
A few examples:
> FUN(100)
$neg
[1] -1 -2 -4 -5 -10 -20 -25 -50 -100
$pos
[1] 1 2 4 5 10 20 25 50 100
> FUN(-42)
$neg
[1] -1 -2 -3 -6 -7 -14 -21 -42
$pos
[1] 1 2 3 6 7 14 21 42
#and big number
> system.time(FUN(1e8))
user system elapsed
1.95 0.18 2.14
You can get all factors from the prime factors. gmp
calculates 1 these very quickly.
library(gmp)
library(plyr)
get_all_factors <- function(n)
{
prime_factor_tables <- lapply(
setNames(n, n),
function(i)
{
if(i == 1) return(data.frame(x = 1L, freq = 1L))
plyr::count(as.integer(gmp::factorize(i)))
}
)
lapply(
prime_factor_tables,
function(pft)
{
powers <- plyr::alply(pft, 1, function(row) row$x ^ seq.int(0L, row$freq))
power_grid <- do.call(expand.grid, powers)
sort(unique(apply(power_grid, 1, prod)))
}
)
}
get_all_factors(c(1, 7, 60, 663, 2520, 75600, 15876000, 174636000, 403409160000))
Update
This is now implemented in the package RcppBigIntAlgos
. See 61 this answer for more details.
Original Post
The algorithm has been 60 fully updated and now implements multiple 59 polynomials as well as some clever sieving 58 techniques that eliminates millions of checks. In 57 addition to the original links, this paper along with 56 this post from primo were very helpful for this last stage 55 (many kudos to primo). Primo does a great 54 job of explaining the guts of the QS in 53 a relatively short space and also wrote 52 a pretty amazing algorithm (it will factor 51 the number at the bottom, 38! + 1, in under 50 2 secs!! Insane!!).
As promised, below is 49 my humble R implementation of the Quadratic Sieve. I have 48 been working on this algorithm sporadically 47 since I promised it in late January. I 46 will not try to explain it fully (unless 45 requested... also, the links below do a 44 very good job) as it is very complicated 43 and hopefully, my function names speak for 42 themselves. This has proved to be one of 41 the most challenging algorithms I have ever 40 attempted to execute as it is demanding 39 both from a programmer's point of view as 38 well as mathematically. I have read countless 37 papers and ultimately, I found these five 36 to be the most helpful (QSieve1, QSieve2, QSieve3, QSieve4, QSieve5).
N.B. This 35 algorithm, as it stands, does not serve 34 very well as a general prime factorization 33 algorithm. If it was optimized further, it 32 would need to be accompanied by a section 31 of code that factors out smaller primes 30 (i.e. less than 10^5 as suggested by this post), then 29 call QuadSieveAll, check to see if these 28 are primes, and if not, call QuadSieveAll 27 on both of these factors, etc. until you 26 are left with all primes (all of these steps 25 are not that difficult). However, the main 24 point of this post is to highlight the heart 23 of the Quadratic Sieve, so the examples 22 below are all semiprimes (even though it 21 will factor most odd numbers not containing 20 a square… Also, I haven’t seen an example 19 of the QS that didn’t demonstrate a non-semiprime). I 18 know the OP was looking for a method to 17 return all factors and not the prime factorization, but 16 this algorithm (if optimized further) coupled 15 with one of the algorithms above would be 14 a force to reckon with as a general factoring 13 algorithm (especially given that the OP 12 was needing something for Project Euler, which usually 11 requires much more than brute force methods). By 10 the way, the MyIntToBit
function is a variation of 9 this answer and the PrimeSieve
is from a post that @Dontas 8 appeared on a while back (Kudos on that 7 as well).
QuadSieveMultiPolysAll <- function(MyN, fudge1=0L, fudge2=0L, LenB=0L) {
### 'MyN' is the number to be factored; 'fudge1' is an arbitrary number
### that is used to determine the size of your prime base for sieving;
### 'fudge2' is used to set a threshold for sieving;
### 'LenB' is a the size of the sieving interval. The last three
### arguments are optional (they are determined based off of the
### size of MyN if left blank)
### The first 8 functions are helper functions
PrimeSieve <- function(n) {
n <- as.integer(n)
if (n > 1e9) stop("n too large")
primes <- rep(TRUE, n)
primes[1] <- FALSE
last.prime <- 2L
fsqr <- floor(sqrt(n))
while (last.prime <= fsqr) {
primes[seq.int(last.prime^2, n, last.prime)] <- FALSE
sel <- which(primes[(last.prime + 1):(fsqr + 1)])
if (any(sel)) {
last.prime <- last.prime + min(sel)
} else {
last.prime <- fsqr + 1
}
}
MyPs <- which(primes)
rm(primes)
gc()
MyPs
}
MyIntToBit <- function(x, dig) {
i <- 0L
string <- numeric(dig)
while (x > 0) {
string[dig - i] <- x %% 2L
x <- x %/% 2L
i <- i + 1L
}
string
}
ExpBySquaringBig <- function(x, n, p) {
if (n == 1) {
MyAns <- mod.bigz(x,p)
} else if (mod.bigz(n,2)==0) {
MyAns <- ExpBySquaringBig(mod.bigz(pow.bigz(x,2),p),div.bigz(n,2),p)
} else {
MyAns <- mod.bigz(mul.bigz(x,ExpBySquaringBig(mod.bigz(
pow.bigz(x,2),p), div.bigz(sub.bigz(n,1),2),p)),p)
}
MyAns
}
TonelliShanks <- function(a,p) {
P1 <- sub.bigz(p,1); j <- 0L; s <- P1
while (mod.bigz(s,2)==0L) {s <- s/2; j <- j+1L}
if (j==1L) {
MyAns1 <- ExpBySquaringBig(a,(p+1L)/4,p)
MyAns2 <- mod.bigz(-1 * ExpBySquaringBig(a,(p+1L)/4,p),p)
} else {
n <- 2L
Legendre2 <- ExpBySquaringBig(n,P1/2,p)
while (Legendre2==1L) {n <- n+1L; Legendre2 <- ExpBySquaringBig(n,P1/2,p)}
x <- ExpBySquaringBig(a,(s+1L)/2,p)
b <- ExpBySquaringBig(a,s,p)
g <- ExpBySquaringBig(n,s,p)
r <- j; m <- 1L
Test <- mod.bigz(b,p)
while (!(Test==1L) && !(m==0L)) {
m <- 0L
Test <- mod.bigz(b,p)
while (!(Test==1L)) {m <- m+1L; Test <- ExpBySquaringBig(b,pow.bigz(2,m),p)}
if (!m==0) {
x <- mod.bigz(x * ExpBySquaringBig(g,pow.bigz(2,r-m-1L),p),p)
g <- ExpBySquaringBig(g,pow.bigz(2,r-m),p)
b <- mod.bigz(b*g,p); r <- m
}; Test <- 0L
}; MyAns1 <- x; MyAns2 <- mod.bigz(p-x,p)
}
c(MyAns1, MyAns2)
}
SieveLists <- function(facLim, FBase, vecLen, sieveD, MInt) {
vLen <- ceiling(vecLen/2); SecondHalf <- (vLen+1L):vecLen
MInt1 <- MInt[1:vLen]; MInt2 <- MInt[SecondHalf]
tl <- vector("list",length=facLim)
for (m in 3:facLim) {
st1 <- mod.bigz(MInt1[1],FBase[m])
m1 <- 1L+as.integer(mod.bigz(sieveD[[m]][1] - st1,FBase[m]))
m2 <- 1L+as.integer(mod.bigz(sieveD[[m]][2] - st1,FBase[m]))
sl1 <- seq.int(m1,vLen,FBase[m])
sl2 <- seq.int(m2,vLen,FBase[m])
tl1 <- list(sl1,sl2)
st2 <- mod.bigz(MInt2[1],FBase[m])
m3 <- vLen+1L+as.integer(mod.bigz(sieveD[[m]][1] - st2,FBase[m]))
m4 <- vLen+1L+as.integer(mod.bigz(sieveD[[m]][2] - st2,FBase[m]))
sl3 <- seq.int(m3,vecLen,FBase[m])
sl4 <- seq.int(m4,vecLen,FBase[m])
tl2 <- list(sl3,sl4)
tl[[m]] <- list(tl1,tl2)
}
tl
}
SieverMod <- function(facLim, FBase, vecLen, SD, MInt, FList, LogFB, Lim, myCol) {
MyLogs <- rep(0,nrow(SD))
for (m in 3:facLim) {
MyBool <- rep(FALSE,vecLen)
MyBool[c(FList[[m]][[1]][[1]],FList[[m]][[2]][[1]])] <- TRUE
MyBool[c(FList[[m]][[1]][[2]],FList[[m]][[2]][[2]])] <- TRUE
temp <- which(MyBool)
MyLogs[temp] <- MyLogs[temp] + LogFB[m]
}
MySieve <- which(MyLogs > Lim)
MInt <- MInt[MySieve]; NewSD <- SD[MySieve,]
newLen <- length(MySieve); GoForIT <- FALSE
MyMat <- matrix(integer(0),nrow=newLen,ncol=myCol)
MyMat[which(NewSD[,1L] < 0),1L] <- 1L; MyMat[which(NewSD[,1L] > 0),1L] <- 0L
if ((myCol-1L) - (facLim+1L) > 0L) {MyMat[,((facLim+2L):(myCol-1L))] <- 0L}
if (newLen==1L) {MyMat <- matrix(MyMat,nrow=1,byrow=TRUE)}
if (newLen > 0L) {
GoForIT <- TRUE
for (m in 1:facLim) {
vec <- rep(0L,newLen)
temp <- which((NewSD[,1L]%%FBase[m])==0L)
NewSD[temp,] <- NewSD[temp,]/FBase[m]; vec[temp] <- 1L
test <- temp[which((NewSD[temp,]%%FBase[m])==0L)]
while (length(test)>0L) {
NewSD[test,] <- NewSD[test,]/FBase[m]
vec[test] <- (vec[test]+1L)
test <- test[which((NewSD[test,]%%FBase[m])==0L)]
}
MyMat[,m+1L] <- vec
}
}
list(MyMat,NewSD,MInt,GoForIT)
}
reduceMatrix <- function(mat) {
tempMin <- 0L; n1 <- ncol(mat); n2 <- nrow(mat)
mymax <- 1L
for (i in 1:n1) {
temp <- which(mat[,i]==1L)
t <- which(temp >= mymax)
if (length(temp)>0L && length(t)>0L) {
MyMin <- min(temp[t])
if (!(MyMin==mymax)) {
vec <- mat[MyMin,]
mat[MyMin,] <- mat[mymax,]
mat[mymax,] <- vec
}
t <- t[-1]; temp <- temp[t]
for (j in temp) {mat[j,] <- (mat[j,]+mat[mymax,])%%2L}
mymax <- mymax+1L
}
}
if (mymax<n2) {simpMat <- mat[-(mymax:n2),]} else {simpMat <- mat}
lenSimp <- nrow(simpMat)
if (is.null(lenSimp)) {lenSimp <- 0L}
mycols <- 1:n1
if (lenSimp>1L) {
## "Diagonalizing" Matrix
for (i in 1:lenSimp) {
if (all(simpMat[i,]==0L)) {simpMat <- simpMat[-i,]; next}
if (!simpMat[i,i]==1L) {
t <- min(which(simpMat[i,]==1L))
vec <- simpMat[,i]; tempCol <- mycols[i]
simpMat[,i] <- simpMat[,t]; mycols[i] <- mycols[t]
simpMat[,t] <- vec; mycols[t] <- tempCol
}
}
lenSimp <- nrow(simpMat); MyList <- vector("list",length=n1)
MyFree <- mycols[which((1:n1)>lenSimp)]; for (i in MyFree) {MyList[[i]] <- i}
if (is.null(lenSimp)) {lenSimp <- 0L}
if (lenSimp>1L) {
for (i in lenSimp:1L) {
t <- which(simpMat[i,]==1L)
if (length(t)==1L) {
simpMat[ ,t] <- 0L
MyList[[mycols[i]]] <- 0L
} else {
t1 <- t[t>i]
if (all(t1 > lenSimp)) {
MyList[[mycols[i]]] <- MyList[[mycols[t1[1]]]]
if (length(t1)>1) {
for (j in 2:length(t1)) {MyList[[mycols[i]]] <- c(MyList[[mycols[i]]], MyList[[mycols[t1[j]]]])}
}
}
else {
for (j in t1) {
if (length(MyList[[mycols[i]]])==0L) {MyList[[mycols[i]]] <- MyList[[mycols[j]]]}
else {
e1 <- which(MyList[[mycols[i]]]%in%MyList[[mycols[j]]])
if (length(e1)==0) {
MyList[[mycols[i]]] <- c(MyList[[mycols[i]]],MyList[[mycols[j]]])
} else {
e2 <- which(!MyList[[mycols[j]]]%in%MyList[[mycols[i]]])
MyList[[mycols[i]]] <- MyList[[mycols[i]]][-e1]
if (length(e2)>0L) {MyList[[mycols[i]]] <- c(MyList[[mycols[i]]], MyList[[mycols[j]]][e2])}
}
}
}
}
}
}
TheList <- lapply(MyList, function(x) {if (length(x)==0L) {0} else {x}})
list(TheList,MyFree)
} else {
list(NULL,NULL)
}
} else {
list(NULL,NULL)
}
}
GetFacs <- function(vec1, vec2, n) {
x <- mod.bigz(prod.bigz(vec1),n)
y <- mod.bigz(prod.bigz(vec2),n)
MyAns <- c(gcd.bigz(x-y,n),gcd.bigz(x+y,n))
MyAns[sort.list(asNumeric(MyAns))]
}
SolutionSearch <- function(mymat, M2, n, FB) {
colTest <- which(apply(mymat, 2, sum) == 0)
if (length(colTest) > 0) {solmat <- mymat[ ,-colTest]} else {solmat <- mymat}
if (length(nrow(solmat)) > 0) {
nullMat <- reduceMatrix(t(solmat %% 2L))
listSol <- nullMat[[1]]; freeVar <- nullMat[[2]]; LF <- length(freeVar)
} else {LF <- 0L}
if (LF > 0L) {
for (i in 2:min(10^8,(2^LF + 1L))) {
PosAns <- MyIntToBit(i, LF)
posVec <- sapply(listSol, function(x) {
t <- which(freeVar %in% x)
if (length(t)==0L) {
0
} else {
sum(PosAns[t])%%2L
}
})
ansVec <- which(posVec==1L)
if (length(ansVec)>0) {
if (length(ansVec) > 1L) {
myY <- apply(mymat[ansVec,],2,sum)
} else {
myY <- mymat[ansVec,]
}
if (sum(myY %% 2) < 1) {
myY <- as.integer(myY/2)
myY <- pow.bigz(FB,myY[-1])
temp <- GetFacs(M2[ansVec], myY, n)
if (!(1==temp[1]) && !(1==temp[2])) {
return(temp)
}
}
}
}
}
}
### Below is the main portion of the Quadratic Sieve
BegTime <- Sys.time(); MyNum <- as.bigz(MyN); DigCount <- nchar(as.character(MyN))
P <- PrimeSieve(10^5)
SqrtInt <- .mpfr2bigz(trunc(sqrt(mpfr(MyNum,sizeinbase(MyNum,b=2)+5L))))
if (DigCount < 24) {
DigSize <- c(4,10,15,20,23)
f_Pos <- c(0.5,0.25,0.15,0.1,0.05)
MSize <- c(5000,7000,10000,12500,15000)
if (fudge1==0L) {
LM1 <- lm(f_Pos ~ DigSize)
m1 <- summary(LM1)$coefficients[2,1]
b1 <- summary(LM1)$coefficients[1,1]
fudge1 <- DigCount*m1 + b1
}
if (LenB==0L) {
LM2 <- lm(MSize ~ DigSize)
m2 <- summary(LM2)$coefficients[2,1]
b2 <- summary(LM2)$coefficients[1,1]
LenB <- ceiling(DigCount*m2 + b2)
}
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
B <- P[P<=LimB]; B <- B[-1]
facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
LenFBase <- length(facBase)+1L
} else if (DigCount < 67) {
## These values were obtained from "The Multiple Polynomial
## Quadratic Sieve" by Robert D. Silverman
DigSize <- c(24,30,36,42,48,54,60,66)
FBSize <- c(100,200,400,900,1200,2000,3000,4500)
MSize <- c(5,25,25,50,100,250,350,500)
LM1 <- loess(FBSize ~ DigSize)
LM2 <- loess(MSize ~ DigSize)
if (fudge1==0L) {
fudge1 <- -0.4
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
myTarget <- ceiling(predict(LM1, DigCount))
while (LimB < myTarget) {
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
fudge1 <- fudge1+0.001
}
B <- P[P<=LimB]; B <- B[-1]
facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
LenFBase <- length(facBase)+1L
while (LenFBase < myTarget) {
fudge1 <- fudge1+0.005
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
myind <- which(P==max(B))+1L
myset <- tempP <- P[myind]
while (tempP < LimB) {
myind <- myind + 1L
tempP <- P[myind]
myset <- c(myset, tempP)
}
for (p in myset) {
t <- ExpBySquaringBig(MyNum,(p-1)/2,p)==1L
if (t) {facBase <- c(facBase,p)}
}
B <- c(B, myset)
LenFBase <- length(facBase)+1L
}
} else {
LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
B <- P[P<=LimB]; B <- B[-1]
facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
LenFBase <- length(facBase)+1L
}
if (LenB==0L) {LenB <- 1000*ceiling(predict(LM2, DigCount))}
} else {
return("The number you've entered is currently too big for this algorithm!!")
}
SieveDist <- lapply(facBase, function(x) TonelliShanks(MyNum,x))
SieveDist <- c(1L,SieveDist); SieveDist[[1]] <- c(SieveDist[[1]],1L); facBase <- c(2L,facBase)
Lower <- -LenB; Upper <- LenB; LenB2 <- 2*LenB+1L; MyInterval <- Lower:Upper
M <- MyInterval + SqrtInt ## Set that will be tested
SqrDiff <- matrix(sub.bigz(pow.bigz(M,2),MyNum),nrow=length(M),ncol=1L)
maxM <- max(MyInterval)
LnFB <- log(facBase)
## N.B. primo uses 0.735, as his siever
## is more efficient than the one employed here
if (fudge2==0L) {
if (DigCount < 8) {
fudge2 <- 0
} else if (DigCount < 12) {
fudge2 <- .7
} else if (DigCount < 20) {
fudge2 <- 1.3
} else {
fudge2 <- 1.6
}
}
TheCut <- log10(maxM*sqrt(2*asNumeric(MyNum)))*fudge2
myPrimes <- as.bigz(facBase)
CoolList <- SieveLists(LenFBase, facBase, LenB2, SieveDist, MyInterval)
GetMatrix <- SieverMod(LenFBase, facBase, LenB2, SqrDiff, M, CoolList, LnFB, TheCut, LenFBase+1L)
if (GetMatrix[[4]]) {
newmat <- GetMatrix[[1]]; NewSD <- GetMatrix[[2]]; M <- GetMatrix[[3]]
NonSplitFacs <- which(abs(NewSD[,1L])>1L)
newmat <- newmat[-NonSplitFacs, ]
M <- M[-NonSplitFacs]
lenM <- length(M)
if (class(newmat) == "matrix") {
if (nrow(newmat) > 0) {
PosAns <- SolutionSearch(newmat,M,MyNum,myPrimes)
} else {
PosAns <- vector()
}
} else {
newmat <- matrix(newmat, nrow = 1)
PosAns <- vector()
}
} else {
newmat <- matrix(integer(0),ncol=(LenFBase+1L))
PosAns <- vector()
}
Atemp <- .mpfr2bigz(trunc(sqrt(sqrt(mpfr(2*MyNum))/maxM)))
if (Atemp < max(facBase)) {Atemp <- max(facBase)}; myPoly <- 0L
while (length(PosAns)==0L) {LegTest <- TRUE
while (LegTest) {
Atemp <- nextprime(Atemp)
Legendre <- asNumeric(ExpBySquaringBig(MyNum,(Atemp-1L)/2,Atemp))
if (Legendre == 1) {LegTest <- FALSE}
}
A <- Atemp^2
Btemp <- max(TonelliShanks(MyNum, Atemp))
B2 <- (Btemp + (MyNum - Btemp^2) * inv.bigz(2*Btemp,Atemp))%%A
C <- as.bigz((B2^2 - MyNum)/A)
myPoly <- myPoly + 1L
polySieveD <- lapply(1:LenFBase, function(x) {
AInv <- inv.bigz(A,facBase[x])
asNumeric(c(((SieveDist[[x]][1]-B2)*AInv)%%facBase[x],
((SieveDist[[x]][2]-B2)*AInv)%%facBase[x]))
})
M1 <- A*MyInterval + B2
SqrDiff <- matrix(A*pow.bigz(MyInterval,2) + 2*B2*MyInterval + C,nrow=length(M1),ncol=1L)
CoolList <- SieveLists(LenFBase, facBase, LenB2, polySieveD, MyInterval)
myPrimes <- c(myPrimes,Atemp)
LenP <- length(myPrimes)
GetMatrix <- SieverMod(LenFBase, facBase, LenB2, SqrDiff, M1, CoolList, LnFB, TheCut, LenP+1L)
if (GetMatrix[[4]]) {
n2mat <- GetMatrix[[1]]; N2SD <- GetMatrix[[2]]; M1 <- GetMatrix[[3]]
n2mat[,LenP+1L] <- rep(2L,nrow(N2SD))
if (length(N2SD) > 0) {NonSplitFacs <- which(abs(N2SD[,1L])>1L)} else {NonSplitFacs <- LenB2}
if (length(NonSplitFacs)<2*LenB) {
M1 <- M1[-NonSplitFacs]; lenM1 <- length(M1)
n2mat <- n2mat[-NonSplitFacs,]
if (lenM1==1L) {n2mat <- matrix(n2mat,nrow=1)}
if (ncol(newmat) < (LenP+1L)) {
numCol <- (LenP + 1L) - ncol(newmat)
newmat <- cbind(newmat,matrix(rep(0L,numCol*nrow(newmat)),ncol=numCol))
}
newmat <- rbind(newmat,n2mat); lenM <- lenM+lenM1; M <- c(M,M1)
if (class(newmat) == "matrix") {
if (nrow(newmat) > 0) {
PosAns <- SolutionSearch(newmat,M,MyNum,myPrimes)
}
}
}
}
}
EndTime <- Sys.time()
TotTime <- EndTime - BegTime
print(format(TotTime))
return(PosAns)
}
With Old QS algorithm
> library(gmp)
> library(Rmpfr)
> n3 <- prod(nextprime(urand.bigz(2, 40, 17)))
> system.time(t5 <- QuadSieveAll(n3,0.1,myps))
user system elapsed
164.72 0.77 165.63
> system.time(t6 <- factorize(n3))
user system elapsed
0.1 0.0 0.1
> all(t5[sort.list(asNumeric(t5))]==t6[sort.list(asNumeric(t6))])
[1] TRUE
With New Muli-Polynomial QS algorithm
> QuadSieveMultiPolysAll(n3)
[1] "4.952 secs"
Big Integer ('bigz') object of length 2:
[1] 342086446909 483830424611
> n4 <- prod(nextprime(urand.bigz(2,50,5)))
> QuadSieveMultiPolysAll(n4) ## With old algo, it took over 4 hours
[1] "1.131717 mins"
Big Integer ('bigz') object of length 2:
[1] 166543958545561 880194119571287
> n5 <- as.bigz("94968915845307373740134800567566911") ## 35 digits
> QuadSieveMultiPolysAll(n5)
[1] "3.813167 mins"
Big Integer ('bigz') object of length 2:
[1] 216366620575959221 438925910071081891
> system.time(factorize(n5)) ## It appears we are reaching the limits of factorize
user system elapsed
131.97 0.00 131.98
Side note: The number n5 above is 6 a very interesting number. Check it out 5 here
The Breaking Point!!!!
> n6 <- factorialZ(38) + 1L ## 45 digits
> QuadSieveMultiPolysAll(n6)
[1] "22.79092 mins"
Big Integer ('bigz') object of length 2:
[1] 14029308060317546154181 37280713718589679646221
> system.time(factorize(n6)) ## Shut it down after 2 days of running
Latest Triumph (50 digits)
> n9 <- prod(nextprime(urand.bigz(2,82,42)))
> QuadSieveMultiPolysAll(n9)
[1] "12.9297 hours"
Big Integer ('bigz') object of length 2:
[1] 2128750292720207278230259 4721136619794898059404993
## Based off of some crude test, factorize(n9) would take more than a year.
It should be noted that the QS generally 4 doesn't perform as well as the Pollard's 3 rho algorithm on smaller numbers and the 2 power of the QS starts to become apparent 1 as the numbers get larger.
Major Update
Below is my latest R factorization algorithm. It 44 is way faster and pays homage to the rle function.
Algorithm 3 (Updated)
library(gmp)
MyFactors <- function(MyN) {
myRle <- function (x1) {
n1 <- length(x1)
y1 <- x1[-1L] != x1[-n1]
i <- c(which(y1), n1)
list(lengths = diff(c(0L, i)), values = x1[i], uni = sum(y1)+1L)
}
if (MyN==1L) return(MyN)
else {
pfacs <- myRle(factorize(MyN))
unip <- pfacs$values
pv <- pfacs$lengths
n <- pfacs$uni
myf <- unip[1L]^(0L:pv[1L])
if (n > 1L) {
for (j in 2L:n) {
myf <- c(myf, do.call(c,lapply(unip[j]^(1L:pv[j]), function(x) x*myf)))
}
}
}
myf[order(asNumeric(myf))] ## 'order' is faster than 'sort.list'
}
Below 43 are the new benchmarks (As Dirk Eddelbuettel 42 says here, "Can't argue with empirics."):
Case 1 (large prime factors)
set.seed(100)
myList <- lapply(1:10^3, function(x) sample(10^6, 10^5))
benchmark(SortList=lapply(myList, function(x) sort.list(x)),
OrderFun=lapply(myList, function(x) order(x)),
replications=3,
columns = c("test", "replications", "elapsed", "relative"))
test replications elapsed relative
2 OrderFun 3 59.41 1.000
1 SortList 3 61.52 1.036
## The times are limited by "gmp::factorize" and since it relies on
## pseudo-random numbers, the times can vary (i.e. one pseudo random
## number may lead to a factorization faster than others). With this
## in mind, any differences less than a half of second
## (or so) should be viewed as the same.
x <- pow.bigz(2,256)+1
system.time(z1 <- MyFactors(x))
user system elapsed
14.94 0.00 14.94
system.time(z2 <- all_divisors(x)) ## system.time(factorize(x))
user system elapsed ## user system elapsed
14.94 0.00 14.96 ## 14.94 0.00 14.94
all(z1==z2)
[1] TRUE
x <- as.bigz("12345678987654321321")
system.time(x1 <- MyFactors(x^2))
user system elapsed
20.66 0.02 20.71
system.time(x2 <- all_divisors(x^2)) ## system.time(factorize(x^2))
user system elapsed ## user system elapsed
20.69 0.00 20.69 ## 20.67 0.00 20.67
all(x1==x2)
[1] TRUE
Case 2 (smaller numbers)
set.seed(199)
samp <- sample(10^9, 10^5)
benchmark(JosephDivs=sapply(samp, MyFactors),
DontasDivs=sapply(samp, all_divisors),
OldDontas=sapply(samp, Oldall_divisors),
replications=10,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")
test replications elapsed relative
1 JosephDivs 10 470.31 1.000
2 DontasDivs 10 567.10 1.206 ## with vapply(..., USE.NAMES = FALSE)
3 OldDontas 10 626.19 1.331 ## with sapply
Case 3 (for complete thoroughness)
set.seed(97)
samp <- sample(10^6, 10^4)
benchmark(JosephDivs=sapply(samp, MyFactors),
DontasDivs=sapply(samp, all_divisors),
CottonDivs=sapply(samp, get_all_factors),
ChaseDivs=sapply(samp, FUN),
replications=5,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")
test replications elapsed relative
1 JosephDivs 5 22.68 1.000
2 DontasDivs 5 27.66 1.220
3 CottonDivs 5 126.66 5.585
4 ChaseDivs 5 554.25 24.438
Original Post
The algorithm by @RichieCotton 41 is a very nice R implementation. The brute 40 force method will only get you so far and 39 fails with large numbers. I have provided 38 three algorithms that will meet different 37 needs. The first one (is the original algorithm 36 I posted in Jan 15 and has been updated 35 slightly), is a stand-alone factorization 34 algorithm which offers a combinatorial approach 33 that is efficient, accurate, and can be 32 easily translated into other languages. The 31 second algorithm is more of a sieve that 30 is very fast and extremely useful when you 29 need the factorization of thousands of numbers 28 quickly. The third is a short (posted above), yet 27 powerful stand-alone algorithm that is superior 26 for any number less than 2^70 (I scrapped 25 almost everything from my original code). I 24 drew inspiration from Richie Cotton's use 23 of the plyr::count
function (it inspired me to write 22 my own rle
function that has a very similar 21 return as plyr::count
), George Dontas's clean way of 20 handling the trivial case (i.e. if (n==1) return(1)
), and the 19 solution provided by @Zelazny7 to a question I had 18 regarding bigz vectors.
Algorithm 1 (original)
library(gmp)
factor2 <- function(MyN) {
if (MyN == 1) return(1L)
else {
max_p_div <- factorize(MyN)
prime_vec <- max_p_div <- max_p_div[sort.list(asNumeric(max_p_div))]
my_factors <- powers <- as.bigz(vector())
uni_p <- unique(prime_vec); maxp <- max(prime_vec)
for (i in 1:length(uni_p)) {
temp_size <- length(which(prime_vec == uni_p[i]))
powers <- c(powers, pow.bigz(uni_p[i], 1:temp_size))
}
my_factors <- c(as.bigz(1L), my_factors, powers)
temp_facs <- powers; r <- 2L
temp_facs2 <- max_p_div2 <- as.bigz(vector())
while (r <= length(uni_p)) {
for (i in 1:length(temp_facs)) {
a <- which(prime_vec > max_p_div[i])
temp <- mul.bigz(temp_facs[i], powers[a])
temp_facs2 <- c(temp_facs2, temp)
max_p_div2 <- c(max_p_div2, prime_vec[a])
}
my_sort <- sort.list(asNumeric(max_p_div2))
temp_facs <- temp_facs2[my_sort]
max_p_div <- max_p_div2[my_sort]
my_factors <- c(my_factors, temp_facs)
temp_facs2 <- max_p_div2 <- as.bigz(vector()); r <- r+1L
}
}
my_factors[sort.list(asNumeric(my_factors))]
}
Algorithm 2 (sieve)
EfficientFactorList <- function(n) {
MyFactsList <- lapply(1:n, function(x) 1)
for (j in 2:n) {
for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)}
}; MyFactsList}
It gives the factorization 17 of every number between 1 and 100,000 in 16 less than 2 seconds. To give you an idea 15 of the efficiency of this algorithm, the 14 time to factor 1 - 100,000 using the brute 13 force method takes close to 3 minutes.
system.time(t1 <- EfficientFactorList(10^5))
user system elapsed
1.04 0.00 1.05
system.time(t2 <- sapply(1:10^5, MyFactors))
user system elapsed
39.21 0.00 39.23
system.time(t3 <- sapply(1:10^5, all_divisors))
user system elapsed
49.03 0.02 49.05
TheTest <- sapply(1:10^5, function(x) all(t2[[x]]==t3[[x]]) && all(asNumeric(t2[[x]])==t1[[x]]) && all(asNumeric(t3[[x]])==t1[[x]]))
all(TheTest)
[1] TRUE
Final Thoughts
@Dontas’s 12 original comment about factoring large numbers 11 got me thinking, what about really really large 10 numbers… like numbers greater than 2^200. You 9 will see that whichever algorithm you choose 8 on this page, they will all take a very 7 long time because most of them rely on gmp::factorize
which 6 uses the Pollard-Rho algorithm. From this question, this algorithm is 5 only reasonable for numbers less than 2^70. I 4 am currently working on my own factorize algorithm 3 which will implement the Quadratic Sieve, which should 2 take all of these algorithms to the next 1 level.
The following approach deliver correct results, even 4 in cases of really big numbers (which should 3 be passed as strings). And it's really fast.
# TEST
# x <- as.bigz("12345678987654321")
# all_divisors(x)
# all_divisors(x*x)
# x <- pow.bigz(2,89)-1
# all_divisors(x)
library(gmp)
options(scipen =30)
sort_listz <- function(z) {
#==========================
z <- z[order(as.numeric(z))] # sort(z)
} # function sort_listz
mult_listz <- function(x,y) {
do.call('c', lapply(y, function(i) i*x))
}
all_divisors <- function(x) {
#==========================
if (abs(x)<=1) return(x)
else {
factorsz <- as.bigz(factorize(as.bigz(x))) # factorize returns up to
# e.g. x= 12345678987654321 factors: 3 3 3 3 37 37 333667 333667
factorsz <- sort_listz(factorsz) # vector of primes, sorted
prime_factorsz <- unique(factorsz)
#prime_ekt <- sapply(prime_factorsz, function(i) length( factorsz [factorsz==i]))
prime_ekt <- vapply(prime_factorsz, function(i) sum(factorsz==i), integer(1), USE.NAMES=FALSE)
spz <- vector() # keep all divisors
all <-1
n <- length(prime_factorsz)
for (i in 1:n) {
pr <- prime_factorsz[i]
pe <- prime_ekt[i]
all <- all*(pe+1) #counts all divisors
prz <- as.bigz(pr)
pse <- vector(mode="raw",length=pe+1)
pse <- c( as.bigz(1), prz)
if (pe>1) {
for (k in 2:pe) {
prz <- prz*pr
pse[k+1] <- prz
} # for k
} # if pe>1
if (i>1) {
spz <- mult_listz (spz, pse)
} else {
spz <- pse;
} # if i>1
} #for n
spz <- sort_listz (spz)
return (spz)
}
} # function factors_all_divisors
#====================================
Refined 2 version, very fast. Code remains simple, readable 1 & clean.
TEST
#Test 4 (big prime factor)
x <- pow.bigz(2,256)+1 # = 1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321
system.time(z2 <- all_divisors(x))
# user system elapsed
# 19.27 1.27 20.56
#Test 5 (big prime factor)
x <- as.bigz("12345678987654321321") # = 3 * 19 * 216590859432531953
system.time(x2 <- all_divisors(x^2))
#user system elapsed
#25.65 0.00 25.67
A lot has changed in the R language since 49 this question was originally asked. In version 48 0.6-3
of the numbers
package, the function divisors
was included 47 that is very useful for getting all of the 46 factors of a number. It will meet the needs 45 of most users, however if you are looking 44 for raw speed or you are working with larger 43 numbers, you will need an alternative method. I 42 have authored two new packages (partially 41 inspired by this question, I might add) that 40 contain highly optimized functions aimed 39 at problems just like this. The first one 38 is RcppAlgos
and the other is RcppBigIntAlgos
(formerly called bigIntegerAlgos
).
RcppAlgos
RcppAlgos
contains 37 two functions for obtaining divisors of 36 numbers less than 2^53 - 1
: divisorsRcpp
(a vectorized function 35 for quickly obtaining the complete factorization 34 of many numbers) & divisorsSieve
(quickly generates 33 the complete factorization over a range). First 32 up, we factor many random numbers using 31 divisorsRcpp
:
library(gmp) ## for all_divisors by @GeorgeDontas
library(RcppAlgos)
library(numbers)
options(scipen = 999)
set.seed(42)
testSamp <- sample(10^10, 10)
## vectorized so you can pass the entire vector as an argument
testRcpp <- divisorsRcpp(testSamp)
testDontas <- lapply(testSamp, all_divisors)
identical(lapply(testDontas, as.numeric), testRcpp)
[1] TRUE
And now, factor many numbers over a range 30 using divisorsSieve
:
system.time(testSieve <- divisorsSieve(10^13, 10^13 + 10^5))
user system elapsed
0.242 0.006 0.247
system.time(testDontasSieve <- lapply((10^13):(10^13 + 10^5), all_divisors))
user system elapsed
47.880 0.132 47.922
identical(lapply(testDontasSieve, asNumeric), testSieve)
[1] TRUE
Both divisorsRcpp
and divisorsSieve
are nice functions that 29 are flexible and efficient, however they 28 are limited to 2^53 - 1
.
RcppBigIntAlgos
The RcppBigIntAlgos
package (formerly 27 called bigIntegerAlgos
prior to version 0.2.0) links directly 26 to the C library gmp and features divisorsBig
which is designed 25 for very large numbers.
library(RcppBigIntAlgos)
## testSamp is defined above... N.B. divisorsBig is not quite as
## efficient as divisorsRcpp. This is so because divisorsRcpp
## can take advantage of more efficient data types.
testBig <- divisorsBig(testSamp)
identical(testDontas, testBig)
[1] TRUE
And here are the 24 benchmark as defined in my original post 23 (N.B. MyFactors
is replaced by divisorsRcpp
and divisorsBig
).
## Case 2
library(rbenchmark)
set.seed(199)
samp <- sample(10^9, 10^5)
benchmark(RcppAlgos=divisorsRcpp(samp),
RcppBigIntAlgos=divisorsBig(samp),
DontasDivs=lapply(samp, all_divisors),
replications=10,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")
test replications elapsed relative
1 RcppAlgos 10 5.236 1.000
2 RcppBigIntAlgos 10 12.846 2.453
3 DontasDivs 10 383.742 73.289
## Case 3
set.seed(97)
samp <- sample(10^6, 10^4)
benchmark(RcppAlgos=divisorsRcpp(samp),
RcppBigIntAlgos=divisorsBig(samp),
numbers=lapply(samp, divisors), ## From the numbers package
DontasDivs=lapply(samp, all_divisors),
CottonDivs=lapply(samp, get_all_factors),
ChaseDivs=lapply(samp, FUN),
replications=5,
columns = c("test", "replications", "elapsed", "relative"),
order = "relative")
test replications elapsed relative
1 RcppAlgos 5 0.083 1.000
2 RcppBigIntAlgos 5 0.265 3.193
3 numbers 5 12.913 155.578
4 DontasDivs 5 15.813 190.518
5 CottonDivs 5 60.745 731.867
6 ChaseDivs 5 299.520 3608.675
The next benchmarks 22 demonstrate the true power of the underlying 21 algorithm in the divisorsBig
function. The number being 20 factored is a power of 10
, so the prime factoring 19 step can almost be completely ignored (e.g. system.time(factorize(pow.bigz(10,30)))
registers 18 0
on my machine). Thus, the difference in 17 timing is due solely to how quickly the 16 prime factors can be combined to produce 15 all factors.
library(microbenchmark)
powTen <- pow.bigz(10, 30)
microbenchmark(divisorsBig(powTen), all_divisors(powTen), unit = "relative")
Unit: relative
expr min lq mean median uq max neval
divisorsBig(powTen) 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 100
all_divisors(powTen) 21.49849 21.27973 21.13085 20.63345 21.18834 20.38772 100
## Negative numbers show an even greater increase in efficiency
negPowTen <- powTen * -1
microbenchmark(divisorsBig(negPowTen), all_divisors(negPowTen), unit = "relative")
Unit: relative
expr min lq mean median uq max neval
divisorsBig(negPowTen) 1.00000 1.0000 1.0000 1.00000 1.00000 1.00000 100
all_divisors(negPowTen) 28.75275 28.1864 27.9335 27.57434 27.91376 30.16962 100
Very Large Numbers
With divisorsBig
, obtaining the complete 14 factorization with very large inputs is 13 no problem. The algorithm dynamically adjusts 12 based off of the input and applies different 11 algorithms in different situations. We can 10 also take advantage of multithreading if 9 Lenstra's Elliptic Curve method or the Quadratic Sieve is utilized.
Here are some examples 8 using n5
and n9
defined in this answer.
n5 <- as.bigz("94968915845307373740134800567566911")
system.time(print(divisorsBig(n5)))
Big Integer ('bigz') object of length 4:
[1] 1 216366620575959221 438925910071081891
[4] 94968915845307373740134800567566911
user system elapsed
0.162 0.003 0.164
n9 <- prod(nextprime(urand.bigz(2, 82, 42)))
system.time(print(divisorsBig(n9, nThreads=4)))
Big Integer ('bigz') object of length 4:
[1] 1
[2] 2128750292720207278230259
[3] 4721136619794898059404993
[4] 10050120961360479179164300841596861740399588283187
user system elapsed
1.776 0.011 0.757
Here is an example 7 provided by @Dontas with one large prime 6 and one smaller prime:
x <- pow.bigz(2, 256) + 1
divisorsBig(x, showStats=TRUE, nThreads=8)
Summary Statistics for Factoring:
115792089237316195423570985008687907853269984665640564039457584007913129639937
| Pollard Rho Time |
|--------------------|
| 479ms |
| Lenstra ECM Time | Number of Curves |
|--------------------|--------------------|
| 1s 870ms | 2584 |
| Total Time |
|--------------------|
| 2s 402ms |
Big Integer ('bigz') object of length 4:
[1] 1
[2] 1238926361552897
[3] 93461639715357977769163558199606896584051237541638188580280321
[4] 115792089237316195423570985008687907853269984665640564039457584007913129639937
Compare this to finding 5 the prime factorization using gmp::factorize
:
system.time(factorize(x))
user system elapsed
9.199 0.036 9.248
Lastly, here 4 is an example with a large semiprime (N.B. since 3 we know it's a semiprime, we skip the extended 2 Pollard's rho algorithm as well as Lentra's 1 elliptic curve method).
## https://members.loria.fr/PZimmermann/records/rsa.html
rsa79 <- as.bigz("7293469445285646172092483905177589838606665884410340391954917800303813280275279")
divisorsBig(rsa79, nThreads=8, showStats=TRUE, skipPolRho=T, skipECM=T)
Summary Statistics for Factoring:
7293469445285646172092483905177589838606665884410340391954917800303813280275279
| MPQS Time | Complete | Polynomials | Smooths | Partials |
|--------------------|----------|-------------|------------|------------|
| 2m 49s 174ms | 100 | 91221 | 5651 | 7096 |
| Mat Algebra Time | Mat Dimension |
|--------------------|--------------------|
| 14s 863ms | 12625 x 12747 |
| Total Time |
|--------------------|
| 3m 4s 754ms |
Big Integer ('bigz') object of length 4:
[1] 1
[2] 848184382919488993608481009313734808977
[3] 8598919753958678882400042972133646037727
[4] 7293469445285646172092483905177589838606665884410340391954917800303813280275279
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.