NEWS
textmineR 3.0.6
This version is a patch. In this version I have
- Updated the URL to a working paper cited in the topic modeling vignette.
textmineR 3.0.5 (2021-06-28)
This version is a patch. In this version I have
- Fixed a bug in
CalcHellignerDist()
and CalcJSDivergence()
that sometimes
caused inputs to be overwritten.
- Fixed some typos in the vignette for topic modeling
- Updated the documentation on
FitCtmModel()
to better explain how to pass
control arguments to CTM's underlying function.
- Enabled return of a
tibble
or data.frame
(instead of only data.frame
) in
the following functions: SummarizeTopics
, GetTopTerms
, TermDocFreq
(Thanks to Mattias for the PR)
textmineR 3.0.4 (2019-04-18)
This version is a patch. In this version I have
- Removed unconditional stripping in MAKEVARs as specified by CRAN
- Improved outputs of
FitLdaModel
textmineR 3.0.3 (2019-03-22)
This version is a patch. In this version I have
- fixed an error related to the
update.lda_topic_model
method.
- added a method
posterior.lda_topic_model
to sample from the posterior of an
LDA topic model.
textmineR 3.0.2 (2019-01-09)
This version is a patch. In this version I have
- changed some elements of NAMESPACE to pass additional CRAN checks.
- added an update method for the lda_topic_model class. This allows users to add
documents to an existing model (and even add new topics) without changing the
indices of previously-trained topics. e.g. topic 5 is still topic 5.
- added a vignette for using
tidytext
alongside textmineR
textmineR 3.0.1 (2018-10-31)
This version is a patch in response to issues revealed by automatic checks upon
submission to CRAN plus an additional issue I encountered along the way.
I have
- Used the CRAN template for my MIT LICENSE file
- Modified the example of the LabelTopics function to speed up run time for that example
- Modified vignettes to run in less time
- Added a Makevars file to keep compiled code small on Ubuntu.
Please read below for major updates between v2.x.x and v3.x.x
textmineR 3.0.0
This version significantly changes textmineR.
-
Several functions that were slated for deletion in version 2.1.3 are now gone.
- RecursiveRbind
- Vec2Dtm
- JSD
- HellDist
- GetPhiPrime
- FormatRawLdaOutput
- Files2Vec
- DepluralizeDtm
- CorrectS
- CalcPhiPrime
-
FitLdaModel has changed significantly.
- Now only Gibbs sampling is a supported training method. The Gibbs sampler is
no longer wrapping lda::lda_collapsed_gibbs_sampler. It is now native to
textmineR. It's a little slower, but has additional features.
- Asymmetric priors are supported for both alpha and beta.
- There is an option, optimize_alpha, which updates alpha every 10 iterations
based on the value of theta at the current iteration.
- The log likelihood of the data given estimates of phi and theta is optionally
calculated every 10 iterations.
- Probabilistic coherence is optionally calculated at the time of model fit.
- R-squared is optionally calculated at the time of model fit.
-
Supported topic models (LDA, LSA, CTM) are now object-oriented, creating their
own S3 classes. These classes have their own predict methods, meaning you do
not have to do your own math to make predictions for new documents.
-
A new function SummarizeTopics has been added.
-
tm is no longer a dependency for stopwords. We now use the stopwords package.
The extended result of this is that there is no longer any Java dependency.
-
Several packages have been moved from "Imports" to "Suggests". The result is
a faster install and lower likelihood of install failure based on packages with
system dependencies. (Looking at you, topicmodels!)
-
Finally, I have changed the textmineR license to the MIT license. Note, however,
that some dependencies may have more restrictive licenses. So if you're looking
to use textmineR in a commercial project, you may want to dig deeper into
what is/isn't permissable.
textmineR 2.1.3 (2018-09-11)
- Deprecating functions that will be removed, renamed, or have significant changes to syntax or functionality in the forthcoming textmineR v3.0.
- Functions slated for deletion:
- RecursiveRbind
- Vec2Dtm
- JSD
- HellDist
- GetPhiPrime
- FormatRawLdaOutput
- Files2Vec
- DepluralizeDtm
- CorrectS
- CalcPhiPrime
- In addition: FitLdaModel is going to change significantly in its functionality and argument calls.
textmineR 2.1.2 (2018-04-29)
- Deprecated RecursiveRbind - it depended on a deprecated function from the Matrix package. And the replacement offered by Matrix operates recursively, making this function truly superfluous.
textmineR 2.1.1 (2018-03-06)
- Corrected some code in the vignettes that caused errors on Linux machines.
textmineR 2.1.0
- Added vignettes for common use cases of textmineR
- Modified averaging for
CalcProbCoherence
- Updated documentation to
CreateTcm
textmineR 2.0.6 (2017-08-17)
- Back-end changes to CreateTcm in response to new
text2vec
API. Functionality is unchanged.
- Changes to how the package interfaces with Rcpp
textmineR 2.0.5 (2017-04-07)
- Add
verbose
option to CreateDtm
and CreateTcm
to supress status messages.
- Add function
GetVocabFromDtm
to get text2vec
vocabulary object from a dgCMatrix
document term matrix.
textmineR 2.0.4 (2016-11-03)
- Patching errors introduced in version 2.0.3
textmineR 2.0.3 (2016-10-06)
- Patches to
CreateDtm
and CreateTcm
in response to updates to text2vec
.
- More formal update to take advantage of
text2vec
's latest optimizations to follow.
textmineR 2.0.2 (2016-06-06)
- Patched
CreateDtm
and CreateTcm
. remove_punctuation now supports non-English
characters.
- Patched
TmParallelApply
. Added an option to declare the environment to search
for your export list. Default to that argument just searches the local
environment. The default should cover ~95% of use cases. (And avoids crash on
Windows OS)
- Patched
FitLdaModel
. Use of the ...
argument now allows you to control
TmParallelApply
, lda::lda.collapsed.gibbs.sampler
, and topicmodels::LDA
without error.
- Patched
FitCtmModel
where the ...
argument now goes to topicmodels::CTM
's
control
argument.
- Patched
CreateTcm
to return objects of class dgCMatrix
. This allows you to
run functions like FitLdaModel
on a TCM.
- Switched from irlba to RSpectra for LSA models because RSpectra's
implementation is much faster.
textmineR 2.0.1 (2016-04-24)
- Patched CreateDtm and CreateTcm. An error caused stopwords to not be removed
textmineR 2.0.0 (2016-04-19)
- Vec2Dtm is now deprecated in favor of CreateDtm
- A function, CreateTcm, now exists to create term co-occurrence matrices
- CreateDtm and CreateTcm are implemented with a parallel C++ back end through the text2vec library
- the implementation is much faster! I've clocked 2X - 10X speedups, depending on options
- adds external dependencies - C++ compiler and GNU make - and takes away an external
dependency - Java.
- now all tokens will be included, regardless of length. (tm's framework silently
dropped all tokens of fewer than 3 characters.)
- Allow generic stemming and stopwords in CreateDtm & CreateTcm
- Now there is only one argument for stopwords, making it clearer how to use
custom or non-English stopwords
- Now the stemming argument allows for passing of stem/lemmatization functions.
- Function for fitting correlated topic models
- Function to turn a document term matrix to term co-occurrence matrix
- Allowed LabelTopics to use unigrams, if you want. (n-grams are still better.)
- More robust error checking for CalcTopicModelR2 and CalcLikelihood
- All function arguments use "_", not ".".
- CalcPhiPrime replaces (the now deprecated) GetPhiPrime
- Allows you to pass an argument to specify non-uniform probabilities of each
document
- Similarly, CalcHellingerDist and CalcJSDivergence replace HellDist and JSD.
This is to conform to a naming convention where functions are "verbs".
textmineR 1.7.0 (2016-03-31)
- Added modeling capability for latent semantic analysis in FitLsaModel()
- Added CalcProbCoherence() function which replaces ProbCoherence() and can calculate
probabilistic coherence for the whole phi matrix.
- Added data from NIH research grants instead of borrowed data from tm
- Removed qcq data
- Added variational em method for FitLdaModel()
- Added function to represent document clustering as a topic model Cluster2TopicModel()
textmineR 1.6.0 (2016-03-06)
- Add deprecation warning to ProbCoherence
- Allow for arguments of number of cores to be passed to every function that
uses implicit parallelziation
- Allow for passing of libraries to TmParallelApply (makes this function truely
independent of textmineR)
- For Vec2Dtm ensure that stopwords and custom stopwords are lowercased
when lower = TRUE
- Update README example to use model caches