Changes in version 3.0.6 (2025-10-18) Maintenance notice - textmineR is now in maintenance-only mode. - Future development effort will focus on the successor package tidylda. - Bug fixes and CRAN compliance updates may continue as needed, but new features will not be added here. - Users starting new projects are encouraged to adopt tidylda. Updates This version is a patch where: - The C++11 requirement has been removed. - Package dependencies in vignettes have been handled more gracefully, to guard against failure to build vignettes during CMD check. - Handled class checking with inherits() not if(class(object) == "class"). - Fixed broken URLs in documentation. - Vignettes no longer rely on parallel processing. Changes in version 3.0.5 (2021-06-28) This version is a patch. In this version I have - Fixed a bug in CalcHellignerDist() and CalcJSDivergence() that sometimes caused inputs to be overwritten. - Fixed some typos in the vignette for topic modeling - Updated the documentation on FitCtmModel() to better explain how to pass control arguments to CTM's underlying function. - Enabled return of a tibble or data.frame (instead of only data.frame) in the following functions: SummarizeTopics, GetTopTerms, TermDocFreq (Thanks to Mattias for the PR) Changes in version 3.0.4 (2019-04-18) This version is a patch. In this version I have - Removed unconditional stripping in MAKEVARs as specified by CRAN - Improved outputs of FitLdaModel Changes in version 3.0.3 (2019-03-22) This version is a patch. In this version I have - fixed an error related to the update.lda_topic_model method. - added a method posterior.lda_topic_model to sample from the posterior of an LDA topic model. Changes in version 3.0.2 (2019-01-09) This version is a patch. In this version I have - changed some elements of NAMESPACE to pass additional CRAN checks. - added an update method for the lda_topic_model class. This allows users to add documents to an existing model (and even add new topics) without changing the indices of previously-trained topics. e.g. topic 5 is still topic 5. - added a vignette for using tidytext alongside textmineR Changes in version 3.0.1 (2018-10-31) This version is a patch in response to issues revealed by automatic checks upon submission to CRAN plus an additional issue I encountered along the way. I have - Used the CRAN template for my MIT LICENSE file - Modified the example of the LabelTopics function to speed up run time for that example - Modified vignettes to run in less time - Added a Makevars file to keep compiled code small on Ubuntu. Please read below for major updates between v2.x.x and v3.x.x Changes in version 3.0.0 This version significantly changes textmineR. - Several functions that were slated for deletion in version 2.1.3 are now gone. - RecursiveRbind - Vec2Dtm - JSD - HellDist - GetPhiPrime - FormatRawLdaOutput - Files2Vec - DepluralizeDtm - CorrectS - CalcPhiPrime - FitLdaModel has changed significantly. - Now only Gibbs sampling is a supported training method. The Gibbs sampler is no longer wrapping lda::lda_collapsed_gibbs_sampler. It is now native to textmineR. It's a little slower, but has additional features. - Asymmetric priors are supported for both alpha and beta. - There is an option, optimize_alpha, which updates alpha every 10 iterations based on the value of theta at the current iteration. - The log likelihood of the data given estimates of phi and theta is optionally calculated every 10 iterations. - Probabilistic coherence is optionally calculated at the time of model fit. - R-squared is optionally calculated at the time of model fit. - Supported topic models (LDA, LSA, CTM) are now object-oriented, creating their own S3 classes. These classes have their own predict methods, meaning you do not have to do your own math to make predictions for new documents. - A new function SummarizeTopics has been added. - tm is no longer a dependency for stopwords. We now use the stopwords package. The extended result of this is that there is no longer any Java dependency. - Several packages have been moved from "Imports" to "Suggests". The result is a faster install and lower likelihood of install failure based on packages with system dependencies. (Looking at you, topicmodels!) - Finally, I have changed the textmineR license to the MIT license. Note, however, that some dependencies may have more restrictive licenses. So if you're looking to use textmineR in a commercial project, you may want to dig deeper into what is/isn't permissable. Changes in version 2.1.3 (2018-09-11) - Deprecating functions that will be removed, renamed, or have significant changes to syntax or functionality in the forthcoming textmineR v3.0. - Functions slated for deletion: - RecursiveRbind - Vec2Dtm - JSD - HellDist - GetPhiPrime - FormatRawLdaOutput - Files2Vec - DepluralizeDtm - CorrectS - CalcPhiPrime - In addition: FitLdaModel is going to change significantly in its functionality and argument calls. Changes in version 2.1.2 (2018-04-29) - Deprecated RecursiveRbind - it depended on a deprecated function from the Matrix package. And the replacement offered by Matrix operates recursively, making this function truly superfluous. Changes in version 2.1.1 (2018-03-06) - Corrected some code in the vignettes that caused errors on Linux machines. Changes in version 2.1.0 - Added vignettes for common use cases of textmineR - Modified averaging for CalcProbCoherence - Updated documentation to CreateTcm Changes in version 2.0.6 (2017-08-17) - Back-end changes to CreateTcm in response to new text2vec API. Functionality is unchanged. - Changes to how the package interfaces with Rcpp Changes in version 2.0.5 (2017-04-07) - Add verbose option to CreateDtm and CreateTcm to supress status messages. - Add function GetVocabFromDtm to get text2vec vocabulary object from a dgCMatrix document term matrix. Changes in version 2.0.4 (2016-11-03) - Patching errors introduced in version 2.0.3 Changes in version 2.0.3 (2016-10-06) - Patches to CreateDtm and CreateTcm in response to updates to text2vec. - More formal update to take advantage of text2vec's latest optimizations to follow. Changes in version 2.0.2 (2016-06-06) - Patched CreateDtm and CreateTcm. remove_punctuation now supports non-English characters. - Patched TmParallelApply. Added an option to declare the environment to search for your export list. Default to that argument just searches the local environment. The default should cover ~95% of use cases. (And avoids crash on Windows OS) - Patched FitLdaModel. Use of the ... argument now allows you to control TmParallelApply, lda::lda.collapsed.gibbs.sampler, and topicmodels::LDA without error. - Patched FitCtmModel where the ... argument now goes to topicmodels::CTM's control argument. - Patched CreateTcm to return objects of class dgCMatrix. This allows you to run functions like FitLdaModel on a TCM. - Switched from irlba to RSpectra for LSA models because RSpectra's implementation is much faster. Changes in version 2.0.1 (2016-04-24) - Patched CreateDtm and CreateTcm. An error caused stopwords to not be removed Changes in version 2.0.0 (2016-04-19) - Vec2Dtm is now deprecated in favor of CreateDtm - A function, CreateTcm, now exists to create term co-occurrence matrices - CreateDtm and CreateTcm are implemented with a parallel C++ back end through the text2vec library - the implementation is much faster! I've clocked 2X - 10X speedups, depending on options - adds external dependencies - C++ compiler and GNU make - and takes away an external dependency - Java. - now all tokens will be included, regardless of length. (tm's framework silently dropped all tokens of fewer than 3 characters.) - Allow generic stemming and stopwords in CreateDtm & CreateTcm - Now there is only one argument for stopwords, making it clearer how to use custom or non-English stopwords - Now the stemming argument allows for passing of stem/lemmatization functions. - Function for fitting correlated topic models - Function to turn a document term matrix to term co-occurrence matrix - Allowed LabelTopics to use unigrams, if you want. (n-grams are still better.) - More robust error checking for CalcTopicModelR2 and CalcLikelihood - All function arguments use "_", not ".". - CalcPhiPrime replaces (the now deprecated) GetPhiPrime - Allows you to pass an argument to specify non-uniform probabilities of each document - Similarly, CalcHellingerDist and CalcJSDivergence replace HellDist and JSD. This is to conform to a naming convention where functions are "verbs". Changes in version 1.7.0 (2016-03-31) - Added modeling capability for latent semantic analysis in FitLsaModel() - Added CalcProbCoherence() function which replaces ProbCoherence() and can calculate probabilistic coherence for the whole phi matrix. - Added data from NIH research grants instead of borrowed data from tm - Removed qcq data - Added variational em method for FitLdaModel() - Added function to represent document clustering as a topic model Cluster2TopicModel() Changes in version 1.6.0 (2016-03-06) - Add deprecation warning to ProbCoherence - Allow for arguments of number of cores to be passed to every function that uses implicit parallelziation - Allow for passing of libraries to TmParallelApply (makes this function truely independent of textmineR) - For Vec2Dtm ensure that stopwords and custom stopwords are lowercased when lower = TRUE - Update README example to use model caches