This repository contains functions to create R-style factors from categorical variables.
Each factor is represented by (i) an array of integer codes in the interval
We can create a factor from any categorical variable:
#include "factorize/factorize.hpp"
std::vector<std::string> group { "A", "B", "C", "A", "B", "C" };
std::vector<int> codes(group.size());
auto levels = factorize::create_factor(group.size(), group.data(), codes.data());
group[0] == levels[codes[0]]; // trueWe can also easily create a factor from multiple variables, where the "levels" will be sorted and unique combinations of the variables.
std::vector<char> grouping1 { 'c', 'a', 'b', 'a', 'b', 'c' };
std::vector<char> grouping2 { 'A', 'B', 'C', 'C', 'B', 'A' };
std::vector<int> combined_codes(grouping1.size());
auto combined_levels = factorize::combine_to_factor(
grouping1.size(),
std::vector<const int*>{ grouping1.data(), grouping2.data() },
combined_codes.data()
);
grouping1[0] == combined_levels[0][combined_codes[0]]; // true
grouping2[0] == combined_levels[1][combined_codes[0]]; // trueCheck out the reference documentation for more details.
If you're using CMake, you just need to add something like this to your CMakeLists.txt:
include(FetchContent)
FetchContent_Declare(
factorize
GIT_REPOSITORY https://github.com/libscran/factorize
GIT_TAG master # or any version of interest
)
FetchContent_MakeAvailable(factorize)Then you can link to factorize to make the headers available during compilation:
# For executables:
target_link_libraries(myexe ltla::factorize)
# For libaries
target_link_libraries(mylib INTERFACE ltla::factorize)find_package(ltla_factorize CONFIG REQUIRED)
target_link_libraries(mylib INTERFACE ltla::factorize)To install the library, use:
mkdir build && cd build
cmake .. -DFACTORIZE_TESTS=OFF
cmake --build . --target installBy default, this will use FetchContent to fetch all external dependencies.
If you want to install them manually, use -DFACTORIZE_FETCH_EXTERN=OFF.
See the tags in extern/CMakeLists.txt to find compatible versions of each dependency.
If you're not using CMake, the simple approach is to just copy the files in include/ - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I.
This also requires the external dependencies listed in extern/CMakeLists.txt.