MolDscript is a Python workflow that converts Density Functional Theory (DFT) and related quantum chemistry outputs into descriptor tables ready for machine learning or benchmarking. It wraps cclib, RDKit, DBSTEP, and pandas to align Gaussian, ORCA, and xTB calculations and write consistent molecule-, bond-, and atom-level CSV files.
- Parse optimization, single-point, NBO, NMR, charge, FMO, and Fukui calculations without manual file editing.
- Match conformer ensembles, apply SMARTS-based substructure filters, and compute DBSTEP buried volumes on demand.
- Generate ensembles (Boltzmann weighted, min/mnax within population windows, lowest-energy snapshots) in a single run.
- Emit descriptor CSVs alongside module logs (
MOLDSCRIPT_*.dat) for traceability.
git clone https://github.com/patonlab/molDscript.git
cd molDscript
pip install -e .Open Babel (optional) can be installed from conda-forge:
conda install -c conda-forge openbabelpython -m moldscript \
--opt calculations/opt --fmo calculations/fmo
--suffix_fmo fmo_suffix --suffix_nbo nbo_suffix
--nbo calculations/nbo
Prefer storing options in a key:value text file? Use --varfile inputs.txt; command-line flags override values loaded from the file.
--opt PATH(required) - baseline optimization files and conformer metadata.--spc PATH- single-point energies that replace optimization SCF energies.--nbo,--nmr,--charges,--fmoPATH - add module-specific descriptors; pair with--suffix_*to specify filename tokens specific to calculation type (required for proper comformer matching).--fukui_neutral,--fukui_reduced,--fukui_oxidizedPATH - supply all three charge states for vertical IE/EA and condensed Fukui functions. Again, pair with--suffix_*for proper conformer matching.--substructure SMARTS- limit atom/bond descriptors to a SMARTS match; combine with--volumeor--valland optional--radiuslist for DBSTEP buried volumes.--boltz,--min_max,--lowe- compute Boltzmann-weighted averages, min/max/range tables (using--cut), and lowest-energy snapshots. Adjust--temp(K) as needed.--output PREFIX- prepend every generated filename; append a slash to target a directory. Use--no_mol,--no_atom,--no_bond, or--no_bond_filterto tailor CSV output.
molecule_level.csv,bond_level.csv,atom_level.csv- aligned descriptors per calculation, bond pair, or atom.ensemble_*.csv,boltzmann_weights.csv- created when--boltzis enabled.min_max_range_*.csv,lowest_energy_*.csv- created when--min_maxor--loweare requested.MOLDSCRIPT_*.dat- per-module logs capturing provenance and CPU-time summaries.
The Read the Docs site (coming soon) will provide the full user guide: https://moldscript.readthedocs.io
Key Python dependencies include pandas, cclib (latest GitHub version for the most up-to-date package compatability), dbstep, rdkit, networkx, numpy, and periodictable.
- Gaussian
- ORCA
- xTB (optimizations)
Run pytest -v from the project root to execute the test suite.
This work was carried out in the Paton Laboratory at Colorado State University, supported by the NSF Center for Computer-Assisted Synthesis (grant CHE-1925607).
Contributors include Shree Sowndarya, Jake King, and Robert Paton.
