irmsd¶
- class irmsd.Molecule(symbols: list[str], positions: ~numpy.ndarray, energy: float | None = None, info: dict[str, ~typing.Any] = <factory>, cell: ~numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None)[source]¶
Bases:
objectLightweight replacement for ase.Atoms.
- Raises:
ValueError – If input data has incorrect shape or contains unknown symbols.
Notes
This class is designed to be a lightweight alternative to ASE’s Atoms class to minimize dependencies.
- cell: ndarray | None = None¶
- copy() Molecule[source]¶
Return a deep copy of the Molecule.
All arrays (positions, numbers, cell) are copied, and both info and symbols are duplicated so that modifying the copy has no effect on the original.
- energy: float | None = None¶
- get_axis() tuple[ndarray, ndarray, ndarray][source]¶
Optional utility: calls core get_axis and returns rotation constants in MHz, average momentum in a.u., and the rotation matrix.
- Returns:
rot (
(3,) ndarrayoffloat64) – Rotation constants in MHz.avmom (
(1,) ndarrayoffloat64) – Average momentum in a.u.evec (
(3,3) ndarrayoffloat64)
- get_canonical(wbo: ndarray | None = None, invtype: str = 'apsp+', heavy: bool = False) ndarray[source]¶
Optional utility: calls core get_canonical_fortran and returns the rank (and/or invariants, depending on backend).
- Parameters:
wbo (
(N,N) ndarrayoffloat64, optional) – Wiberg bond order matrix, required if invtype is ‘cangen’.invtype (
str, optional) – Algorithm type for invariants calculation (default: ‘apsp+’), alternatively ‘cangen’.heavy (
bool, optional) – Whether to consider only heavy atoms (default: False).
- Returns:
rank – Rank array.
- Return type:
(N,) ndarrayofint32
- get_chemical_formula(mode: str = 'hill') str[source]¶
Return a chemical formula string.
- Parameters:
mode (
str) – “hill” → C, H first, then alphabetical (standard Hill formula) others → alphabetical order of all elements- Returns:
formula
- Return type:
str
- get_cn() ndarray[source]¶
Optional utility: calls core get_cn_fortran and returns a numpy array with the coordination numbers per atom.
- Returns:
cn
- Return type:
(N,) ndarrayoffloat64
- get_positions(copy: bool = True) ndarray[source]¶
Return atomic positions.
- Parameters:
copy (
bool) – If True, return a copy of the positions array. If False, return the internal array (may be modified).- Returns:
positions
- Return type:
(N,3) ndarrayoffloat64
- info: dict[str, Any]¶
- property natoms: int¶
- pbc: tuple[bool, bool, bool] | None = None¶
- positions: ndarray¶
- symbols: list[str]¶
- irmsd.ase_to_molecule(atoms)[source]¶
Convert an ASE Atoms object (or a sequence of them) into the internal irmsd.core.Molecule type.
This function is intentionally non-invasive: it does not trigger any new ASE calculator evaluations. It merely extracts whatever structural and metadata information is already present in the ASE object.
- Parameters:
atoms (
ase.AtomsorSequence[ase.Atoms]) – A single ASE Atoms instance or a sequence of them.- Returns:
If atoms is a single Atoms object, a single Molecule is returned.
If atoms is a sequence of Atoms objects, a list of Molecules is returned in the same order.
- Return type:
Moleculeorlist[Molecule]
Notes
This routine requires ASE to be installed. If ASE is missing, a clear and controlled error message is raised via require_ase().
This routine does not modify either the input Atoms object or its attached calculator.
The returned Molecule is guaranteed to be fully self-contained and ASE-independent.
- Raises:
RuntimeError – If ASE is not installed.
TypeError – If the input is neither an ASE Atoms instance nor a sequence of them.
- irmsd.compute_axis_and_print(molecule_list: Sequence[Molecule], run_multiple: bool = False) List[Tuple[ndarray, ndarray]][source]¶
Compute rotational constants, averge momentum and rotation matrix for each structure and prints them.
- Parameters:
molecule_list (
list[irmsd.Molecule]) – Structures to analyze.- Returns:
One float array with the 3 rotational constants, one float with the average momentum and one float array with the rotation matrix (3, 3) per structure, same order as
molecule_list.- Return type:
list[np.ndarray,np.ndarray,np.ndarray]
- irmsd.compute_canonical_and_print(molecule_list: Sequence[Molecule], heavy: bool = False, run_multiple: bool = False) List[ndarray][source]¶
Computes the canonical atom identifiers for each structure and prints them.
- Parameters:
molecule_list (
list[irmsd.Molecule]) – Structures to analyze.heavy (
bool) – Consider only heavy atoms
- Returns:
One integer array with the canonical ranks per structure, same order as
molecule_list.- Return type:
list[np.ndarray]
- irmsd.compute_cn_and_print(molecule_list: Sequence[Molecule], run_multiple: bool = False) List[ndarray][source]¶
Compute coordination numbers for each structure and print them.
- Parameters:
molecule_list (
list[irmsd.Molecule]) – Structures to analyze.- Returns:
One integer array per structure, same order as
molecule_list.- Return type:
list[np.ndarray]
- irmsd.compute_irmsd_and_print(molecule_list: Sequence[Molecule], inversion=None, outfile=None, idx_ref=0, idx_align=1) None[source]¶
Computes the iRMSD between a SINGLE PAIR of molecules and print the iRMSD value.
- Parameters:
molecule_list (
list[irmsd.Molecule]) – Structures to analyze. Must contain exactly two strucutresinversion – parameter to instruct inversion in iRMSD routine
outfile (
strorNone, optional) – If not None, write the aligned structures to this file.idx_ref (
int, optional) – Index of the reference structure in molecule_list (default: 0).idx_align (
int, optional) – Index of the structure to align in molecule_list (default: 1).
- Return type:
None
- irmsd.compute_quaternion_rmsd_and_print(molecule_list: Sequence[Molecule], heavy=False, outfile=None, idx_ref=0, idx_align=1) None[source]¶
Computes the canonical atom identifiers for a SINGLE PAIR of molecules and print the RMSD in Angström between them.
- Parameters:
molecule_list (
list[irmsd.Molecule]) – Structures to analyze. Must contain exactly two strucutresheavy (
bool, optional) – If True, only heavy atoms are considered in the RMSD calculation.outfile (
strorNone, optional) – If not None, write the aligned structure to this file.idx_ref (
int, optional) – Index of the reference structure in molecule_list (default: 0).idx_align (
int, optional) – Index of the structure to align in molecule_list (default: 1).
- Returns:
One integer array with the canonical ranks per structure, same order as
molecule_list.- Return type:
list[np.ndarray]
- irmsd.cregen(molecule_list: Sequence[Molecule], rthr: float = 0.125, ethr: float = 7.96800686e-05, bthr: float = 0.01, printlvl: int = 0, ewin: float | None = None) List[Molecule][source]¶
High-level wrapper around the Fortran-backed
cregen_rawthat operates directly on Molecule objects. Returns a pruned & energy-sorted list of structures.- Parameters:
molecule_list (
Sequence[Molecule]) – Sequence of Molecule objects. All molecules must have the same number of atoms.rthr (
float) – Distance threshold for the sorter (passed through to the backend).ethr (
float) – Inter-conformer energy threshold (in Hartree)bthr (
float) – Inter-conformer rotational constant threshold (fractional)iinversion (
int, optional) – Inversion symmetry flag, passed through to the backend.printlvl (
int, optional) – Verbosity level, passed through to the backend.ewin (
float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.
- Returns:
new_molecule_list – New Molecule objects reconstructed from the sorted atomic numbers and positions returned by the backend. The list contains only n_structures defined as
uniqueaccording to the selected thresholds.- Return type:
list[Molecule]- Raises:
TypeError – If
molecule_listdoes not contain Molecule instances.ValueError – If
molecule_listis empty or if the Molecules do not all have the same number of atoms.
- irmsd.delta_irmsd_list_molecule(molecule_list: Sequence[Molecule], iinversion: int = 0, allcanon: bool = True, printlvl: int = 0) Tuple[ndarray, List[Molecule]][source]¶
High-level wrapper around the Fortran-backed
delta_irmsd_listthat operates directly on Molecule objects.- Parameters:
molecule_list (
Sequence[Molecule]) – Sequence of Molecule objects. All molecules must have the same number of atoms.iinversion (
int, optional) – Inversion symmetry flag, passed through to the backend.allcanon (
bool, optional) – Canonicalization flag, passed through to the backend.printlvl (
int, optional) – Verbosity level, passed through to the backend.
- Returns:
delta (
np.ndarray) – Float array returned by the backend (seedelta_irmsd_listfor detailed semantics).new_molecule_list (
list[Molecule]) – New Molecule objects reconstructed from the atomic numbers and positions returned by the backend. The list has the same length and ordering asmolecule_list.
- Raises:
TypeError – If
molecule_listdoes not contain Molecule instances.ValueError – If
molecule_listis empty or if the Molecules do not all have the same number of atoms.
- irmsd.get_axis(atom_numbers: ndarray, positions: ndarray) Tuple[ndarray, ndarray, ndarray][source]¶
Core API: call the Fortran routine to calculate the rotation axis, average moment, and eigenvectors
- Parameters:
atom_numbers (
(N,) int32-like) – Atomic numbers (or types).positions (
(N,3) float64-like) – Cartesian coordinates in Å.
- Returns:
rot (
(3,) float64 ndarray) – Rotation axis.avmom (
(1,) float64 ndarray) – Average moment.evec (
(3,3) float64 ndarray) – Eigenvectors.
- Raises:
ValueError – If positions does not have shape (N, 3).
- irmsd.get_axis_ase(atoms) tuple[ndarray, ndarray, ndarray][source]¶
High-level utility: accepts an ASE Atoms object, converts it into an internal Molecule instance, and returns rotation constants, average angular momentum, and eigenvectors via Molecule.get_axis().
This routine never triggers a new ASE calculator evaluation.
- Parameters:
atoms (
ase.Atoms) – A single ASE Atoms object.- Returns:
rot_constants_MHz, avg_momentum_au, rotation_matrix
- Return type:
tuple[np.ndarray,np.ndarray,np.ndarray]
- irmsd.get_axis_rdkit(molecule, conf_id: None | int | Sequence = None) Tuple[ndarray, ndarray, ndarray] | List[Tuple[ndarray, ndarray, ndarray]][source]¶
Optional RDKit utility: compute principal axes for one or more conformers of a molecule.
- Parameters:
molecule (
rdkit.Chem.Mol) – RDKit Molecule object containing conformers.conf_id (
int,listofint, orNone, optional) – Conformer ID(s) to compute principal axes for. If None, all conformers are used.
- Returns:
(Rotational constants, average moments, eigenvectors) for each specified conformer.
- Return type:
Tuple[np.ndarray,np.ndarray,np.ndarray]orlistofsuch tuples.- Raises:
TypeError – If the input is not an RDKit Molecule.
- irmsd.get_canonical_ase(atoms, wbo: ndarray | None = None, invtype: str = 'apsp+', heavy: bool = False) ndarray[source]¶
High-level utility: accepts an ASE Atoms object, converts it into an internal Molecule instance, and returns the canonicalization rank / invariants as computed by Molecule.get_canonical().
This routine does not trigger any new ASE calculator evaluation.
- Parameters:
atoms (
ase.Atoms) – A single ASE Atoms object.wbo (
np.ndarrayorNone, optional) – Optional Wiberg bond order matrix or similar, passed through to Molecule.get_canonical() and ultimately to the Fortran backend.invtype (
str, optional) – Invariant type selector, e.g. “apsp+” (default). Forwarded directly to the canonicalization backend.heavy (
bool, optional) – If True, restricts invariants to heavy atoms only, as defined by the underlying backend. Defaults to False.
- Returns:
Canonicalization rank / invariants array as returned by Molecule.get_canonical().
- Return type:
np.ndarray- Raises:
RuntimeError – If ASE is not installed.
TypeError – If atoms is not an ASE Atoms instance.
- irmsd.get_canonical_fortran(atom_numbers: ndarray, positions: ndarray, wbo: ndarray | None = None, invtype: str = 'apsp+', heavy: bool = False) ndarray[source]¶
Core API: call the Fortran routine to calculate the canonical ranking of atoms
- Parameters:
atom_numbers (
(N,) int32-like) – Atomic numbers (or types).positions (
(N,3) float64-like) – Cartesian coordinates in Å.heavy (
bool, optional) – Whether to consider only heavy atoms (default: False).wbo (
(natoms,natoms) float64,C-contiguous, optional) – Optional Wiberg bond order matrix, required if invtype is ‘cangen’, ignored in case of ‘apsp+’.invtype (
str, optional) – alogrithm type for invariants calculation (default: apsp+), alternativly ‘cangen’.
- Returns:
rank – Rank array.
- Return type:
(N,) int32- Raises:
ValueError – If positions does not have shape (N, 3).
- irmsd.get_canonical_rdkit(molecule, conf_id: None | int | Sequence = None, wbo: None | ndarray = None, invtype='apsp+', heavy: bool = False) ndarray[source]¶
Optional RDKit utility: compute coordination numbers for one or more conformers of a molecule.
- Parameters:
molecule (
rdkit.Chem.Mol) – RDKit Molecule object containing conformers.conf_id (
int,listofint, orNone, optional) – Conformer ID(s) to compute canonical representations for. If None, all conformers are used.wbo (
np.ndarray, optional) – Optional weight bond order matrix/matrices for canonicalization, required for the ‘cangen’ invtype. If given either one per conformer with shape (n_conf, n_atoms, n_atoms) or use the same for all (n_atoms, n_atoms).invtype (
str, optional) – Type of invariant representation to compute. Default is ‘apsp+’.heavy (
bool, optional) – Whether to consider only heavy atoms in the canonicalization. Default is False.
- Returns:
Canonical ranks for each atom in the specified conformers. If multiple conformers are specified, returns an array of shape (n_conf, n_atoms).
- Return type:
np.ndarray- Raises:
TypeError – If the input is not an RDKit Molecule.
- irmsd.get_cn_ase(atoms) ndarray[source]¶
High-level utility: accepts an ASE Atoms object, converts it into an internal Molecule instance, and returns the coordination-number array as computed by Molecule.get_cn().
This routine does not trigger any new ASE calculator evaluation.
- Parameters:
atoms (
ase.Atoms) – A single ASE Atoms object.- Returns:
Array of coordination numbers with shape (N,).
- Return type:
np.ndarray
- irmsd.get_cn_fortran(atom_numbers: ndarray, positions: ndarray) ndarray[source]¶
Core API: call the Fortran routine to calculate CN
- Parameters:
atom_numbers (
(N,) int32-like) – Atomic numbers (or types).positions (
(N,3) float64-like) – Cartesian coordinates in Å.
- Returns:
cn – array with coordination numbers
- Return type:
(N) float64 ndarray- Raises:
ValueError – If positions does not have shape (N, 3).
- irmsd.get_cn_rdkit(molecule, conf_id: None | int | Sequence = None) ndarray[source]¶
Optional RDKit utility: compute coordination numbers for one or more conformers of a molecule.
- Parameters:
molecule (
rdkit.Chem.Mol) – RDKit Molecule object containing conformers.conf_id (
int,listofint, orNone, optional) – Conformer ID(s) to compute coordination numbers for. If None, all conformers are used.
- Returns:
Coordination numbers for each atom in the specified conformers. If multiple conformers are specified, returns an array of shape (n_conf, n_atoms).
- Return type:
np.ndarray- Raises:
TypeError – If the input is not an RDKit Molecule.
- irmsd.get_energies_from_molecule_list(molecule_list: Sequence[Molecule]) ndarray[source]¶
Collect potential energies from a sequence of Molecule objects.
For each Molecule, this function calls
get_potential_energy()and stores the result in a 1D NumPy array of dtype float. If the energy is not available (for example, ifget_potential_energy()raisesAttributeErroror returnsNone), the corresponding entry is set to 0.0.- Parameters:
molecule_list (
Sequence[Molecule]) – Sequence of Molecule instances.- Returns:
energies – Array of shape (n_structures,) containing one energy per Molecule.
- Return type:
np.ndarray
- irmsd.get_irmsd(atom_numbers1: ndarray, positions1: ndarray, atom_numbers2: ndarray, positions2: ndarray, iinversion: int = 0) Tuple[float, ndarray, ndarray, ndarray, ndarray][source]¶
Core API: call the Fortran routine to calculate the iRMSD between two structures
- Parameters:
atom_numbers1 (
(N1,) int32-like) – Atomic numbers (or types) of structure 1.positions1 (
(N1,3) float64-like) – Cartesian coordinates in Å of structure 1.atom_numbers2 (
(N2,) int32-like) – Atomic numbers (or types) of structure 2.positions2 (
(N2,3) float64-like) – Cartesian coordinates in Å of structure 2.iinversion (
int, optional) – Whether to consider inversion symmetry. Default is 0 (auto). Set to 1 to use inversion, set to 2 to disable inversion.
- Returns:
rmsdval (
float) – The calculated iRMSD value.Z3 (
(N1,) int32 ndarray) – Atomic numbers of the aligned structure 1.P3 (
(N1,3) float64 ndarray) – Aligned and centered coordinates of structure 1.Z4 (
(N2,) int32 ndarray) – Atomic numbers of the aligned structure 2.P4 (
(N2,3) float64 ndarray) – Aligned and centered coordinates of structure 2.
Notes
The returned coordinates P3 and P4 are centered at the origin.
- Raises:
ValueError – If positions1 or positions2 do not have shape (Ni, 3).
- irmsd.get_irmsd_ase(atoms1, atoms2, iinversion: int = 0) Tuple[float, 'ase.Atoms', 'ase.Atoms'][source]¶
ASE wrapper for
get_irmsd_molecule.Converts two ASE
Atomsobjects to Molecules, callsget_irmsd_molecule, and converts both resulting Molecules back to ASEAtomsobjects.- Parameters:
atoms1 (
ase.Atoms) – First structure.atoms2 (
ase.Atoms) – Second structure.iinversion (
int, optional) – Inversion flag passed through to the backend. (0 = ‘auto’, 1 = ‘on’, 2 = ‘off’)
- Returns:
irmsd (
float) – iRMSD value in Ångström.new_atoms1 (
ase.Atoms) – New ASE Atoms object corresponding to the transformed first Molecule.new_atoms2 (
ase.Atoms) – New ASE Atoms object corresponding to the transformed second Molecule.
- Raises:
RuntimeError – If ASE is not installed.
- irmsd.get_irmsd_molecule(molecule1: Molecule, molecule2: Molecule, iinversion: int = 0) Tuple[float, Molecule, Molecule][source]¶
Compute the iRMSD between two Molecule objects using the iRMSD backend.
The backend may reorder atoms and/or change atomic numbers according to its canonicalization / matching logic. This wrapper returns copies of both input Molecules with the updated atomic numbers and positions.
- Parameters:
- Returns:
- Raises:
TypeError – If either input is not a Molecule.
- irmsd.get_irmsd_rdkit(molecule_ref, molecule_align, conf_id_ref=-1, conf_id_align=-1, iinversion: int = 0) Tuple[float, 'Mol', 'Mol'][source]¶
Optional Rdkit utility: operate on TWO Rdkit Molecules. Returns the iRMSD in Angström, the molecule object with both Conformers permuted and aligned.
- Parameters:
molecule_ref (
rdkit.Chem.Mol) – Reference RDKit Molecule object.molecule_align (
rdkit.Chem.Mol) – RDKit Molecule object to be aligned.conf_id_ref (
int, optional) – Conformer ID for the reference molecule. Default is -1 (rdkit default).conf_id_align (
int, optional) – Conformer ID for the molecule to be aligned. Default is -1 (rdkit default).iinversion (
int, optional) – Inversion type for iRMSD calculation. Default is 0. ( 0: ‘auto’, 1: ‘on’, 2: ‘off’ )
- Returns:
iRMSD in Angström, aligned RDKit Molecule object for reference, aligned RDKit Molecule object for alignment.
- Return type:
Tuple[float,rdkit.Chem.Mol,rdkit.Chem.Mol]- Raises:
TypeError – If the inputs are not RDKit Molecule objects.
- irmsd.get_quaternion_rmsd_fortran(atom_numbers1: ndarray, positions1: ndarray, atom_numbers2: ndarray, positions2: ndarray, mask: ndarray | None = None) Tuple[float, ndarray, ndarray][source]¶
Pair API: call the Fortran routine on TWO structures.
- Parameters:
atom_numbers1 (
(N1,) int32-like)positions1 (
(N1,3) float64-like)atom_numbers2 (
(N2,) int32-like)positions2 (
(N2,3) float64-like)mask (
(N1,) bool-likeorNone)
- Returns:
rmsdval (
float64)new_positions2 (
(N2,3) float64,(positions2 @ Umat.T))Umat (
(3,3) float64 (Fortran-ordered))
Notes
The returned new_positions2 is aligned onto positions1. 1. If mask is provided, only the atoms where mask==True in structure 1 are used to compute the RMSD. 2. The rotation matrix Umat is Fortran-ordered, i.e., to rotate positions2, do: new_positions2 = positions2 @ Umat.T 3. The returned new_positions2 is also translated to have the same barycenter as positions1.
- Raises:
ValueError – If the input arrays do not have the correct shapes or types.
- irmsd.get_rmsd_ase(atoms1, atoms2, mask=None) Tuple[float, 'ase.Atoms', np.ndarray][source]¶
ASE wrapper for
get_rmsd_molecule.Converts two ASE
Atomsobjects to internal Molecule objects, callsget_rmsd_molecule, and converts the aligned second structure back to an ASEAtomsobject.- Parameters:
atoms1 (
ase.Atoms) – Reference structure.atoms2 (
ase.Atoms) – Structure to be rotated/translated ontoatoms1.mask (
array-likeofbool, optional) – Optional mask selecting which atoms in the first structure participate in the RMSD (forwarded to the backend viaget_rmsd_molecule).
- Returns:
rmsd (
float) – RMSD value in Ångström.new_atoms2 (
ase.Atoms) – New ASE Atoms object with coordinates aligned toatoms1.rotation_matrix (
np.ndarray) – 3×3 rotation matrix used for the alignment.
- Raises:
RuntimeError – If ASE is not installed.
TypeError – If inputs are not ASE Atoms.
- irmsd.get_rmsd_molecule(molecule1: Molecule, molecule2: Molecule, mask=None) Tuple[float, Molecule, ndarray][source]¶
Compute the RMSD between two Molecule objects using the quaternion-based RMSD backend, and return an aligned copy of the second Molecule.
- Parameters:
molecule1 (
Molecule) – Reference structure.molecule2 (
Molecule) – Structure to be rotated/translated ontomolecule1.mask (
array-likeofbool, optional) – Optional mask selecting which atoms ofmolecule1participate in the RMSD. Must be broadcastable / compatible with the Fortran backend’s mask semantics.
- Returns:
rmsd (
float) – Root-mean-square deviation in Ångström.new_molecule2 (
Molecule) – Copy ofmolecule2with its positions replaced by the aligned coordinates returned by the backend.rotation_matrix (
np.ndarray) – 3×3 rotation matrix applied to alignmolecule2ontomolecule1.
- Raises:
TypeError – If either input is not a Molecule.
- irmsd.get_rmsd_rdkit(molecule_ref, molecule_align, conf_id_ref=-1, conf_id_align=-1, mask=None) Tuple[float, 'Mol', np.ndarray][source]¶
Optional Rdkit utility: operate on two Rdkit Molecules. Returns the RMSD in Angström, the molecule object with both Conformers aligned.
- Parameters:
molecule_ref (
rdkit.Chem.Mol) – Reference RDKit Molecule object.molecule_align (
rdkit.Chem.Mol) – RDKit Molecule object to be aligned.conf_id_ref (
int, optional) – Conformer ID for the reference molecule. Default is -1 (rdkit default).conf_id_align (
int, optional) – Conformer ID for the molecule to be aligned. Default is -1 (rdkit default).mask (
array-likeofbool, optional)
- Returns:
RMSD in Angström, aligned RDKit Molecule object, and rotation matrix.
- Return type:
Tuple[float,rdkit.Chem.Mol,np.ndarray]- Raises:
TypeError – If the inputs are not RDKit Molecule objects.
- irmsd.molecule_to_ase(molecules: Molecule | Sequence[Molecule])[source]¶
Convert an internal irmsd.core.Molecule instance (or a sequence of them) into ASE Atoms objects.
This routine performs a purely structural and metadata-level conversion: it does not create or attach any calculator, nor does it trigger any new ASE calculations.
- Parameters:
molecules (
MoleculeorSequence[Molecule]) – A single Molecule or a sequence of Molecule objects.- Returns:
If molecules is a single Molecule, a single ASE Atoms object is returned.
If molecules is a sequence of Molecule objects, a list of ASE Atoms objects is returned in the same order.
- Return type:
ase.Atomsorlist[ase.Atoms]
Notes
This routine requires ASE to be installed. If ASE is missing, a clear RuntimeError is raised via require_ase().
The returned Atoms objects are structurally independent copies; further modifications to the original Molecule will not affect them.
- Raises:
RuntimeError – If ASE is not installed.
TypeError – If the input is neither a Molecule instance nor a sequence of Molecule instances.
- irmsd.molecule_to_rdkit(molecule: Molecule | Sequence[Molecule]) 'Mol' | list['Mol'][source]¶
Convert one or more irmsd Molecule objects to one or more RDKit Molecule objects.
- Parameters:
molecule (
irmsd.core.Moleculeorlistofirmsd.core.Molecule) – irmsd Molecule object(s) to convert.- Returns:
Converted RDKit Molecule object(s).
- Return type:
rdkit.Chem.Molorlistofrdkit.Chem.Mol- Raises:
TypeError – If the input is not an irmsd Molecule or a list of them.
- irmsd.prune(molecule_list: Sequence[Molecule], rthr: float, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, ewin: float | None = None) List[Molecule][source]¶
High-level wrapper around the Fortran-backed
sorter_irmsdthat operates directly on Molecule objects. Returns a pruned list of structures.- Parameters:
molecule_list (
Sequence[Molecule]) – Sequence of Molecule objects. All molecules must have the same number of atoms.rthr (
float) – Distance threshold for the sorter (passed through to the backend).iinversion (
int, optional) – Inversion symmetry flag, passed through to the backend.allcanon (
bool, optional) – Canonicalization flag, passed through to the backend.printlvl (
int, optional) – Verbosity level, passed through to the backend.ethr (
float | None) – Optional energy threshold to accelerate by pre-sortingewin (
float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.
- Returns:
new_molecule_list – New Molecule objects reconstructed from the sorted atomic numbers and positions returned by the backend. The list contains only n_structures defined as
uniqueaccording to the selected thresholds.- Return type:
list[Molecule]- Raises:
TypeError – If
molecule_listdoes not contain Molecule instances.ValueError – If
molecule_listis empty or if the Molecules do not all have the same number of atoms.
- irmsd.rdkit_to_molecule(molecules, conf_id: int | Sequence[int] | None = None) Molecule | list[Molecule][source]¶
Convert one or more RDKit Molecule objects to one or more irmsd Molecule objects.
If conf_id is None, all conformers are converted. If conf_id is an int, only that conformer is converted. If conf_id is a list of int, only those conformers are converted.
- Parameters:
molecules (
rdkit.Chem.Molorlistofrdkit.Chem.Mol) – RDKit Molecule object(s) to convert.conf_id (
int,listofint, orNone, optional) – Conformer ID(s) to convert. If None, all conformers are converted.
- Returns:
Converted irmsd Molecule object(s).
- Return type:
irmsd.core.Moleculeorlistofirmsd.core.Molecule- Raises:
TypeError – If the input is not an RDKit Molecule or a list of them. Also if any conformer is not 3D.
- irmsd.read_structures(paths: str | Sequence[str]) List[Molecule][source]¶
Read an arbitrary number of structures and return them as Molecule objects.
For each path, this routine behaves as follows:
If the file extension is
.xyzor.extxyz, it uses the internalread_extxyzhelper to obtain one or more Molecule objects.For all other file types, it attempts to import ASE via
require_ase(), usesase.io.readto read one or more ASE Atoms objects, and converts them into Molecule objects usingase_to_molecule.
- Multi-frame files:
If a file contains multiple frames, all frames are read and appended to the output list. A short informational message is printed indicating the number of frames that were found.
- Parameters:
paths (
Sequence[str]) – File paths to read.- Returns:
structures – One Molecule per frame found across all input paths.
- Return type:
list[Molecule]
- irmsd.run_cregen_and_print(molecule_list: Sequence[Molecule], rthr: float, ethr: float, bthr: float, ewin: float | None = None, printlvl: int = 0, maxprint: int = 25, outfile: str | None = None) None[source]¶
Convenience wrapper around cregen() from mol_interface. Splits according to sum formula, if necessary.
- Parameters:
molecule_list (
sequenceofirmsd.Molecule) – Input structures.rthr (
float) – RMSD thershold for conformer identificationethr (
float) – Energy threshold for conformer identificationbthr (
float) – Rotational constant threshold for conformer identificationprintlvl (
int, optional) – Verbosity level, passed through.maxprint (
int, optional) – Max number of lines to print for each structure result tableoutfile (
strorNone, optional) – If not None, write all resulting structures to this file (e.g. ‘sorted.xyz’) using a write function. Gets automatic name appendage if there are more than one type of molecule in the molecule_list
- irmsd.sort_get_delta_irmsd_and_print(molecule_list: Sequence[Molecule], inversion: str = None, allcanon: bool = True, printlvl: int = 0, maxprint: int = 25, outfile: str | None = None) None[source]¶
Convenience wrapper around presorted_sort_structures_and_print:
Analyzes the molecule_list to separate them by composition
Sorts by energy if applicable.
Calculates iRMSD between structures x_i and x_i-1
- Parameters:
molecule_list (
sequenceofirmsd.Molecule) – Input structures.inversion (
str, optional) – Inversion symmetry flag, passed through.allcanon (
bool, optional) – Canonicalization flag, passed through.printlvl (
int, optional) – Verbosity level, passed through.maxprint (
int, optional) – Max number of lines to print for each structure result tableoutfile (
strorNone, optional) – If not None, write all resulting structures to this file (e.g. ‘sorted.xyz’) using a write function. Gets automatic name appendage if there are more than one type of molecule in the molecule_list
- irmsd.sort_structures_and_print(molecule_list: Sequence[Molecule], rthr: float, inversion: str = None, allcanon: bool = True, printlvl: int = 0, maxprint: int = 25, ethr: float | None = None, ewin: float | None = None, outfile: str | None = None) None[source]¶
Convenience wrapper around presorted_sort_structures_and_print:
Analyzes the molecule_list to separate them by composition
Sorts by energy if applicable.
Calls presorted_sort_structures_and_print for each group
- Parameters:
molecule_list (
sequenceofirmsd.Molecule) – Input structures.rthr (
float | None) – Distance threshold for sorter_irmsd_molecule.inversion (
str, optional) – Inversion symmetry flag, passed through.allcanon (
bool, optional) – Canonicalization flag, passed through.printlvl (
int, optional) – Verbosity level, passed through.maxprint (
int, optional) – Max number of lines to print for each structure result tableethr (
float | None) – Optional inter-conformer energy threshold for more efficient presortingewin (
float | None) – Optional energy window to limit ensemble size around lowest energy structureoutfile (
strorNone, optional) – If not None, write all resulting structures to this file (e.g. ‘sorted.xyz’) using a write function. Gets automatic name appendage if there are more than one type of molecule in the molecule_list
- irmsd.sorter_irmsd(atom_numbers_list: Sequence[ndarray], positions_list: Sequence[ndarray], nat: int, rthr: float, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, energies_list: Sequence[ndarray] | None = None) Tuple[ndarray, List[ndarray], List[ndarray]][source]¶
High-level API: call the sorter_exposed_xyz_fortran Fortran routine.
- Parameters:
atom_numbers_list (
sequenceof(N,) int32 arrays) – Per-structure atom numbers.positions_list (
sequenceof(N,3) float64 arrays) – Per-structure coordinates.nat (
int) – Number of atoms for which the groups array is defined. Must satisfy 1 <= nat <= N.rthr (
float) – Distance threshold for the Fortran sorter. In Angström.iinversion (
int) – Inversion symmetry flag.allcanon (
bool) – Canonicalization flag.printlvl (
int) – Verbosity level.ethr (
float | None) – Inter-conformer energy threshold (optional). In Hartree.energies_list (
sequenceof(Nall,) floats | None) – List of energies for the passed structures (optional). In Hartree.
- Returns:
groups (
(nat,) int32) – Group index for each of the first nat atoms.xyz_structs (
listof(N,3) float64 arrays) – Updated coordinates for each structure.Z_structs (
listof(N,) int32 arrays) – Updated atom numbers for each structure.
- Raises:
ValueError – If input arrays have inconsistent shapes or invalid parameters.
- irmsd.sorter_irmsd_ase(atoms_list: Sequence['ase.Atoms'], rthr: float = 0.125, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, ewin: float | None = None) Tuple[np.ndarray, List['ase.Atoms']][source]¶
ASE wrapper for
sorter_irmsd_molecule.Converts a sequence of ASE
Atomsobjects to Molecules, callssorter_irmsd_molecule, and converts the resulting Molecules back to ASEAtomsobjects.- Parameters:
atoms_list (
Sequence[ase.Atoms]) – Sequence of ASE Atoms objects. All must have the same number of atoms.rthr (
float) – Distance threshold for the sorter.iinversion (
int, optional) – Inversion symmetry flag. (0 = ‘auto’, 1 = ‘on’, 2 = ‘off’)allcanon (
bool, optional) – Canonicalization flag.printlvl (
int, optional) – Verbosity level.ethr (
float | None) – Optional energy threshold to accelerate by pre-sorting. In Hartree.ewin (
float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.
- Returns:
groups (
np.ndarray) – Integer array of shape (nat,) with group indices as returned bysorter_irmsd_molecule/ backend.new_atoms_list (
list[ase.Atoms]) – New ASE Atoms objects reconstructed from the sorted Molecules.
- irmsd.sorter_irmsd_molecule(molecule_list: Sequence[Molecule], rthr: float = 0.125, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, ewin: float | None = None) Tuple[ndarray, List[Molecule]][source]¶
High-level wrapper around the Fortran-backed
sorter_irmsdthat operates directly on Molecule objects.- Parameters:
molecule_list (
Sequence[Molecule]) – Sequence of Molecule objects. All molecules must have the same number of atoms.rthr (
float) – Distance threshold for the sorter (passed through to the backend).iinversion (
int, optional) – Inversion symmetry flag, passed through to the backend.allcanon (
bool, optional) – Canonicalization flag, passed through to the backend.printlvl (
int, optional) – Verbosity level, passed through to the backend.ethr (
float | None) – Optional energy threshold to accelerate by pre-sortingewin (
float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.
- Returns:
groups (
np.ndarray) – Integer array of shape (nat,) with group indices for the firstnatatoms (as defined by the backend).new_molecule_list (
list[Molecule]) – New Molecule objects reconstructed from the sorted atomic numbers and positions returned by the backend. The list has the same length and ordering asmolecule_list.
- Raises:
TypeError – If
molecule_listdoes not contain Molecule instances.ValueError – If
molecule_listis empty or if the Molecules do not all have the same number of atoms.
- irmsd.sorter_irmsd_rdkit(molecules: 'Mol' | Sequence['Mol'], rthr: float = 0.125, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, ewin: float | None = None) Tuple[np.ndarray, List['Mol']][source]¶
Optional Rdkit utility: operate on a list of Rdkit Molecules. Returns a list of indices corresponding to the sorted molecules based on iRMSD.
- Parameters:
molecules (
rdkit.Chem.Molorlistofrdkit.Chem.Mol) – RDKit Molecule object(s) containing multiple conformers.rthr (
float) – iRMSD threshold for grouping.iinversion (
int, optional) – Inversion type for iRMSD calculation. Default is 0. ( 0: ‘auto’, 1: ‘on’, 2: ‘off’ )allcanon (
bool, optional) – Canonicalization flag, passed through to the backend.printlvl (
int, optional) – Verbosity level, passed through to the backend.ethr (
float | None) – Optional energy threshold to accelerate by pre-sortingewin (
float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.
- Returns:
groups (
np.ndarray) – Integer array of shape (nat,) with group indices for the firstnatatoms (as defined by the backend).new_molecules_list (
listofrdkit.Chem.Mol) – List of RDKit Molecule objects corresponding to the sorted molecules.
- Raises:
TypeError – If the input is not an RDKit Molecule or a list of them.
- irmsd.write_structures(filename: str | Path, structures: Molecule | Sequence[Molecule], mode: str = 'w') None[source]¶
High-level structure writer for the irmsd Molecule type.
This routine mirrors the behaviour of read_structures on the output side: it chooses the appropriate backend based on the file extension and accepts either a single Molecule or a sequence of Molecule objects.
Dispatch rules¶
If the filename has no extension, ‘.xyz’ is appended and the internal extended-XYZ writer is used.
If the filename ends with ‘.xyz’, ‘.extxyz’ or ‘.trj’ (case-insensitive), the structures are written using the internal extended-XYZ writer (write_extxyz). The mode argument is passed through and controls whether the file is overwritten (‘w’, default) or appended to (‘a’).
For all other filename extensions, ASE is used as a backend. The structures are first converted to ASE Atoms objects using molecule_to_ase, and then written via ase.io.write. In this case the mode argument is currently ignored and ASE’s default behaviour for the chosen format is used.
- param filename:
Output filename. Its extension determines the backend.
- type filename:
strorpathlib.Path- param structures:
A single Molecule or a sequence of Molecules to be written. For extended XYZ, multiple Molecules are written as consecutive frames in one file.
- type structures:
MoleculeorSequence[Molecule]- param mode:
File open mode for extended XYZ output. Ignored for non-XYZ formats handled via ASE.
- type mode:
{"w", "a"}, optional- raises RuntimeError:
If ASE is required (non-XYZ formats) but not installed.
- raises TypeError:
If structures is not a Molecule or a sequence of Molecules.