irmsd

class irmsd.Molecule(symbols: list[str], positions: ~numpy.ndarray, energy: float | None = None, info: dict[str, ~typing.Any] = <factory>, cell: ~numpy.ndarray | None = None, pbc: tuple[bool, bool, bool] | None = None)[source]

Bases: object

Lightweight replacement for ase.Atoms.

Raises:

ValueError – If input data has incorrect shape or contains unknown symbols.

Notes

This class is designed to be a lightweight alternative to ASE’s Atoms class to minimize dependencies.

cell: ndarray | None = None
copy() Molecule[source]

Return a deep copy of the Molecule.

All arrays (positions, numbers, cell) are copied, and both info and symbols are duplicated so that modifying the copy has no effect on the original.

energy: float | None = None
get_atomic_numbers() ndarray[source]

Return atomic numbers as int32 array.

get_axis() tuple[ndarray, ndarray, ndarray][source]

Optional utility: calls core get_axis and returns rotation constants in MHz, average momentum in a.u., and the rotation matrix.

Returns:

  • rot ((3,) ndarray of float64) – Rotation constants in MHz.

  • avmom ((1,) ndarray of float64) – Average momentum in a.u.

  • evec ((3, 3) ndarray of float64)

get_canonical(wbo: ndarray | None = None, invtype: str = 'apsp+', heavy: bool = False) ndarray[source]

Optional utility: calls core get_canonical_fortran and returns the rank (and/or invariants, depending on backend).

Parameters:
  • wbo ((N, N) ndarray of float64, optional) – Wiberg bond order matrix, required if invtype is ‘cangen’.

  • invtype (str, optional) – Algorithm type for invariants calculation (default: ‘apsp+’), alternatively ‘cangen’.

  • heavy (bool, optional) – Whether to consider only heavy atoms (default: False).

Returns:

rank – Rank array.

Return type:

(N,) ndarray of int32

get_chemical_formula(mode: str = 'hill') str[source]

Return a chemical formula string.

Parameters:

mode (str) – “hill” → C, H first, then alphabetical (standard Hill formula) others → alphabetical order of all elements

Returns:

formula

Return type:

str

get_chemical_symbols() list[str][source]
get_cn() ndarray[source]

Optional utility: calls core get_cn_fortran and returns a numpy array with the coordination numbers per atom.

Returns:

cn

Return type:

(N,) ndarray of float64

get_positions(copy: bool = True) ndarray[source]

Return atomic positions.

Parameters:

copy (bool) – If True, return a copy of the positions array. If False, return the internal array (may be modified).

Returns:

positions

Return type:

(N, 3) ndarray of float64

get_potential_energy() float[source]
info: dict[str, Any]
property natoms: int
pbc: tuple[bool, bool, bool] | None = None
positions: ndarray
set_atomic_numbers(numbers: Sequence[int]) None[source]
set_positions(positions: Sequence[Sequence[float]]) None[source]
set_potential_energy(energy: float | None)[source]
symbols: list[str]
irmsd.ase_to_molecule(atoms)[source]

Convert an ASE Atoms object (or a sequence of them) into the internal irmsd.core.Molecule type.

This function is intentionally non-invasive: it does not trigger any new ASE calculator evaluations. It merely extracts whatever structural and metadata information is already present in the ASE object.

Parameters:

atoms (ase.Atoms or Sequence[ase.Atoms]) – A single ASE Atoms instance or a sequence of them.

Returns:

  • If atoms is a single Atoms object, a single Molecule is returned.

  • If atoms is a sequence of Atoms objects, a list of Molecules is returned in the same order.

Return type:

Molecule or list[Molecule]

Notes

  • This routine requires ASE to be installed. If ASE is missing, a clear and controlled error message is raised via require_ase().

  • This routine does not modify either the input Atoms object or its attached calculator.

  • The returned Molecule is guaranteed to be fully self-contained and ASE-independent.

Raises:
  • RuntimeError – If ASE is not installed.

  • TypeError – If the input is neither an ASE Atoms instance nor a sequence of them.

irmsd.compute_axis_and_print(molecule_list: Sequence[Molecule], run_multiple: bool = False) List[Tuple[ndarray, ndarray]][source]

Compute rotational constants, averge momentum and rotation matrix for each structure and prints them.

Parameters:

molecule_list (list[irmsd.Molecule]) – Structures to analyze.

Returns:

One float array with the 3 rotational constants, one float with the average momentum and one float array with the rotation matrix (3, 3) per structure, same order as molecule_list.

Return type:

list[np.ndarray, np.ndarray, np.ndarray]

irmsd.compute_canonical_and_print(molecule_list: Sequence[Molecule], heavy: bool = False, run_multiple: bool = False) List[ndarray][source]

Computes the canonical atom identifiers for each structure and prints them.

Parameters:
  • molecule_list (list[irmsd.Molecule]) – Structures to analyze.

  • heavy (bool) – Consider only heavy atoms

Returns:

One integer array with the canonical ranks per structure, same order as molecule_list.

Return type:

list[np.ndarray]

irmsd.compute_cn_and_print(molecule_list: Sequence[Molecule], run_multiple: bool = False) List[ndarray][source]

Compute coordination numbers for each structure and print them.

Parameters:

molecule_list (list[irmsd.Molecule]) – Structures to analyze.

Returns:

One integer array per structure, same order as molecule_list.

Return type:

list[np.ndarray]

irmsd.compute_irmsd_and_print(molecule_list: Sequence[Molecule], inversion=None, outfile=None, idx_ref=0, idx_align=1) None[source]

Computes the iRMSD between a SINGLE PAIR of molecules and print the iRMSD value.

Parameters:
  • molecule_list (list[irmsd.Molecule]) – Structures to analyze. Must contain exactly two strucutres

  • inversion – parameter to instruct inversion in iRMSD routine

  • outfile (str or None, optional) – If not None, write the aligned structures to this file.

  • idx_ref (int, optional) – Index of the reference structure in molecule_list (default: 0).

  • idx_align (int, optional) – Index of the structure to align in molecule_list (default: 1).

Return type:

None

irmsd.compute_quaternion_rmsd_and_print(molecule_list: Sequence[Molecule], heavy=False, outfile=None, idx_ref=0, idx_align=1) None[source]

Computes the canonical atom identifiers for a SINGLE PAIR of molecules and print the RMSD in Angström between them.

Parameters:
  • molecule_list (list[irmsd.Molecule]) – Structures to analyze. Must contain exactly two strucutres

  • heavy (bool, optional) – If True, only heavy atoms are considered in the RMSD calculation.

  • outfile (str or None, optional) – If not None, write the aligned structure to this file.

  • idx_ref (int, optional) – Index of the reference structure in molecule_list (default: 0).

  • idx_align (int, optional) – Index of the structure to align in molecule_list (default: 1).

Returns:

One integer array with the canonical ranks per structure, same order as molecule_list.

Return type:

list[np.ndarray]

irmsd.cregen(molecule_list: Sequence[Molecule], rthr: float = 0.125, ethr: float = 7.96800686e-05, bthr: float = 0.01, printlvl: int = 0, ewin: float | None = None) List[Molecule][source]

High-level wrapper around the Fortran-backed cregen_raw that operates directly on Molecule objects. Returns a pruned & energy-sorted list of structures.

Parameters:
  • molecule_list (Sequence[Molecule]) – Sequence of Molecule objects. All molecules must have the same number of atoms.

  • rthr (float) – Distance threshold for the sorter (passed through to the backend).

  • ethr (float) – Inter-conformer energy threshold (in Hartree)

  • bthr (float) – Inter-conformer rotational constant threshold (fractional)

  • iinversion (int, optional) – Inversion symmetry flag, passed through to the backend.

  • printlvl (int, optional) – Verbosity level, passed through to the backend.

  • ewin (float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.

Returns:

new_molecule_list – New Molecule objects reconstructed from the sorted atomic numbers and positions returned by the backend. The list contains only n_structures defined as unique according to the selected thresholds.

Return type:

list[Molecule]

Raises:
  • TypeError – If molecule_list does not contain Molecule instances.

  • ValueError – If molecule_list is empty or if the Molecules do not all have the same number of atoms.

irmsd.delta_irmsd_list_molecule(molecule_list: Sequence[Molecule], iinversion: int = 0, allcanon: bool = True, printlvl: int = 0) Tuple[ndarray, List[Molecule]][source]

High-level wrapper around the Fortran-backed delta_irmsd_list that operates directly on Molecule objects.

Parameters:
  • molecule_list (Sequence[Molecule]) – Sequence of Molecule objects. All molecules must have the same number of atoms.

  • iinversion (int, optional) – Inversion symmetry flag, passed through to the backend.

  • allcanon (bool, optional) – Canonicalization flag, passed through to the backend.

  • printlvl (int, optional) – Verbosity level, passed through to the backend.

Returns:

  • delta (np.ndarray) – Float array returned by the backend (see delta_irmsd_list for detailed semantics).

  • new_molecule_list (list[Molecule]) – New Molecule objects reconstructed from the atomic numbers and positions returned by the backend. The list has the same length and ordering as molecule_list.

Raises:
  • TypeError – If molecule_list does not contain Molecule instances.

  • ValueError – If molecule_list is empty or if the Molecules do not all have the same number of atoms.

irmsd.get_axis(atom_numbers: ndarray, positions: ndarray) Tuple[ndarray, ndarray, ndarray][source]

Core API: call the Fortran routine to calculate the rotation axis, average moment, and eigenvectors

Parameters:
  • atom_numbers ((N,) int32-like) – Atomic numbers (or types).

  • positions ((N, 3) float64-like) – Cartesian coordinates in Å.

Returns:

  • rot ((3,) float64 ndarray) – Rotation axis.

  • avmom ((1,) float64 ndarray) – Average moment.

  • evec ((3, 3) float64 ndarray) – Eigenvectors.

Raises:

ValueError – If positions does not have shape (N, 3).

irmsd.get_axis_ase(atoms) tuple[ndarray, ndarray, ndarray][source]

High-level utility: accepts an ASE Atoms object, converts it into an internal Molecule instance, and returns rotation constants, average angular momentum, and eigenvectors via Molecule.get_axis().

This routine never triggers a new ASE calculator evaluation.

Parameters:

atoms (ase.Atoms) – A single ASE Atoms object.

Returns:

rot_constants_MHz, avg_momentum_au, rotation_matrix

Return type:

tuple[np.ndarray, np.ndarray, np.ndarray]

irmsd.get_axis_rdkit(molecule, conf_id: None | int | Sequence = None) Tuple[ndarray, ndarray, ndarray] | List[Tuple[ndarray, ndarray, ndarray]][source]

Optional RDKit utility: compute principal axes for one or more conformers of a molecule.

Parameters:
  • molecule (rdkit.Chem.Mol) – RDKit Molecule object containing conformers.

  • conf_id (int, list of int, or None, optional) – Conformer ID(s) to compute principal axes for. If None, all conformers are used.

Returns:

(Rotational constants, average moments, eigenvectors) for each specified conformer.

Return type:

Tuple[np.ndarray, np.ndarray, np.ndarray] or list of such tuples.

Raises:

TypeError – If the input is not an RDKit Molecule.

irmsd.get_canonical_ase(atoms, wbo: ndarray | None = None, invtype: str = 'apsp+', heavy: bool = False) ndarray[source]

High-level utility: accepts an ASE Atoms object, converts it into an internal Molecule instance, and returns the canonicalization rank / invariants as computed by Molecule.get_canonical().

This routine does not trigger any new ASE calculator evaluation.

Parameters:
  • atoms (ase.Atoms) – A single ASE Atoms object.

  • wbo (np.ndarray or None, optional) – Optional Wiberg bond order matrix or similar, passed through to Molecule.get_canonical() and ultimately to the Fortran backend.

  • invtype (str, optional) – Invariant type selector, e.g. “apsp+” (default). Forwarded directly to the canonicalization backend.

  • heavy (bool, optional) – If True, restricts invariants to heavy atoms only, as defined by the underlying backend. Defaults to False.

Returns:

Canonicalization rank / invariants array as returned by Molecule.get_canonical().

Return type:

np.ndarray

Raises:
  • RuntimeError – If ASE is not installed.

  • TypeError – If atoms is not an ASE Atoms instance.

irmsd.get_canonical_fortran(atom_numbers: ndarray, positions: ndarray, wbo: ndarray | None = None, invtype: str = 'apsp+', heavy: bool = False) ndarray[source]

Core API: call the Fortran routine to calculate the canonical ranking of atoms

Parameters:
  • atom_numbers ((N,) int32-like) – Atomic numbers (or types).

  • positions ((N, 3) float64-like) – Cartesian coordinates in Å.

  • heavy (bool, optional) – Whether to consider only heavy atoms (default: False).

  • wbo ((natoms, natoms) float64, C-contiguous, optional) – Optional Wiberg bond order matrix, required if invtype is ‘cangen’, ignored in case of ‘apsp+’.

  • invtype (str, optional) – alogrithm type for invariants calculation (default: apsp+), alternativly ‘cangen’.

Returns:

rank – Rank array.

Return type:

(N,) int32

Raises:

ValueError – If positions does not have shape (N, 3).

irmsd.get_canonical_rdkit(molecule, conf_id: None | int | Sequence = None, wbo: None | ndarray = None, invtype='apsp+', heavy: bool = False) ndarray[source]

Optional RDKit utility: compute coordination numbers for one or more conformers of a molecule.

Parameters:
  • molecule (rdkit.Chem.Mol) – RDKit Molecule object containing conformers.

  • conf_id (int, list of int, or None, optional) – Conformer ID(s) to compute canonical representations for. If None, all conformers are used.

  • wbo (np.ndarray, optional) – Optional weight bond order matrix/matrices for canonicalization, required for the ‘cangen’ invtype. If given either one per conformer with shape (n_conf, n_atoms, n_atoms) or use the same for all (n_atoms, n_atoms).

  • invtype (str, optional) – Type of invariant representation to compute. Default is ‘apsp+’.

  • heavy (bool, optional) – Whether to consider only heavy atoms in the canonicalization. Default is False.

Returns:

Canonical ranks for each atom in the specified conformers. If multiple conformers are specified, returns an array of shape (n_conf, n_atoms).

Return type:

np.ndarray

Raises:

TypeError – If the input is not an RDKit Molecule.

irmsd.get_cn_ase(atoms) ndarray[source]

High-level utility: accepts an ASE Atoms object, converts it into an internal Molecule instance, and returns the coordination-number array as computed by Molecule.get_cn().

This routine does not trigger any new ASE calculator evaluation.

Parameters:

atoms (ase.Atoms) – A single ASE Atoms object.

Returns:

Array of coordination numbers with shape (N,).

Return type:

np.ndarray

irmsd.get_cn_fortran(atom_numbers: ndarray, positions: ndarray) ndarray[source]

Core API: call the Fortran routine to calculate CN

Parameters:
  • atom_numbers ((N,) int32-like) – Atomic numbers (or types).

  • positions ((N, 3) float64-like) – Cartesian coordinates in Å.

Returns:

cn – array with coordination numbers

Return type:

(N) float64 ndarray

Raises:

ValueError – If positions does not have shape (N, 3).

irmsd.get_cn_rdkit(molecule, conf_id: None | int | Sequence = None) ndarray[source]

Optional RDKit utility: compute coordination numbers for one or more conformers of a molecule.

Parameters:
  • molecule (rdkit.Chem.Mol) – RDKit Molecule object containing conformers.

  • conf_id (int, list of int, or None, optional) – Conformer ID(s) to compute coordination numbers for. If None, all conformers are used.

Returns:

Coordination numbers for each atom in the specified conformers. If multiple conformers are specified, returns an array of shape (n_conf, n_atoms).

Return type:

np.ndarray

Raises:

TypeError – If the input is not an RDKit Molecule.

irmsd.get_energies_from_molecule_list(molecule_list: Sequence[Molecule]) ndarray[source]

Collect potential energies from a sequence of Molecule objects.

For each Molecule, this function calls get_potential_energy() and stores the result in a 1D NumPy array of dtype float. If the energy is not available (for example, if get_potential_energy() raises AttributeError or returns None), the corresponding entry is set to 0.0.

Parameters:

molecule_list (Sequence[Molecule]) – Sequence of Molecule instances.

Returns:

energies – Array of shape (n_structures,) containing one energy per Molecule.

Return type:

np.ndarray

irmsd.get_irmsd(atom_numbers1: ndarray, positions1: ndarray, atom_numbers2: ndarray, positions2: ndarray, iinversion: int = 0) Tuple[float, ndarray, ndarray, ndarray, ndarray][source]

Core API: call the Fortran routine to calculate the iRMSD between two structures

Parameters:
  • atom_numbers1 ((N1,) int32-like) – Atomic numbers (or types) of structure 1.

  • positions1 ((N1, 3) float64-like) – Cartesian coordinates in Å of structure 1.

  • atom_numbers2 ((N2,) int32-like) – Atomic numbers (or types) of structure 2.

  • positions2 ((N2, 3) float64-like) – Cartesian coordinates in Å of structure 2.

  • iinversion (int, optional) – Whether to consider inversion symmetry. Default is 0 (auto). Set to 1 to use inversion, set to 2 to disable inversion.

Returns:

  • rmsdval (float) – The calculated iRMSD value.

  • Z3 ((N1,) int32 ndarray) – Atomic numbers of the aligned structure 1.

  • P3 ((N1, 3) float64 ndarray) – Aligned and centered coordinates of structure 1.

  • Z4 ((N2,) int32 ndarray) – Atomic numbers of the aligned structure 2.

  • P4 ((N2, 3) float64 ndarray) – Aligned and centered coordinates of structure 2.

Notes

The returned coordinates P3 and P4 are centered at the origin.

Raises:

ValueError – If positions1 or positions2 do not have shape (Ni, 3).

irmsd.get_irmsd_ase(atoms1, atoms2, iinversion: int = 0) Tuple[float, 'ase.Atoms', 'ase.Atoms'][source]

ASE wrapper for get_irmsd_molecule.

Converts two ASE Atoms objects to Molecules, calls get_irmsd_molecule, and converts both resulting Molecules back to ASE Atoms objects.

Parameters:
  • atoms1 (ase.Atoms) – First structure.

  • atoms2 (ase.Atoms) – Second structure.

  • iinversion (int, optional) – Inversion flag passed through to the backend. (0 = ‘auto’, 1 = ‘on’, 2 = ‘off’)

Returns:

  • irmsd (float) – iRMSD value in Ångström.

  • new_atoms1 (ase.Atoms) – New ASE Atoms object corresponding to the transformed first Molecule.

  • new_atoms2 (ase.Atoms) – New ASE Atoms object corresponding to the transformed second Molecule.

Raises:

RuntimeError – If ASE is not installed.

irmsd.get_irmsd_molecule(molecule1: Molecule, molecule2: Molecule, iinversion: int = 0) Tuple[float, Molecule, Molecule][source]

Compute the iRMSD between two Molecule objects using the iRMSD backend.

The backend may reorder atoms and/or change atomic numbers according to its canonicalization / matching logic. This wrapper returns copies of both input Molecules with the updated atomic numbers and positions.

Parameters:
  • molecule1 (Molecule) – First input structure.

  • molecule2 (Molecule) – Second input structure.

  • iinversion (int, optional) – Inversion flag passed directly to the Fortran backend. See the backend documentation for allowed values and meanings.

Returns:

  • irmsd (float) – iRMSD value in Ångström.

  • new_molecule1 (Molecule) – Copy of molecule1 with updated atomic numbers and positions.

  • new_molecule2 (Molecule) – Copy of molecule2 with updated atomic numbers and positions.

Raises:

TypeError – If either input is not a Molecule.

irmsd.get_irmsd_rdkit(molecule_ref, molecule_align, conf_id_ref=-1, conf_id_align=-1, iinversion: int = 0) Tuple[float, 'Mol', 'Mol'][source]

Optional Rdkit utility: operate on TWO Rdkit Molecules. Returns the iRMSD in Angström, the molecule object with both Conformers permuted and aligned.

Parameters:
  • molecule_ref (rdkit.Chem.Mol) – Reference RDKit Molecule object.

  • molecule_align (rdkit.Chem.Mol) – RDKit Molecule object to be aligned.

  • conf_id_ref (int, optional) – Conformer ID for the reference molecule. Default is -1 (rdkit default).

  • conf_id_align (int, optional) – Conformer ID for the molecule to be aligned. Default is -1 (rdkit default).

  • iinversion (int, optional) – Inversion type for iRMSD calculation. Default is 0. ( 0: ‘auto’, 1: ‘on’, 2: ‘off’ )

Returns:

iRMSD in Angström, aligned RDKit Molecule object for reference, aligned RDKit Molecule object for alignment.

Return type:

Tuple[float, rdkit.Chem.Mol, rdkit.Chem.Mol]

Raises:

TypeError – If the inputs are not RDKit Molecule objects.

irmsd.get_quaternion_rmsd_fortran(atom_numbers1: ndarray, positions1: ndarray, atom_numbers2: ndarray, positions2: ndarray, mask: ndarray | None = None) Tuple[float, ndarray, ndarray][source]

Pair API: call the Fortran routine on TWO structures.

Parameters:
  • atom_numbers1 ((N1,) int32-like)

  • positions1 ((N1, 3) float64-like)

  • atom_numbers2 ((N2,) int32-like)

  • positions2 ((N2, 3) float64-like)

  • mask ((N1,) bool-like or None)

Returns:

  • rmsdval (float64)

  • new_positions2 ((N2, 3) float64, (positions2 @ Umat.T))

  • Umat ((3, 3) float64 (Fortran-ordered))

Notes

The returned new_positions2 is aligned onto positions1. 1. If mask is provided, only the atoms where mask==True in structure 1 are used to compute the RMSD. 2. The rotation matrix Umat is Fortran-ordered, i.e., to rotate positions2, do: new_positions2 = positions2 @ Umat.T 3. The returned new_positions2 is also translated to have the same barycenter as positions1.

Raises:

ValueError – If the input arrays do not have the correct shapes or types.

irmsd.get_rmsd_ase(atoms1, atoms2, mask=None) Tuple[float, 'ase.Atoms', np.ndarray][source]

ASE wrapper for get_rmsd_molecule.

Converts two ASE Atoms objects to internal Molecule objects, calls get_rmsd_molecule, and converts the aligned second structure back to an ASE Atoms object.

Parameters:
  • atoms1 (ase.Atoms) – Reference structure.

  • atoms2 (ase.Atoms) – Structure to be rotated/translated onto atoms1.

  • mask (array-like of bool, optional) – Optional mask selecting which atoms in the first structure participate in the RMSD (forwarded to the backend via get_rmsd_molecule).

Returns:

  • rmsd (float) – RMSD value in Ångström.

  • new_atoms2 (ase.Atoms) – New ASE Atoms object with coordinates aligned to atoms1.

  • rotation_matrix (np.ndarray) – 3×3 rotation matrix used for the alignment.

Raises:
  • RuntimeError – If ASE is not installed.

  • TypeError – If inputs are not ASE Atoms.

irmsd.get_rmsd_molecule(molecule1: Molecule, molecule2: Molecule, mask=None) Tuple[float, Molecule, ndarray][source]

Compute the RMSD between two Molecule objects using the quaternion-based RMSD backend, and return an aligned copy of the second Molecule.

Parameters:
  • molecule1 (Molecule) – Reference structure.

  • molecule2 (Molecule) – Structure to be rotated/translated onto molecule1.

  • mask (array-like of bool, optional) – Optional mask selecting which atoms of molecule1 participate in the RMSD. Must be broadcastable / compatible with the Fortran backend’s mask semantics.

Returns:

  • rmsd (float) – Root-mean-square deviation in Ångström.

  • new_molecule2 (Molecule) – Copy of molecule2 with its positions replaced by the aligned coordinates returned by the backend.

  • rotation_matrix (np.ndarray) – 3×3 rotation matrix applied to align molecule2 onto molecule1.

Raises:

TypeError – If either input is not a Molecule.

irmsd.get_rmsd_rdkit(molecule_ref, molecule_align, conf_id_ref=-1, conf_id_align=-1, mask=None) Tuple[float, 'Mol', np.ndarray][source]

Optional Rdkit utility: operate on two Rdkit Molecules. Returns the RMSD in Angström, the molecule object with both Conformers aligned.

Parameters:
  • molecule_ref (rdkit.Chem.Mol) – Reference RDKit Molecule object.

  • molecule_align (rdkit.Chem.Mol) – RDKit Molecule object to be aligned.

  • conf_id_ref (int, optional) – Conformer ID for the reference molecule. Default is -1 (rdkit default).

  • conf_id_align (int, optional) – Conformer ID for the molecule to be aligned. Default is -1 (rdkit default).

  • mask (array-like of bool, optional)

Returns:

RMSD in Angström, aligned RDKit Molecule object, and rotation matrix.

Return type:

Tuple[float, rdkit.Chem.Mol, np.ndarray]

Raises:

TypeError – If the inputs are not RDKit Molecule objects.

irmsd.molecule_to_ase(molecules: Molecule | Sequence[Molecule])[source]

Convert an internal irmsd.core.Molecule instance (or a sequence of them) into ASE Atoms objects.

This routine performs a purely structural and metadata-level conversion: it does not create or attach any calculator, nor does it trigger any new ASE calculations.

Parameters:

molecules (Molecule or Sequence[Molecule]) – A single Molecule or a sequence of Molecule objects.

Returns:

  • If molecules is a single Molecule, a single ASE Atoms object is returned.

  • If molecules is a sequence of Molecule objects, a list of ASE Atoms objects is returned in the same order.

Return type:

ase.Atoms or list[ase.Atoms]

Notes

  • This routine requires ASE to be installed. If ASE is missing, a clear RuntimeError is raised via require_ase().

  • The returned Atoms objects are structurally independent copies; further modifications to the original Molecule will not affect them.

Raises:
  • RuntimeError – If ASE is not installed.

  • TypeError – If the input is neither a Molecule instance nor a sequence of Molecule instances.

irmsd.molecule_to_rdkit(molecule: Molecule | Sequence[Molecule]) 'Mol' | list['Mol'][source]

Convert one or more irmsd Molecule objects to one or more RDKit Molecule objects.

Parameters:

molecule (irmsd.core.Molecule or list of irmsd.core.Molecule) – irmsd Molecule object(s) to convert.

Returns:

Converted RDKit Molecule object(s).

Return type:

rdkit.Chem.Mol or list of rdkit.Chem.Mol

Raises:

TypeError – If the input is not an irmsd Molecule or a list of them.

irmsd.prune(molecule_list: Sequence[Molecule], rthr: float, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, ewin: float | None = None) List[Molecule][source]

High-level wrapper around the Fortran-backed sorter_irmsd that operates directly on Molecule objects. Returns a pruned list of structures.

Parameters:
  • molecule_list (Sequence[Molecule]) – Sequence of Molecule objects. All molecules must have the same number of atoms.

  • rthr (float) – Distance threshold for the sorter (passed through to the backend).

  • iinversion (int, optional) – Inversion symmetry flag, passed through to the backend.

  • allcanon (bool, optional) – Canonicalization flag, passed through to the backend.

  • printlvl (int, optional) – Verbosity level, passed through to the backend.

  • ethr (float | None) – Optional energy threshold to accelerate by pre-sorting

  • ewin (float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.

Returns:

new_molecule_list – New Molecule objects reconstructed from the sorted atomic numbers and positions returned by the backend. The list contains only n_structures defined as unique according to the selected thresholds.

Return type:

list[Molecule]

Raises:
  • TypeError – If molecule_list does not contain Molecule instances.

  • ValueError – If molecule_list is empty or if the Molecules do not all have the same number of atoms.

irmsd.rdkit_to_molecule(molecules, conf_id: int | Sequence[int] | None = None) Molecule | list[Molecule][source]

Convert one or more RDKit Molecule objects to one or more irmsd Molecule objects.

If conf_id is None, all conformers are converted. If conf_id is an int, only that conformer is converted. If conf_id is a list of int, only those conformers are converted.

Parameters:
  • molecules (rdkit.Chem.Mol or list of rdkit.Chem.Mol) – RDKit Molecule object(s) to convert.

  • conf_id (int, list of int, or None, optional) – Conformer ID(s) to convert. If None, all conformers are converted.

Returns:

Converted irmsd Molecule object(s).

Return type:

irmsd.core.Molecule or list of irmsd.core.Molecule

Raises:

TypeError – If the input is not an RDKit Molecule or a list of them. Also if any conformer is not 3D.

irmsd.read_structures(paths: str | Sequence[str]) List[Molecule][source]

Read an arbitrary number of structures and return them as Molecule objects.

For each path, this routine behaves as follows:

  • If the file extension is .xyz or .extxyz, it uses the internal read_extxyz helper to obtain one or more Molecule objects.

  • For all other file types, it attempts to import ASE via require_ase(), uses ase.io.read to read one or more ASE Atoms objects, and converts them into Molecule objects using ase_to_molecule.

Multi-frame files:

If a file contains multiple frames, all frames are read and appended to the output list. A short informational message is printed indicating the number of frames that were found.

Parameters:

paths (Sequence[str]) – File paths to read.

Returns:

structures – One Molecule per frame found across all input paths.

Return type:

list[Molecule]

irmsd.run_cregen_and_print(molecule_list: Sequence[Molecule], rthr: float, ethr: float, bthr: float, ewin: float | None = None, printlvl: int = 0, maxprint: int = 25, outfile: str | None = None) None[source]

Convenience wrapper around cregen() from mol_interface. Splits according to sum formula, if necessary.

Parameters:
  • molecule_list (sequence of irmsd.Molecule) – Input structures.

  • rthr (float) – RMSD thershold for conformer identification

  • ethr (float) – Energy threshold for conformer identification

  • bthr (float) – Rotational constant threshold for conformer identification

  • printlvl (int, optional) – Verbosity level, passed through.

  • maxprint (int, optional) – Max number of lines to print for each structure result table

  • outfile (str or None, optional) – If not None, write all resulting structures to this file (e.g. ‘sorted.xyz’) using a write function. Gets automatic name appendage if there are more than one type of molecule in the molecule_list

irmsd.sort_get_delta_irmsd_and_print(molecule_list: Sequence[Molecule], inversion: str = None, allcanon: bool = True, printlvl: int = 0, maxprint: int = 25, outfile: str | None = None) None[source]

Convenience wrapper around presorted_sort_structures_and_print:

  • Analyzes the molecule_list to separate them by composition

  • Sorts by energy if applicable.

  • Calculates iRMSD between structures x_i and x_i-1

Parameters:
  • molecule_list (sequence of irmsd.Molecule) – Input structures.

  • inversion (str, optional) – Inversion symmetry flag, passed through.

  • allcanon (bool, optional) – Canonicalization flag, passed through.

  • printlvl (int, optional) – Verbosity level, passed through.

  • maxprint (int, optional) – Max number of lines to print for each structure result table

  • outfile (str or None, optional) – If not None, write all resulting structures to this file (e.g. ‘sorted.xyz’) using a write function. Gets automatic name appendage if there are more than one type of molecule in the molecule_list

irmsd.sort_structures_and_print(molecule_list: Sequence[Molecule], rthr: float, inversion: str = None, allcanon: bool = True, printlvl: int = 0, maxprint: int = 25, ethr: float | None = None, ewin: float | None = None, outfile: str | None = None) None[source]

Convenience wrapper around presorted_sort_structures_and_print:

  • Analyzes the molecule_list to separate them by composition

  • Sorts by energy if applicable.

  • Calls presorted_sort_structures_and_print for each group

Parameters:
  • molecule_list (sequence of irmsd.Molecule) – Input structures.

  • rthr (float | None) – Distance threshold for sorter_irmsd_molecule.

  • inversion (str, optional) – Inversion symmetry flag, passed through.

  • allcanon (bool, optional) – Canonicalization flag, passed through.

  • printlvl (int, optional) – Verbosity level, passed through.

  • maxprint (int, optional) – Max number of lines to print for each structure result table

  • ethr (float | None) – Optional inter-conformer energy threshold for more efficient presorting

  • ewin (float | None) – Optional energy window to limit ensemble size around lowest energy structure

  • outfile (str or None, optional) – If not None, write all resulting structures to this file (e.g. ‘sorted.xyz’) using a write function. Gets automatic name appendage if there are more than one type of molecule in the molecule_list

irmsd.sorter_irmsd(atom_numbers_list: Sequence[ndarray], positions_list: Sequence[ndarray], nat: int, rthr: float, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, energies_list: Sequence[ndarray] | None = None) Tuple[ndarray, List[ndarray], List[ndarray]][source]

High-level API: call the sorter_exposed_xyz_fortran Fortran routine.

Parameters:
  • atom_numbers_list (sequence of (N,) int32 arrays) – Per-structure atom numbers.

  • positions_list (sequence of (N,3) float64 arrays) – Per-structure coordinates.

  • nat (int) – Number of atoms for which the groups array is defined. Must satisfy 1 <= nat <= N.

  • rthr (float) – Distance threshold for the Fortran sorter. In Angström.

  • iinversion (int) – Inversion symmetry flag.

  • allcanon (bool) – Canonicalization flag.

  • printlvl (int) – Verbosity level.

  • ethr (float | None) – Inter-conformer energy threshold (optional). In Hartree.

  • energies_list (sequence of (Nall,) floats | None) – List of energies for the passed structures (optional). In Hartree.

Returns:

  • groups ((nat,) int32) – Group index for each of the first nat atoms.

  • xyz_structs (list of (N,3) float64 arrays) – Updated coordinates for each structure.

  • Z_structs (list of (N,) int32 arrays) – Updated atom numbers for each structure.

Raises:

ValueError – If input arrays have inconsistent shapes or invalid parameters.

irmsd.sorter_irmsd_ase(atoms_list: Sequence['ase.Atoms'], rthr: float = 0.125, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, ewin: float | None = None) Tuple[np.ndarray, List['ase.Atoms']][source]

ASE wrapper for sorter_irmsd_molecule.

Converts a sequence of ASE Atoms objects to Molecules, calls sorter_irmsd_molecule, and converts the resulting Molecules back to ASE Atoms objects.

Parameters:
  • atoms_list (Sequence[ase.Atoms]) – Sequence of ASE Atoms objects. All must have the same number of atoms.

  • rthr (float) – Distance threshold for the sorter.

  • iinversion (int, optional) – Inversion symmetry flag. (0 = ‘auto’, 1 = ‘on’, 2 = ‘off’)

  • allcanon (bool, optional) – Canonicalization flag.

  • printlvl (int, optional) – Verbosity level.

  • ethr (float | None) – Optional energy threshold to accelerate by pre-sorting. In Hartree.

  • ewin (float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.

Returns:

  • groups (np.ndarray) – Integer array of shape (nat,) with group indices as returned by sorter_irmsd_molecule / backend.

  • new_atoms_list (list[ase.Atoms]) – New ASE Atoms objects reconstructed from the sorted Molecules.

irmsd.sorter_irmsd_molecule(molecule_list: Sequence[Molecule], rthr: float = 0.125, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, ewin: float | None = None) Tuple[ndarray, List[Molecule]][source]

High-level wrapper around the Fortran-backed sorter_irmsd that operates directly on Molecule objects.

Parameters:
  • molecule_list (Sequence[Molecule]) – Sequence of Molecule objects. All molecules must have the same number of atoms.

  • rthr (float) – Distance threshold for the sorter (passed through to the backend).

  • iinversion (int, optional) – Inversion symmetry flag, passed through to the backend.

  • allcanon (bool, optional) – Canonicalization flag, passed through to the backend.

  • printlvl (int, optional) – Verbosity level, passed through to the backend.

  • ethr (float | None) – Optional energy threshold to accelerate by pre-sorting

  • ewin (float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.

Returns:

  • groups (np.ndarray) – Integer array of shape (nat,) with group indices for the first nat atoms (as defined by the backend).

  • new_molecule_list (list[Molecule]) – New Molecule objects reconstructed from the sorted atomic numbers and positions returned by the backend. The list has the same length and ordering as molecule_list.

Raises:
  • TypeError – If molecule_list does not contain Molecule instances.

  • ValueError – If molecule_list is empty or if the Molecules do not all have the same number of atoms.

irmsd.sorter_irmsd_rdkit(molecules: 'Mol' | Sequence['Mol'], rthr: float = 0.125, iinversion: int = 0, allcanon: bool = True, printlvl: int = 0, ethr: float | None = None, ewin: float | None = None) Tuple[np.ndarray, List['Mol']][source]

Optional Rdkit utility: operate on a list of Rdkit Molecules. Returns a list of indices corresponding to the sorted molecules based on iRMSD.

Parameters:
  • molecules (rdkit.Chem.Mol or list of rdkit.Chem.Mol) – RDKit Molecule object(s) containing multiple conformers.

  • rthr (float) – iRMSD threshold for grouping.

  • iinversion (int, optional) – Inversion type for iRMSD calculation. Default is 0. ( 0: ‘auto’, 1: ‘on’, 2: ‘off’ )

  • allcanon (bool, optional) – Canonicalization flag, passed through to the backend.

  • printlvl (int, optional) – Verbosity level, passed through to the backend.

  • ethr (float | None) – Optional energy threshold to accelerate by pre-sorting

  • ewin (float | None) – Optional energy window to limit ensembe size around lowest energy structure. In Hartree.

Returns:

  • groups (np.ndarray) – Integer array of shape (nat,) with group indices for the first nat atoms (as defined by the backend).

  • new_molecules_list (list of rdkit.Chem.Mol) – List of RDKit Molecule objects corresponding to the sorted molecules.

Raises:

TypeError – If the input is not an RDKit Molecule or a list of them.

irmsd.write_structures(filename: str | Path, structures: Molecule | Sequence[Molecule], mode: str = 'w') None[source]

High-level structure writer for the irmsd Molecule type.

This routine mirrors the behaviour of read_structures on the output side: it chooses the appropriate backend based on the file extension and accepts either a single Molecule or a sequence of Molecule objects.

Dispatch rules

  • If the filename has no extension, ‘.xyz’ is appended and the internal extended-XYZ writer is used.

  • If the filename ends with ‘.xyz’, ‘.extxyz’ or ‘.trj’ (case-insensitive), the structures are written using the internal extended-XYZ writer (write_extxyz). The mode argument is passed through and controls whether the file is overwritten (‘w’, default) or appended to (‘a’).

  • For all other filename extensions, ASE is used as a backend. The structures are first converted to ASE Atoms objects using molecule_to_ase, and then written via ase.io.write. In this case the mode argument is currently ignored and ASE’s default behaviour for the chosen format is used.

param filename:

Output filename. Its extension determines the backend.

type filename:

str or pathlib.Path

param structures:

A single Molecule or a sequence of Molecules to be written. For extended XYZ, multiple Molecules are written as consecutive frames in one file.

type structures:

Molecule or Sequence[Molecule]

param mode:

File open mode for extended XYZ output. Ignored for non-XYZ formats handled via ASE.

type mode:

{"w", "a"}, optional

raises RuntimeError:

If ASE is required (non-XYZ formats) but not installed.

raises TypeError:

If structures is not a Molecule or a sequence of Molecules.