User API#
BRER runner#
RunConfig class handles the actual workflow logic.
- class brer.run_config.RunConfig(tpr, ensemble_dir, ensemble_num=None, pairs_json='pair_data.json')#
Run configuration for single BRER ensemble member.
The run configuration specifies the files and directory structure used for the run. It determines whether the run is in the training, convergence, or production phase, then performs the run.
Note that all instances of RunConfig need the same sized array of TPR input files across all ranks in an MPI ensemble because they must all be capable of constructing a compatible copy of the ensemble simulation work description.
Source data for the pair restraints is provided through a JSON file (pairs_json). See
brer.pair_data.PairDataCollection
for details.- Parameters:
tpr (str) – path (or paths) to tpr input. Must be compatible with the GROMACS version providing gmxapi.
ensemble_dir (str) – path to top directory which contains the full ensemble.
ensemble_num (int, default=0) – the ensemble member to run
pairs_json (str, default="pair_data.json") – path to file containing ALL the pair metadata. (A serialized
brer.pair_data.PairDataCollection
.)
- build_plugins(plugin_config)#
Builds the plugin configuration.
For each pair-wise restraint, populate the plugin with data: both the “general” data and the data unique to that restraint.
- Parameters:
plugin_config (PluginConfig) – the particular plugin configuration (Training, Convergence, Production) for the run.
- run(tpr_file=None, **kwargs)#
Perform the MD simulations.
Each Python interpreter process runs a separate ensemble member.
- Parameters:
tpr_file (str, optional) – If provided, use this input file instead of the input from the main configuration.
**kwargs (optional) – Additional key word arguments are passed on to the simulator.
After the first “iteration”, brer bootstraps the training and convergence phase’s trajectory with the checkpoint file from the previous iteration’s production phase.
At the beginning of a production phase (when there is not yet a checkpoint file), the checkpoint file from the convergence phase is used to start the production trajectory unless tpr_file is given.
When tpr_file is not None, run() does not look for a bootstrapping checkpoint file. This can be helpful if a checkpoint file is corrupted or unavailable. In general, this means that the tpr_file argument should include the starting configuration you intend for the phase that you are about to run(). If you are providing the tpr_file because you are changing parameters that render existing checkpoints incompatible, you need to either generate the file with the checkpoint from which you want to continue, or you may remove the checkpoint file from the phase directory and restart that phase.
Example
>>> config_params = { ... "tpr": "{}/topol.tpr".format(data_dir), ... "ensemble_num": 1, ... "ensemble_dir": tmpdir, ... "pairs_json": "{}/pair_data.json".format(data_dir) ... } >>> rc = RunConfig(**config_params) >>> assert rc.run_data.get('phase') == 'training' >>> rc.run(threads=2) >>> assert rc.run_data.get('phase') == 'convergence' >>> rc.run() >>> assert rc.run_data.get('phase') == 'production' >>> rc.run(tpr_file=new_tpr, max_hours=23.9)
BRER parameters#
pair_data module#
BRER data for site pairs.
Coordinate the experimental reference data and molecular model data for the labeled / restrained pairs.
Support the statistical (re)sampling of target pair distances when beginning a BRER iteration.
- class brer.pair_data.PairData(name, bins, distribution, sites)#
Pair distance distribution.
Essential pair data to support BRER MD plugin code. All data must be provided when the object is initialized. (Fields are read-only.)
Fields here correspond to the fields for each named pair (JSON objects) in a
pair_data.json
file (the pairs_json argument ofRunConfig()
).Changed in version 2.0: When reading from
pair_data.json
orstate.json
, the name used for PairData comes from the object key, not from the name field of the object. The name field in the serialized (JSON) representation is ignored when reading, but is preserved for backward compatibility when writing.- distribution: list[float]#
Site distance distribution.
Histogram values (weights or relative probabilities) for distances between the sites. (Generally derived from experimental data.)
- name: str#
Identifier for the pair of sites on the molecule.
This string is chosen by the researcher. For example, the name may include identifiers for the two residues in a scheme that can be easily cross-referenced with experimental data.
- sites: list[int]#
Indices defining the distance vector.
A list of indices for sites in the molecular model. The first and last list elements are the sites associated with the distance data. Additional indices can be inserted in the list to define a chain of distance vectors that will be added without applying periodic boundary conditions.
If, at any point in the simulation, the two molecular sites in the pair might be farther apart than half of the shortest simulation box dimension, the distance might accidentally get calculated between sites on different molecule “images” (periodic boundary conditions). To make sure that site-site distances are calculated on the same molecule, provide a sequence of sites on the molecule (that are never more than half a box-length apart) so that the correct vector between sites is unambiguous.
- class brer.pair_data.PairDataCollection(*pairs)#
Data for all the restrained pairs in a BRER simulation.
Source data for the pair restraints is provided through a JSON file (pairs_json). The JSON file contains one JSON object for each
PairData
to be read.For each object, the object key is assumed to be the
PairData.name
of a pair. The JSON object contents are used to initialize aPairData
for each named pair.The data file is usually constructed manually by the researcher after inspection of a molecular model and available experimental data. An example of what such a file should look like is provided in the
brer/data
directory of the installed package or in the source repository.Note that JSON is not a Python-specific file format, but
json
may be helpful.A PairDataCollection can be initialized from a sequence of
PairData
objects, or created from a JSON pair data file by usingcreate_from()
.- Parameters:
pairs (PairData) –
- as_dict()#
Encode the full collection as a single Python dictionary.
- brer.pair_data.sample(pair_data)#
Choose a bin edge according to the probability distribution.
- Parameters:
pair_data (PairData) –
- brer.pair_data.sample_all(pairs)#
Get a mapping of pair names to freshly sampled targets.
- Parameters:
pairs (PairDataCollection) –
run_data module#
Handle simulation data for BRER simulations.
RunData
manages general parameters (GeneralParams
) and pair-specific
parameters (PairParams
) for a single simulator for a specific phase of the
BRER method.
Parameters are initially provided through the
RunConfig
, and are then stored to (and restored
from) an internally managed state.json
file.
Not all parameters are applicable to all BRER phases.
See also
- class brer.run_data.RunData(*, general_params, pair_params)#
Store (and manipulate, to a lesser extent) all the metadata for a BRER run.
The full set of metadata for a single BRER run includes both the general parameters and the pair-specific parameters.
Key-value pairs provided to general_params will be used to update the default values of a new
GeneralParams
instance.pair_params is a mapping of named
PairParams
instances.Both general and pair-specific parameters may be updated with set().
This is the BRER program state data structure. We avoid the name “state” because of potential confusion with concepts like energetic, conformational, or thermodynamic state, but we use the filename
state.json
for the serialized object. RunData instances can be serialized to a file withsave_config()
or deserialized (restored from a file) withcreate_from()
.Examples
├── pair parameters │ ├── name of pair 1 │ │ ├── alpha │ │ ├── target │ │ └── ... │ ├── name of pair 2 | ├── general parameters ├── A ├── tau ├── ...
- Parameters:
general_params (GeneralParams) –
pair_params (MutableMapping[str, PairParams]) –
- as_dictionary()#
Get the run metadata as a hierarchical dictionary.
- Returns:
hierarchical dictionary of metadata
- Return type:
For historical reasons, the top level dictionary keys are not exact string matches for the object attributes.
- classmethod create_from(source: str | PathLike | Path, ensemble_num: int = None) RunData #
- classmethod create_from(source: Mapping[str, dict], ensemble_num: int = None) RunData
- classmethod create_from(source: PairDataCollection, ensemble_num: int = None) RunData
Create a new instance from provided data.
Warns if ensemble_num is specified but contradicts source. If ensemble_num is not specified and is not found in source, the default value is determined by
GeneralParams
.source is usually either a
state.json
file or a- Parameters:
source – File or Python objects from which to initialize RunData.
ensemble_num – Member index in the ensemble (if any).
- get(key, *, name=None)#
Get either a general or a pair-specific parameter.
- save_config(fnm='state.json')#
Saves the run parameters to a log file.
- Parameters:
fnm (str, default='state.json') – Log file for state parameters.
- set(name=None, **kwargs)#
Set either general or pair-specific parameters.
When a name argument is present, sets pair-specific parameters for the named restraint.
When name is not provided, sets general parameters.
- Parameters:
name (str, default=None) – Restraint name, as used in the
brer.run_config.RunConfig
.- Raises:
ValueError – if you provide a name and try to set a general parameter or don’t provide a name and try to set a pair-specific parameter.
- class brer.run_data.GeneralParams(A=50.0, end_time=0.0, ensemble_num=0, iteration=0, num_samples=50, phase='training', production_time=10000.0, sample_period=100.0, start_time=0.0, tau=50, tolerance=0.25)#
Store the parameters shared by all restraints in a single simulation.
These include some of the “Voth” parameters: tau, A, tolerance
New in version 2.0: The end_time parameter.
Changed in version 2.0: ensemble_num now defaults to 0 for consistency with
RunConfig
Update general parameters before a call to
brer.run_config.RunConfig.run()
by callingbrer.run_data.RunData.set()
without a name argument.
- class brer.run_data.PairParams(name, sites, logging_filename=None, alpha=0.0, target=3.0)#
Stores the parameters that are unique to a specific restraint.
PairParams is a mutable structure for run time data. Fields such as alpha and target may be updated automatically while running brer.
PairParams should not be confused with
PairData
(an input data structure).Update pair-specific parameters before a call to
brer.run_config.RunConfig.run()
by callingbrer.run_data.RunData.set()
, providing the pair name with the name argument.logging_filename is derived from the pair name (user-provided; usually derived from the residue IDs defining the pair). Overriding the default produces a warning.
Changed in version 2.0: sites is required to initialize the object.