User API#

BRER runner#

RunConfig class handles the actual workflow logic.

class brer.run_config.RunConfig(tpr, ensemble_dir, ensemble_num=None, pairs_json='pair_data.json')#

Run configuration for single BRER ensemble member.

The run configuration specifies the files and directory structure used for the run. It determines whether the run is in the training, convergence, or production phase, then performs the run.

Note that all instances of RunConfig need the same sized array of TPR input files across all ranks in an MPI ensemble because they must all be capable of constructing a compatible copy of the ensemble simulation work description.

Source data for the pair restraints is provided through a JSON file (pairs_json). See brer.pair_data.PairDataCollection for details.

Parameters:
  • tpr (str) – path (or paths) to tpr input. Must be compatible with the GROMACS version providing gmxapi.

  • ensemble_dir (str) – path to top directory which contains the full ensemble.

  • ensemble_num (int, default=0) – the ensemble member to run

  • pairs_json (str, default="pair_data.json") – path to file containing ALL the pair metadata. (A serialized brer.pair_data.PairDataCollection.)

build_plugins(plugin_config)#

Builds the plugin configuration.

For each pair-wise restraint, populate the plugin with data: both the “general” data and the data unique to that restraint.

Parameters:

plugin_config (PluginConfig) – the particular plugin configuration (Training, Convergence, Production) for the run.

run(tpr_file=None, **kwargs)#

Perform the MD simulations.

Each Python interpreter process runs a separate ensemble member.

Parameters:
  • tpr_file (str, optional) – If provided, use this input file instead of the input from the main configuration.

  • **kwargs (optional) – Additional key word arguments are passed on to the simulator.

After the first “iteration”, brer bootstraps the training and convergence phase’s trajectory with the checkpoint file from the previous iteration’s production phase.

At the beginning of a production phase (when there is not yet a checkpoint file), the checkpoint file from the convergence phase is used to start the production trajectory unless tpr_file is given.

When tpr_file is not None, run() does not look for a bootstrapping checkpoint file. This can be helpful if a checkpoint file is corrupted or unavailable. In general, this means that the tpr_file argument should include the starting configuration you intend for the phase that you are about to run(). If you are providing the tpr_file because you are changing parameters that render existing checkpoints incompatible, you need to either generate the file with the checkpoint from which you want to continue, or you may remove the checkpoint file from the phase directory and restart that phase.

Example

>>> config_params = {
...     "tpr": "{}/topol.tpr".format(data_dir),
...     "ensemble_num": 1,
...     "ensemble_dir": tmpdir,
...     "pairs_json": "{}/pair_data.json".format(data_dir)
... }
>>> rc = RunConfig(**config_params)
>>> assert rc.run_data.get('phase') == 'training'
>>> rc.run(threads=2)
>>> assert rc.run_data.get('phase') == 'convergence'
>>> rc.run()
>>> assert rc.run_data.get('phase') == 'production'
>>> rc.run(tpr_file=new_tpr, max_hours=23.9)

BRER parameters#

pair_data module#

BRER data for site pairs.

Coordinate the experimental reference data and molecular model data for the labeled / restrained pairs.

Support the statistical (re)sampling of target pair distances when beginning a BRER iteration.

class brer.pair_data.PairData(name, bins, distribution, sites)#

Pair distance distribution.

Essential pair data to support BRER MD plugin code. All data must be provided when the object is initialized. (Fields are read-only.)

Fields here correspond to the fields for each named pair (JSON objects) in a pair_data.json file (the pairs_json argument of RunConfig()).

Changed in version 2.0: When reading from pair_data.json or state.json, the name used for PairData comes from the object key, not from the name field of the object. The name field in the serialized (JSON) representation is ignored when reading, but is preserved for backward compatibility when writing.

Parameters:
bins: list[float]#

Histogram edges for the distance distribution data.

(Simulation length units.)

distribution: list[float]#

Site distance distribution.

Histogram values (weights or relative probabilities) for distances between the sites. (Generally derived from experimental data.)

name: str#

Identifier for the pair of sites on the molecule.

This string is chosen by the researcher. For example, the name may include identifiers for the two residues in a scheme that can be easily cross-referenced with experimental data.

sites: list[int]#

Indices defining the distance vector.

A list of indices for sites in the molecular model. The first and last list elements are the sites associated with the distance data. Additional indices can be inserted in the list to define a chain of distance vectors that will be added without applying periodic boundary conditions.

If, at any point in the simulation, the two molecular sites in the pair might be farther apart than half of the shortest simulation box dimension, the distance might accidentally get calculated between sites on different molecule “images” (periodic boundary conditions). To make sure that site-site distances are calculated on the same molecule, provide a sequence of sites on the molecule (that are never more than half a box-length apart) so that the correct vector between sites is unambiguous.

class brer.pair_data.PairDataCollection(*pairs)#

Data for all the restrained pairs in a BRER simulation.

Source data for the pair restraints is provided through a JSON file (pairs_json). The JSON file contains one JSON object for each PairData to be read.

For each object, the object key is assumed to be the PairData.name of a pair. The JSON object contents are used to initialize a PairData for each named pair.

The data file is usually constructed manually by the researcher after inspection of a molecular model and available experimental data. An example of what such a file should look like is provided in the brer/data directory of the installed package or in the source repository.

Note that JSON is not a Python-specific file format, but json may be helpful.

A PairDataCollection can be initialized from a sequence of PairData objects, or created from a JSON pair data file by using create_from().

Parameters:

pairs (PairData) –

as_dict()#

Encode the full collection as a single Python dictionary.

static create_from(filename)#

Reads pair data from json file.

Parameters:

filename (str | PathLike | Path) – filename of the pair data

brer.pair_data.sample(pair_data)#

Choose a bin edge according to the probability distribution.

Parameters:

pair_data (PairData) –

brer.pair_data.sample_all(pairs)#

Get a mapping of pair names to freshly sampled targets.

Parameters:

pairs (PairDataCollection) –

run_data module#

Handle simulation data for BRER simulations.

RunData manages general parameters (GeneralParams) and pair-specific parameters (PairParams) for a single simulator for a specific phase of the BRER method.

Parameters are initially provided through the RunConfig, and are then stored to (and restored from) an internally managed state.json file.

Not all parameters are applicable to all BRER phases.

class brer.run_data.RunData(*, general_params, pair_params)#

Store (and manipulate, to a lesser extent) all the metadata for a BRER run.

The full set of metadata for a single BRER run includes both the general parameters and the pair-specific parameters.

Key-value pairs provided to general_params will be used to update the default values of a new GeneralParams instance.

pair_params is a mapping of named PairParams instances.

Both general and pair-specific parameters may be updated with set().

This is the BRER program state data structure. We avoid the name “state” because of potential confusion with concepts like energetic, conformational, or thermodynamic state, but we use the filename state.json for the serialized object. RunData instances can be serialized to a file with save_config() or deserialized (restored from a file) with create_from().

Examples

├── pair parameters
│   ├── name of pair 1
│   │   ├── alpha
│   │   ├── target
│   │   └── ...
│   ├── name of pair 2
|
├── general parameters
    ├── A
    ├── tau
    ├── ...
Parameters:
as_dictionary()#

Get the run metadata as a hierarchical dictionary.

Returns:

hierarchical dictionary of metadata

Return type:

dict

For historical reasons, the top level dictionary keys are not exact string matches for the object attributes.

classmethod create_from(source: str | PathLike | Path, ensemble_num: int = None) RunData#
classmethod create_from(source: Mapping[str, dict], ensemble_num: int = None) RunData
classmethod create_from(source: PairDataCollection, ensemble_num: int = None) RunData

Create a new instance from provided data.

Warns if ensemble_num is specified but contradicts source. If ensemble_num is not specified and is not found in source, the default value is determined by GeneralParams.

source is usually either a state.json file or a

Parameters:
  • source – File or Python objects from which to initialize RunData.

  • ensemble_num – Member index in the ensemble (if any).

get(key, *, name=None)#

Get either a general or a pair-specific parameter.

Parameters:
  • key (str) – The parameter to get.

  • name (str, default=None) – If getting a pair-specific parameter, specify the restraint name.

Returns:

the parameter value.

Return type:

Any

save_config(fnm='state.json')#

Saves the run parameters to a log file.

Parameters:

fnm (str, default='state.json') – Log file for state parameters.

set(name=None, **kwargs)#

Set either general or pair-specific parameters.

When a name argument is present, sets pair-specific parameters for the named restraint.

When name is not provided, sets general parameters.

Parameters:

name (str, default=None) – Restraint name, as used in the brer.run_config.RunConfig.

Raises:

ValueError – if you provide a name and try to set a general parameter or don’t provide a name and try to set a pair-specific parameter.

class brer.run_data.GeneralParams(A=50.0, end_time=0.0, ensemble_num=0, iteration=0, num_samples=50, phase='training', production_time=10000.0, sample_period=100.0, start_time=0.0, tau=50, tolerance=0.25)#

Store the parameters shared by all restraints in a single simulation.

These include some of the “Voth” parameters: tau, A, tolerance

New in version 2.0: The end_time parameter.

Changed in version 2.0: ensemble_num now defaults to 0 for consistency with RunConfig

Update general parameters before a call to brer.run_config.RunConfig.run() by calling brer.run_data.RunData.set() without a name argument.

Parameters:
class brer.run_data.PairParams(name, sites, logging_filename=None, alpha=0.0, target=3.0)#

Stores the parameters that are unique to a specific restraint.

PairParams is a mutable structure for run time data. Fields such as alpha and target may be updated automatically while running brer.

PairParams should not be confused with PairData (an input data structure).

Update pair-specific parameters before a call to brer.run_config.RunConfig.run() by calling brer.run_data.RunData.set(), providing the pair name with the name argument.

logging_filename is derived from the pair name (user-provided; usually derived from the residue IDs defining the pair). Overriding the default produces a warning.

Changed in version 2.0: sites is required to initialize the object.

Parameters: