job_stream.baked

Pre-baked templates for common distributed operations.

Members

job_stream.baked.sweep(variables={}, trials=0, output=None, trialsParms={}, showProgress=True)[source]

Generic experiment framework; wraps a user’s job_stream (realized via job_stream.inline.Work. The wrapped job_stream should take a parameter dictionary and return a dictionary of results to track. The numerical statistics of several invocations of the user’s code will be reported. For example:

from job_stream.baked import sweep
import numpy as np

with sweep({ 'a': np.arange(10) }) as w:
    @w.job
    def square(id, trial, a):
        return { 'value': a*a + np.random.random() }

The above will print out a CSV file (in a legible, table format) that details the mean of value for a = 0, a = 1, ..., a = 9. Additionally, the standard deviation and expected (95% confidence) error of the reported mean will be printed.

Note

While training e.g. a machine learning model, it may be desirable to print the model’s accuracy at different points throughout the training. To accomplish this, it is recommended that the user code remember the accuracy throughout training and multiplex the column in the final answer (e.g., value_at_0, value_at_1000, etc).

Parameters:
  • variables (any) –

    Any of:

    • dict: { 'parameter name': [ 'values to try' ] }.

      Tries every combination of values for the specified parameters.

      Often, values to try will come from a function such as numpy.linspace, which generates a number of intermediate values on a range.

      Warning

      When more than one parameter is specified, the number of experiments that will run will be the multiplication of the number of values to try for each; that is, all combinations are tried. This will quickly take a long time to run, so be careful.

    • list: [ { 'parameter name': 'value' } ].

      Tries only the specified combinations. The dictionaries are passed as-is.

    Regardless of the type passed, the arguments seen by the user’s jobs will also include id, a unique identifier for the combination of parameters, and trial, the current iteration of that combination.

  • trials (int) –

    The number of times to try each parameter combination.

    If greater than zero, this number is exact.

    If zero, the number will be automatically discerned based on the standard deviations of each returned property. More specifically, the algorithm proposed by Driels et al. in 2004, “Determining the Number of Iterations for Monte Carlo Simulations of Weapon Effectiveness,” will be used to guarantee that every mean returned will be within 10% of the true value with 95% confidence.

    If less than zero, the same algorithm is used as for a zero value, but the number of trials ran will not exceed abs(trials).

  • output (str) – If None, results are printed to stdout and the program exits. If anything else, presumed to be the path to a CSV file where the results will be dumped.
  • trialsParms (dict) –

    A dictionary of keys used to configure the auto-detect used to determine the number of trials needed.

    E
    The percentage of relative error between the true mean and the reported mean. For instance, if E is 0.1, then there is a 95% confidence that the true value is on the range of the estimated value * (1. +- 0.1). Setting to 0 will disable this stopping criteria. Defaults to 0.1.
    eps
    The absolute minimum estimated error. Setting to 0 will disable this stopping criteria. Defaults to 1e-2.
    min
    The minimum number of trials to run. Defaults to 3, as this is usually enough to get a stable idea of the variable’s standard deviation.
    max
    The maximum number of trials to run. Defaults to 10000. May also be specified by setting the argument trials to a negative number.
  • showProgress (bool) – If True (default), then print out progress indicators to stderr as work is completed.
Returns:

Nothing is returned. However, both stdout and, if specified, the csv indicated by output will have a grid with all trial ids, parameters, and results printed. Each result column will represent the mean value; additionally, the estimated standard deviation and the 95% confidence error of the reported mean will be printed.