job_stream.baked¶
Pre-baked templates for common distributed operations.
Members¶
-
job_stream.baked.
sweep
(variables={}, trials=0, output=None, trialsParms={}, showProgress=True)[source]¶ Generic experiment framework; wraps a user’s job_stream (realized via
job_stream.inline.Work
. The wrapped job_stream should take a parameter dictionary and return a dictionary of results to track. The numerical statistics of several invocations of the user’s code will be reported. For example:from job_stream.baked import sweep import numpy as np with sweep({ 'a': np.arange(10) }) as w: @w.job def square(id, trial, a): return { 'value': a*a + np.random.random() }
The above will print out a CSV file (in a legible, table format) that details the mean of
value
fora = 0, a = 1, ..., a = 9
. Additionally, the standard deviation and expected (95% confidence) error of the reported mean will be printed.Note
While training e.g. a machine learning model, it may be desirable to print the model’s accuracy at different points throughout the training. To accomplish this, it is recommended that the user code remember the accuracy throughout training and multiplex the column in the final answer (e.g.,
value_at_0
,value_at_1000
, etc).Parameters: - variables (any) –
Any of:
dict
:{ 'parameter name': [ 'values to try' ] }
.Tries every combination of values for the specified parameters.
Often,
values to try
will come from a function such asnumpy.linspace
, which generates a number of intermediate values on a range.Warning
When more than one parameter is specified, the number of experiments that will run will be the multiplication of the number of values to try for each; that is, all combinations are tried. This will quickly take a long time to run, so be careful.
list
:[ { 'parameter name': 'value' } ]
.Tries only the specified combinations. The dictionaries are passed as-is.
Regardless of the type passed, the arguments seen by the user’s jobs will also include
id
, a unique identifier for the combination of parameters, andtrial
, the current iteration of that combination. - trials (int) –
The number of times to try each parameter combination.
If greater than zero, this number is exact.
If zero, the number will be automatically discerned based on the standard deviations of each returned property. More specifically, the algorithm proposed by Driels et al. in 2004, “Determining the Number of Iterations for Monte Carlo Simulations of Weapon Effectiveness,” will be used to guarantee that every mean returned will be within 10% of the true value with 95% confidence.
If less than zero, the same algorithm is used as for a zero value, but the number of trials ran will not exceed
abs(trials)
. - output (str) – If
None
, results are printed to stdout and the program exits. If anything else, presumed to be the path to a CSV file where the results will be dumped. - trialsParms (dict) –
A dictionary of keys used to configure the auto-detect used to determine the number of trials needed.
- E
- The percentage of relative error between the true mean and the reported mean. For instance, if E is 0.1, then there is a 95% confidence that the true value is on the range of the estimated value * (1. +- 0.1). Setting to 0 will disable this stopping criteria. Defaults to 0.1.
- eps
- The absolute minimum estimated error. Setting to 0 will disable this stopping criteria. Defaults to 1e-2.
- min
- The minimum number of trials to run. Defaults to 3, as this is usually enough to get a stable idea of the variable’s standard deviation.
- max
- The maximum number of trials to run. Defaults to 10000. May
also be specified by setting the argument
trials
to a negative number.
- showProgress (bool) – If True (default), then print out progress indicators to stderr as work is completed.
Returns: Nothing is returned. However, both stdout and, if specified, the csv indicated by
output
will have a grid with all trial ids, parameters, and results printed. Each result column will represent the mean value; additionally, the estimated standard deviation and the 95% confidence error of the reported mean will be printed.- variables (any) –