Car stopping distance problem¤
In this example, we will implement a data-driven process that generates output for a data-driven experiment. We will use the ‘car stopping distance’ problem as an example.
Defining the problem¤
Car stopping distance \(y\) as a function of its velocity \(x\) before it starts braking:
\(y = z x + \frac{1}{2 \mu g} x^2 = z x + 0.1 x^2\) - \(z\) is the driver's reaction time (in seconds) - \(\mu\) is the road/tires coefficient of friction (we assume \(\mu=0.5\)) - \(g\) is the acceleration of gravity (assume \(g=10 m/s^2\)).
\(y = d_r + d_{b}\) - where \(d_r\) is the reaction distance, and \(d_b\) is the braking distance.
Reaction distance \(d_r\):
\(d_r = z x\) - with \(z\) being the driver's reaction time, \(x\) being the velocity of the car at the start of braking.
Kinetic energy of moving car:
\(E = \frac{1}{2}m x^2\) - where \(m\) is the car mass.
Work done by braking:
\(W = \mu m g d_b\) - where \(\mu\) is the coefficient of friction between the road and the tire, \(g\) is the acceleration of gravity, and \(d_b\) is the car braking distance.
The braking distance follows from \(E=W\):
\(d_b = \frac{1}{2\mu g}x^2\)
Therefore, if we add the reacting distance \(d_r\) to the braking distance \(d_b\) we get the stopping distance \(y\):
\(y = d_r + d_b = z x + \frac{1}{2\mu g} x^2\)
Every driver has its own reaction time \(z\). Assume the distribution associated to \(z\) is Gaussian with mean \(\mu_z=1.5\) seconds and variance \(\sigma_z^2=0.5^2\) seconds \(^2\):
\(z \sim \mathcal{N}(\mu_z=1.5,\sigma_z^2=0.5^2)\)
We create a function that generates the stopping distance \(y\) given the velocity \(x\) and the reaction time \(z\).
from scipy.stats import norm
def y(x):
z = norm.rvs(1.5, 0.5, size=1)
y = float(z * x + 0.1 * x**2)
return y
Next, we create a design-of-experiments by creating a Domain object with \(x\) as the car velocity:
from f3dasm.design import Domain
domain = Domain()
domain.add_float("x", low=0.0, high=100.0)
Invalid MIT-MAGIC-COOKIE-1 key
For demonstration purposes, we will generate a dataset of stopping distances for velocities between 3 and 83 m/s.
import numpy as np
from f3dasm import ExperimentData
N = 33 # number of points to generate
Data_x = np.linspace(3, 83, 100)
experiment_data = ExperimentData(input_data=Data_x, domain=domain)
experiment_data
jobs input
x
0 open 3.000000
1 open 3.808081
2 open 4.616162
3 open 5.424242
4 open 6.232323
.. ... ...
95 open 79.767677
96 open 80.575758
97 open 81.383838
98 open 82.191919
99 open 83.000000
[100 rows x 2 columns]
As you can see, the ExperimentData object has been created successfully and the jobs have the label ‘open’. This means that the output has not been generated yet. Now, we want to compute the stopping distance for each velocity in the design-of-experiments. There are several ways to approach this with f3dasm:
Method 1: Use the Block abstraction directly:¤
We create a new class CarStoppingDistance that inherits from Block and implements the call method accordingly:
from f3dasm import Block
class CarStoppingDistance(Block):
def call(self, data: ExperimentData) -> ExperimentData:
for _id, experiment_sample in data:
# Extract the car velocity x from the experiment sample
x = experiment_sample.input_data["x"]
# Evaluate the stopping distance y(x)
distance = y(x)
# Store the stopping distance back in the experiment sample
experiment_sample.store(name="distance", object=distance)
# Mark the experiment as finished
experiment_sample.mark("finished")
# After all experiments are finished, return the data
return data
We create a new instance of CarStoppingDistance and run it on our experiments:
car_stopping_distance = CarStoppingDistance()
experiment_data = car_stopping_distance.call(experiment_data)
experiment_data
jobs input output
x distance
0 finished 3.000000 6.248363
1 finished 3.808081 4.075474
2 finished 4.616162 8.385220
3 finished 5.424242 12.908499
4 finished 6.232323 14.330119
.. ... ... ...
95 finished 79.767677 746.832001
96 finished 80.575758 768.961481
97 finished 81.383838 795.442547
98 finished 82.191919 846.343735
99 finished 83.000000 785.128937
[100 rows x 3 columns]
Method 2: Using the DataGenerator class:¤
The DataGenerator class is a wrapper around the Block class that simplifies the process of running a function on every experiment. Instead of implementing the call() method of this block and operating on the whole ExperimentData, we implement an execute() method that operates on each ExperimentSample iteratively. The currently processed ExperimentSample is stored in the experiment_sample attribute of the DataGenerator class.
from f3dasm import ExperimentSample
from f3dasm.datageneration import DataGenerator
class CarStoppingDistanceDataGenerator(DataGenerator):
def execute(self, experiment_sample: ExperimentSample) -> ExperimentSample:
# Extract the car velocity x from the experiment sample
x = experiment_sample.input_data["x"]
# Evaluate the stopping distance y(x)
distance = y(x)
# Store the stopping distance back in the experiment sample
experiment_sample.store(name="distance", object=distance)
# Return the experiment sample
return experiment_sample
car_stopping_distance_datagenerator = CarStoppingDistanceDataGenerator()
# Recreate the experiment data
experiment_data = ExperimentData(input_data=Data_x, domain=domain)
# Evaluate the experiment data on the DataGenerator
experiment_data = car_stopping_distance_datagenerator.call(
experiment_data, mode="sequential"
)
experiment_data
jobs input output
x distance
0 finished 3.000000 5.761669
1 finished 3.808081 5.868567
2 finished 4.616162 8.538199
3 finished 5.424242 12.119334
4 finished 6.232323 9.641409
.. ... ... ...
95 finished 79.767677 830.357611
96 finished 80.575758 780.974813
97 finished 81.383838 786.697678
98 finished 82.191919 826.837751
99 finished 83.000000 803.162397
[100 rows x 3 columns]
There are three methods available of evaluating the experiments:
'sequential': regular for-loop over each of the experiments in order'parallel': utilizing the multiprocessing capabilities (with the pathos multiprocessing library), each experiment is run in a separate core'cluster': each experiment is run in a seperate node. This is especially useful on a high-performance computation cluster where you have multiple worker nodes and a commonly accessible resource folder. After completion of an experiment, the node will automatically pick the next available open experiment.
Method 3: Using the function directly¤
The function y(x) can be transformed to a DataGenerator by decorating it with the f3dasm.datagenerator wrapper.
In order to use this method, we need to specify the output_names of the return values of the function y(x) in the decorator:
from f3dasm import datagenerator
@datagenerator(output_names="distance")
def y(x):
z = norm.rvs(1.5, 0.5, size=1)
y = float(z * x + 0.1 * x**2)
return y
Multiple output_names can be passed as a list. The function y can now be used as a DataGenerator object and called with the ExperimentData object.
# Recreate the experiment data
experiment_data = ExperimentData(input_data=Data_x, domain=domain)
# Evaluate the experiment data on the DataGenerator
experiment_data = y.call(experiment_data, mode="sequential")
experiment_data
jobs input output
x distance
0 finished 3.000000 5.033485
1 finished 3.808081 8.458842
2 finished 4.616162 7.443685
3 finished 5.424242 8.883606
4 finished 6.232323 17.203309
.. ... ... ...
95 finished 79.767677 722.595927
96 finished 80.575758 765.167271
97 finished 81.383838 719.652285
98 finished 82.191919 814.625470
99 finished 83.000000 847.015209
[100 rows x 3 columns]
If you want to use the datagenerator in a functional approach or it comes from an external library, the following is equivalent:
y_datagenerator = datagenerator(output_names='y')(y)
experiment_data = y_datagenerator.call(experiment_data, mode='sequential')
Next: Cluster Execution