Testing callbacks
AbstractStopAndGoCallback
Bases: ABC
, BaseInterruptedVsContinuousCallback
Abstract base class for stop-and-go callback to compare metadata before pausing and after resuming training.
This base class provides utility methods to help streamline stop and go comparison.
Provided methods
- init: initializes the callback with the given mode.
- get_metadata: abstract method that should be overridden to get metadata from the trainer and pl_module.
Default behaviors
- in stop mode, metadata is gotten and compared on_validation_epoch_end.
- in go mode, metadata is gotten and saved on_train_epoch_start.
Override these behaviors if necessary.
Source code in bionemo/testing/testing_callbacks.py
205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 |
|
__init__(mode=Mode.STOP)
Initialize StopAndGoCallback.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
mode
|
str
|
Mode to run in. Must be either Mode.STOP or Mode.RESUME. Defaults to Mode.STOP. |
STOP
|
Notes
User must override get_metadata to get metadata from the trainer and pl_module.
Source code in bionemo/testing/testing_callbacks.py
221 222 223 224 225 226 227 228 229 230 231 232 233 |
|
get_metadata(trainer, pl_module)
abstractmethod
Get metadata from trainer and pl_module.
Source code in bionemo/testing/testing_callbacks.py
235 236 237 238 |
|
BaseInterruptedVsContinuousCallback
Bases: Callback
, CallbackMethods
, IOMixin
Base class for serializable stop-and-go callback to compare continuous to interrupted training.
This class is used by extending a callback and collecting data into the self.data
attribute. This data is then
compared between continuous and interrupted training.
See nemo.lightning.megatron_parallel.CallbackMethods for the available callback methods.
Source code in bionemo/testing/testing_callbacks.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
__deepcopy__(memo)
Don't actually attempt to copy this data when this callback is being serialized.
Source code in bionemo/testing/testing_callbacks.py
61 62 63 |
|
__init__()
Initializes the callback.
Source code in bionemo/testing/testing_callbacks.py
57 58 59 |
|
ConsumedSamplesCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback to check consumed samples before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
86 87 88 89 90 91 92 93 94 95 96 97 |
|
on_megatron_step_start(step)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
89 90 91 92 93 94 95 96 97 |
|
GlobalStepStateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback for global_step before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
76 77 78 79 80 81 82 83 |
|
on_megatron_step_start(step)
Get learning rate as metadata.
Source code in bionemo/testing/testing_callbacks.py
79 80 81 82 83 |
|
LearningRateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback for learning rate before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
66 67 68 69 70 71 72 73 |
|
on_megatron_step_start(step)
Get learning rate as metadata.
Source code in bionemo/testing/testing_callbacks.py
69 70 71 72 73 |
|
OptimizerStateCallback
Bases: BaseInterruptedVsContinuousCallback
Stop-and-go callback to check optimizer states before pausing and after resuming training.
Source code in bionemo/testing/testing_callbacks.py
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
|
on_megatron_step_start(step)
Get optimizer states as metadata.
Source code in bionemo/testing/testing_callbacks.py
191 192 193 194 195 196 197 198 199 200 201 202 |
|
RaiseAfterMetadataCallback
Bases: Callback
A callback that raises a StopAndGoException after the validation epoch.
Use this callback for pytest based Stop and go tests.
Source code in bionemo/testing/testing_callbacks.py
36 37 38 39 40 41 42 43 44 45 |
|
TrainInputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training input samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
100 101 102 103 104 105 106 107 108 109 110 111 112 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
103 104 105 106 107 108 109 110 111 112 |
|
TrainLossCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training loss samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
160 161 162 163 164 165 166 167 168 169 170 171 |
|
on_megatron_step_end(step, microbatch_outputs, reduced=None)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
163 164 165 166 167 168 169 170 171 |
|
TrainOutputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training output samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
133 134 135 136 137 138 139 140 141 142 |
|
TrainValInitConsumedSamplesStopAndGoCallback
Bases: AbstractStopAndGoCallback
Stop-and-go callback to check consumed samples before pausing and after resuming training.
This is currently the only callback that doesn't fit with the new pattern of directly comparing continuous and interrupted training, since the dataloaders don't track their consumed_samples before and after checkpoint resumption.
Source code in bionemo/testing/testing_callbacks.py
249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
|
get_metadata(trainer, pl_module)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
257 258 259 260 261 262 263 |
|
ValidInputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect validation input samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
118 119 120 121 122 123 124 125 126 127 |
|
ValidLossCallback
Bases: BaseInterruptedVsContinuousCallback
Collect training loss samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
174 175 176 177 178 179 180 181 182 183 184 185 |
|
on_megatron_step_end(step, microbatch_outputs, reduced=None)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
177 178 179 180 181 182 183 184 185 |
|
ValidOutputCallback
Bases: BaseInterruptedVsContinuousCallback
Collect validation output samples for comparison.
Source code in bionemo/testing/testing_callbacks.py
145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
on_megatron_microbatch_end(step, batch, forward_callback, output)
Get consumed samples as metadata.
Source code in bionemo/testing/testing_callbacks.py
148 149 150 151 152 153 154 155 156 157 |
|