Detector models¶

During the different detector characterization campaigns [20] as well as ground campaigns [21], a very large amount of data was acquired (about 20 Tb per NISP IR detector). The detector team of the project is already working on deriving the physical model for the pixel response. However, we already know that the use of these models for sky simulation may be limited due to the different conditions of the various campaigns that conducted to a heterogeneous dataset. The goal of this work package will be to explore ML and big-data technics to extent the physical models developed by the detector team and exploit all datasets to classify the different pixel populations that may require specific modeling.

Challenges in Detector Modeling¶

The Detector Model development encounters significant challenges due to the intricate nature of understanding pixel behavior and developing reliable models for data analysis and observation simulation. One of the primary hurdles lies in the dissimilarity between calibration data and flight data acquisition. While calibration data records the entire ramp with a dedicated electronic, flight data, though providing actual electronic responses, offers only the ramp and a quality factor. Bridging the gap between these datasets and transferring knowledge from calibration to flight data is exceptionally complex. This challenge demands innovative approaches and rigorous methodologies to ensure the accurate modeling of pixel behavior.

Our Approach¶

The readout electronic that was used during the characterization campaigns of the NISP near-infrared detector is designed to read pixels in a multiple accumulated sampling mode for which the signal is sampled non-destructively every 1.48 s. At the end of the integration time one obtains, for every pixel, a signal which increases with increasing sampling time. Thereafter, one will refer this signal as a ramp. From that point one can classify pixels in two distinct categories which are pixels with normal behavior (ramp increasing linearly with time) and pixels with abnormal behaviors (non-linear increases of the ramp). However, the distinction between those two populations isn’t straight forward as non-linear effects as the natural pixel non-linearity, persistence or interpixel capacitance add up to the signal, producing ramp with various degree of non-linearity depending on the pixel responses as well as on illumination history of the pixels. In addition, their exist an heterogenous population of abnormal behavior for the pixels which may be classified in different population, with some of those pixels being bad for science while other might still be usable and would require special processing and modeling.

During the characterization campaign, detector where either maintained in dark condition or illuminated with homogenous and continuous illumination. Additionally, characterization campaign has shown that NISP’s detectors have a relatively homogenous pixel response across their surface. Hence, good pixels, despite being slightly different from each other, behaves similarly to their neighbors. One will therefore take advantage of the homogenous properties of the pixels to identify and classify pixels with abnormal behavior. The first step in this task will be to use ML technic based on principal component analyses (PCA). PCA aim at reducing the number of dimensions of the dataset by identifying a set of parameters that maximize variance and that are uncorrelated between each other Jolliffe et al. 2018. One expects from PCA analyses of the ramps to obtain population of pixels with different values of their principal components which can be used to classify pixels. Such a method has already been used successfully in other works in identifying bad pixels from dark frame images Lopez-Alonso et al. 2002 and 2003. In this WP, one will not limit our analyses to dark frame, and one will account for the various illumination level and histories tested during our characterization campaign, allowing to identify different population of abnormal pixels. Unsupervised ML PCA will be used to deals with the large amount of data and to identify eigen vectors from our dataset.

One drawback of the PCA is that classification rely on the definition of thresholds to separate different populations. In some cases, the separation between different populations is tenuous and the use of a threshold leads to miss-identification. Therefore, one would also like to combine PCA to random forest classifier. Random forest relies on an ensemble of decision trees that work together to label dataset based on the features present in the data. However, random forest usually outperforms when using correlated data and their performance can be improved by removing correlation present in the data Zhoa et al. 2014. Due to intra-pixel capacitance and correlated noises, there are a certain degree of correlation between the pixels of the NISP detectors. Combining PCA to random forest, as suggested by Gardner et al. 2021, may improve precision of the random forest labelling as PCA would reduce the dataset to set of uncorrelated eigenvectors.

Finally, from the identification of the pixel population, one would like to build a realistic model of the pixel’s responses that will be used as basis of the sky simulator tools WP4. One would proceed in a similar way as what one would do for WP1 ; meaning that ML algorithm would be used to improve the accuracy of the basic model already developed by the NISP detector team. Similarly, to the instrument response, pixel response is a function that convert pixel stimuli to an electrical signal. The transformation is carried out by a set of physical parameters that can be evaluated from the data themselves, on condition that a various set of stimuli were applied onto the pixels. Recent work by Jesús-Valls, Lux & Sànchez 2022 show that constrained bottleneck encoder can be used to evaluate those parameters. Additionally, the algorithm, once trained, can also be used to simulate the detector. One would like to expend this method to the modeling of the pixel response. The algorithm will be trained on the detector characterization data and model and one will test simulation performances to evaluate if this method can be used as a fast pixel simulator.