| CaaM [207] | ICCV 2021 | Causal attention module, visual recognition | Attention learning | Back-door adjustment |
| - [210] | ICLR 2019 | Meta-learn causal structures, representation learning | Meta-learn | Causal intervention |
| ICIN [211] | 2019 | Goal-conditioned policy, visual observation, causal induction | Meta-learn | Causal inference |
| IFSL [216] | NeurIPS 2020 | Meta-learn, few-shot learning, causal intervention | Meta-learn | Back-door adjustment |
| MLM [308] | ICCV 2021 | Object detection, automatic drive, masked language models | Object detection | Causal intervention |
| CIM [238] | ICTAI 2021 | Object detection, visual context, causal intervention | Object detection | Back-door adjustment |
| - [239] | TPAMl 2022 | Domain adaptation model, object detection, causal intervention, representation learning | Object detection | Back-door adjustment |
| MAD [240] | CVPR 2023 | Domain adversarial learning, domain shift, causal factors, non-causal factors | Object detection | Causal learning |
| D&R [217] | AAAI 2023 | Few-shot learning, knowledge distillation, few-shot object detection, structural causal model, causal intervention | Object detection | Back-door adjustment |
| DCFD [241] | AAAI 2022 | Unsupervised salient object detection, debiasing framework, causal intervention | Object detection | Back-door adjustment |
| CMAT [241] | ICCV 2019 | Scene graph generation, counterfactual critic, multi-agent policy | Scene graph generation | Counterfactual |
| TDE [247] | CVPR 2020 | Scene graph generation, unbiased learning, counterfactual causality | Scene graph generation | Counterfactual |
| TsCM [248] | TPAMI 2023 | Scene graph generation, causal inference, counterfactuals, representation learning, long-tailed distribution | Scene graph generation | Causal intervention |
| Causal-SETR [223] | ACCV 2022 | Causal intervention, vision transformer, semantic segmentation | Semantic segmentation | Back-door adjustment |
| CauSSL [225] | ICCV 2023 | Semisupervised learning, medical image analysis, image segmentation, causal diagram | Semantic segmentation | Causal diagram |
| CAUSE [229] | 2023 | Unsupervised semantic segmentation, causal intervention, self-supervised learning | Semantic segmentation | Front-door adjustment |
| CausalCellSegmenter [224] | 2024 | Causal inference, feature aggregation, cell nucleus segmentation, pathology image | Semantic segmentation | Causal inference |
| CityCAN [208] | 2024 | Spatiotemporal data mining, causal intervention, attention | Semantic segmentation | Causal intervention |
| IVG [251] | CVPR 2021 | Video grounding , contrastive learning, causal intervention | Video analysis | Back-door adjustment |
| Causalainer [252] | CVPR 2023 | Video summarization, explainability, causal semantics extractor | Video analysis | Causal learning |
| - [249] | CVPR 2019 | Multimodal explanations, video understanding, counterfactual explanations | Video analysis | Counterfactual |
| TS-PCA [250] | CVPR 2021 | Weakly supervised temporal action localization, video understanding | Video analysis | Causal intervention |
| MCR [100] | CVPR 2023 | Video question answering, causal intervention, multimodal causal inference | Video question answering | Back-door adjustment |
| CaVIR [259] | ICCV 2023 | Video question answering, multiple contexts, context-aware | Video question answering | Causal inference |
| VCSR [265] | ACM 2023 | Video question answering, causal inference, cross-modal | Video question answering | Front-door adjustment |
| LLCP [266] | ICLR 2023 | Video question answering, latent causal processes, self-supervised model | Video question answering | Counterfactual prediction |
| VC R-CNN [222] | CVPR 2020 | Visual common sense, unsupervised learning, feature representation | Visual common sense | Causal intervention |
| CATT [209] | CVPR 2021 | Causal attention module, causal intervention | Visual question answering | Front-door adjustment |
| - [259] | CVPR 2020 | Visual question answering, counterfactual, task analysis, machine learning | Visual question answering | Counterfactual |
| CF-VQA [258] | CVPR 2021 | Computer vision, linguistics, robustness, counterfactual inference | Visual question answering | Counterfactual |
| CSS [261] | CVPR 2020 | Visual-explainable, question-sensitive, counterfactual samples | Visual question answering | Counterfactual |
| - [260] | CVPR 2020 | Semantic editing, robustness, synthetic dataset, data augmentation | Visual question answering | Counterfactual |
| DeVLBert [262] | ACM 2020 | Multi-modal pretraining, out-of-domain, debias, back-door adjustment, BERT | Visual question answering | Back-door adjustment |
| VLCI [263] | 2023 | Radiology report generation, visual language pretraining model, cross-modal reasoning | Visual question answering | Front-door adjustment |
| CMCIR [267] | TPAMI 2023 | Visual question answering, cross-modal reasoning, video event understanding | Visual question answering | Front-door adjustment & back-door adjustment |
| CONTA [46] | NeurIPS 2020 | Weakly supervised semantic segmentation, context adjustment, causal inference | Weakly supervised semantic segmentation | Back-door adjustment |
| C-CAM [228] | CVPR 2022 | Weakly supervised semantic segmentation, medical images, class activation mapping | Weakly supervised semantic segmentation | Causal intervention |
| CF [221] | 2021 | Zero-shot semantic segmentation, counterfactual, causal inference | Zero-shot semantic segmentation | Counterfactual |
| - [255] | PRL 2021 | Action recognition, causal graph structures, causal relationship, recognition of falls | Action recognition | Causal graph structures |
| CISNet [256] | AAAI 2022 | Causal diagram, subject-invariant facial action unit, causal intervention | Action recognition | Causal intervention |
| DeCalGAN [220] | TMM 2023 | Zero-shot learning, action recognition, causal inference | Zero-shot learning | Causal intervention |
| - [218] | NIPS 2020 | Zero-shot learning, feature compositionality, causal inference | Zero-shot learning | Causal intervention |
| CaML [219] | NIPS 2024 | Meta-learning, causal inference, zero-shot learning | Zero-shot learning | Causal intervention |
| - [235] | NIPS 2020 | Long-tailed classification, back-door adjustment, re-balanced training | Image classification | Causal intervention |
| TLT [234] | CVF 2023 | Noisy image classification, causal inference, attention mechanism | Image classification | Causal intervention |
| - [232] | ICIP 2021 | Visual causality, contrastive explanations, gradients, causal metrics | Image classification | Causal inference |