Image Quality and AI-Readiness (IQAIR)

A task group of the Audiovisual Core maintenance group

Convenor

Core Members

  • Steven Baskauf
  • Henry Bart
  • Xiaojun Wang
  • Bahadır Altıntaş
  • Dom Jebbia
  • Jane Greenberg
  • David Breen
  • Anuj Karpatne
  • Nicolas Bailly
  • Leanna Housel

Motivation

This task group is being created to review and adopt vocabulary terms to represent multimedia image-quality management systems and systems with AI-readiness functionality and establish that the resource is fit for use in biodiversity science applications before users acquire and use the media. The goal of the task group is to adopt the vocabulary terms from FishAIR and other similar projects to the TDWG Audiovisual Core.

Goals Outputs and Outcomes

The aim of this group is to create a list of properties that can be used for image quality and AI readiness of the multimedia data. The source list that will be worked on is available at FishAIR website. (see https://fishair.org/vocabulary.html)

Timeline:

Phase Year 1 Year 2 Year 3
FishAIR AI-Readiness metadata                        
FishAIR Image Quality metadata                        
FishAIR Feature extraction metadata                        
Drexel MRC Automated metadata                        
Metadata from other sources                        
Usage Statistics & Reporting                        


The group will meet once a month and discuss the published FishAIR terms (~70 terms) in the first year, and unpublished terms (~90 terms) in the second year and first part of third year. The additional terms that will possibly be generated from other studies during these years will be added to the unpublished list as well and will be discussed in the third year.

Strategy

Discuss terms starting with FishAIR vocabulary, put them in a useful form that can be ingested by TDWG standards infrastructure.

Becoming Involved

People who want to attend meetings can contact the convenor to be added to the email list. People who want to follow the progress of the group can watch the GitHub repository.

History/Context

  • FishAIR - A fish image dataset with built-in image quality management system. https://fishair.org

  • On Image Quality Metadata, FAIR in ML, AI-Readiness and Reproducibility: FishAIR example. 2023. Y Bakış, X Wang, B Altıntaş, D Jebbia, H Bart. Biodiversity Information Science and Standards 7, e112178.

  • Computational metadata generation methods for biological specimen image collections. 2022. K Karnani, J Pepper, Y Bakış, X Wang, H Bart Jr, DE Breen, J Greenberg. International Journal on Digital Libraries, 25. 1-18. 10.1007/s00799-022-00342-1.

  • Extracting Landmark and Trait Information from Segmented Digital Specimen Images Generated by Artificial Neural Networks. 2022 Y Bakış, B Altıntaş, X Wang, M Maruf, A Karpatne, H Bart. Biodiversity Information Science and Standards 6, e94955.

  • Toward a Flexible Metadata Pipeline for Fish Specimen Images* (2022) D Jebbia, X Wang, Y Bakış, HL Bart Jr, J Greenberg. arXiv preprint arXiv:2211.15472.

  • Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images. 2024. Kazi Sajeed Mehrab, M Maruf, Arka Daw, Harish Babu Manogaran, Abhilash Neog, Mridul Khurana, Bahadir Altintas, Yasin Bakis, Elizabeth G Campolongo, Matthew J Thompson, Xiaojun Wang, Hilmar Lapp, Wei-Lun Chao, Paula M Mabee, Henry L Bart Jr, Wasila Dahdul, Anuj Karpatne. arXiv:2407.08027.

  • Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example. 2023. B Altintas, Y Bakış, X Wang, B Henry. Biodiversity Information Science and Standards 7, e112438.

Summary

The availability of biocollection data attracts scientists and engineers from a variety of disciplines, especially in the field of AI/ML. However, data wrangling takes up almost 80% of the total processing time. Pre-processing the biocollection data by enriching the metadata, structuring, cleaning and filtering the data, and finally publishing the dataset would decrease the time that is spent each time, increasing efficiency of the pipelines and accuracy of the results. The Biodiversity Informatics Team at Tulane University Biodiversity Research Institute has been extracting metadata terms as part of data pre-processing within the NSF supported Biology Guided Neural Networks project. The terms that have been extracted are novel among different standards and of great worth for ML scientists, Biologists, Data engineers, informaticians, Data repositories and whoever wants to work with multimedia data. (see https://fishair.org/vocabulary.html)

This group will increase the size of the Audiovisual Core Terminology especially with the addition of image quality metadata terms.

Resources

There will be a GitHub repository under TDWG organization after this draft charter is approved.