Image Quality and AI-Readiness (IQAIR)
A task group of the Audiovisual Core maintenance group
Convenor
- Yasin Bakış - Tulane University
Core Members
- Steven Baskauf
- Henry Bart
- Xiaojun Wang
- Bahadır Altıntaş
- Dom Jebbia
- Jane Greenberg
- David Breen
- Anuj Karpatne
- Nicolas Bailly
- Leanna Housel
Motivation
This task group is being created to review and adopt vocabulary terms to represent multimedia image-quality management systems and systems with AI-readiness functionality and establish that the resource is fit for use in biodiversity science applications before users acquire and use the media. The goal of the task group is to adopt the vocabulary terms from FishAIR and other similar projects to the TDWG Audiovisual Core.
Goals Outputs and Outcomes
The aim of this group is to create a list of properties that can be used for image quality and AI readiness of the multimedia data. The source list that will be worked on is available at FishAIR website. (see https://fishair.org/vocabulary.html)
Timeline:
| Phase | Year 1 | Year 2 | Year 3 | |||||||||
| FishAIR AI-Readiness metadata | ||||||||||||
| FishAIR Image Quality metadata | ||||||||||||
| FishAIR Feature extraction metadata | ||||||||||||
| Drexel MRC Automated metadata | ||||||||||||
| Metadata from other sources | ||||||||||||
| Usage Statistics & Reporting | ||||||||||||
The group will meet once a month and discuss the published FishAIR terms (~70 terms) in the first year, and unpublished terms (~90 terms) in the second year and first part of third year. The additional terms that will possibly be generated from other studies during these years will be added to the unpublished list as well and will be discussed in the third year.
Strategy
Discuss terms starting with FishAIR vocabulary, put them in a useful form that can be ingested by TDWG standards infrastructure.
Becoming Involved
People who want to attend meetings can contact the convenor to be added to the email list. People who want to follow the progress of the group can watch the GitHub repository.
History/Context
-
FishAIR - A fish image dataset with built-in image quality management system. https://fishair.org
-
On Image Quality Metadata, FAIR in ML, AI-Readiness and Reproducibility: FishAIR example. 2023. Y Bakış, X Wang, B Altıntaş, D Jebbia, H Bart. Biodiversity Information Science and Standards 7, e112178.
-
Computational metadata generation methods for biological specimen image collections. 2022. K Karnani, J Pepper, Y Bakış, X Wang, H Bart Jr, DE Breen, J Greenberg. International Journal on Digital Libraries, 25. 1-18. 10.1007/s00799-022-00342-1.
-
Extracting Landmark and Trait Information from Segmented Digital Specimen Images Generated by Artificial Neural Networks. 2022 Y Bakış, B Altıntaş, X Wang, M Maruf, A Karpatne, H Bart. Biodiversity Information Science and Standards 6, e94955.
-
Toward a Flexible Metadata Pipeline for Fish Specimen Images* (2022) D Jebbia, X Wang, Y Bakış, HL Bart Jr, J Greenberg. arXiv preprint arXiv:2211.15472.
-
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images. 2024. Kazi Sajeed Mehrab, M Maruf, Arka Daw, Harish Babu Manogaran, Abhilash Neog, Mridul Khurana, Bahadir Altintas, Yasin Bakis, Elizabeth G Campolongo, Matthew J Thompson, Xiaojun Wang, Hilmar Lapp, Wei-Lun Chao, Paula M Mabee, Henry L Bart Jr, Wasila Dahdul, Anuj Karpatne. arXiv:2407.08027.
-
Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example. 2023. B Altintas, Y Bakış, X Wang, B Henry. Biodiversity Information Science and Standards 7, e112438.
Summary
The availability of biocollection data attracts scientists and engineers from a variety of disciplines, especially in the field of AI/ML. However, data wrangling takes up almost 80% of the total processing time. Pre-processing the biocollection data by enriching the metadata, structuring, cleaning and filtering the data, and finally publishing the dataset would decrease the time that is spent each time, increasing efficiency of the pipelines and accuracy of the results. The Biodiversity Informatics Team at Tulane University Biodiversity Research Institute has been extracting metadata terms as part of data pre-processing within the NSF supported Biology Guided Neural Networks project. The terms that have been extracted are novel among different standards and of great worth for ML scientists, Biologists, Data engineers, informaticians, Data repositories and whoever wants to work with multimedia data. (see https://fishair.org/vocabulary.html)
This group will increase the size of the Audiovisual Core Terminology especially with the addition of image quality metadata terms.
Resources
There will be a GitHub repository under TDWG organization after this draft charter is approved.