A probabilistic approach to emission-line galaxy classification

R. S. de Souza, M. L.L. Dantas, M. V. Costa-Duarte, E. D. Feigelson, M. Killedar, P. Y. Lablanche, R. Vilalta, A. Krone-Martins, R. Beck, F. Gieseke

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional emission-line classification schemes of galaxy ionization sources: the Baldwin-Phillips-Terlevich (BPT) and WHa versus [N II]/H a (WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey Data Release 7 and SEAGal/STARLIGHT data sets. We apply a GMM to empirically define classes of galaxies in a three-dimensional space spanned by the log [O III]/Hβ, log [NII]/Hα and log EW(H α) optical parameters. The best-fitting GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart. GC1 and GC4 are associated with star-forming galaxies, suggesting the need to define a new starburst subgroup. GC2 is associated with BPT's active galactic nuclei (AGN) class and WHAN's weak AGN class. GC3 is associated with BPT's composite class and WHAN's strong AGN class. Conversely, there is no statistical evidence - based on four GCs - for the existence of a Seyfert/lowionization nuclear emission-line region (LINER) dichotomy in our sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The GC5 appears associated with the LINER and passive galaxies on the BPT and WHAN diagrams, respectively. This indicates that if the Seyfert/LINER dichotomy is there, it does not account significantly to the global data variance and may be overlooked by standard metrics of goodness of fit. Subtleties aside, we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical data sets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available within the COINtoolbox at https://cointoolbox.github.io/GMM_Catalogue/.

Original languageEnglish (US)
Pages (from-to)2808-2822
Number of pages15
JournalMonthly Notices of the Royal Astronomical Society
Volume472
Issue number3
DOIs
StatePublished - Dec 2017

Fingerprint

galaxies
active galactic nuclei
dichotomies
wilderness
diagram
diagrams
goodness of fit
information theory
subgroups
catalogs
ionization
inclusions
methodology
stars
composite materials

All Science Journal Classification (ASJC) codes

  • Astronomy and Astrophysics
  • Space and Planetary Science

Cite this

de Souza, R. S., Dantas, M. L. L., Costa-Duarte, M. V., Feigelson, E. D., Killedar, M., Lablanche, P. Y., ... Gieseke, F. (2017). A probabilistic approach to emission-line galaxy classification. Monthly Notices of the Royal Astronomical Society, 472(3), 2808-2822. https://doi.org/10.1093/mnras/stx2156
de Souza, R. S. ; Dantas, M. L.L. ; Costa-Duarte, M. V. ; Feigelson, E. D. ; Killedar, M. ; Lablanche, P. Y. ; Vilalta, R. ; Krone-Martins, A. ; Beck, R. ; Gieseke, F. / A probabilistic approach to emission-line galaxy classification. In: Monthly Notices of the Royal Astronomical Society. 2017 ; Vol. 472, No. 3. pp. 2808-2822.
@article{c192d272391d40fe80ca4e2abbb6e5c7,
title = "A probabilistic approach to emission-line galaxy classification",
abstract = "We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional emission-line classification schemes of galaxy ionization sources: the Baldwin-Phillips-Terlevich (BPT) and WHa versus [N II]/H a (WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey Data Release 7 and SEAGal/STARLIGHT data sets. We apply a GMM to empirically define classes of galaxies in a three-dimensional space spanned by the log [O III]/Hβ, log [NII]/Hα and log EW(H α) optical parameters. The best-fitting GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart. GC1 and GC4 are associated with star-forming galaxies, suggesting the need to define a new starburst subgroup. GC2 is associated with BPT's active galactic nuclei (AGN) class and WHAN's weak AGN class. GC3 is associated with BPT's composite class and WHAN's strong AGN class. Conversely, there is no statistical evidence - based on four GCs - for the existence of a Seyfert/lowionization nuclear emission-line region (LINER) dichotomy in our sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The GC5 appears associated with the LINER and passive galaxies on the BPT and WHAN diagrams, respectively. This indicates that if the Seyfert/LINER dichotomy is there, it does not account significantly to the global data variance and may be overlooked by standard metrics of goodness of fit. Subtleties aside, we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical data sets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available within the COINtoolbox at https://cointoolbox.github.io/GMM_Catalogue/.",
author = "{de Souza}, {R. S.} and Dantas, {M. L.L.} and Costa-Duarte, {M. V.} and Feigelson, {E. D.} and M. Killedar and Lablanche, {P. Y.} and R. Vilalta and A. Krone-Martins and R. Beck and F. Gieseke",
year = "2017",
month = "12",
doi = "10.1093/mnras/stx2156",
language = "English (US)",
volume = "472",
pages = "2808--2822",
journal = "Monthly Notices of the Royal Astronomical Society",
issn = "0035-8711",
publisher = "Oxford University Press",
number = "3",

}

de Souza, RS, Dantas, MLL, Costa-Duarte, MV, Feigelson, ED, Killedar, M, Lablanche, PY, Vilalta, R, Krone-Martins, A, Beck, R & Gieseke, F 2017, 'A probabilistic approach to emission-line galaxy classification', Monthly Notices of the Royal Astronomical Society, vol. 472, no. 3, pp. 2808-2822. https://doi.org/10.1093/mnras/stx2156

A probabilistic approach to emission-line galaxy classification. / de Souza, R. S.; Dantas, M. L.L.; Costa-Duarte, M. V.; Feigelson, E. D.; Killedar, M.; Lablanche, P. Y.; Vilalta, R.; Krone-Martins, A.; Beck, R.; Gieseke, F.

In: Monthly Notices of the Royal Astronomical Society, Vol. 472, No. 3, 12.2017, p. 2808-2822.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A probabilistic approach to emission-line galaxy classification

AU - de Souza, R. S.

AU - Dantas, M. L.L.

AU - Costa-Duarte, M. V.

AU - Feigelson, E. D.

AU - Killedar, M.

AU - Lablanche, P. Y.

AU - Vilalta, R.

AU - Krone-Martins, A.

AU - Beck, R.

AU - Gieseke, F.

PY - 2017/12

Y1 - 2017/12

N2 - We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional emission-line classification schemes of galaxy ionization sources: the Baldwin-Phillips-Terlevich (BPT) and WHa versus [N II]/H a (WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey Data Release 7 and SEAGal/STARLIGHT data sets. We apply a GMM to empirically define classes of galaxies in a three-dimensional space spanned by the log [O III]/Hβ, log [NII]/Hα and log EW(H α) optical parameters. The best-fitting GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart. GC1 and GC4 are associated with star-forming galaxies, suggesting the need to define a new starburst subgroup. GC2 is associated with BPT's active galactic nuclei (AGN) class and WHAN's weak AGN class. GC3 is associated with BPT's composite class and WHAN's strong AGN class. Conversely, there is no statistical evidence - based on four GCs - for the existence of a Seyfert/lowionization nuclear emission-line region (LINER) dichotomy in our sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The GC5 appears associated with the LINER and passive galaxies on the BPT and WHAN diagrams, respectively. This indicates that if the Seyfert/LINER dichotomy is there, it does not account significantly to the global data variance and may be overlooked by standard metrics of goodness of fit. Subtleties aside, we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical data sets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available within the COINtoolbox at https://cointoolbox.github.io/GMM_Catalogue/.

AB - We invoke a Gaussian mixture model (GMM) to jointly analyse two traditional emission-line classification schemes of galaxy ionization sources: the Baldwin-Phillips-Terlevich (BPT) and WHa versus [N II]/H a (WHAN) diagrams, using spectroscopic data from the Sloan Digital Sky Survey Data Release 7 and SEAGal/STARLIGHT data sets. We apply a GMM to empirically define classes of galaxies in a three-dimensional space spanned by the log [O III]/Hβ, log [NII]/Hα and log EW(H α) optical parameters. The best-fitting GMM based on several statistical criteria suggests a solution around four Gaussian components (GCs), which are capable to explain up to 97 per cent of the data variance. Using elements of information theory, we compare each GC to their respective astronomical counterpart. GC1 and GC4 are associated with star-forming galaxies, suggesting the need to define a new starburst subgroup. GC2 is associated with BPT's active galactic nuclei (AGN) class and WHAN's weak AGN class. GC3 is associated with BPT's composite class and WHAN's strong AGN class. Conversely, there is no statistical evidence - based on four GCs - for the existence of a Seyfert/lowionization nuclear emission-line region (LINER) dichotomy in our sample. Notwithstanding, the inclusion of an additional GC5 unravels it. The GC5 appears associated with the LINER and passive galaxies on the BPT and WHAN diagrams, respectively. This indicates that if the Seyfert/LINER dichotomy is there, it does not account significantly to the global data variance and may be overlooked by standard metrics of goodness of fit. Subtleties aside, we demonstrate the potential of our methodology to recover/unravel different objects inside the wilderness of astronomical data sets, without lacking the ability to convey physically interpretable results. The probabilistic classifications from the GMM analysis are publicly available within the COINtoolbox at https://cointoolbox.github.io/GMM_Catalogue/.

UR - http://www.scopus.com/inward/record.url?scp=85030836515&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030836515&partnerID=8YFLogxK

U2 - 10.1093/mnras/stx2156

DO - 10.1093/mnras/stx2156

M3 - Article

AN - SCOPUS:85030836515

VL - 472

SP - 2808

EP - 2822

JO - Monthly Notices of the Royal Astronomical Society

JF - Monthly Notices of the Royal Astronomical Society

SN - 0035-8711

IS - 3

ER -