We make science discovery happen
Related published article can be found here.
A complete catalog of more than 3,000,000 of AGN is HEREIn this work we approach an application of machine learning based methods to the identification of candidate AGN from optical survey data and to the automatic
classification of AGNs in broad types. We applied four different machine learning algorithms, namely the Multi Layer Perceptron (MLP), trained respectively with the Conjugate Gradient, Scaled Conjugate Gradient and Quasi Newton learning rules, and the Support Vector Machines (SVM), to tackle the problem of the classification of emission line galaxies in different classes, mainly AGNs vs non-AGNs, obtained using optical photometry in place of the diagnostics based on line intensity ratios which are classically used in the literature.
Using the same photometric features we discuss also the behavior of the classifiers on finer AGN classification tasks, namely Seyfert I vs Seyfert II and Seyfert vs LINER. Furthermore we describe the algorithms employed, the samples of spectroscopically classified galaxies used to train the algorithms, the procedure followed to select the photometric parameters and the performances of our methods in terms of multiple statistical indicators. The results of the experiments show that the application of self adaptive data mining algorithms trained on spectroscopic data sets and applied to carefully chosen photometric parameters represents a viable alternative to the classical methods that employ time-consuming spectroscopic observations.
Our KB was obtained by merging two different samples (respectively, Sorrentino et al. 2006 and Kauffmann et al. 2003), of objects for which a classification based on spectroscopy, was available. Both samples were drawn from the SDSS DR4 PhotoSpecAll table which contains all objects for which both photometric and spectroscopic observations are available.
Catalogue by Sorrentino et al. (2006) This catalogue contains objects in the redshift range (0.05 < z < 0.095). It provides a classification as Type 1 (Seyfert I and LINER
I), Type 2 (Seyfert II and LINER II) and non-AGN for 24293 objects. The data were extracted from the Sloan Digital Sky Survey Data Release 4 (Adelman-McCarthy et al. 2006), and the selection was performed using the traditional approach based on the equivalent width of specific emission lines. In particular, objects were considered to be bona fide AGN if they lay above one of the so called Kewley lines, (Kewley et al. 2001).
Furthermore, AGNs were classified as Seyfert I and Seyfert II. The final catalogue comprises 22464 objects recognized as non-AGN, 725 Seyfert I, and 1105 Seyfert II.
This final catalogue, summarized in the picture below, was then used to create three different data sets to be used for the three distinct classification experiments.
Namely:
(i) KB data set for the AGN vs non-AGN experiment: the whole catalogue;
(ii) KB data set for the Seyfert I vs Seyfert II experiment: just the pure AGN objects belonging to the data set of Sorrentino et al. (2006), resulting into 1570 objects;
(iii) KB data set for the Seyferts vs LINERs experiment: pure AGN objects, belonging to the catalogue of
Kauffmann et al. (2003), divided into LINERs and Seyferts, obtaining 30380 objects.
The production of large and accurate AGN catalogues is an important topic that will become crucial with the advent of the future photometric only digital surveys that will map large fractions of the sky to unprecedented depth in the different wavelengths.
We have applied four distinct classification methods, based on self-adaptive classification techniques, to the problem of the classification of emission line galaxies using only optical photometric parameters. The methods have been applied to three classification problems, specifically the separation of AGNs from non-AGNs, Seyfert I from Seyfert II and the classification of Seyfert from LINERs. The results indicate that our methods perform fairly, in terms of the efficiency of the classification, when applied to the problem of the classification of AGNs vs non-
AGNs, while the performances decrease in the more fine classification of Seyfert vs LINERs and Seyfert I vs Seyfert II.
From a methodological standpoint, the results of our experiments indicate how sensitive the performances of the photometric classification of line-emission galaxies are to the size of the spectroscopic data sets used to train the method, and to the uncertainty affecting the spectroscopic classification of the training set sources.
It is important to stress that, even with a completeness of about 58%, the possibility to use photometric data alone would led to a catalogue of candidate AGN about 200 times larger than existing ones, still retaining a purity of about 70%.
This work, that should be interpreted as a feasibility study, is hence just a first step and encourages the possibility to proceed further with more fine classifications of the different families of line emission galaxies by exploiting their multi-band photometry.