My articles and publications --(full text, click here. You may be asked to sign up --it is free) --Mis publicaciones (texto completo: http://ipn.academia.edu/AdolfoGuzman Quizá le pida suscribirse --es gratis) Mi página Web -- (click here) -- My Web page (http://alum.mit.edu/www/aguzman). ALGUNOS VIDEOS SOBRE LO QUE HAGO. Conferencia 'Ciudad inteligente, con conectividad y tecnología' (oct. 2010), parte 1 (15min), parte 2 (8min), parte 3 (9min), parte 4 (2min). Entrevista por redCudiMéxico, 2012: aquí (11 min). Avances en Inteligencia Artificial, entrevista en la Univ. IBERO, Puebla, 2013. Pulse aquí (53min). Video in the series "Personalities in the history of ESIME" (for the 100 years anniversary of ESIME-IPN, in Spanish) about Adolfo Guzman": 2014, click here. (1h)
Entrevista "La visión de los egresados del IPN, a 80 años de la creación del IPN y 100 años de la creación de la ESIME, 2014: ver en youtube (1h). Seminario sobre "Big Data" (la Ciencia de Datos). 2014. Pulse aquí (56min). Seminar on "Big Data", in English, 2014. Click here (56min). Algunos trabajos sobre Minería de Datos y sus Aplicaciones (CIC-IPN, 2016): pulse aquí (5min). El auge y el ocaso de las máquinas de Lisp (Plática en la Reunión Anual 2016 de la Academia Mexicana de Computación): pulse aquí (56min). Entrevista sobre la funcionalidad y competitividad de Hotware 10: 2016, aquí (6 min). Adolfo Guzmán Arenas, Ingeniero Electrónico e investigador del Centro de Investigación en Computación del IPN, conversó sobre su trayectoria y la importancia de las ciencias aplicadas para el desarrollo del país. 2017, Canal 11, Noticias TV (30min). Cómo se construyó la primera computadora en el mundo de procesamiento paralelo con Lisp. Marzo 2018. https://www.youtube.com/watch?v=dzyZGDhxwrU (12 min). Charla "Historias de éxito en la computación mexicana", ciclo Códice IA. Entrevista a A. Guzmán, "Entre la vida y la academia": https://bit.ly/3sIOQBc (45 min). El CIC cumple 25 años. Pulse aquí (51min. Habla Adolfo: "Pasado y futuro del CIC": minutos 13.57 a 22.70 ).
Perfil en ResearchGate -- Adolfo Guzman-Arenas My URL in Google Scholar: http://scholar.google.com/citations?user=Nw5lSdEAAAAJ My ORCID number 0000-0002-8236-0469. Scopus Author ID 6602302516.

Follow me on Academia.edu

Feature Selection Ordered by Correlation - FSOC

 Often, objects in a large dataset have many features or attributes. Not all of them are  relevant or apport information about the object (for instance, to classify it into one of several known classes). Narrowing the set of relevant features is useful to "understand what is going on". Also, fewer attributes mean faster data processing.

Arturo Heredia, Adolfo Guzmán and Gilberto Martínez have published an article that provides a new technique to select relevant features (those comprising most of the information for correct classification of an object) in a dataset containg objects with many features:

Arturo Heredia Márquez, Adolfo Guzmán-Arenas, Gilberto Lorenzo Martínez Luna (2023). FSOC – Feature selection ordered by correlation. Computación y Sistemas Vol. 27 No. 1, 2023, 33-51. ISSN: 2007-9737. DOI: 13053/CyS-27-1-3982.

The article (full text) can be dowloaded from here. Its abstract follows. 

Abstract. Data sets have increased in volume and features, yielding longer times for classification and training. When an object has many features, it often occurs that not all of them are highly correlated with the target class, and that significant correlation may exist between certain pair of features. An adequate removal of “useless” features saves time and effort at data collection, and assures faster learning and classification times, with little or no reduction in classification accuracy.

This article presents a new filter type method, called FSOC (Feature Selection Ordered by Correlation), to select, with small computational cost, relevant features. FSOC achieves this reduction by selecting a subset of the original features. FSOC does not combine existing features to produce a new set of fewer features, since
the artificially created features mask the relevance of the original features in class assignment, making the new model difficult to interpret. 

To test FSOC, a statistical analysis was performed on a collection of 36 data sets from several repositories some with millions of objects. The classification percentages (efficiency) of FSOC were similar to other feature selection features.
Nevertheless, when obtaining the selected features, FSOC was up to 42 times faster than other algorithms such as Correlation Feature Selection (CFS), Fast Correlation-Based Filter (FCFB) and Efficient feature selection based on correlation measure (ECMBF).

Keywords. Feature selection, data mining, pre-processing, feature reduction, data analysis.


No hay comentarios: