Talks and presentations

Machine Learning System for Fraud Detection. A Methodological Approach for a Development Platform

January 29, 2021

Talk, International Conference on Digital Technologies and Applications (ICDTA'2021), Fez, Morocco

Abstract: The democratization and massification use of credit cards lead inexorably to a high number of fraudulent transactions. Generally, the fraud detection is part of the anomaly detection problem. In this field, current approaches and techniques are constantly looking for optimized solutions to detect anomalies. Faced with a massive and growing data volume, these methods are put to the test, and thus lead to a large number of undetected anomalies. Real time fraud detection requires the design and implementation of scalable techniques capable of ingesting and analyzing massive amounts of data continuously. Recent advances in storage, data analytics processing, and open-source solutions open up new perspectives in the anomaly detection field and in particular fraud. In this article, we are interested in the design of a fraud detection system (FDS) based on open-sources Big Data technologies. Thus, a general methodology is proposed based on the formalization, the implementation and the technical design of a platform for fraud detection. The formalization part consists of four layers: distributed storage, data processing, model building, and finally the model evaluation. The implementation part uses Spark distributed data processing system. In particular, we are based on its framework dedicated to machine learning, called MLlib. The technical design part of the platform is based on the latest Big Data technologies such as Hadoop, Yarn, Livy etc.

Machine learning for anomaly detection. performance study considering anomaly distribution in an imbalanced dataset

November 26, 2020

Talk, International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech’20), Marrakesh, Morocco

Abstract : The continuous dematerialization of real-world data greatly contributes to the important growing of the exchanged data. In this case, anomaly detection is increasingly becoming an important task of data analysis in order to detect abnormal data, which is of particular interest and may require action. Recent advances in artificial intelligence approaches, such as machine learning, are making an important breakthrough in this area. Typically, these techniques have been designed for balanced data sets or that have certain assumptions about the distribution of data. However, the real applications are rather confronted with an imbalanced data distribution, where normal data are present in large quantities and abnormal cases are generally very few. This makes anomaly detection similar to looking for the needle in a haystack. In this article, we develop an experimental setup for comparative analysis of two types of machine learning techniques in their application to anomaly detection systems. We study their performance taking into account anomaly distribution in an imbalanced dataset.

A machine learning based approach to reduce behavioral noise problem in an imbalanced data: application to a fraud detection

October 20, 2020

Talk, International Conference on Intelligent Data Science Technologies and Applications (IDSTA'2020), Valencia, Spain

Abstract: The question of class imbalance has become more pronounced with the application of learning algorithms in real applications. It has received significant attention in the machine learning and data mining community. This problem is present in fraud detection, medical diagnostics, and a number of other areas where training data contains significantly more representatives of one class (called the majority class) than the other class (called the minority class). Machine learning techniques struggle to deal with imbalanced data by focusing on minimizing the error rate for the majority class while ignoring the minority class, which is the most interesting from a learning point of view and also involves a high cost when it is not well classified. However, the imbalance ratio is not the only cause of poor performance when learning from imbalanced data. Another critical factor that accompanies imbalanced data in the real world is the presence of a number of instances of the two classes being overlapped in feature space. This problem is commonly referred to as class overlap and we have called it “behavioral noise”. In this paper, we propose One Side Behavioral Noise Reduction (OSBNR) approach to deal with the problem of class imbalance in the presence of a behavioral noise level. OSBNR is based on two stages. Firstly, a clustering is applied to groups similar instances of the minority class in multiple behavior clusters. Secondly, we select and eliminate instances of the majority class, considered as behavioral noise, which overlap with the behavior clusters of the minority class. The results of experiments conducted on a representative public dataset confirm that the proposed approach is effective for class imbalance problem in the presence of behavioral noise.

Healthcare Social Data Platform Based on Linked Data and Machine Learning

May 03, 2019

Talk, ESAI'19: International Conference On Embedded Systems And Artificial Intelligence, Fez, Morocco

Abstract: The healthcare system is facing very important challenges in order to improve the whole system performance. Different communities are interested in this subject from different perspectives ranging from technical issues to organizational aspects. An important aspect of this research area is to consider social network data within the system especially because of the rapid and growing development of social networks. It can be general social networks, like Facebook or twitter but also others dedicated as PatientsLikeMe. This social network proliferation generates complex problems and locks when we want to take into account the resulting large amounts of data, created continuously, within the healthcare system. We call these data “social data”. The aim of this work is to demonstrate that is possible and feasible to build promising alternatives of the traditional healthcare system to improve the quality of services and reduce cost. In our opinion, taking into account “social data” can provide efficient healthcare decisional support systems to help healthcare operators to make optimal and efficient decisions in dynamic and complex environments. Our approach involves data extraction from multiple social networks, data aggregation, and the development of a semantic model in order to answer high-level users’ queries. In addition, we show how an analytical tool can help operators to understand data. Lastly, we present a model of machine learning which aims to detect the Sentiments of users expressed toward a given medication and the “TOP TRENDING” of care and treatments used for a given disease.

Towards an agent-based approach for multidimensional analyses of semantic web data

April 17, 2017

Talk, International Conference on Intelligent Systems and Computer Vision (ISCV'2017), Fez, Morocco

Abstract: OLAP analytical systems are essential technologies in decision-making processes; they provide an efficient way to carry out complex analysis in a simpler and faster way to decision-makers. In today’s dynamic and competitive business contexts, the stored internal data within companies does no longer provide enough information for decision-making processes. Therefore, decision analysis systems could be improved by including external data available through the semantic web in order to provide multiple perspectives to decision makers. In this article, we describe a preliminary approach based on the use of multi-agent systems for multidimensional analysis of external data coming from the semantic web also gives a short review of recent research works combining business intelligence and semantic web technologies. The proposed approach is based on an evolutionary architecture by dint of the “agents” technology. The different stages of the analysis are considered tasks that will be assimilated to services, managed by agents.