Portfolio
Valentin-Gabriel Soumah

My Projects

Face Attractiveness Ratings

Using MEBeauty dataset to analyze and predict facial attractiveness ratings.

GitHub Repository

Analysing Ratings Predicting attractiveness score Identifying Facial features Generating faces
Analysing Facial Ratings Icon Predicting Attractiveness Icon Identifying Facial Features Icon Generating Faces Icon
In this notebook we use several statistics and data visualization tools to thoroughly examine the distribution of facial ratings.

We aim to identify how people rate faces, which variables influence the ratings, and how variable are the scores.
We train a regression model to predict the attractiveness score of any human face.

We adjust for the biases in human ratings before leveraging a pretrained convolutional neural network.
The resulting model was evaluated on separate data.
We used the structure of the CNN trained in the previous notebook to isolate human-interpretable features.

Ongoing Project.
Training a model to generate artificial images based on the variables in the dataset: ethnicity, gender, and rating.

Project not started yet.

Coreference resolution for French

Adding support for French to coreferee, coreference resolution library powered by spaCy. Demonstration of the tool and code and resources developed.

Coreferee French : Code Coreferee French : Demo Neuralcoref French
Coreferee French Code Icon Coreferee French Demo Icon Neuralcoref Icon
Codebase for coreferee in French.
The codebase presents all the files and resources developed to add French to coreferee, a coreference resolution library.

Feature (linguistics) engineering to help coreference resolution and training of a neural ensemble to decide coreference chains.
Demonstration of the tool done on March 30, 2022.

Instructions for using the tool and files used during the demonstration (presentation and notebook).
Codebase for Neuralcoref in French.
Attempt to train the model of spaCy-based coreference resolution library: neuralcoref.

Inconclusive project done before coreferee French.

La Defense Business District in Newspapers

Analysis of the perception and representation of the French business district "La Defense" in French and British press. Work done during Master internship.

Repository

Corpus Detection of designations (NER) Classification of mentions according to typology
Corpus Icon Detection of Designations Icon Classification of Mentions Icon
Dataset of French and British newspapers.
Also mentions and typology annotated for model training and analysis.
Codebase for training of a model to identify mentions of La Defense (Named Entity Recognition). Codebase for training of a model to assign the identified mentions of La Defense to their class with respect to the typology developed for perception analysis.