Text Classification

This project attempts to find the best performing text classification method by comparing three approaches:

  • Experiment 1: Text Classification based on Natural Language Processing based extracted feature vectors combined with Deep Learning
  • Experiment 2: Text Classification based on Embedding of only the extracted features combined with Deep Learning
  • Experiment 3: Text Classification based on Embedding of feature and original text combined with Deep Learning.

It finds that classification based on feature embedding performs the best for the given dataset.

For Experiments 2 and 3, the pre-trained glove embedding would be required which can be found at:http://nlp.stanford.edu/data/glove.840B.300d.zip. Dataset: HASOC 2019 (English).

Technologies

Python, Keras, GloVe, NLTK, regex, Spacy, scikit-learn

Tags