Projects - Việt Hoàng, Nghiêm

Master Thesis
Data Mining Cup
LLM Finetuning

🎓 Master Thesis: Development of chatbots for waste disposal

Description: In my master thesis, I developed a chatbot with 3 different chatbot environments that can assist citizens in sustainable waste disposal. I used Google Dialogflow, Rasa AI and a custom LLM-based implementation (GPT-4o-mini via LangChain and LangGraph). I evaluated the development environments to determine in which use cases they are best suited.

Skills:

Concepts: Agent | LLM | RAG | NLU | NLP | Software Design
Development: Python | JavaScript | HTML | CSS
Skills: Research | Presentation | Self Management
Domain: Sustainability | Waste Management | Public Service

Key Learnings: There is a use case for every tool. The choice of the development environment for a waste disposal chatbot depends on the required chatbot features and capabilities of the implementing organization. LLM-based chatbots enable powerful features but requires expertise and effort to implement and maintain.

🏆 Data Mining Cup 2023: Analysis of Frankfurt's municipal waste using ML

🎥 Highlight Video 📊 Presentation 📋 Application

Description: In the context of a university project, I used ML to analyze Frankfurt's current state of municipal waste. It included the classification of households regarding residual waste generation based on retail affinity and the forecasting of municipal waste generation. Additionally, targeted campaigns were designed based on the most important attributes of households that generate a high amount of residual waste using Explainable AI. The project was used to participate in the Data Mining Cup 2023, where we won the first place and presented our results in a workshop in Berlin.

Skills:

Concepts: Classification | Forecasting | Explainable AI | Data Analytics | Data Visualization
Development: Python | KNIME
Skills: Presentation | Communication | Teamwork
Domain: Sustainability | Waste Management | Public Service

Key Learnings: The classification and usage of explainable AI (Surrogate Linear Model and Decision Tree) revealed that the stronger the retail affinity, the more likely it is a household with high residual waste. The environmental department of Frankfurt proposed an action plan in 2022 to reduce the amount of municipal waste. We used a sample of weighted waste data of 4 years to assess the effectiveness of the action plan and forecast the future development. The result showed that there is still a big gap between reality and plan which highlights the needs for more urgency and actions.

📊 Personal Project: Fine-tuning BERT for Spam Detection

📁 GitHub

Description: I fine-tuned a pre-trained BERT model to classify Emails as spam or ham (not spam). The project includes data preprocessing, tokenization with Hugging Face Transformers, model fine-tuning using PyTorch and evaluation based on accuracy and F1 score. To see the performance difference between the pre-trained BERT model and the fine-tuned model, I also evaluated each model version on the test set including the test data, ChatGPT generated Emails and real spam Emails of myself.

Skills:

Concepts: Transfer Learning | NLP | Text Classification | Model Evaluation
Development: Python | PyTorch | Hugging Face Transformers | scikit-learn

Key Learnings: Fine-tuning a pre-trained model like BERT can improve performance on a specific task. It is essential understand the use case to determine if a fine-tuning approach is reasonable given the required resources. If so, it is critical to select and preprocess qualitative data that will be used for fine-tuning the model. The fine-tuning process must be analyzed with metrics such as training and validation loss to avoid overfitting. At the end, an evaluation must be designed to ensure the fine-tuned model performs well on unknown data.

Projects.

Table of Contents

🎓 Master Thesis: Development of chatbots for waste disposal

🏆 Data Mining Cup 2023: Analysis of Frankfurt's municipal waste using ML

📊 Personal Project: Fine-tuning BERT for Spam Detection