Course ACOS Extraction / ACOS Course Evaluation System

Project Overview

Course ACOS Extraction is an NLP system for extracting structured feedback from university course review text. Instead of only predicting whether a review is positive or negative, the system extracts ACOS quadruples:

Aspect
Category
Opinion
Sentiment

The project provides a Streamlit interface for:

Single review analysis
Batch CSV/Excel analysis
English input
Bahasa Melayu input through language detection and Malay-to-English translation
Downloadable batch results
Model performance reporting

The final model is a fine-tuned google/flan-t5-base model saved under:

models/final_flan_t5_acos_model/

Live Demo

The deployed Streamlit demo is available on Hugging Face Spaces:

Space Demo: https://huggingface.co/spaces/lijinzheyy/acos-extraction-system
Fine-tuned Model: https://huggingface.co/lijinzheyy/acos-flan-t5-course-evaluation

The GitHub repository does not include model weights because the fine-tuned FLAN-T5 model is too large. During deployment, the Streamlit app loads the fine-tuned model from Hugging Face Hub.

Problem Statement

Traditional sentiment analysis usually returns a broad label such as positive, negative, or neutral. This is useful, but it does not explain what part of a course the student is discussing.

For course evaluation, instructors and academic administrators need more structured information. A sentence can mention multiple issues, such as teaching clarity, workload, assessment, or learning resources. ACOS extraction gives a more detailed view by identifying the target, the category, the opinion phrase, and the sentiment together.

What is ACOS?

ACOS stands for Aspect-Category-Opinion-Sentiment.

Field	Meaning
Aspect	The target object mentioned in the review, such as lecturer, course, assignment, or workload.
Category	The feedback category, such as teaching quality, assessment, workload, learning resources, or course general.
Opinion	The opinion expression from the review text.
Sentiment	The polarity label: positive, negative, or neutral.

Example input:

The lecturer explained the concepts clearly, but the workload was too heavy.

Example output:

(lecturer | teaching quality | explained the concepts clearly | positive)
(workload | workload | too heavy | negative)

Model

Base model: google/flan-t5-base
Task type: text-to-text generation
Input format: course review text
Prompt format: Extract ACOS quadruples from this course review: {text}
Output format: (aspect | category | opinion | sentiment)
Optimizer: AdamW
Epochs: 8
fp16: disabled
bf16: disabled

Dataset

The current model is trained and evaluated using the processed OATS-ABSA dataset. The final test set is the all-domain OATS-ABSA test set.

Split	Examples	ACOS Tuples
Train	16,415	23,204
Dev	1,980	2,961
Test	986	1,444

EduRABSA was inspected for future education-domain expansion, but it was not used in the current FLAN-T5 training.

Evaluation Results

Evaluation uses tuple-level Exact Match F1 with post-processing. A tuple is counted as correct only when aspect, category, opinion, and sentiment all match the gold label.

Old deployed model with post-processing:

Precision	Recall	F1
0.2504	0.2147	0.2312

Final retrained model with post-processing:

Precision	Recall	F1
0.3341	0.2957	0.3137

Improvement:

+0.0825 F1

V2 LoRA Experiments

V2 LoRA was used as a parameter-efficient fine-tuning ablation. These runs train LoRA adapter weights only and do not replace the V1 full fine-tuned FLAN-T5 model.

Experiment	Configuration	Post-processed F1	Interpretation
Random baseline	Rule-free baseline	0.0000	Lower-bound baseline
TF-IDF + kNN baseline	Traditional retrieval baseline	0.0579	Simple ML baseline
Old deployed FLAN-T5	Previous deployed model	0.2312	Earlier app model
LoRA Run 1	r=8, alpha=16, target_modules=`["q", "v"]`	0.2221	Learned tuple-style output but below old deployed model
LoRA Run 2	r=16, alpha=32, target_modules=`["q", "v"]`	0.2439	Higher rank improved LoRA performance
LoRA Run 3	r=16, alpha=32, target_modules=`["q", "v", "k", "o"]`	0.2478	Best LoRA run so far
V1 full fine-tuned FLAN-T5	Full model fine-tuning	0.3137	Best overall project model

Run 3 is the best parameter-efficient LoRA result and improves over the random baseline, TF-IDF + kNN baseline, and old deployed FLAN-T5 model. However, LoRA does not outperform the V1 full fine-tuned FLAN-T5 model. The ablation suggests LoRA is useful for efficient experimentation, while full fine-tuning remains the strongest approach for strict ACOS tuple extraction in this project.

Error Analysis

Final model error analysis on the all-domain OATS-ABSA test set:

Error Type	Count
Exact matched tuples	427
Missed tuples	220
Hallucinated tuples	54
Wrong opinion	508
Wrong category	299
Wrong sentiment	145
Malformed output	0

The biggest remaining challenge is that the model sometimes predicts the wrong opinion phrase or category. Many predictions are partially correct, but Exact Match F1 is strict and gives no credit if one field is different.

Streamlit Demo

The Streamlit app contains four main pages:

Single Review: analyze one English or Bahasa Melayu course review.
Batch Analysis: upload CSV/Excel files and download structured ACOS results.
Performance: view final model metrics and old-vs-final comparison.
About: view the language pipeline, model setup, dataset notes, and limitations.

Screenshots

Single Review: English

Single Review: Bahasa Melayu

Batch Analysis

Performance Page

About Page

How to Run Locally

Using the existing project virtual environment:

cd "C:\Users\Administrator\OneDrive\Desktop\Semester 2 2025-2026\FYP1\Code\course-acos-extraction"
..\course-acos-extraction.venv\Scripts\python.exe -m streamlit run app.py

General setup:

python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
streamlit run app.py

Project Structure

course-acos-extraction/
|-- app.py
|-- requirements.txt
|-- README.md
|-- models/
|   `-- final_flan_t5_acos_model/
|-- data/
|   `-- predictions/
|       `-- predictions_final.csv
|-- reports/
|-- screenshots/
|   |-- 01_single_review_english.png
|   |-- 02_single_review_malay.png
|   |-- 03_batch_analysis.png
|   |-- 04_performance_page.png
|   `-- 05_about_page.png
`-- src/

Limitations

Exact Match F1 is strict.
The model was trained on English ACOS data.
Bahasa Melayu support depends on Malay-to-English translation quality.
Some opinion phrases and categories are still confused.
This is a research prototype, not a production system.

Future Work

Optional LoRA fine-tuning as an ablation experiment.
Add more education-domain ACOS data.
Improve category normalization.
Improve multilingual training instead of relying only on translation.
Add human evaluation for course-feedback usefulness.
Deploy the final version to Hugging Face Spaces.

Model Weights

The fine-tuned FLAN-T5-base model weights are not included in this GitHub repository due to file size limits.

To run the app locally, place the final model folder at:

models/final_flan_t5_acos_model/

The model can later be hosted separately on Hugging Face Model Hub for deployment.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
data/predictions		data/predictions
experiments		experiments
notebooks		notebooks
reports		reports
screenshots		screenshots
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course ACOS Extraction / ACOS Course Evaluation System

Project Overview

Live Demo

Problem Statement

What is ACOS?

Model

Dataset

Evaluation Results

V2 LoRA Experiments

Error Analysis

Streamlit Demo

Screenshots

Single Review: English

Single Review: Bahasa Melayu

Batch Analysis

Performance Page

About Page

How to Run Locally

Project Structure

Limitations

Future Work

Model Weights

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Course ACOS Extraction / ACOS Course Evaluation System

Project Overview

Live Demo

Problem Statement

What is ACOS?

Model

Dataset

Evaluation Results

V2 LoRA Experiments

Error Analysis

Streamlit Demo

Screenshots

Single Review: English

Single Review: Bahasa Melayu

Batch Analysis

Performance Page

About Page

How to Run Locally

Project Structure

Limitations

Future Work

Model Weights

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages