Course ACOS Extraction is an NLP system for extracting structured feedback from university course review text. Instead of only predicting whether a review is positive or negative, the system extracts ACOS quadruples:
- Aspect
- Category
- Opinion
- Sentiment
The project provides a Streamlit interface for:
- Single review analysis
- Batch CSV/Excel analysis
- English input
- Bahasa Melayu input through language detection and Malay-to-English translation
- Downloadable batch results
- Model performance reporting
The final model is a fine-tuned google/flan-t5-base model saved under:
models/final_flan_t5_acos_model/
The deployed Streamlit demo is available on Hugging Face Spaces:
- Space Demo: https://huggingface.co/spaces/lijinzheyy/acos-extraction-system
- Fine-tuned Model: https://huggingface.co/lijinzheyy/acos-flan-t5-course-evaluation
The GitHub repository does not include model weights because the fine-tuned FLAN-T5 model is too large. During deployment, the Streamlit app loads the fine-tuned model from Hugging Face Hub.
Traditional sentiment analysis usually returns a broad label such as positive, negative, or neutral. This is useful, but it does not explain what part of a course the student is discussing.
For course evaluation, instructors and academic administrators need more structured information. A sentence can mention multiple issues, such as teaching clarity, workload, assessment, or learning resources. ACOS extraction gives a more detailed view by identifying the target, the category, the opinion phrase, and the sentiment together.
ACOS stands for Aspect-Category-Opinion-Sentiment.
| Field | Meaning |
|---|---|
| Aspect | The target object mentioned in the review, such as lecturer, course, assignment, or workload. |
| Category | The feedback category, such as teaching quality, assessment, workload, learning resources, or course general. |
| Opinion | The opinion expression from the review text. |
| Sentiment | The polarity label: positive, negative, or neutral. |
Example input:
The lecturer explained the concepts clearly, but the workload was too heavy.
Example output:
(lecturer | teaching quality | explained the concepts clearly | positive)
(workload | workload | too heavy | negative)
- Base model:
google/flan-t5-base - Task type: text-to-text generation
- Input format: course review text
- Prompt format:
Extract ACOS quadruples from this course review: {text} - Output format:
(aspect | category | opinion | sentiment) - Optimizer: AdamW
- Epochs: 8
- fp16: disabled
- bf16: disabled
The current model is trained and evaluated using the processed OATS-ABSA dataset. The final test set is the all-domain OATS-ABSA test set.
| Split | Examples | ACOS Tuples |
|---|---|---|
| Train | 16,415 | 23,204 |
| Dev | 1,980 | 2,961 |
| Test | 986 | 1,444 |
EduRABSA was inspected for future education-domain expansion, but it was not used in the current FLAN-T5 training.
Evaluation uses tuple-level Exact Match F1 with post-processing. A tuple is counted as correct only when aspect, category, opinion, and sentiment all match the gold label.
Old deployed model with post-processing:
| Precision | Recall | F1 |
|---|---|---|
| 0.2504 | 0.2147 | 0.2312 |
Final retrained model with post-processing:
| Precision | Recall | F1 |
|---|---|---|
| 0.3341 | 0.2957 | 0.3137 |
Improvement:
+0.0825 F1
V2 LoRA was used as a parameter-efficient fine-tuning ablation. These runs train LoRA adapter weights only and do not replace the V1 full fine-tuned FLAN-T5 model.
| Experiment | Configuration | Post-processed F1 | Interpretation |
|---|---|---|---|
| Random baseline | Rule-free baseline | 0.0000 | Lower-bound baseline |
| TF-IDF + kNN baseline | Traditional retrieval baseline | 0.0579 | Simple ML baseline |
| Old deployed FLAN-T5 | Previous deployed model | 0.2312 | Earlier app model |
| LoRA Run 1 | r=8, alpha=16, target_modules=["q", "v"] |
0.2221 | Learned tuple-style output but below old deployed model |
| LoRA Run 2 | r=16, alpha=32, target_modules=["q", "v"] |
0.2439 | Higher rank improved LoRA performance |
| LoRA Run 3 | r=16, alpha=32, target_modules=["q", "v", "k", "o"] |
0.2478 | Best LoRA run so far |
| V1 full fine-tuned FLAN-T5 | Full model fine-tuning | 0.3137 | Best overall project model |
Run 3 is the best parameter-efficient LoRA result and improves over the random baseline, TF-IDF + kNN baseline, and old deployed FLAN-T5 model. However, LoRA does not outperform the V1 full fine-tuned FLAN-T5 model. The ablation suggests LoRA is useful for efficient experimentation, while full fine-tuning remains the strongest approach for strict ACOS tuple extraction in this project.
Final model error analysis on the all-domain OATS-ABSA test set:
| Error Type | Count |
|---|---|
| Exact matched tuples | 427 |
| Missed tuples | 220 |
| Hallucinated tuples | 54 |
| Wrong opinion | 508 |
| Wrong category | 299 |
| Wrong sentiment | 145 |
| Malformed output | 0 |
The biggest remaining challenge is that the model sometimes predicts the wrong opinion phrase or category. Many predictions are partially correct, but Exact Match F1 is strict and gives no credit if one field is different.
The Streamlit app contains four main pages:
- Single Review: analyze one English or Bahasa Melayu course review.
- Batch Analysis: upload CSV/Excel files and download structured ACOS results.
- Performance: view final model metrics and old-vs-final comparison.
- About: view the language pipeline, model setup, dataset notes, and limitations.
Using the existing project virtual environment:
cd "C:\Users\Administrator\OneDrive\Desktop\Semester 2 2025-2026\FYP1\Code\course-acos-extraction"
..\course-acos-extraction.venv\Scripts\python.exe -m streamlit run app.pyGeneral setup:
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
streamlit run app.pycourse-acos-extraction/
|-- app.py
|-- requirements.txt
|-- README.md
|-- models/
| `-- final_flan_t5_acos_model/
|-- data/
| `-- predictions/
| `-- predictions_final.csv
|-- reports/
|-- screenshots/
| |-- 01_single_review_english.png
| |-- 02_single_review_malay.png
| |-- 03_batch_analysis.png
| |-- 04_performance_page.png
| `-- 05_about_page.png
`-- src/
- Exact Match F1 is strict.
- The model was trained on English ACOS data.
- Bahasa Melayu support depends on Malay-to-English translation quality.
- Some opinion phrases and categories are still confused.
- This is a research prototype, not a production system.
- Optional LoRA fine-tuning as an ablation experiment.
- Add more education-domain ACOS data.
- Improve category normalization.
- Improve multilingual training instead of relying only on translation.
- Add human evaluation for course-feedback usefulness.
- Deploy the final version to Hugging Face Spaces.
The fine-tuned FLAN-T5-base model weights are not included in this GitHub repository due to file size limits.
To run the app locally, place the final model folder at:
models/final_flan_t5_acos_model/
The model can later be hosted separately on Hugging Face Model Hub for deployment.




