Data Science and Machine Learning projects
2020 research
Projects developed during the UniCredit internship including OCR, NLP, and document classification.
OCR Pipeline
- Implementation of a generic and configurable OCR web service for text extraction from scanned documents.
- Configurable image processing based preprocessing modules, configurable modules for different OCR engines, language sensitive word correction post processing modules.
Garnishment Document Classification and Enrichment
- Testing different enrichment models and perform different tests.
- Implementation of Deep learning techniques for document classification and named-entity recognition.
- Dockerizing, Integration and testing web services.
Stamp Recognition and Information Extraction
- Specific stamp recognition from scanned documents and date and time extraction.
- Implemented with Keras, scikit-learn, OpenCV.
- Managed to achieve 83% accuracy.