profile image

Yi-Hui (Lily) Lee 李怡慧

Ph.D. Candidate

CV Contact Me

About Me

I am a fourth-year Ph.D. candidate in the Department of Computer Science at the University of Texas at Dallas. My research interest is machine learning and artificial intelligence, mainly using deep learning techniques and natural language processing to identify and inspect disinformation and online illicit activity in social society. My advisor is professor Shuang Hao.

Publications
Google Scholar

[LREC' 2020] Headword-Oriented Entity Linking: A New Entity Linking Task with Dataset and Baseline

Mu Yang, Chi-Yen Chen, Yi-Hui Lee, Qian-Hui Zeng, Wei-Yun Ma, Chen-Yang Shih, Wei-Jhih Chen

12th Language Resources and Evaluation Conference

Marseille, France, May 2020

PDF

[RecSys (Challenge)' 2019] A-HA: A Hybrid Approach for Hotel Recommendation

Kung-Hsiang Huang, Yi-Fu Fu, Yi-Ting Lee, Tzong-Hann Lee, Yao-Chun Chan, Yi-Hui Lee, Shou-De Lin

13th ACM Conference on Recommender Systems

Copenhagen, Denmark, September 2019

PDF

[WI' 2018] Conditional Relationship Extraction for Diseases and Symptoms by a Web Search-Based Approach

Yi-Hui Lee, Jia-Ling Koh

IEEE/WIC/ACM International Conference on Web Intelligence

Santiago, Chile, Dec 2018

PDF

[WWW (DEMO)' 2017] IExM: Information Extraction System for Movies

Peng-Yu Chen, Yi-Hui Lee, Yueh-Han Wu, Wei-Yun Ma

26th ACM International World Wide Web Conference

Perth, Australia, April 2017

PDF    Slides

Best Demo - Special Mention Award

Projects

project name

Computer Vision from Scratch to Implementation

Built image preprocessing functions such as (adapted) linear scaling and (adapted) histogram equalization from scratch, similar to cv2 function by python. Built a sequential CNN-based classifier test on downsized MNIST by TensorFlow. Fine-tuned parameter with the order of filter size, learning rate and epochs, dropout rate, and final epochs. Designed two different filter size and dropout layers to handle overfitting that the training accuracy 90.74% and testing accuracy 80.84% within 4 mins 23 secs. Designed a blinks detection application by applying MTCNN and cascade technique combined with rule-based algorithms. Improved the detector and achieved 81.36% accuracy by precisely designed its optimal region of interest, preprocessed image by applying blurring and histogram equalization algorithms, and fined-tuned parameters for MTCNN and cascade.

Python, TensorFlow, CV, cv2, CNN, MTCNN, cascade, MNIST

CODE

project name

Convolutional Neural Networks from Scratch to Implementation

Built Neural Networks, and Convolutional Neural Networks function similar to PyTorch function by python from scratch. Reproduced and fine-tuned RegNet test on downsized ImageNet by PyTorch. Designed efficient RegNet by small group size and high bottleneck ratio to lower parameters but keep the accuracy. Designed and concatenated different group sizes on the four different stages output similar to GoogLeNet. Designed Attention-Based Bilingual Image2caption Emoji Model (ABBIE) that transforms the image pixels into the bilingual caption and also with the corresponding emoji. Top-down ABBIE into transformer, bilingual language model, and image caption model. Utilized hugging face multiple transformer architecture to pretrain the word and emoji embedding. Fine-tuned the transformer to the image-2-text task.

Python, PyTorch, CNN, RegNet, ImageNet, GoogLeNet, Transformer, Hugging Face Transformer, Image Caption, Word Embedding, Emoji Embedding

RegNet PDF    ABBIE PDF

project name

[LREC' 2020]
Headword-Oriented Entity Linking: A New Entity Linking Task with Dataset and Baseline

We cooperated with PIXNET to help them link the cosmetic product mentioned in the PIXNET blog. We designed a headword-oriented entity linking (HEL), a specialized entity linking problem in which only the headwords of the entities are to be linked to knowledge bases. We developed a product embedding model to solve the entity linking problem in cosmetic domain. Besides, to increase training data, we propose a special transfer learning framework in which distant supervision with heuristic patterns is first utilized, followed by supervised learning using a small amount of manually labeled data. The experimental results show that our model provides a strong benchmark performance on the special task. Our model raised accuracy from baseline 64% to 83.4%. We published this work in LREC' 2020.

PIXNET, NLP, Corpus, Information Extraction, Distant Supervision, Encoder Modules, Cosmetic Domain, Word Segmentation

PDF    CODE

project name

Define IoT Device Naming Pattern in Domain Name

Researched domain name segmentation algorithms and deep neural networks for analyzing the domain name naming rules of IoT devices in IPv6. Collected 11, 182, 640 domain names in IPv6. Preprocessed leftmost-level domain name by adapted word segmentation by adding unknown words N-gram. Created association rules between substrings. Formulated naming pattern rules for the IoT device from embedding-based clustering algorithm.

Python, spaCy, Gensim, AllenNLP, Stanford CoreNLP, IoT Domain Name, IPv6, NLP, Regular Expressions, Word Segmentation, N-gram, Association Rules, Cluster

PDF    CODE

Work Experience

Research Assistant - The University of Texas at Dallas (Aug. 2018 to Present)

- Analyzed and defended against fake content, identified disinformation campaigns.
- Adopted various deep convolutional neural networks, classified downsized MNIST, fine-tuned testing accuracy 80.84% in 4 mins.
- Utilized NLP technique and clustering algorithm, defined naming patterns of the IoT device from domain names.

Teaching Assistant - The University of Texas at Dallas (May. 2019 to Present)

- Mentored a class of 70, moderated and evaluated NER research projects, accomplished 91.3% average accuracy rate.
- Course:
  Natural Language Processing (Spring 2020)
  Machine Learning (Fall 2020, Spring 2021)
  Semantic Web (Fall 2019)
  Discrete Mathematics (Summer 2019)
  Software Project Planning and Management (Spring 2021)

Research Assistant - Institute of Information Science, Academia Sinica (Jul. 2016 to Jun. 2018)

- Chinese Knowledge and Information Processing (CKIP) LAB
- Mostly focus on Natural Language Processing, and my advisor is professor Wei-Yun Ma.
- Specific in following area:
   a. Entity Linking: We cooperated with PIXNET to help them link the cosmetic product mentioned in the PIXNET blog. We designed a headword-oriented entity linking problem. We developed a product embedding model to solve the entity linking problem in cosmetic domain. Our model raised accuracy from baseline 64% to 83.4%. We published this work in LREC' 2020.
   b. Information Extraction: Designed ”Improved Pattern Ranking Algorithm (IPRA)” and built an information extraction system in movie domain. Our algorithm improved f1 score from baseline 67% to 73.4%. We published this work in WWW DEMO session 2017 and received the Best Demo - Special Mention Award.
- Led a team of 15 people for industry-university cooperation, reduced 90% manual label time and cost for 1,000k+ products.

Research Assistant - National Taiwan Normal University (Sep. 2015 – Aug. 2017)

- Knowledge Discovery and Data Mining (KDD) LAB
- Mostly focus on Text Mining and Relation Extraction, and my advisor is professor Jia-Ling Koh.
- We extract the conditional relationship for diseases and symptoms by a web search-based approach. We published this work in Web Intelligence, December 2018.

Software Engineering Intern- IBM, Taiwan (Jul. 2015 – Dec. 2015)

- Debugged and detected in User Acceptance Testing stage, solved and troubleshot the insurance batch system based on Agile development system.
- Communicated between engineer, architecture, product manager. Confirmed specifications and technical details meanwhile maintained software requirements specification of the insurance system.
- Planed round table panel talk and participated in the business proposal for internal summer intern emotional cloud service competition.