Videos
Hello, fellow Redditors!
I'm looking to build an entity recognition model for my company's internal use, and I could use some guidance from the community. Essentially, I want to develop a model that can automatically extract specific entities like UID, email ID, and login ID from various types of text data, such as emails, logs, and messages.
uid -> a string of 5-15 digit set of characters from a-z caps and small, with a "." example"ramzi" emailid -> example "ramzees@gmailcom"
loginid-> 5 or 4 digit A-Z 0-9 "23542"
Specifically, I need some help on:
Data Collection: What kind of data do I need to collect for training the model? How should this data be annotated?
Feature Extraction: What features should I extract from the text data to train the model effectively? Are there any best practices for feature engineering in entity recognition tasks?
Model Training: How do I train the model using the annotated data and extracted features? Which machine learning algorithms or models are suitable for entity recognition tasks?
Evaluation: What metrics should I use to evaluate the performance of my model? How do I know if it's performing well enough?
I would greatly appreciate it if someone could provide detailed steps or point me to resources/tutorials that cover each of these aspects. Any advice, tips, or best practices would be invaluable.