I have txt file with 25000 words and each word follows a synonym after "|" symbol. This file is used for software that is why it is organized this way. The problem is that most of synonyms are not interchangeable i all situations and you can't exactly change one to another in any text because it will be ruined. For example worry|interest. There are two different words and can only be synonyms in very rare situations.
What I need is to clean this DB and leave only synonyms that can ALWAYS be interchanged in any text without ruining it. Of course there is going to be a little difference with a new word but as long as the sense stays the same and text will be readable it is ok for me.
Most words are in more then one pair but if only one can truly interchange the word in any context you can leave just one. For example:
You can leave only continue|proceed and the rest needs just to be deleted.
If word has no 100% interchangeable synonym then you can delete all pairs.
Also words and synonyms that are no longer in use in modern language and are for example left from Shakespeare times for example also need to be deleted.
The person needs to be Fluent in spoken English.
This is one of 4 parts of the job and there will be 3 following exactly the same continuing the dictionary. I split it so I can check the quality before I proceed further.
Probably most of synonyms will be deleted as a result of this job but this is ok because that is the whole idea of this job as you read above.