So, I'm not a great programmer. To say I'm a novice would be far over stating my abilities. More like, I understand it....a bit. But, where my knowledge falls off it falls off fast. I am working on a task at work that involves taking a large text file with multiple entries separated by a carriage return, line feed. In each entry there is data in the text file that contains text I need to extract. As of now we are doing it manually via copy and paste, which takes an exorbitantly long time (doing this 4-5 thousand times). I would really like to be able to parse the text into two strings, one that gets deleted and another that get saved to a new file or preserved on the same file without the first part. Generally, it's delimited by a dash and a space like this "- ". That denotes where the beginning text becomes the "features" which I need to extract. Here is an example of what I am working with:
Berkley Fusion19 hooks are targeted to everyone, from the novice to the avid angler. The Heavy Cover hook is an extremely strong hook used for flipping into the heaviest cover. The Heavy Cover hook features a stainless steel bait keeper designed to stay rigged cast after cast. Each front of every package provides soft-bait recommendations.Features:- The Heavy Cover flipping hook sizes include 6/0 to 3/0- Needle point with SlickSet Coating for easier penetration- Stainless steel wire bait keeper- Closed eyelet for line securitySpecifications:- Hook Size: 4/0- Color: Smoke Satin- Quantity: Per 4
What I need to do is extract everything between "features" and "specifications." But, it can't stop based on specifications because not all entries specifically denote that. Some may not have any specifications. Same with features. It can't use "features" to delimit it because sometimes the entry doesn't include that word. Sometimes specifications need to be used instead. But, one thing that is consistent is features always come before specifications. So, I figured it might be possible to parse out the text up until the first "- " and then stop the parsing at the next CRLF unless the word "specification" is seen at which point it will stop the parsing there.
This is somewhat confusing to explain, so if I can't get any help I understand. But, if someone could help, it would be awesome and literally save me a month of monotonous work that could be spent doing other tasks.