Parsing an X12 EDI 'by hand'
So you need to handle an X12 EDI message and likely store it's data back into your database.If this is your first time trying to do such a thing, you are probably looking at a structure something like this:
ISA*00**00**ZZ*AV09311993 *01*030240928*031023*1758*U*00401*557988899*1*T*:~GS*HS*AV01101957*030240928*20031023*17581205*1*X*004010X092A1~ST*270*0001~BHT*0022*13*19637886*20031023*17581205~HL*1**20*1~NM1*PR*2*HUMANA*****PI*HUMANA~HL*2*1*21*1~NM1*1P*2*Hospital Name*****FI*111111111~HL*3*2*22*0~TRN*1*19637886*3030240928~NM1*IL*1*LastName*FirstName****MI*1111111111~DMG*D8*20000101*F~DTP*472*D8*20031023~EQ*30~SE*13*0001~GE*1*1~IEA*1*557988899~
Note : The word wrapping has been done for readability, a properly formatted message would not have random CRLFs littered about, as is discussed below.
And your wondering what sadist created this standard, and how to handle it.First, let me tell you ... it gets worse before it gets better.If you intend to do alot of parsing of EDI, seriously look into BizTalk server (possibly with the EDI expansion offered by Covast) or one of its competitors.But EDI is easy to work with, once you understand the standards by which it is created, so I highly recommend going out to http://www.x12.org and finding the specifications for the type of EDI document you are working with.
An EDI document is just a delimited test file so the first thing to understand is that EDI has three different delimiters at work in it.
- Field Delimiters (* in the example above)
- Subfield Delimiters (: in the example above)
- Record Delimiters (~ in the example above)
The X12 standard does not set down any rules which state that these delimiters must be certain values.* and ~ for record and field are very common but you can't assume that everyone will follow this convention.So if the standard does not set down what the delimiters are how can you parse this delimited file, you ask?
Well the folks who set down the X12 EDI standards were bright folks, and while EDI as a whole is a delimited, the first record in every file (ISA) is positional as well as delimited.Every field in the ISA record has a set length it must be, and as such you can find data in this first record either by using the delimiters, or by searching in a certain number of characters.Now, positional files are more of a bear to deal with in most modern programming languages, so we are going to do as little postional parsing as possible.We just need to get our delimiters, and then we can set about parsing the data in a delimited fashion.
According to the X12 specification, the very last field in the ISA record has a value equal to the subfield delimiter for that file.This field is located at position 105 within the file, and did I mention it was the last field in the record?This means that position 106 will be the equal to the record delimiter within your file and position 104 is equal to the field delimiter (because every field of data has the delimiter between it and the next, even this one).So to recap we can find our delimiters at:
- Field Delimiter - Position 104
- Subfield Delimiter - Position 105
- Record Delimiter - Position 106
From this point on, it should be easy sailing.If you are working in a language like Visual Basic you can use the Split( ) function to take the entire file and split it by the record delimiters.Them you can take those records and split them by the Field Delimiters and finally if the field contains a Subfield delimiter you can split by that.
A couple of final pieces of advice as you begin to walk down this road:
- Build error check into your code ... trust me, you'll get badly formed messages and it will save you alot of time.
- The X12 standard sets down that every record has a tag associated with is (ISA, GS, ST, etc) which starts the record.When you split your data, be sure to check if the first character of the line is an alpha character.If not, you'll want to trim out any white space characters before continueing.The most common offending character that slips in is the line feed (LF) character.This happens when someone decides to take the more readable path of making a 'return' the record delimiter (which puts each record on its own line, which is nice) but doesn't realize that on their system 'return' is CRLF, hence two characters and a technical violation of the standard.You won't get people to every realize this, so just code for it, trust me.
- Start your own EDI parsing library, trust me it will come in handy.
- Seriously look at an Enterprise Application Integration (EAI) package like BizTalk or GEIS or the like if your volume gets very high.They are costly, but once you start getting into parsing the particulars of every different type of message (997, 824, 850, 837I, 837P, 837D, etc) you will begin to understand why such systems have been developed.