EBCDIC Layout and Conversion

A Primer on COBOL Copybooks, EBCDIC to ASCII, and SQL Databases

EBCDIC data often needs to be put into a relational database like Access®, SQL Server™ or MySQL. However, that conversion is usually not trivial. This page discusses some of the many issues that arise when preparing EBCDIC data for a relational database, including:

Conversion to ASCII for Database

EBCDIC files can have complex formatting specified by "COBOL Copybook" layout files (like variable-length records) not supported by most databases. Also, different types of records can be intermixed into one data file.

Before putting data like that into a database, you need to convert it to a database-friendly format, like ASCII, creating a file for each database table. EBCDIC and ASCII are also very different and converting between them can be complicated.

Because of those and other differences, converting from EBCDIC to ASCII usually requires special software like the vEdit EBCDIC Converter Packages.

EBCDIC to ASCII to SQL Database Flowchart

Character Maps

The machine codes for displayable characters differ between EBCDIC and ASCII. To convert even the simplest alpha-numeric file, you must at least convert the character codes.

For example, the word "TEXAS" in EBCDIC is represented by the hex values <E3 C5 E7 C1 E2>. The same word in ASCII is <54 45 58 41 53>. If you don't convert the character codes, the data doesn't make sense.

If you simply FTP your EBCDIC file to your PC without converting it to ASCII using a program like vEdit, it will be gibberish.

An EBCDIC file is gibberish if read by an ASCII system without translating the character codes.

Delimiters and Layout Files

The layout of EBCDIC files is positional. There are no special "delimiter" characters to separate data columns from each other and no end-of-line ("EOL") characters to mark the end of each row of data.

When converting an EBCDIC file to ASCII, it helps to know ahead of time where everything is located. Otherwise, it can be difficult to convert the data. The position and length of each field is specified in a separate "layout" file; the most common type is a "COBOL copybook".

By contrast, ASCII data files are usually set up to use delimiters like commas to separate fields and EOL characters (like <CR> or <LF>) to separate records. If you need to know field names, you can put them as comma-separated values in the first line of the file. That takes up more space, but does not require a separate layout file and is much easier to import into a spreadsheet like Excel or a database like SQL Server.

Converting EBCDIC data with a layout file to ASCII requires specialized software, like one of the vEdit EBCDIC Converter Kits or a custom turn-key solution.

EBCDIC to ASCII Layout File Fields
The layout file specifies four fields in this EBCDIC data: a 7-character alpha-numeric, a 2-digit numeric, a 9-digit number (2 after the decimal) and a 5-character alpha-numeric.

REDEFINES Clauses

COBOL copybooks support REDEFINES clauses that change some of the fields in a record, based on the value of another field. The total number of bytes in each record of a REDEFINES block has to be the same, so some records require filler (empty) bytes.

Sometimes records are redefined using multiple layout files instead of a REDEFINES clause, in which case records of different lengths are intermixed in one file.

For example, an EBCDIC file for a bank might have a block of data with transactions that occurred during a certain period of time. The various types of transactions (e.g., deposit, withdrawal, balance inquiry) would have some identical fields (e.g., date, account and teller), but they would also have some different fields.

To put this information into a relational database, you need to group the records into blocks of data that have the same fields. Then you convert the EBCDIC data to ASCII files, one for each type of transaction, and import those files into corresponding database tables.

The process might be easier to understand in the reverse: imagine merging various teller transactions into one data block, thus creating the EBCDIC data. Putting that data into database tables requires separating it by transaction type.

To create the various ASCII files from such EBCDIC data, you need specialized software like a vEdit Level-3 or 4 EBCDIC Converter Kit that supports REDEFINES clauses. We can also build a turn-key solution that you run or we can perform the conversion for you.

EBCDIC to ASCII REDEFINES Clauses
Converting EBCDIC Data with REDEFINES clauses to ASCII files for import to SQL database tables

OCCURS ... DEPENDING ON Clauses

The COBOL copybook layout also supports OCCURS ... DEPENDING ON clauses, which define variable-length arrays in records. The fixed and variable-length portions of records like that must be split into separate ASCII files for import to a relational database.

The EBCDIC files shown here describe the name, suite number and phone extension, as well as the names and wages of the employees in, each department of a company. When converting the data to ASCII, you would put the fixed department info in one file and the employee info in another. You would also need an additional field (like the department name) in the employee file so you know which department each is in. Then you can import the ASCII files to your database tables.

To make this all work, you need specialized software like the vEdit Level-4 EBCDIC Converter Kit or a custom turn-key solution that can handle OCCURS ... DEPENDING ON clauses. If you don't want to run the conversion yourself, you can hire us to do it for you.

EBCDIC to ASCII OCCURS Clauses
Converting EBCDIC Data with OCCURS DEPENDING ON clauses to ASCII files for import to SQL database tables

Packed Decimal Numbers Binary-Coded-Decimal (BCD) and COMP-3

Packed decimal, sometimes called Packed Binary-Coded-Decimal (BCD) or COMP-3, is a scheme used for decades in IBM and other mainframe computers to reduce the number of bytes it takes to represent numbers. Since each digit (0-9) in a number can be represented by 4 bits (a half-byte, or "nibble"), every two digits can be "packed" into one byte (8 bits). The first nibble is one of the digits and the second nibble is the other.

Packed BCD uses one nibble (the "zone") to express the sign of the number (positive or negative). Another packed encoding called "Unisys COMP-3" does not contain a zone, so the numbers are unsigned.

Other packed schemes: Occasionally, an EBCDIC file will have pure binary fields, which you can't read just by looking at the data. Mainframe programs written in assembly language instead of COBOL can pack numbers in yet other ways. We can help you even with those types of files.

Without a layout file that describes which fields (if any) are packed and how, it would be very difficult to read the data.

vEdit EBCDIC Packages Levels 2, 3 and 4 can all convert packed EBCDIC data to ASCII using the layout file. If you're not able to do the setup work on your own, we can build a custom turn-key solution for you or even do the entire conversion for you.

EBCDIC Packed Decimal vs. ASCII
EBCDIC Packed BCD (comp-3) uses half the space to represent the same number, but you need the layout information to know where the decimal is located.