Parsing a line in .CSV file into fields

  • 15 years ago

    VB6 introduced the split() function, which will return an array of strings, one for each delimited field in a string, with a user-selectable delimiter.


    Given...


     John Jones, 123 Main St., Dayton OH 45402


    ...in a string (a line collected by Line Input #1,  sRecord for example), then aFields = Split(sRecord) gives these array elements in aFields...


     John Jones
     123 Main St.
     Dayton OH 45402


    ...but the problem comes with real-world .CSV files, which may use quotes to surround fields containing commas as data. Given...


     "John Jones, Sr.", 123 Main St., "Dayton, OH 45402"


    ...the Split function returns the following...


     "John Jones
      Sr."
     123 Main St.
     "Dayton
      OH 45402"


    ...which is not useful. I have code to do the painful parsing myself at the moment, but I have to believe there's an API call somewhere, or maybe a DLL or ActiveX control one can buy or download which does this? Even VB itself contains code to parse this correctly, because if you loop through Input #1, sField statements for each field, the quoted fields will "stick together"  (and the quotes are conveniently discarded), but reading such a file like that will cause your program to come completely unglued (you get out of sync with the fields) the first time you hit a record with too few or too many commas.


    Has anybody seen a component one can download or buy which will assist in parsing this common file format?


    Thanks.


    - Chuck Somerville

  • 15 years ago

    Why don't you find and remove the quotes before spliting the string. You can do this using the Instr() function.


    Regards, Programmer

  • 15 years ago

    Quote:
    [1]Posted by Programmer on 20 Apr 2005 06:44 PM[/1]
    Why don't you find and remove the quotes before spliting the string. You can do this using the Instr() function.


    Regards, Programmer



    The quotes are not the problem. The commas within fields which are data, not delimiters are the problem.


    If I did what you suggest, in second my example I would then have the aFields array containing...


     John Jones
     Sr.
     123 Main St.
     Dayton
     OH 45402


    ...which is still not useful. (The file's records are supposed to contain three fields and I get  five fields out of this one.) Oh by the way, the commas after "Jones" and "Dayton" are now gone - removed by Split() - and they were part of the data.


    - Chuck

  • 15 years ago

    If have searched Google and found this http://www.freevbcode.com/ShowCode.asp?ID=3110  


    Open the page and click on source/clsCSV.zip to download the file.


    Hope it helps.


    Regards, Programmer

  • 15 years ago

    Hi Programmer,


    Thanks for the pointer. I had reviewed the list of items on that web site but somehow missed that one.


    Unfortunately, that guy's code is an overly complex wrapper on the VB6 Split() function and does nothing to the data other than applying Split() - it does nothing to handle commas as data within quoted fields.


    Oh well...  I'll keep on looking.


    If I get motivated enough, maybe I'll write the DLL in C (or maybe assembler) and put it in the Public Domain myself. I can't be the only one who has ever needed to read .CSV files.


    Cheers,


    - Chuck Somerville

  • 15 years ago


    Hi Chuck,


    Is this line of your data correct ?


     "John Jones, Sr.", 123 Main St., "Dayton, OH 45402"


    I would expect the 123 Main street to have quotes around it ?


    If so one can write a routine to using instr to split the string up looking for "," as the separtor. I have done it my self.


    Denis

  • 15 years ago
    Quote:
    [1]Posted by denis on 21 Apr 2005 04:58 PM[/1]

    Hi Chuck,

    Is this line of your data correct ?

     "John Jones, Sr.", 123 Main St., "Dayton, OH 45402"

    I would expect the 123 Main street to have quotes around it ?

    If so one can write a routine to using instr to split the string up looking for "," as the separtor. I have done it my self.

    Denis



    Hi Denis,

    Yeah, real-world files of this type can often be found where the writing software only quote-surrounds the fields which "need" it.

    Even BASIC's Write statement (which makes CSV file records) quotes strings and not numbers.

    Cut-and-pasted from a Command Prompt window ("DOS box")...

    Code:

     C:\>type test.bas

     DEFINT I
     DEFSTR S

     s1 = "Abc, Def"
     s2 = "Uvwxyz"
     i1 = 34
     i2 = 0

     OPEN "test.txt" FOR OUTPUT AS 1

     FOR i = 1 TO 5
         WRITE #1, i, s1, i1, s2, i2
     NEXT i

     CLOSE

     END


     C:\>qbasic /run test.bas

     C:\>type test.txt
     1,"Abc, Def",34,"Uvwxyz",0
     2,"Abc, Def",34,"Uvwxyz",0
     3,"Abc, Def",34,"Uvwxyz",0
     4,"Abc, Def",34,"Uvwxyz",0
     5,"Abc, Def",34,"Uvwxyz",0

     C:\>


    Sorry about the <code> formatting... the <pre> formatting turns out to triple-space the text!  (Yikes!)

    Cheers,

    - Chuck
  • 15 years ago
    Hi Chuck

    I sympathise with you - been trying to solve this myself.
    I'm not that familiar with VB6 but this function I found on the net works beautifully in vbscript Hope you find it useful.

    http://cwashington.netreach.net/depo/view.asp?Index=320&ScriptType=vbscript

    Cheers
    Nadine

Post a reply

Enter your message below

Sign in or Join us (it's free).

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“God could create the world in six days because he didn't have to make it compatible with the previous version.”