csvmedkit

The utilities:

  • csvflatten
    • Usage reference
      • -P, --prettify
      • -L, --max-length <max_length_of_field>
      • -R, --rec-id
      • -B, --label-chunks
      • -E, --eor <end_of_record_divider>
    • High level overview
      • Basic example
    • How it compares to existing tools
      • Compared to csvkit’s``csvlook``
      • Compared to xsv flatten
      • Compared to tabulate
    • Reference: Options and usage
      • -P/--prettify
      • -L/--max-length [integer]
      • -B/--chunk-labels
      • -E/--eor [END_OF_RECORD_MARKER (string)]
      • -R/--rec-id
    • Common scenarios and use cases
      • Making multiline tweets easier to read
  • csvheader
    • Options TK reference
      • -A, --add
      • -B, --bash
      • -C, --create <column_names>
      • -R, --rename <renamed_header_pairs>
      • -S, --slugify
      • -X, --regex <pattern> <replacement>
      • -P, --preview
    • High level overview TK
    • How csvheader compares to existing tools TK
      • Compared to adding a header row with csvformat --no-header-row
      • Compared to listing column names with csvcut --names
      • Compared to listing column names with xsv headers
      • Compared to replacing the first line of data with sed
    • Real-world use cases TK
      • Adding a header to the Social Security babynames data
  • csvslice
    • Options and flags
      • -i, --indexes <values>
      • --head <int>
      • --tail <int>
    • Usage overview and examples
      • Get the first n rows with --head
      • Get the last n rows with --tail
      • Slicing individual rows with --index
      • Slicing rows by an index range
      • Troubleshooting
    • How csvslice compares to existing tools
      • head: Get the first n rows
      • tail: Get the last n rows
      • csvformat: Skip the first n lines
      • xsv slice
      • The agate library
      • The pandas library
    • Real-world use cases
      • Skipping the meta-header in Census data TK
  • csvnorm
    • Usage reference
      • -c, --columns <COLUMNS>
      • -S, --slugify
      • -L, --lowercase
      • -U, --uppercase
      • --keep-lines
    • High level overview
    • How csvnorm compares to existing tools
      • csvformat
      • csvsed
      • Agate
      • Excel/Google Sheets
    • Usecases
  • csvpivot
    • Usage reference
      • --list-aggs
      • -r, --pivot-rows PIVOT_ROWNAMES
      • -c, --pivot-column PIVOT_COLNAME
      • -a, --agg AGGREGATES_LIST
    • High level overview
    • How it compares to existing tools
      • Excel/Google Sheets
      • pandas.pivot_table()
      • agate.Table.pivot()
    • Usecases
    • Limitations/future fixes
  • csvsed
    • Usage reference
      • -c, --columns <columns_list>
      • -m, --match-literal
      • -F, --filter
    • High level description
    • Real-world use cases
      • Using csvsed to clean up the SSA babynames data

Misc

  • Credits
    • Development Lead
    • Contributors
  • History
    • 0.0.0.1 (2020-10-02)

Cookbook

  • Cookbook of real-world CSV wrangling TKTK
    • Browsing/understanding messy data
      • Figuring out what’s in the NHTSA’s safety-related defect complaint database

Appendix

  • Data samples
    • ids.csv
    • hamlet.csv
csvmedkit
  • Docs »
  • Cookbook of real-world CSV wrangling TKTK »
  • Figuring out what’s in the NHTSA’s safety-related defect complaint database
  • Edit on GitHub

Figuring out what’s in the NHTSA’s safety-related defect complaint database¶

  • Landing page: https://www-odi.nhtsa.dot.gov/downloads/
  • README: https://www-odi.nhtsa.dot.gov/downloads/folders/Complaints/CMPL.txt
  • Direct download (250MB+): https://www-odi.nhtsa.dot.gov/downloads/folders/Complaints/FLAT_CMPL.zip

Skimming the structure¶

TKTKTK

Looking at the first record:

source
    $ head -n 1 examples/real/nhtsa-complaints.txt | csvformat -t | csvheaders --HM | csvflatten -P


    | field    | value                                                     |
    | -------- | --------------------------------------------------------- |
    | field_1  | 1                                                         |
    | field_2  | 958173                                                    |
    | field_3  | Ford Motor Company                                        |
    | field_4  | LINCOLN                                                   |
    | field_5  | TOWN CAR                                                  |
    | field_6  | 1994                                                      |
    | field_7  | Y                                                         |
    | field_8  | 19941222                                                  |
    | field_9  | N                                                         |
    | field_10 | 0                                                         |
    | field_11 | 0                                                         |
    | field_12 | SERVICE BRAKES, HYDRAULIC:PEDALS AND LINKAGES             |
    | field_13 | HIGH LAND PA                                              |
    | field_14 | MI                                                        |
    | field_15 | 1LNLM82W8RY                                               |
    | field_16 | 19950103                                                  |
    | field_17 | 19950103                                                  |
    | field_18 |                                                           |
    | field_19 | 1                                                         |
    | field_20 | BRAKE PEDAL PUSH ROD RETAINER WAS NOT PROPERLY INSTALLED, |
    |          | CAUSING BRAKES TO FAIL, RESULTING IN AN ACCIDENT AFTER    |
    |          | RECALL REPAIRS (94V-129). *AK                             |
    | field_21 | EVOQ                                                      |
    | field_22 |                                                           |
    | field_23 |                                                           |
    | field_24 |                                                           |
    | field_25 |                                                           |
    | field_26 |                                                           |
    | field_27 |                                                           |
    | field_28 |                                                           |
    | field_29 |                                                           |
    | field_30 |                                                           |
    | field_31 |                                                           |
    | field_32 |                                                           |
    | field_33 |                                                           |
    | field_34 |                                                           |
    | field_35 |                                                           |
    | field_36 |                                                           |
    | field_37 |                                                           |
    | field_38 |                                                           |
    | field_39 |                                                           |
    | field_40 |                                                           |
    | field_41 |                                                           |
    | field_42 |                                                           |
    | field_43 |                                                           |
    | field_44 |                                                           |
    | field_45 |                                                           |
    | field_46 | V                                                         |
    | field_47 |                                                           |
    | field_48 |                                                           |
    | field_49 |                                                           |
Next Previous

© Copyright 2020, Dan Nguyen Revision 0bd55494.

Built with Sphinx using a theme provided by Read the Docs.