Data Migration Retool

See milestone:"Data Migration Retool" for rationale.

Major Steps

  • Save MSWord Directory file as "Web Page filtered"
    • Set "Save for Web" options to ensure we get UTF8, CSS style (which we can easily ignore)
  • Use Beautiful Soup to make it well-formed
  • Extract the Directory portion (HTML tables)
  • Serialize these to xml with some semantics based on what we know about the types of tables
  • Perform transforms, regular expressions etc. in a series of steps to get data that's ready to join to the stuff that has to come from map-feature extraction in arcmap, also by way of xml:
    • Coordinates
    • Place type
    • Locative certainty
    • Material (mines and quarries)
    • Orientation (for some types like bridges, centuriation, passes)
    • Identity information necessary to effect matching: Label, GridSquare?, manual disambiguator number