Data Migration Retool
See milestone:"Data Migration Retool" for rationale.
Major Steps
- Save MSWord Directory file as "Web Page filtered"
- Set "Save for Web" options to ensure we get UTF8, CSS style (which we can easily ignore)
- Use Beautiful Soup to make it well-formed
- Extract the Directory portion (HTML tables)
- Serialize these to xml with some semantics based on what we know about the types of tables
- Perform transforms, regular expressions etc. in a series of steps to get data that's ready to join to the stuff that has to come from map-feature extraction in arcmap, also by way of xml:
- Coordinates
- Place type
- Locative certainty
- Material (mines and quarries)
- Orientation (for some types like bridges, centuriation, passes)
- Identity information necessary to effect matching: Label, GridSquare?, manual disambiguator number
