The forklift's discuss from agrc

Add configuration option

Explanation

Most of the pallets I have been converting have different settings for dev/test/prod. This is only possible with forklift now if we run the pallet individually. If we add a property to the config that has a configuration or we add optional args to the cli, we can pass that value to a new build method that we discussed in #65. Optional args can still be passed to the __init__ method when run individually.

Expected behavior

We should be able to run all pallets as dev, stage or prod without having to modify the pallets.

Truncated log values

The logging situation now is not ideal. It does a sliding time based on the last write. The setting we have now makes it easy to make the log get cut off.

I think if we switch our handler to use the midnight setting and schedule forklift to be run after midnight our logs will be grouped properly.

add_crates tuple order is not intuitive

Steps to reproduce

create a simple pallet

class SimplePallet(Pallet):

    def __init__(self):
        super(SimplePallet, self).__init__()

        destination_workspace = env.scratchGDB
        source_workspace = 'Database Connections\\[email protected]'
        #: source name, source workspace, destination workspace, destination name
        crate_info = ('SGID10.GEOSCIENCE.AvalanchePaths', source_workspace, destination_workspace, 'AvyPaths')
        self.add_crate(crate_info)

Expected behavior

I would expect the crate creation to be source name, workspace, destination name, workspace.

Actual behavior

We have a mirror of params. without looking at this array it's hard to know how to create a crate properly.

Pallet execute order

It looks as if multiple pallets defined in the same module are executed in alphabetical order. This should be verified and persisted with unit tests. There are some projects that will depend on a consistent execution order (e.g. DEQ).

Report the result of crates in the report

Steps to reproduce

Run lift

Expected behavior

I would expect to see the result of each crate.

Actual behavior

Only the message from each crate is displayed.

Weekly pallets show up as errors on days that they do not run

Steps to reproduce

Run sgid10pallet.py on a day other than Friday.

Expected behavior

It should be reported as running successfully. Perhaps with a message that says "This pallet only runs on Fridays".

Actual behavior

It shows up as an error with a message of "None".

Part of the issue is this change which will always set the success value of the pallet to be false if it's not Friday. @steveoh Why did we add the check on is_ready_to_ship to the report? I can't think of a good reason. I think that we should remove this extra check and set the success (boolean and message) in the pallet something like this:

def is_ready_to_ship(self):
    ready = strftime('%A') == 'Friday'
    if not ready:
        self.success = (True, 'This pallet only runs on Fridays')
    return ready

Pallets to fix

SGID10 in warehouse
DDACTS

Create config file in the directory where forklift is invoked

Steps to reproduce

Run forklift init

Expected behavior

config.json created in the folder from which forklift was invoked.

Actual behavior

config.json is created in site-packages. This is nice in that it is in the same place no matter where you run forklift from but it's a pain to dig into that folder to take a look at the config file. It's also a bad place to be maintaining things in general. I think that it would be better to have it closer to where you are in the file system.

Basic validation of crate source and destination

Steps to reproduce

if destination is a gdb, destination_name should not be allowed to contain a .
if source is a gdb, sounce_name should not be allowed to contain a .

Expected behavior

The crate should be ignored and a warning should be sent from 🚜

Actual behavior

Nothing but exceptions from arcpy

move logging level and email notifications to config.json

Backups

We need to decide exactly how we want the backups generated and where they should be stored.

Detect field length schema changes

Steps to reproduce

Try to update a feature class that has a longer text field in source than in destination (same name).

Expected behavior

I would expect the tool to warn me about the different in field length but it doesn't.

Actual behavior

No schema change error is reported.

We should look at field type, length, scale and precision.

Ported from agrc/agrc.python#3.

specific pallets and gdb in use

When attempting to run forklift for a specific pallet, it failed to copy the data to copyDestinations because a service associated with another pallet was locking the same database. I wonder if _hydrate_copy_structures could be adjusted to populate source_to_services for all pallets even when it's run on a single pallet. That way when it goes to copy the data it would stop all services pointing to the database even if they are not associated with the pallet that was run.

Migrate upgrade scripts

Scripts to Update

From windows scheduler on .56

Michael's scripts

All are daily unless otherwise noted

Other services that use data in a fgdb and may or may not have nightly scripts

UtahEM (All data in UDES.gdb)

The rest of these are in SGID10.gdb...

copy data

Define a new property on Pallet called copy_data(List) that tells forklift what workspaces you want copied to the production servers. This would default to [] in which case no data would be copied.

Add a new property to the config called copy_destinations(List) that defines where you want the copy data (defined by copy_data in Pallet) from forklift to be copied after processing.

Implement a new private method in lift.py that loops through all of the pallets and generates a distinct list of all of the workspaces that need to be copied and copies them to copy_destinations.

dunderscore Pallet

Steps to reproduce

use dunderscore __ for all pallet properties and private methods

Expected behavior

Avoid unexpected overwrites when creating your own pallet

Actual behavior

💥 Collide all the things.

Crates that update data in SDE have incorrect destination_name

Steps to reproduce

Run a pallet with a crate defined something like ('FCName', 'FGDB.gdb', 'SGID Stage.sde', 'SGID10.OWNER.FCName')

Expected behavior

SGID Stage.sde/SGID10.OWNER.FCName is updated or created.

Actual behavior

SGID Stage.sde/SGID10.OWNER.SGID10_OWNER_FCName is updated or created. :(

I propose that the . -> _ replacement only be done if destination_name is not passed into the constructor.

Field type schema check issues between FGDB and SDE

Steps to reproduce

Copy a feature class with a field of type Float (shows Single in code) to an SDE database.
Notice that the field type is now Double and curse ESRI.
Run forklift on a pallet that defines a crate that moves data from the FGDB to SDE.

Expected behavior

The schema check for the crate should pass successfully since it's the same feature class and no change to the schema have been made.

Actual behavior

The schema check fails and reports something like: AVE_LENGTH: source type of Single does not match destination type of Double.

The issue is in this line of code. I propose that we run something like this before checking the field types:

if not isTable:
    arcpy.MakeFeatureLayer_management(sdeFC, layer, '1 = 2')
else:
    arcpy.MakeTableView_management(sdeFC, layer, '1 = 2')

try:
    arcpy.Append_management(layer, f, 'TEST')
    passed = True
except:
    # go onto checking the field types and lengths for the report

I've tested Append_management against the Single/Double issue and it runs successfully.

counties always updated

we should pull counties into the test sde and see what is happening to make it always think it's shape has changed.

cli

I vote for docopt because it's awesome.

brainstorming ideas...

'''
forklift
Usage:
  forklift update [--config=<config>]
  forklift update-only <path> [--plugin=<plugin>]
Options:
  --config     the path to some cfg or text file or something where we keep paths to places where there are update plugins. defaults to some relative or static path.
  --plugin     the name of the plugin used to filter execution. maybe a partial match or glob or exact match?
Arguments:
  <path>       an optional path to pass in so you can run a certain directory
'''

Production Things Using Forklift

These are the apps that are currently using 🚜 in production

I may be missing some and some should be removed because they do not use fgdb's etc

Update scripts being used in production

CLI flag for printing debug logs

I would love a flag that I can use to print debug logs in the console when running forklift from the command line. Something like -v or -debug.

Move data off of clustered file system

It should be stored locally on each server. This will help with agrc/locate#94

Move secrets.py to environment variables

This saves us from having to manage them in the source code.

Shut down service around remove/copy of data

We need to test arcgis server to see what happens when we remove and copy data while it has a process spooled up. If it's ok then this issue can be closed.

Otherwise, a Pallet needs a property with the path to the service that can be used to restart the service.

I have a hunch, that we can remove and copy the fresh data and then restart the service. I'm hoping we do not need to stop the service to remove and copy.

The create schema lock setting might need to be set for all services.

Rename "plugin" to "pallet"?

Just an idea.

core.py

Right now update.py has a lot of public methods. We should tighten those up when we create core.py from it.

Import core.py

I think we should pull update.py out of agrc.python and rename it to core.py.

Split out SGID

Steps to reproduce

Are we going to split out the SGID10.gdb in the pallets or handle it magically in 🚜?

Expected behavior

A crate can ask for sgid10.boundaries.counties or counties and the county feature class will end up in a boundaries.gdb?

Actual behavior

Everything ends up in SGID10.gdb

Order by sql clause

I think the problem here is objectid_1 and order by objectid

RuntimeError: An invalid SQL statement was used. [SELECT OBJECTID_1, CustomerInfo_FK, Program, ContactPerson, GUID, DateFromPurchasing, ContactPhone, OriginalContractAmount, PDFDocument, ConEffectiveDate, ContactEmail, Fund, ReasonForRejection, PracticeMapPDF, NRCSNum, CancelDate, ArchPDF, Notes, UDAFContractNum, Dept, NEPAPDF, Cancelled, Project_FK, ScheduleOfOperationsPDF, ContractAmount, ReasonForCancellation, GrantCategory, ContractStatus, GranteeStatus, ConExpirationDate, OrgUnit, ManagerNotes, DateReceived, CostShareRate, ContractType, AppUnit, DateToPurchasing FROM ContractInformation ORDER BY OBJECTID]

Sources without OBJECTID fields cause errors

Steps to reproduce

Define a crate that is pointed at a source without an OBJECTID field (e.g. a non-SDE table).

Expected behavior

The destination table should be created on the first run with an OBJECTID fields since it's being created within a geodatabase. Then the destination table should validate on subsequent runs.

Actual behavior

On the first run the destination table is created as expected. However, on susquent runs core.py reports OBJECTID as a missing field and throws an exception when trying to check for changes. See below for the console output...

INFO 06-20 07:48:34       lift:   49 crate: interactive_map_monitoring_sites
WARN 06-20 07:48:35       core:  174 Missing fields in \\tsclient\stdavis\Documents\Projects\deq-enviro\scripts\nightly\settings\..\databases\eqmairvisionp.sde\AVData.dbo.interactive_map_monitoring_sites: OBJECTID
ERRO 06-20 07:48:37       core:   63 unhandled exception: Attribute column not found[42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'OBJECTID'.] for crate {   'destination': 'C:\\Scheduled\\staging\\DEQEnviro\\TempPoints.gdb\\interactive_map_monitoring_sites',
    'destination_coordinate_system': <SpatialReference object at 0xc3a2930[0xc3393f8]>,
    'destination_name': 'interactive_map_monitoring_sites',
    'destination_workspace': 'C:\\Scheduled\\staging\\DEQEnviro\\TempPoints.gdb',
    'geographic_transformation': 'NAD_1983_To_WGS_1984_5',
    'result': (   'This crate was never processed.',
                  None),
    'source': '\\\\tsclient\\stdavis\\Documents\\Projects\\deq-enviro\\scripts\\nightly\\settings\\..\\databases\\eqmairvisionp.sde\\AVData.dbo.interactive_map_monitoring_sites',
    'source_name': 'AVData.dbo.interactive_map_monitoring_sites',
    'source_workspace': '\\\\tsclient\\stdavis\\Documents\\Projects\\deq-enviro\\scripts\\nightly\\settings\\..\\databases\\eqmairvisionp.sde'}
Traceback (most recent call last):
  File "C:\Python27\ArcGIS10.3\lib\site-packages\forklift\core.py", line 56, in update
    if _has_changes(crate):
  File "C:\Python27\ArcGIS10.3\lib\site-packages\forklift\core.py", line 269, in _has_changes
    for destination_row, source_row in izip(f_cursor, sde_cursor):
RuntimeError: Attribute column not found[42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'OBJECTID'.]
INFO 06-20 07:48:37       lift:   56 result: ('Unhandled exception during update.', "Attribute column not found[42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'OBJECTID'.]")

catch exceptions on pallet:ship & pallet:process

We should be able to report on exceptions that happen within these methods.

define plugin baseclass and structure

I gave this a shot in the parcels application. Let's put some incantation of this into forklift.

Do we want to have to checkout forklift into every project so the update script can be imported and inherited from? Do we checkout one version into each server where we have update scripts?

compact gdb's after truncate and append

williamscraigm: do you run compact on it after doing that? if you're doing lots of updates and don't compact, all the changes are in delta files and it's not optimal.

Use filename when scanning for pallets

I wonder if we should only try to import potential pallets that have "palllet" in the file name. This would cut out all of the issues with running standalone scripts unintentionally. It would also make the import errors more relevant since they are more likely real issues. For example, last night the deq pallet failed to import (because I forgot to include a file that's not in version control) but the main forklift report didn't report any issues.

core._has_changes giving false positives when comparing numeric fields

Steps to reproduce

Crate:

'destination_name': 'DWQMercuryInFishTissue',
'destination_workspace': 'SGID10 as ENVIRONMENT on stage.sde',
'source_name': 'Mercury_in_Fish_Tissue',
'source_workspace': '\\\\<...>\\GIS\\DWQGIS\\projects\\Interactive_Map\\DWQ_Data_Interactive_Map.gdb'

Run pallet two times in a row.

Expected behavior

Result: "Created table successfully." on first run.
Result: "No changes found." on second run.

Actual behavior

Result: "Created table successfully." on first run.
Result: "Data updated successfully." on second run.

Here's an example of the rows that were compared:

source row: (194.8000030517578, -111.941056, u'No Consumption Advisory', 41.501327, u'4900440 no fish advisory', u'Bluegill', 1, 0.1346, 189.1999969482422, u'BOX ELDER', 2013, u'4900440-Bluegill-2013', 10, 0.206, u'4900440', u'Reservoir/Lake', 0.08, 0.040219, u'MANTUA RES AB DAM 01', 4594839.0, 421458.0, (421458.0, 4594839.0))
destination row: (194.80000305, -111.941056, u'No Consumption Advisory', 41.501327, u'4900440 no fish advisory', u'Bluegill', 1, 0.1346, 189.19999695, u'BOX ELDER', 2013, u'4900440-Bluegill-2013', 10, 0.206, u'4900440', u'Reservoir/Lake', 0.08, 0.040219, u'MANTUA RES AB DAM 01', 4594839.0, 421458.0, (421458.0, 4594839.0))

Crate source_name happy path issue

Steps to reproduce

create a pallet with source_workspace set to a SDE db connect as not the owner. eg SGID10 with agrc user.
set source_name to owner.name or name. eg: GEOSICIENCE.AvalancePaths or AvalanchePaths
call forklift lift twice.
- once to create the destination
- once to update the destination

Expected behavior

I would like to be able to specify the feature class name without the owner. 🚜 should only fail if there are duplicate names.

Actual behavior

If using name 🚜 will fail on the first call to lift because it is not found in the source.
if using owner.name the second call to lift will fail because it is not found in the destination

from arcpy import env
from forklift.models import Pallet


class SimplePallet(Pallet):

    def __init__(self):
        super(SimplePallet, self).__init__()

        destination_workspace = env.scratchGDB
        source_workspace = 'Database Connections\\[email protected]'

        self.add_crates(['SGID10.GEOSCIENCE.AvalanchePaths'], {'source_workspace': source_workspace,
                                                               'destination_workspace': destination_workspace})

Git repos and pyc files

I think I'm noticing something odd where a pallets source code can update but if there was a pyc file from a prior run, the old code is run. We may want to git clean -f to remove untracked files after our git update.

This could get rid of secret files or things that are git ignored right? So maybe we can delete all *.pyc files?

Catch pallet initialization errors

Steps to reproduce

Run a pallet that raises an exception in __init__

Expected behavior

I would expect forklift to complete successfully and show the error raised for that pallet.

Actual behavior

Forklift chokes and crashes without processing any subsequent pallets.

I think that we should probably wrap this code in a try/except. On except we may need to create a fake pallet with the appropriate success tuple.

copy error reporting

lift is using pallet outside of the loop so the status is never set. A way to look up the pallet from the destination is needed to update the status for the pallet.

Give a pallet a method to easily send emails

The pallet should have a method defined that allows for easy email notifications. This will allow for situations where we want to notify someone if data was updated a process occurred for a specific pallet.

Maybe something like

self.send_email('[email protected]', 'Raster data was updated (subject)', 'This would be the body of the email')

Always use both logging handlers

Steps to reproduce

Run lift with logger: "file".

Expected behavior

A log file is created but I still see the output in the console.

Actual behavior

The log file is created but the console is blank.

I can't think of a good reason to have to chose one handler over the other. Why not just use both all of the time and get rid of the logger config?

Tables getting OBJECTID_1 field

Steps to reproduce

create a pallet with a crate pointing to a table that is participating in a join
run forklift pointing to the pallet
look at the gdb table fields that was created

Expected behavior

The table structure should be copied as is

Actual behavior

OBJECTID_1 is created and causes issues on the second run of forklift.

If you need an example, i can repro this with the ACTS pallet I am creating so the data is coming from a 9.3 geodatabase.

Should OBJECTID_1 be added to the naughty list?

Default reproject to web mercator for Pallets

Pallet should define default destination_coordinate_system and geographic_transformation properties that will be passed into the Crate constructor. These could obviously be overridden.

self.destination_coordinate_system = 3857
self.geographic_transformation = 'NAD_1983_To_WGS_1984_5'

Recursively search through pallet folder

Steps to reproduce

Call list-pallets on a folder that has pallet classes defined in files that are within subdirectories.

Expected behavior

All of the pallets (including those within subfolders) to be listed.

Actual behavior

Only direct children of the pallet folder are listed.

forklift.py

forklift.py can be the runner. This will be the tool that is run every so often. I think it should have a config of file paths - unc and most likely a default for c:\scheduled so it can scan for update plugins. Then it can figure out which ones to run and which ones to skip.

reprojection

How should be define/handle reprojecting data between source and destination?

output_spatial_reference and default_transform properties on a pallet?

Create html email template for reports

Subject line

display total successful pallets / total pallets

Body

display each pallet name with success color coded and message if has one
display every crate for a pallet with result status color coded and messages if has

Attachments

attach log

feel free to add to this. trying to brainstorm.

Pallet:process does not fire when crate result is CREATED

Steps to reproduce

Run forklift on a crate that is pointed at a destination workspace that does not exist

Expected behavior

Pallet:process should run.

Actual behavior

Pallet:process does not run and the pallet and crates return successful.

Create destination_workspace if it doesn't exist

Steps to reproduce

Create a crate that has a destination_workspace that does not exist
Run forklift lift on the pallet

Expected behavior

The destination workspace (e.g. a file geodatabase) would be created.

Actual behavior

The crate returns this error:

ERROR 000210: Cannot create output C:\ForkliftData\Broadband.gdb\BB_Service
Failed to execute (CopyFeatures).

Report console output

The report from forklift when printed to the console is a bunch of html and is not helpful. We should do something easier to read and still color coded. The nose-cover output would be a good example.

agrc / forklift Goto Github PK

forklift's Issues

Explanation

Expected behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Pallets to fix

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Scripts to Update

Michael's scripts

Other services that use data in a fgdb and may or may not have nightly scripts

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

These are the apps that are currently using 🚜 in production

Update scripts being used in production

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Subject line

Body

Attachments

Steps to reproduce

Expected behavior

Actual behavior

Steps to reproduce

Expected behavior

Actual behavior

Recommend Projects

Recommend Topics

Recommend Org