agrc / forklift Goto Github PK

:tractor::package::sparkles: Slinging data all over the place :tractor::package::sparkles:

Python 97.51% HTML 2.41% Batchfile 0.07%

python esri file-geodatabase arcgis arcgis-server etl fgdb geospatial geospatial-data

forklift's Introduction

🚜📦✨ forklift

A python CLI tool for managing and organizing the repetitive tasks involved with keeping remote geodatabases in sync with their sources. In other words, it is a tool to tame your scheduled task nightmare.

https://xkcd.com/2054/

Rules

The first rule of 🚜 is it does not work on any sabbath.

The second rule of 🚜 is that it's out of your element Donny.

Usage

The work that forklift does is defined by Pallets. forklift.models.Pallet is a base class that allows the user to define a job for forklift to perform by creating a new class that inherits from Pallet. Each pallet should have Pallet in it's file name and be unique among other pallets run by forklift.

A Pallet can have zero or more Crates. forklift.models.Crate is a class that defines data that will be moved from one location to another (reprojecting to web mercator by default). Crates are created by calling the add_crates (or add_crate) methods within the build method on the pallet. For example:

class MyPallet(Pallet):
    def __init__(self):
        #: this is required to initialize the Pallet base class properties
        super(MyPallet, self).__init__()

    def build(self, configuration)
        #: all operations that can throw an exception should be done in build
        destination_workspace = 'C:\\MapData'
        source_workspace = path.join(self.garage, 'connection.sde')

        self.add_crate('Counties', {'source_workspace': source_workspace,
                                    'destination_workspace': destination_workspace})

For details on all of the members of the Pallet and Crate classes see models.py.

For examples of pallets see samples/PalletSamples.py.

CLI

Interacting with forklift is done via the command line interface. Run forklift -h for a list of all of the available commands.

Config File Properties

config.json is created in the working directory after running forklift config init. It contains the following properties:

changeDetectionTables - An array of strings that are paths to change detection tables relative to the garage folder (e.g. SGID.sde\\SGID.META.ChangeDetection). A match between the source table name of a crate and a name from this table will cause forklift to skip hashing and use the values in the change detection table to determine if a crate's data needs to be updated. Each table should have the following fields:
- table_name - A string field that contains a lower-cased, fully-qualified table name (e.g. sgid.boundaries.counties).
- hash - A string that represents a unique hash of the entirety of the data in the table such that any change to data in the table will result in a new value.
configuration - A configuration string (Production, Staging, or Dev) that is passed to Pallet:build to allow a pallet to use different settings based on how forklift is being run. Defaults to Production.
dropoffLocation - The folder location where production ready files will be placed. This data will be compressed and will not contain any forklift artifacts. Pallets place their data in this location within their copy_data property.
email - An object containing fromAddress, and smptPort, and smtpServer or a sendgrid apiKey for sending report emails.
hashLocation - The folder location where forklift creates and manages data. This data contains hash digests that are used to check for changes. Referencing this location within a pallet is done by: os.path.join(self.staging_rack, 'the.gdb').
notify - An array of emails that will be sent the summary report each time forklift lift is run.
repositories - A list of github repositories in the <owner>/<name> format that will be cloned/updated into the warehouse folder. A secure git repo can be added manually to the config in the format below:
```
"repositories": [{
  "host": "gitlabs.com/",
  "repo": "name/repo",
  "token": "personal access token with `read_repository` access only"
}]
```
sendEmails - A boolean value that determines whether or not to send forklift summary report emails after each lift.
servers - An object describing one or more production servers that data will be shipped to. See below for more information.
serverStartWaitSeconds - The number of seconds that forklift will wait after starting ArcGIS Server. Defaults to 300 (5 minutes).
shipTo - A folder location that forklift will copy data to for each server. This is the datas' final location. Everything in the dropoffLocation will be copied to the shipTo location during a forklift ship. The shipTo path is optionally formatted with the servers.host value if present and necessary. Place a {} in your shipTo path if you would like to use this feature. eg: \\\\{}\\c$\\data.
warehouse - The folder location where all of the repositories will be cloned into and where forklift will scan for pallets to lift.
slackWebhookUrl - If you have a slack channel, you can login to the admin website and create a webhook url. If you set this property forklift will send reports to that channel.

Any of these properties can be set via the config set command like so:

forklift config set --key sendEmails --value False

If the property is a list then the value is appended to the existing list.

Metadata

Metadata is only copied from source to destination when the destination is first created, not on subsequent data updates. If you want to push metadata updates, delete the destination in the hashing folder and then it will be updated when it is recreated on the next lift.

Install to First Successful Run

From within the ArcGIS Pro conda environment (c:\Program Files\ArcGIS\Pro\bin\Python\scripts\proenv.bat):

Install git.
Install Visual Studio Build tools with the Desktop development with C++ module
Install ArcGIS Pro.
Add ArcGIS Pro to your path.
- If installed for all users: c:\Program Files\ArcGIS\Pro\bin\Python\scripts\.
- If install for single user: C:\Users\{USER}\AppData\Local\Programs\ArcGIS\Pro\bin\Python\Scripts.
Create a conda environment for forklift conda create --name forklift python=3.9.
Activate the conda environment activate forklift.
conda install arcpy -c esri
Chckout forklift repository: git clone https://github.com/agrc/forklift.git
pip install .\ from the directory containing setup.py.
Install the python dependencies for your pallets.
forklift config init
forklift config repos --add agrc/parcels - The agrc/parcels is the user/repo to scan for Pallets.
forklift garage open - Opens garage directory. Copy all connection.sde files to the forklift garage.
forklift git-update - Updates pallet repos. Add any secrets or supplementary data your pallets need that is not in source control.

Edit the config.json to add the arcgis server(s) to manage. The options property will be mixed in to all of the other servers.

username ArcGIS admin username.
password ArcGIS admin password.
host ArcGIS host address eg: myserver. Validate this property by looking at the machineName property returned by /arcgis/admin/machines?f=json
port ArcGIS server instance port eg: 6080

"servers": {
   "options": {
       "username": "mapserv",
       "password": "test",
       "port": 6080
   },
   "primary": {
       "host": "this.is.the.qualified.name.as.seen.in.arcgis.server.machines",
   },
   "secondary": {
       "host": "this.is.the.qualified.name.as.seen.in.arcgis.server.machines"
   },
   "backup": {
       "host": "this.is.the.qualified.name.as.seen.in.arcgis.server.machines",
       "username": "test",
       "password": "password",
       "port": 6443
   }
}

Edit the config.json to add the email notification properties. (This is required for sending email reports)
- smtpServer The SMTP server that you want to send emails with.
- smtpPort The SMTP port number.
- fromAddress The from email address for emails sent by forklift.
```
"email": {
    "smtpServer": "smpt.server.address",
    "smtpPort": 25,
    "fromAddress": "[email protected]"
}
```
forklift lift
forklift ship

run_forklift.bat is an example of a batch file that could be used to run forklift via the Windows Scheduler.

Upgrading Forklift

From the root of the forklift source code folder:

Activate forklift environment: activate forklift
Pull any new updates from GitHub: git pull origin master
Pip install with the upgrade option: pip install .\ -U

Upgrading ArcGIS Pro

Upgrade ArcGIS Pro

There is no second step if you originally created a fresh conda environment (not cloned from arcgispro-py3) and installed arcpy via conda install arcpy -c esri.

If you do need to recreate the forklift environment from scratch, follow these steps:

Copy the forklift-garage folder to a temporary location.
Activate forklift environment: activate forklift
Export conda packages: conda env export > env.yaml
Export pip packages: pip freeze > requirements.txt
Remove and make note of any packages in requirements.txt that are not published to pypi such as forklift.
Deactivate forklift environment: deactivate
Remove forklift environment: conda remove --name forklift --all
Create new forklift environment: conda create --clone arcgispro-py3 --name forklift --pinned
Activate new environment: activate forklift
Reinstall conda packages: conda env update -n forklift -f env.yaml
Reinstall pip packages: pip install -r requirements.txt
Copy the forklift-garage folder to the site-packages folder of the newly created environment.
Reinstall forklift and any other missing pip package (from root of project): pip install .\

Development Usage

create new env
- conda create --name forklift --clone arcgispro-py3
- activate forklift
install deps
- conda or pip install everything in the setup.py install_requires
optionally install forklift
- cd forklift
- pip install .\ -U
run forklift
- for the installed version execute forklift -h
- for the source version, from the **/src** directory, execute python -m forklift -h for usage

Tests

On first run

install deps
- pip install -e ".[tests]"
run tests
- python setup.py develop
- pytest -p no:faulthandler

-p no:faulthandler is to prevent pytest from printing tons of errors.

Tests that depend on a local SDE database (see tests/data/UPDATE_TESTS.bak) will automatically be skipped if it is not found on your system.

To run a specific test or suite: pytest -k <test/suite name>

If you have pip installed forklift into your current environment, you may need to uninstall it to get tests to see recent updates to the source code.

forklift's People

Contributors

Stargazers

Watchers

Forkers

michae1duncan fgcarto joshgroeneveld

forklift's Issues

Give a pallet a method to easily send emails

The pallet should have a method defined that allows for easy email notifications. This will allow for situations where we want to notify someone if data was updated a process occurred for a specific pallet.

Maybe something like

self.send_email('[email protected]', 'Raster data was updated (subject)', 'This would be the body of the email')

Create destination_workspace if it doesn't exist

Steps to reproduce

Create a crate that has a destination_workspace that does not exist
Run forklift lift on the pallet

Expected behavior

The destination workspace (e.g. a file geodatabase) would be created.

Actual behavior

The crate returns this error:

ERROR 000210: Cannot create output C:\ForkliftData\Broadband.gdb\BB_Service
Failed to execute (CopyFeatures).

reprojection

How should be define/handle reprojecting data between source and destination?

output_spatial_reference and default_transform properties on a pallet?

Report console output

The report from forklift when printed to the console is a bunch of html and is not helpful. We should do something easier to read and still color coded. The nose-cover output would be a good example.

Add configuration option

Explanation

Most of the pallets I have been converting have different settings for dev/test/prod. This is only possible with forklift now if we run the pallet individually. If we add a property to the config that has a configuration or we add optional args to the cli, we can pass that value to a new build method that we discussed in #65. Optional args can still be passed to the __init__ method when run individually.

Expected behavior

We should be able to run all pallets as dev, stage or prod without having to modify the pallets.

Default reproject to web mercator for Pallets

Pallet should define default destination_coordinate_system and geographic_transformation properties that will be passed into the Crate constructor. These could obviously be overridden.

self.destination_coordinate_system = 3857
self.geographic_transformation = 'NAD_1983_To_WGS_1984_5'

counties always updated

we should pull counties into the test sde and see what is happening to make it always think it's shape has changed.

dunderscore Pallet

Steps to reproduce

use dunderscore __ for all pallet properties and private methods

Expected behavior

Avoid unexpected overwrites when creating your own pallet

Actual behavior

💥 Collide all the things.

Git repos and pyc files

I think I'm noticing something odd where a pallets source code can update but if there was a pyc file from a prior run, the old code is run. We may want to git clean -f to remove untracked files after our git update.

This could get rid of secret files or things that are git ignored right? So maybe we can delete all *.pyc files?

Crate source_name happy path issue

Steps to reproduce

create a pallet with source_workspace set to a SDE db connect as not the owner. eg SGID10 with agrc user.
set source_name to owner.name or name. eg: GEOSICIENCE.AvalancePaths or AvalanchePaths
call forklift lift twice.
- once to create the destination
- once to update the destination

Expected behavior

I would like to be able to specify the feature class name without the owner. 🚜 should only fail if there are duplicate names.

Actual behavior

If using name 🚜 will fail on the first call to lift because it is not found in the source.
if using owner.name the second call to lift will fail because it is not found in the destination

from arcpy import env
from forklift.models import Pallet


class SimplePallet(Pallet):

    def __init__(self):
        super(SimplePallet, self).__init__()

        destination_workspace = env.scratchGDB
        source_workspace = 'Database Connections\\[email protected]'

        self.add_crates(['SGID10.GEOSCIENCE.AvalanchePaths'], {'source_workspace': source_workspace,
                                                               'destination_workspace': destination_workspace})

Order by sql clause

I think the problem here is objectid_1 and order by objectid

RuntimeError: An invalid SQL statement was used. [SELECT OBJECTID_1, CustomerInfo_FK, Program, ContactPerson, GUID, DateFromPurchasing, ContactPhone, OriginalContractAmount, PDFDocument, ConEffectiveDate, ContactEmail, Fund, ReasonForRejection, PracticeMapPDF, NRCSNum, CancelDate, ArchPDF, Notes, UDAFContractNum, Dept, NEPAPDF, Cancelled, Project_FK, ScheduleOfOperationsPDF, ContractAmount, ReasonForCancellation, GrantCategory, ContractStatus, GranteeStatus, ConExpirationDate, OrgUnit, ManagerNotes, DateReceived, CostShareRate, ContractType, AppUnit, DateToPurchasing FROM ContractInformation ORDER BY OBJECTID]

Detect field length schema changes

Steps to reproduce

Try to update a feature class that has a longer text field in source than in destination (same name).

Expected behavior

I would expect the tool to warn me about the different in field length but it doesn't.

Actual behavior

No schema change error is reported.

We should look at field type, length, scale and precision.

Ported from agrc/agrc.python#3.

Crates that update data in SDE have incorrect destination_name

Steps to reproduce

Run a pallet with a crate defined something like ('FCName', 'FGDB.gdb', 'SGID Stage.sde', 'SGID10.OWNER.FCName')

Expected behavior

SGID Stage.sde/SGID10.OWNER.FCName is updated or created.

Actual behavior

SGID Stage.sde/SGID10.OWNER.SGID10_OWNER_FCName is updated or created. :(

I propose that the . -> _ replacement only be done if destination_name is not passed into the constructor.

Shut down service around remove/copy of data

We need to test arcgis server to see what happens when we remove and copy data while it has a process spooled up. If it's ok then this issue can be closed.

Otherwise, a Pallet needs a property with the path to the service that can be used to restart the service.

I have a hunch, that we can remove and copy the fresh data and then restart the service. I'm hoping we do not need to stop the service to remove and copy.

The create schema lock setting might need to be set for all services.

Recursively search through pallet folder

Steps to reproduce

Call list-pallets on a folder that has pallet classes defined in files that are within subdirectories.

Expected behavior

All of the pallets (including those within subfolders) to be listed.

Actual behavior

Only direct children of the pallet folder are listed.

core._has_changes giving false positives when comparing numeric fields

Steps to reproduce

Crate:

'destination_name': 'DWQMercuryInFishTissue',
'destination_workspace': 'SGID10 as ENVIRONMENT on stage.sde',
'source_name': 'Mercury_in_Fish_Tissue',
'source_workspace': '\\\\<...>\\GIS\\DWQGIS\\projects\\Interactive_Map\\DWQ_Data_Interactive_Map.gdb'

Run pallet two times in a row.

Expected behavior

Result: "Created table successfully." on first run.
Result: "No changes found." on second run.

Actual behavior

Result: "Created table successfully." on first run.
Result: "Data updated successfully." on second run.

Here's an example of the rows that were compared:

source row: (194.8000030517578, -111.941056, u'No Consumption Advisory', 41.501327, u'4900440 no fish advisory', u'Bluegill', 1, 0.1346, 189.1999969482422, u'BOX ELDER', 2013, u'4900440-Bluegill-2013', 10, 0.206, u'4900440', u'Reservoir/Lake', 0.08, 0.040219, u'MANTUA RES AB DAM 01', 4594839.0, 421458.0, (421458.0, 4594839.0))
destination row: (194.80000305, -111.941056, u'No Consumption Advisory', 41.501327, u'4900440 no fish advisory', u'Bluegill', 1, 0.1346, 189.19999695, u'BOX ELDER', 2013, u'4900440-Bluegill-2013', 10, 0.206, u'4900440', u'Reservoir/Lake', 0.08, 0.040219, u'MANTUA RES AB DAM 01', 4594839.0, 421458.0, (421458.0, 4594839.0))

Pallet:process does not fire when crate result is CREATED

Steps to reproduce

Run forklift on a crate that is pointed at a destination workspace that does not exist

Expected behavior

Pallet:process should run.

Actual behavior

Pallet:process does not run and the pallet and crates return successful.

Move secrets.py to environment variables

This saves us from having to manage them in the source code.

Rename "plugin" to "pallet"?

Just an idea.

Sources without OBJECTID fields cause errors

Steps to reproduce

Define a crate that is pointed at a source without an OBJECTID field (e.g. a non-SDE table).

Expected behavior

The destination table should be created on the first run with an OBJECTID fields since it's being created within a geodatabase. Then the destination table should validate on subsequent runs.

Actual behavior

On the first run the destination table is created as expected. However, on susquent runs core.py reports OBJECTID as a missing field and throws an exception when trying to check for changes. See below for the console output...

INFO 06-20 07:48:34       lift:   49 crate: interactive_map_monitoring_sites
WARN 06-20 07:48:35       core:  174 Missing fields in \\tsclient\stdavis\Documents\Projects\deq-enviro\scripts\nightly\settings\..\databases\eqmairvisionp.sde\AVData.dbo.interactive_map_monitoring_sites: OBJECTID
ERRO 06-20 07:48:37       core:   63 unhandled exception: Attribute column not found[42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'OBJECTID'.] for crate {   'destination': 'C:\\Scheduled\\staging\\DEQEnviro\\TempPoints.gdb\\interactive_map_monitoring_sites',
    'destination_coordinate_system': <SpatialReference object at 0xc3a2930[0xc3393f8]>,
    'destination_name': 'interactive_map_monitoring_sites',
    'destination_workspace': 'C:\\Scheduled\\staging\\DEQEnviro\\TempPoints.gdb',
    'geographic_transformation': 'NAD_1983_To_WGS_1984_5',
    'result': (   'This crate was never processed.',
                  None),
    'source': '\\\\tsclient\\stdavis\\Documents\\Projects\\deq-enviro\\scripts\\nightly\\settings\\..\\databases\\eqmairvisionp.sde\\AVData.dbo.interactive_map_monitoring_sites',
    'source_name': 'AVData.dbo.interactive_map_monitoring_sites',
    'source_workspace': '\\\\tsclient\\stdavis\\Documents\\Projects\\deq-enviro\\scripts\\nightly\\settings\\..\\databases\\eqmairvisionp.sde'}
Traceback (most recent call last):
  File "C:\Python27\ArcGIS10.3\lib\site-packages\forklift\core.py", line 56, in update
    if _has_changes(crate):
  File "C:\Python27\ArcGIS10.3\lib\site-packages\forklift\core.py", line 269, in _has_changes
    for destination_row, source_row in izip(f_cursor, sde_cursor):
RuntimeError: Attribute column not found[42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'OBJECTID'.]
INFO 06-20 07:48:37       lift:   56 result: ('Unhandled exception during update.', "Attribute column not found[42S22:[Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid column name 'OBJECTID'.]")

specific pallets and gdb in use

When attempting to run forklift for a specific pallet, it failed to copy the data to copyDestinations because a service associated with another pallet was locking the same database. I wonder if _hydrate_copy_structures could be adjusted to populate source_to_services for all pallets even when it's run on a single pallet. That way when it goes to copy the data it would stop all services pointing to the database even if they are not associated with the pallet that was run.

cli

I vote for docopt because it's awesome.

brainstorming ideas...

'''
forklift
Usage:
  forklift update [--config=<config>]
  forklift update-only <path> [--plugin=<plugin>]
Options:
  --config     the path to some cfg or text file or something where we keep paths to places where there are update plugins. defaults to some relative or static path.
  --plugin     the name of the plugin used to filter execution. maybe a partial match or glob or exact match?
Arguments:
  <path>       an optional path to pass in so you can run a certain directory
'''

Always use both logging handlers

Steps to reproduce

Run lift with logger: "file".

Expected behavior

A log file is created but I still see the output in the console.

Actual behavior

The log file is created but the console is blank.

I can't think of a good reason to have to chose one handler over the other. Why not just use both all of the time and get rid of the logger config?

copy error reporting

lift is using pallet outside of the loop so the status is never set. A way to look up the pallet from the destination is needed to update the status for the pallet.

define plugin baseclass and structure

I gave this a shot in the parcels application. Let's put some incantation of this into forklift.

Do we want to have to checkout forklift into every project so the update script can be imported and inherited from? Do we checkout one version into each server where we have update scripts?

Create html email template for reports

Subject line

display total successful pallets / total pallets

Body

display each pallet name with success color coded and message if has one
display every crate for a pallet with result status color coded and messages if has

Attachments

attach log

feel free to add to this. trying to brainstorm.

Report the result of crates in the report

Steps to reproduce

Run lift

Expected behavior

I would expect to see the result of each crate.

Actual behavior

Only the message from each crate is displayed.

Pallet execute order

It looks as if multiple pallets defined in the same module are executed in alphabetical order. This should be verified and persisted with unit tests. There are some projects that will depend on a consistent execution order (e.g. DEQ).

add_crates tuple order is not intuitive

Steps to reproduce

create a simple pallet

class SimplePallet(Pallet):

    def __init__(self):
        super(SimplePallet, self).__init__()

        destination_workspace = env.scratchGDB
        source_workspace = 'Database Connections\\[email protected]'
        #: source name, source workspace, destination workspace, destination name
        crate_info = ('SGID10.GEOSCIENCE.AvalanchePaths', source_workspace, destination_workspace, 'AvyPaths')
        self.add_crate(crate_info)

Expected behavior

I would expect the crate creation to be source name, workspace, destination name, workspace.

Actual behavior

We have a mirror of params. without looking at this array it's hard to know how to create a crate properly.

Production Things Using Forklift

These are the apps that are currently using 🚜 in production

I may be missing some and some should be removed because they do not use fgdb's etc

Update scripts being used in production

core.py

Right now update.py has a lot of public methods. We should tighten those up when we create core.py from it.

move logging level and email notifications to config.json

Basic validation of crate source and destination

Steps to reproduce

if destination is a gdb, destination_name should not be allowed to contain a .
if source is a gdb, sounce_name should not be allowed to contain a .

Expected behavior

The crate should be ignored and a warning should be sent from 🚜

Actual behavior

Nothing but exceptions from arcpy

compact gdb's after truncate and append

williamscraigm: do you run compact on it after doing that? if you're doing lots of updates and don't compact, all the changes are in delta files and it's not optimal.

Weekly pallets show up as errors on days that they do not run

Steps to reproduce

Run sgid10pallet.py on a day other than Friday.

Expected behavior

It should be reported as running successfully. Perhaps with a message that says "This pallet only runs on Fridays".

Actual behavior

It shows up as an error with a message of "None".

Part of the issue is this change which will always set the success value of the pallet to be false if it's not Friday. @steveoh Why did we add the check on is_ready_to_ship to the report? I can't think of a good reason. I think that we should remove this extra check and set the success (boolean and message) in the pallet something like this:

def is_ready_to_ship(self):
    ready = strftime('%A') == 'Friday'
    if not ready:
        self.success = (True, 'This pallet only runs on Fridays')
    return ready

Pallets to fix

SGID10 in warehouse
DDACTS

Migrate upgrade scripts

Scripts to Update

From windows scheduler on .56

Michael's scripts

All are daily unless otherwise noted

Other services that use data in a fgdb and may or may not have nightly scripts

UtahEM (All data in UDES.gdb)

The rest of these are in SGID10.gdb...

Backups

We need to decide exactly how we want the backups generated and where they should be stored.

catch exceptions on pallet:ship & pallet:process

We should be able to report on exceptions that happen within these methods.

Catch pallet initialization errors

Steps to reproduce

Run a pallet that raises an exception in __init__

Expected behavior

I would expect forklift to complete successfully and show the error raised for that pallet.

Actual behavior

Forklift chokes and crashes without processing any subsequent pallets.

I think that we should probably wrap this code in a try/except. On except we may need to create a fake pallet with the appropriate success tuple.

Move data off of clustered file system

It should be stored locally on each server. This will help with agrc/locate#94

Use filename when scanning for pallets

I wonder if we should only try to import potential pallets that have "palllet" in the file name. This would cut out all of the issues with running standalone scripts unintentionally. It would also make the import errors more relevant since they are more likely real issues. For example, last night the deq pallet failed to import (because I forgot to include a file that's not in version control) but the main forklift report didn't report any issues.

forklift.py

forklift.py can be the runner. This will be the tool that is run every so often. I think it should have a config of file paths - unc and most likely a default for c:\scheduled so it can scan for update plugins. Then it can figure out which ones to run and which ones to skip.

Truncated log values

The logging situation now is not ideal. It does a sliding time based on the last write. The setting we have now makes it easy to make the log get cut off.

I think if we switch our handler to use the midnight setting and schedule forklift to be run after midnight our logs will be grouped properly.

Import core.py

I think we should pull update.py out of agrc.python and rename it to core.py.

CLI flag for printing debug logs

I would love a flag that I can use to print debug logs in the console when running forklift from the command line. Something like -v or -debug.

Tables getting OBJECTID_1 field

Steps to reproduce

create a pallet with a crate pointing to a table that is participating in a join
run forklift pointing to the pallet
look at the gdb table fields that was created

Expected behavior

The table structure should be copied as is

Actual behavior

OBJECTID_1 is created and causes issues on the second run of forklift.

If you need an example, i can repro this with the ACTS pallet I am creating so the data is coming from a 9.3 geodatabase.

Should OBJECTID_1 be added to the naughty list?

Split out SGID

Steps to reproduce

Are we going to split out the SGID10.gdb in the pallets or handle it magically in 🚜?

Expected behavior

A crate can ask for sgid10.boundaries.counties or counties and the county feature class will end up in a boundaries.gdb?

Actual behavior

Everything ends up in SGID10.gdb

copy data

Define a new property on Pallet called copy_data(List) that tells forklift what workspaces you want copied to the production servers. This would default to [] in which case no data would be copied.

Add a new property to the config called copy_destinations(List) that defines where you want the copy data (defined by copy_data in Pallet) from forklift to be copied after processing.

Implement a new private method in lift.py that loops through all of the pallets and generates a distinct list of all of the workspaces that need to be copied and copies them to copy_destinations.

Field type schema check issues between FGDB and SDE

Steps to reproduce

Copy a feature class with a field of type Float (shows Single in code) to an SDE database.
Notice that the field type is now Double and curse ESRI.
Run forklift on a pallet that defines a crate that moves data from the FGDB to SDE.

Expected behavior

The schema check for the crate should pass successfully since it's the same feature class and no change to the schema have been made.

Actual behavior

The schema check fails and reports something like: AVE_LENGTH: source type of Single does not match destination type of Double.

The issue is in this line of code. I propose that we run something like this before checking the field types:

if not isTable:
    arcpy.MakeFeatureLayer_management(sdeFC, layer, '1 = 2')
else:
    arcpy.MakeTableView_management(sdeFC, layer, '1 = 2')

try:
    arcpy.Append_management(layer, f, 'TEST')
    passed = True
except:
    # go onto checking the field types and lengths for the report

I've tested Append_management against the Single/Double issue and it runs successfully.

Create config file in the directory where forklift is invoked

Steps to reproduce

Run forklift init

Expected behavior

config.json created in the folder from which forklift was invoked.

Actual behavior

config.json is created in site-packages. This is nice in that it is in the same place no matter where you run forklift from but it's a pain to dig into that folder to take a look at the config file. It's also a bad place to be maintaining things in general. I think that it would be better to have it closer to where you are in the file system.