Hi,
Everything works fine in Data Factory except the following things (or I just don't understand how that works) :
Data Factory checks the avaibility for all slices whitout waiting the start slice hour.....
here :
Here the dataSetInput :
{ "name": "DVDFromAzureBlobInput", "properties": { "structure": [ { "name": "DVD_Title", "type": "String" }, { "name": "Studio", "type": "String" }, { "name": "Released", "type": "String" }, { "name": "Status", "type": "String" }, { "name": "Sound", "type": "String" }, { "name": "Versions", "type": "String" }, { "name": "Price", "type": "String" }, { "name": "Rating", "type": "String" }, { "name": "Year", "type": "String" }, { "name": "Genre", "type": "String" }, { "name": "Aspect", "type": "String" }, { "name": "UPC", "type": "String" }, { "name": "DVD_ReleaseDate", "type": "String" }, { "name": "ID", "type": "String" }, { "name": "Timestamp", "type": "String" } ], "published": false, "type": "AzureBlob", "linkedServiceName": "Dicom-Azure-Storage", "typeProperties": { "folderPath": "adf/inputdata/{Hour}/", "format": { "type": "TextFormat", "rowDelimiter": "\n", "columnDelimiter": ",", "nullValue": "N", "quoteChar": "\"" }, "partitionedBy": [ { "name": "Hour", "value": { "type": "DateTime", "date": "SliceStart", "format": "%H" } } ] }, "availability": { "frequency": "Hour", "interval": 1 }, "external": true, "policy": {} } }
here my pipeline :
{ "name": "CsvToSQLAzurePipeline", "properties": { "description": "CSV to SQL Azure", "activities": [ { "type": "Copy", "typeProperties": { "source": { "type": "BlobSource", "treatEmptyAsNull": true, "skipHeaderLineCount": 1 }, "sink": { "type": "SqlSink", "writeBatchSize": 10000, "writeBatchTimeout": "00:00:00" }, "translator": { "type": "TabularTranslator", "columnMappings": "DVD_Title: TITLE" } }, "inputs": [ { "name": "DVDFromAzureBlobInput" } ], "outputs": [ { "name": "DVDToAzureSQLOutput" } ], "scheduler": { "frequency": "Hour", "interval": 1 }, "name": "CopyTEST", "description": "description" } ], "start": "2016-04-06T10:30:00Z", "end": "2016-04-07T12:00:00Z", "isPaused": false, "hubName": "dicomfactory_hub", "pipelineMode": "Scheduled" } }
So the first slice works fine because the data is here and we have a script internally that loads a file on the blob storage every hours. But the data factory checks if the data is already here for every hours of the day..... So the execution doesn't works if I don't rerun the validation manually when the script has uploaded the file on the blob storage....
How to tell the data factory to validate only the slice of the current hour... not for every hours directly ?