Coder Social home page Coder Social logo

Comments (7)

daniel-dona avatar daniel-dona commented on July 20, 2024 1

Possible problems on SET statements while importing data, further investigation needed.

Relevant:

LOAD DATA interprets an empty field value differently from a missing field:

  • For string types, the column is set to the empty string.
  • For numeric types, the column is set to 0.
  • For date and time types, the column is set to the appropriate “zero” value for the type. See Section 11.2, “Date and Time Data Types”.

https://dev.mysql.com/doc/refman/8.0/en/load-data.html
https://dev.mysql.com/doc/refman/8.0/en/datetime.html

from gtfs-bench.

daniel-dona avatar daniel-dona commented on July 20, 2024

This is the origin of the difference:

gtfs-bench/composer/app.py

Lines 280 to 291 in 205ce1a

DROP TABLE IF EXISTS FREQUENCIES;
CREATE TABLE FREQUENCIES (`trip_id` VARCHAR(200),
`start_time` VARCHAR(200),
`end_time` VARCHAR(200),
`headway_secs` INT,
`exact_times` INT DEFAULT 0,
PRIMARY KEY (trip_id,start_time));
LOAD DATA LOCAL INFILE '{2}datasets/{0}/{1}/FREQUENCIES.csv'
INTO TABLE FREQUENCIES FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS
SET exact_times = IF(exact_times='',NULL,exact_times);

But I can't see how the 0 become NULL; it should be 0 and only NULL if exact_times is an empty value... maybe some problem importing the CSV file to MySQL? Needs further investigation.

from gtfs-bench.

dachafra avatar dachafra commented on July 20, 2024

The problem is in the input dataset, the value of exact_time is always NULL, bc it's optional in the GTFS model and our dataset does not provide that information. What I don't understand is why the output contains 0...

from gtfs-bench.

daniel-dona avatar daniel-dona commented on July 20, 2024

The problem is in the input dataset, the value of exact_time is always NULL, bc it's optional in the GTFS model and our dataset does not provide that information. What I don't understand is why the output contains 0...

The data that VIG uses have NULL values:

DROP TABLE IF EXISTS `FREQUENCIES`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `FREQUENCIES` (
  `trip_id` varchar(200) NOT NULL,
  `start_time` varchar(200) NOT NULL,
  `end_time` varchar(200) DEFAULT NULL,
  `headway_secs` int(11) DEFAULT NULL,
  `exact_times` int(11) DEFAULT '0',
  PRIMARY KEY (`trip_id`,`start_time`),
  CONSTRAINT `FREQUENCIES_ibfk_1` FOREIGN KEY (`trip_id`) REFERENCES `TRIPS` (`trip_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
/*!40101 SET character_set_client = @saved_cs_client */;

--
-- Dumping data for table `FREQUENCIES`
--

LOCK TABLES `FREQUENCIES` WRITE;
/*!40000 ALTER TABLE `FREQUENCIES` DISABLE KEYS */;
INSERT INTO `FREQUENCIES` VALUES ('4_I12-001_2016I12_1_1_4__1___','00:00:00','01:00:00',600,NULL),('4_I12-001_2016I12_1_1_4__1___','01:00:00','02:00:00',900,NULL),('4_I12-001_2016I12_1_1_4__1___','06:05:00','07:00:00',540,NULL)

 [...]

And confirmed:

mysql> select * from FREQUENCIES LIMIT 10;
+-------------------------------+------------+----------+--------------+-------------+
| trip_id                       | start_time | end_time | headway_secs | exact_times |
+-------------------------------+------------+----------+--------------+-------------+
| 4_I12-001_2016I12_1_1_4__1___ | 00:00:00   | 01:00:00 |          600 |        NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 01:00:00   | 02:00:00 |          900 |        NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 06:05:00   | 07:00:00 |          540 |        NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 07:00:00   | 09:00:00 |          480 |        NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 09:00:00   | 18:00:00 |          480 |        NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 18:00:00   | 21:30:00 |          360 |        NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 21:30:00   | 23:00:00 |          390 |        NULL |
| 4_I12-001_2016I12_1_1_4__1___ | 23:00:00   | 24:00:00 |          450 |        NULL |
| 4_I12-001_2016I12_2_1_4__1___ | 00:00:00   | 01:00:00 |          600 |        NULL |
| 4_I12-001_2016I12_2_1_4__1___ | 01:00:00   | 02:00:00 |          900 |        NULL |
+-------------------------------+------------+----------+--------------+-------------+
10 rows in set (0.00 sec)

Maybe a VIG bug?

from gtfs-bench.

dachafra avatar dachafra commented on July 20, 2024

I'll make some tests locally

from gtfs-bench.

dachafra avatar dachafra commented on July 20, 2024

seems that the problem is here: SET exact_times = IF(exact_times='',NULL,exact_times);

from gtfs-bench.

dachafra avatar dachafra commented on July 20, 2024

if there are other issues regarding set in the loading but is not in exact_times (as it has been solved) please @daniel-dona open other issues

from gtfs-bench.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.