hazyresearch / deepdive Goto Github PK
View Code? Open in Web Editor NEWDeepDive
Home Page: deepdive.stanford.edu
DeepDive
Home Page: deepdive.stanford.edu
I found that holdout_query
must be ended with ';' for now.
If I use the following:
holdout_query: "INSERT INTO dd_graph_variables_holdout(variable_id) select id from candidate where docid in (select docid from eval_docs)"
The SQL query will be:
21:59:06 [] ERROR SQL execution failed (Reason: ERROR: syntax error at or near "UPDATE"
Position: 122):
DROP TABLE IF EXISTS candidate_label_cardinality CASCADE;CREATE TABLE candidate_label_cardinality(candidate_label_cardinality) AS VALUES (1) WITH DATA;INSERT INTO dd_graph_variables_map(variable_id) SELECT id FROM dd_graph_variables;INSERT INTO dd_graph_variables_holdout(variable_id) select id from candidate where docid in (select docid from eval_docs)UPDATE dd_graph_variables SET is_evidence=false WHERE dd_graph_variables.id IN (SELECT variable_id FROM dd_graph_variables_holdout)
21:59:07 [inferenceManager] ERROR ERROR: syntax error at or near "UPDATE"
Position: 122
Apparently the system missed the ";" after holdout_query. Should be easy to fix.
In current tests it assume that DeepDive will auto-assign ID for developers if "id" is not returned in JSON. We cancelled that in the newest commit 5a6d651.
Need to fix tests to cancel this assumption: https://travis-ci.org/HazyResearch/deepdive/jobs/24320060
def buildCopySql
in PostgresExtractionDataStore.scala
)We must fix this before next code push.
@zhangce @feiranwang @msushkov
change
deepdive.extractions: {
wordsExtractor.style: "udf_extractor"
wordsExtractor.output_relation: "words"
wordsExtractor.input: "SELECT * FROM titles"
wordsExtractor.udf: "words.py"
}
to
deepdive.extraction.extractors: {
wordsExtractor {
style: "udf_extractor"
output_relation: "words"
input: "SELECT * FROM titles"
udf: "words.py"
}
}
(1) extractions->extraction.extractors; (2) nested is easier for reading
In all documentations on web.
There should be error messages when encountering unexpected configuration items.
Just now I mistyped "dependencies" as "depencencies", and there is no error message on parsing the config file. Therefore the dependencies is broken but programmers do not know what cause the problem.
Strongly suggest that unexpected configs should be abandoned, or at least warned.
When system is running I can see the log in log/2014-XX..XX.txt
,
but after I send a SIGINT (CTRL+C), the log becomes empty. Frustrating...
For such a case, letโs change the learning rate from the default of โ0.1โ to the โ0.001โ by adding the following sampler options to the configuration file:
sampler.sampler_args: "-l 125 -s 1 -i 200 --alpha 0.001"
Error in develop branch but not in master.
14:05:38.050 [default-dispatcher-2][profiler][Profiler] DEBUG starting report_id=inference_grounding
14:05:38.051 [default-dispatcher-3][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore(akka://deepdive)][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore] INFO Writing grounding queries to file="/var/folders/rz/0l6t9_w90hs_k6l6fq7nlsxm0000gn/T/grounding8297874664321351755.sql"
14:05:38.052 [default-dispatcher-6][taskManager][TaskManager] INFO Added task_id=inference
14:05:38.053 [default-dispatcher-6][taskManager][TaskManager] INFO 0/1 tasks eligible.
14:05:38.053 [default-dispatcher-6][taskManager][TaskManager] INFO Tasks not_eligible: Set(inference)
14:05:38.054 [default-dispatcher-6][taskManager][TaskManager] INFO Added task_id=calibration
14:05:38.054 [default-dispatcher-6][taskManager][TaskManager] INFO 0/2 tasks eligible.
14:05:38.055 [default-dispatcher-6][taskManager][TaskManager] INFO Tasks not_eligible: Set(inference, calibration)
14:05:38.056 [default-dispatcher-6][taskManager][TaskManager] INFO Added task_id=report
14:05:38.057 [default-dispatcher-6][taskManager][TaskManager] INFO 0/3 tasks eligible.
14:05:38.058 [default-dispatcher-6][taskManager][TaskManager] INFO Tasks not_eligible: Set(inference, report, calibration)
14:05:38.058 [default-dispatcher-6][taskManager][TaskManager] INFO Added task_id=shutdown
14:05:38.059 [default-dispatcher-6][taskManager][TaskManager] INFO 0/4 tasks eligible.
14:05:38.059 [default-dispatcher-6][taskManager][TaskManager] INFO Tasks not_eligible: Set(shutdown, inference, report, calibration)
14:05:38.076 [default-dispatcher-3][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore(akka://deepdive)][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore] INFO Executing grounding query...
14:05:38.351 [][][StatementExecutor$$anon$1] ERROR SQL execution failed (Reason: ERROR: invalid input syntax for integer: ""
Position: 184):
INSERT INTO dd_graph_weights(initial_value, is_fixed, description) SELECT DISTINCT 0.0 AS wValue, false AS wIsFixed, 'label1-' || (CASE WHEN "features.feature_id" IS NULL THEN '' ELSE "features.feature_id" END) || "label1_val_cardinality" AS wCmd FROM label1_query GROUP BY wValue, wIsFixed, wCmd
14:05:38.370 [default-dispatcher-3][inferenceManager][OneForOneStrategy] ERROR ERROR: invalid input syntax for integer: ""
Position: 184
org.postgresql.util.PSQLException: ERROR: invalid input syntax for integer: ""
Position: 184
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157) ~[postgresql-9.2-1003-jdbc4.jar:na]
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886) ~[postgresql-9.2-1003-jdbc4.jar:na]
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255) ~[postgresql-9.2-1003-jdbc4.jar:na]
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:555) ~[postgresql-9.2-1003-jdbc4.jar:na]
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:417) ~[postgresql-9.2-1003-jdbc4.jar:na]
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:410) ~[postgresql-9.2-1003-jdbc4.jar:na]
at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) ~[commons-dbcp-1.4.jar:1.4]
at org.apache.commons.dbcp.DelegatingPreparedStatement.execute(DelegatingPreparedStatement.java:172) ~[commons-dbcp-1.4.jar:1.4]
at scalikejdbc.StatementExecutor$$anonfun$execute$1.apply$mcZ$sp(StatementExecutor.scala:295) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor$$anonfun$execute$1.apply(StatementExecutor.scala:295) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor$$anonfun$execute$1.apply(StatementExecutor.scala:295) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor$NakedExecutor.apply(StatementExecutor.scala:33) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor$$anon$1.scalikejdbc$StatementExecutor$LoggingSQLAndTiming$$super$apply(StatementExecutor.scala:291) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor$LoggingSQLAndTiming$class.apply(StatementExecutor.scala:238) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor$$anon$1.scalikejdbc$StatementExecutor$LoggingSQLIfFailed$$super$apply(StatementExecutor.scala:291) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor$LoggingSQLIfFailed$class.apply(StatementExecutor.scala:269) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor$$anon$1.apply(StatementExecutor.scala:291) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.StatementExecutor.execute(StatementExecutor.scala:295) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DBSession$$anonfun$executeWithFilters$1.apply(DBSession.scala:248) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DBSession$$anonfun$executeWithFilters$1.apply(DBSession.scala:246) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.LoanPattern$.using(LoanPattern.scala:29) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.package$.using(package.scala:76) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DBSession$class.executeWithFilters(DBSession.scala:245) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.ActiveSession.executeWithFilters(DBSession.scala:420) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.SQLExecution.apply(SQL.scala:441) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at org.deepdive.inference.SQLInferenceDataStore$$anonfun$4$$anonfun$apply$4.apply(SQLInferenceDataStore.scala:39) ~[classes/:na]
at org.deepdive.inference.SQLInferenceDataStore$$anonfun$4$$anonfun$apply$4.apply(SQLInferenceDataStore.scala:38) ~[classes/:na]
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) ~[scala-library.jar:0.13.1]
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library.jar:0.13.1]
at org.deepdive.inference.SQLInferenceDataStore$$anonfun$4.apply(SQLInferenceDataStore.scala:38) ~[classes/:na]
at org.deepdive.inference.SQLInferenceDataStore$$anonfun$4.apply(SQLInferenceDataStore.scala:37) ~[classes/:na]
at scalikejdbc.DBConnection$$anonfun$autoCommit$1.apply(DB.scala:185) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DBConnection$$anonfun$autoCommit$1.apply(DB.scala:184) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.LoanPattern$.using(LoanPattern.scala:29) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.package$.using(package.scala:76) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DBConnection$class.autoCommit(DB.scala:184) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DB.autoCommit(DB.scala:498) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DB$$anonfun$autoCommit$2.apply(DB.scala:641) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DB$$anonfun$autoCommit$2.apply(DB.scala:640) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.LoanPattern$.using(LoanPattern.scala:29) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.package$.using(package.scala:76) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at scalikejdbc.DB$.autoCommit(DB.scala:640) ~[scalikejdbc_2.10-1.7.4.jar:1.7.4]
at org.deepdive.inference.SQLInferenceDataStore$class.execute(SQLInferenceDataStore.scala:37) ~[classes/:na]
at org.deepdive.inference.PostgresInferenceDataStoreComponent$PostgresInferenceDataStore.execute(PostgresInferenceDataStore.scala:19) ~[classes/:na]
at org.deepdive.inference.SQLInferenceDataStore$class.groundFactorGraph(SQLInferenceDataStore.scala:536) ~[classes/:na]
at org.deepdive.inference.PostgresInferenceDataStoreComponent$PostgresInferenceDataStore.groundFactorGraph(PostgresInferenceDataStore.scala:19) ~[classes/:na]
at org.deepdive.inference.InferenceManager$$anonfun$receive$1.applyOrElse(InferenceManager.scala:59) ~[classes/:na]
at akka.actor.Actor$class.aroundReceive(Actor.scala:467) ~[akka-actor_2.10-2.3-M2.jar:2.3-M2]
at org.deepdive.inference.InferenceManager$PostgresInferenceManager.aroundReceive(InferenceManager.scala:116) ~[classes/:na]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:491) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at akka.actor.ActorCell.invoke(ActorCell.scala:462) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:385) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library.jar:na]
14:05:38.372 [default-dispatcher-6][inferenceManager][InferenceManager$PostgresInferenceManager] INFO Starting
14:05:38.372 [default-dispatcher-3][factorGraphBuilder][FactorGraphBuilder$PostgresFactorGraphBuilder] INFO Starting
For input SQL statements to extractors
Instead of an executable
All data of an extractor is currently written to the relation specified in the output_relation
setting. It would be useful to allow extractors to write to multiple relations. One way to implement this would be to allow a _relation
key in the JSON output and use that value for grouping.
When a variable in a weight rule is null, the weight will become null. Not intended.
May have neglected some rows due to unknown parsing issues of TSV.
deepdive_spouse_tsv=# select count(*) from has_spouse_features ; count
--------
151808
(1 row)
deepdive_spouse_tsv=# select count(*) from has_spouse;
count
-------
75446
(1 row)
deepdive_spouse_tsv=# select count(*) from people_mentions ;
count
-------
39269
(1 row)
Correct number should be:
deepdive_spouse_plpy=# select count(*) from has_spouse_features;
count
--------
151824
(1 row)
deepdive_spouse_plpy=# select count(*) from has_spouse;
count
-------
75454
(1 row)
deepdive_spouse_plpy=# select count(*) from people_mentions ;
count
-------
39270
(1 row)
(tested on other two extractors)
Use jdbc rather than bash to execute SQL commands, to enable error catching.
without inference rule, the system should not do inference at all. (extract only)
Now it has error message like:
22:26:52 [inferenceManager] ERROR /afs/cs.stanford.edu/u/zifei/repos/deepdive/out/2014-04-21T222429/graph.weights (No such file or directory)
Need to be coherent with ID conventions
I tried to update the smoke example for develop branch. I changed the syntax to current setting, but the grounding SQL script failed here:
INSERT INTO dd_graph_variables(id, data_type, initial_value, is_evidence, cardinality)
SELECT people.id, 'Boolean', people.smokes::int, (people.smokes IS NOT NULL), null
FROM people;
DROP TABLE IF EXISTS people_smokes_cardinality CASCADE;
CREATE TABLE people_smokes_cardinality(people_smokes_cardinality) AS VALUES (1) WITH DATA;
INSERT INTO dd_graph_variables(id, data_type, initial_value, is_evidence, cardinality)
SELECT people.id, 'Boolean', people.has_cancer::int, (people.has_cancer IS NOT NULL), null
FROM people;
DROP TABLE IF EXISTS people_has_cancer_cardinality CASCADE;
CREATE TABLE people_has_cancer_cardinality(people_has_cancer_cardinality) AS VALUES (1) WITH DATA;
INSERT INTO dd_graph_variables_map(variable_id)
SELECT id FROM dd_graph_variables;
INSERT INTO dd_graph_variables_holdout(variable_id)
SELECT id FROM dd_graph_variables
WHERE RANDOM() < 0.0 AND is_evidence = true;
UPDATE dd_graph_variables SET is_evidence=false
WHERE dd_graph_variables.id IN (SELECT variable_id FROM dd_graph_variables_holdout);
The error is:
21:53:48.558 [default-dispatcher-2][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore(akka://deepdive)][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore] INFO Executing grounding query...
21:53:57.533 [][][StatementExecutor$$anon$1] ERROR SQL execution failed (Reason: ERROR: duplicate key violates unique constraint "dd_graph_variables_pkey" (seg20 rulk.stanford.edu:40000 pid=25436)):
DROP TABLE IF EXISTS people_smokes_cardinality CASCADE;CREATE TABLE people_smokes_cardinality(people_smokes_cardinality) AS VALUES (1) WITH DATA;INSERT INTO dd_graph_variables(id, data_type, initial_value, is_evidence, cardinality) SELECT people.id, 'Boolean', people.has_cancer::int, (people.has_cancer IS NOT NULL), null FROM people
21:53:57.558 [default-dispatcher-2][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore(akka://deepdive)][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore] ERROR org.postgresql.util.PSQLException: ERROR: duplicate key violates unique constraint "dd_graph_variables_pkey" (seg20 rulk.stanford.edu:40000 pid=25436)
21:53:57.559 [default-dispatcher-2][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore(akka://deepdive)][PostgresInferenceDataStoreComponent$PostgresInferenceDataStore] INFO [Error] Please check the SQL cmd!
21:53:57.644 [default-dispatcher-5][inferenceManager][OneForOneStrategy] ERROR ERROR: duplicate key violates unique constraint "dd_graph_variables_pkey" (seg20 rulk.stanford.edu:40000 pid=25436)
org.postgresql.util.PSQLException: ERROR: duplicate key violates unique constraint "dd_graph_variables_pkey" (seg20 rulk.stanford.edu:40000 pid=25436)
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2157) ~[postgresql-9.2-1003-jdbc4.jar:na]
What seeems to cause the error is that variable "smokes" and "has_cancer" are in the same table, and system tries to use the row ID as variable ID, but it fails since variables cannot have duplicate IDs...
Any suggestions?
There are potential errors in the new function executeSql
in src/main/scala/org/deepdive/extraction/ExtractorRunner.scala
.
Is the last commit well-tested? @senwu
I do sql:"select * from articles limit 10;" in an extractor and style:"sql_extractor", and the error goes like below:
22:04:29 [PostgresExtractionDataStore(akka://deepdive)] ERROR org.postgresql.util.PSQLException: A result was returned when none was expected.
22:04:29 [PostgresExtractionDataStore(akka://deepdive)] INFO [Error] Please check the SQL cmd!
22:04:29 [extractorRunner-ext_test_sql] ERROR A result was returned when none was expected.
org.postgresql.util.PSQLException: A result was returned when none was expected.
When I try to build my own code on this function, I also got errors like:
21:43:47 [PostgresExtractionDataStore(akka://deepdive)] ERROR org.postgresql.util.PSQLException: No value specified for parameter 1.
@dennybritz : What is the right way to execute a sql query somewhere other than grounding?
I am getting the following error, although the setup is nearly identical to the deepdive_spouse example, and all the dd factors in postgreSQL are full (included below).
The application.conf file is available at https://github.com/tomMulholland/isDB
17:46:26.559 [Thread-23][sampler][Sampler] INFO 17:46:26.559 [main] DEBUG org.dennybritz.sampler.Runner$ - Creating factor graph...
17:46:26.640 [Thread-23][sampler][Sampler] INFO 17:46:26.639 [main] DEBUG org.dennybritz.sampler.Runner$ - Starting learning phase...
17:46:27.586 [Thread-23][sampler][Sampler] INFO 17:46:27.585 [main] DEBUG org.dennybritz.sampler.Learner - num_iterations=120
17:46:27.587 [Thread-23][sampler][Sampler] INFO 17:46:27.585 [main] DEBUG org.dennybritz.sampler.Learner - num_samples_per_iteration=1
17:46:27.587 [Thread-23][sampler][Sampler] INFO 17:46:27.586 [main] DEBUG org.dennybritz.sampler.Learner - learning_rate=0.1
17:46:27.588 [Thread-23][sampler][Sampler] INFO 17:46:27.587 [main] DEBUG org.dennybritz.sampler.Learner - diminish_rate=0.95
17:46:27.588 [Thread-23][sampler][Sampler] INFO 17:46:27.587 [main] DEBUG org.dennybritz.sampler.Learner - regularization_constant=0.01
17:46:27.589 [Thread-23][sampler][Sampler] INFO 17:46:27.587 [main] DEBUG org.dennybritz.sampler.Learner - num_factors=267260 num_query_factors=75456
17:46:27.590 [Thread-23][sampler][Sampler] INFO 17:46:27.587 [main] DEBUG org.dennybritz.sampler.Learner - num_weights=143009 num_query_weights=49011
17:46:27.590 [Thread-23][sampler][Sampler] INFO 17:46:27.587 [main] DEBUG org.dennybritz.sampler.Learner - num_query_variables=1791 num_evidence_variables=1227
17:46:27.751 [Thread-23][sampler][Sampler] INFO 17:46:27.750 [main] DEBUG org.dennybritz.sampler.Learner - iteration=0 learning_rate=0.1
Exception in thread "main" scala.collection.parallel.CompositeThrowable: Multiple exceptions thrown during a parallel computation: java.lang.UnsupportedOperationException: empty.reduceLeft
scala.collection.LinearSeqOptimized$class.reduceLeft(LinearSeqOptimized.scala:124)
scala.collection.immutable.List.reduceLeft(List.scala:84)
scala.collection.TraversableOnce$class.reduce(TraversableOnce.scala:195)
scala.collection.AbstractTraversable.reduce(Traversable.scala:105)
org.dennybritz.sampler.SamplingUtils$.sampleVariable(SamplingUtils.scala:34)
org.dennybritz.sampler.SamplingUtils$$anonfun$sampleVariables$1.apply$mcVI$sp(SamplingUtils.scala:42)
org.dennybritz.sampler.SamplingUtils$$anonfun$sampleVariables$1.apply(SamplingUtils.scala:42)
org.dennybritz.sampler.SamplingUtils$$anonfun$sampleVariables$1.apply(SamplingUtils.scala:42)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.parallel.immutable.ParHashSet$ParHashSetIterator.foreach(ParHashSet.scala:76)
.
.
.
at scala.collection.parallel.package$$anon$1.alongWith(package.scala:85)
at scala.collection.parallel.Task$class.mergeThrowables(Tasks.scala:86)
at scala.collection.parallel.ParIterableLike$Foreach.mergeThrowables(ParIterableLike.scala:972)
at scala.collection.parallel.Task$class.tryMerge(Tasks.scala:72)
at scala.collection.parallel.ParIterableLike$Foreach.tryMerge(ParIterableLike.scala:972)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:190)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:514)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:162)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:514)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
17:46:28.161 [default-dispatcher-11][inferenceManager][OneForOneStrategy] ERROR sampling failed (see error log for more details)
java.lang.RuntimeException: sampling failed (see error log for more details)
at org.deepdive.inference.Sampler$$anonfun$receive$1.applyOrElse(Sampler.scala:36) ~[classes/:na]
at akka.actor.Actor$class.aroundReceive(Actor.scala:467) ~[akka-actor_2.10-2.3-M2.jar:2.3-M2]
at org.deepdive.inference.Sampler.aroundReceive(Sampler.scala:17) ~[classes/:na]
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:491) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at akka.actor.ActorCell.invoke(ActorCell.scala:462) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at akka.dispatch.Mailbox.run(Mailbox.scala:219) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:385) [akka-actor_2.10-2.3-M2.jar:2.3-M2]
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library.jar:na]
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library.jar:na]
17:46:28.164 [default-dispatcher-11][sampler][LocalActorRef] INFO Message [akka.actor.PoisonPill$] from Actor[akka://deepdive/user/inferenceManager#-1596663203] to Actor[akka://deepdive/user/inferenceManager/sampler#-1865594242] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
17:46:28.165 [default-dispatcher-4][inferenceManager][InferenceManager$PostgresInferenceManager] INFO Starting
17:46:28.166 [default-dispatcher-11][factorGraphBuilder][FactorGraphBuilder$PostgresFactorGraphBuilder] INFO Starting
17:46:56.074 [default-dispatcher-4][taskManager][TaskManager] INFO Memory usage: 213/962MB (max: 962MB)
DeepDive variables are full.
isDB=# SELECT schemaname,relname,n_live_tup
isDB-# FROM pg_stat_user_tables
isDB-# ORDER BY n_live_tup DESC;
schemaname | relname | n_live_tup
------------+-----------------------------------------+------------
public | schol_features | 267260
public | f_is_schol_features_query | 267260
public | selectedgesfordumpsql_raw | 267260
public | dd_graph_edges | 267260
public | selectfactorsfordumpsql_raw | 267260
public | dd_graph_factors | 267260
public | dd_graph_weights | 143009
public | selectweightsfordumpsql_raw | 143009
public | selectvariablesfordumpsql_raw | 3018
public | dd_graph_variables | 3018
public | scholarships | 3018
public | dd_graph_variables_map | 3018
public | websites | 852
public | schol_int_study | 645
public | financial_aid | 489
public | dd_graph_variables_holdout | 269
public | factornum | 2
public | scholarships_is_scholarship_cardinality | 1
(18 rows)
nfactor
---------
0
267260
(2 rows)
Can you show me an example in application.conf to use the new custom holdout query? I need that pretty much. Thanks!
NOTE: This example wrongly refer to variable "id"s. Need rewriting.
@feiranwang @msushkov
Recover the views old system can give users, and enable users to reuse commands like relearn_from
and weight_table
.
Users should be able to only perform extractions, while skipping grounding, learning, inference and calibration. Or a more flexible pipeline should be supported.
udf_extractor
(the default one) is really a bad naming, since plpy/tsv extractors also have "udf".
We'll recently change all of them to json_extractor
.
There is no exception catch for database connection error, and the program runs forever...
We should do a sanity check on the configuration upon loading. Instead of the application crashing in the middle of execution due to a configuration issue, we should immediately exit if we find an obvious mistake. Some things we can check for:
There probably are more things that we can check for.
Currently, user need to interpret the calibration plots manually. It would be great if we could automatically give them recommendations based on the calibration data.
A general version not optimized for greenplum will be fine.
What are common features for IE applications? Dependency paths, etc.
People can currently write pure SQL extractors by using an empty extractor, but that's a "hack". We should have a principled way to allow for pure SQL extractor. The difficulty here is the assignment of unique variable IDs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.