Comments (6)
Thanks for the ticket. It looks like there are two different issues in this ticket:
- cernopendata-client inconsistent with the results from the web.
- Issues with the updates
In this ticket I will focus on the second issue. For the first one, it might be better to create a dedicated ticket on the cernopendata-client repo.
I've created an empty instance locally, populated it only with the file mentioned in the ticket, then executed the command again in replace mode, and I can't reproduce the issue yet. I'll do the replace several more times to see if I can reproduce it.
Do you have the same issue with other files? This one is the largest, with the most entries to process. Since all the entries are processed in one transaction, I wonder if the transaction is too big.
From the logs posted in this ticket, you don't have access anymore to the message of the exception mentioned in This Session's transaction has been rolled back due to a previous exception during flush.
, do you?
from opendata.cern.ch.
cernopendata-client inconsistent with the results from the web
I don't think there is any problem with cernopendata-client
as such. I used it simply to automatically discover problems. You can find the same problems manually by browsing the web as well. Or by calling the REST API directly, such as:
$ curl http://opendata-qa.cern.ch/api/records/7794
{
"message": "The server could not verify that you are authorized to access the URL requested. You either supplied the wrong credentials (e.g. a bad password), or your browser doesn't understand how to supply the credentials required.",
"status": 401
}
from opendata.cern.ch.
I've created an empty instance locally, populated it only with the file mentioned in the ticket, then executed the command again in replace mode, and I can't reproduce the issue yet.
Have you followed exactly the procedure I mentioned? I.e. load old file (without file information of concerned records), then update with new file, and then re-update once again? I can reproduce the problem in this way.
Do you have the same issue with other files?
Haven't tried with other records, since I wanted to reproduce locally exactly the problem we were seeing on DEV and QA. But I can try to reproduce with a small file if it would be useful.
From the logs posted in this ticket, you don't have access anymore to the message of the exception mentioned in This Session's transaction has been rolled back due to a previous exception during flush., do you?
Nope, but I have just reproduced the problem, so here it is:
...
Record recid 7776 updated.
Record recid 7781 updated.
Record recid 7782 updated.
Record recid 7784 updated.
Record recid 7785 updated.
Record recid 7786 updated.
Recid 7787 file CMS_mc_Summer12_DR53X_GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_20000_file_index.json could not be loaded due to (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "uq_files_files_uri"
DETAIL: Key (uri)=(root://eospublic.cern.ch//eos/opendata/cms/mc/Summer12_DR53X/GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6/AODSIM/PU_S10_START53_V19-v1/file-indexes/CMS_mc_Summer12_DR53X_GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_20000_file_index.json) already exists.
[SQL: INSERT INTO files_files (created, updated, id, uri, storage_class, size, checksum, readable, writable, last_check_at, last_check) VALUES (%(created)s, %(updated)s, %(id)s, %(uri)s, %(storage_class)s, %(size)s, %(checksum)s, %(readable)s, %(writable)s, %(last_check_at)s, %(last_check)s)]
[parameters: {'created': datetime.datetime(2024, 5, 14, 6, 52, 9, 585255), 'updated': datetime.datetime(2024, 5, 14, 6, 52, 9, 585262), 'id': UUID('9b73ef1d-d5af-4f78-867a-af2695fb9a9a'), 'uri': 'root://eospublic.cern.ch//eos/opendata/cms/mc/Summer12_DR53X/GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6/AODSIM/PU_S10_START53_V19-v1/file-indexes/CMS_mc_Summer12_DR53X_GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_20000_file_index.json', 'storage_class': 'S', 'size': 4305, 'checksum': 'adler32:dc00a245', 'readable': True, 'writable': False, 'last_check_at': None, 'last_check': True}]
(Background on this error at: https://sqlalche.me/e/14/gkpj).
Recid 7787 file CMS_mc_Summer12_DR53X_GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_20000_file_index.txt could not be loaded due to This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "uq_files_files_uri"
DETAIL: Key (uri)=(root://eospublic.cern.ch//eos/opendata/cms/mc/Summer12_DR53X/GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6/AODSIM/PU_S10_START53_V19-v1/file-indexes/CMS_mc_Summer12_DR53X_GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_20000_file_index.json) already exists.
[SQL: INSERT INTO files_files (created, updated, id, uri, storage_class, size, checksum, readable, writable, last_check_at, last_check) VALUES (%(created)s, %(updated)s, %(id)s, %(uri)s, %(storage_class)s, %(size)s, %(checksum)s, %(readable)s, %(writable)s, %(last_check_at)s, %(last_check)s)]
[parameters: {'created': datetime.datetime(2024, 5, 14, 6, 52, 9, 585255), 'updated': datetime.datetime(2024, 5, 14, 6, 52, 9, 585262), 'id': UUID('9b73ef1d-d5af-4f78-867a-af2695fb9a9a'), 'uri': 'root://eospublic.cern.ch//eos/opendata/cms/mc/Summer12_DR53X/GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6/AODSIM/PU_S10_START53_V19-v1/file-indexes/CMS_mc_Summer12_DR53X_GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_20000_file_index.json', 'storage_class': 'S', 'size': 4305, 'checksum': 'adler32:dc00a245', 'readable': True, 'writable': False, 'last_check_at': None, 'last_check': True}]
(Background on this error at: https://sqlalche.me/e/14/gkpj) (Background on this error at: https://sqlalche.me/e/14/7s2a).
Traceback (most recent call last):
File "/opt/invenio/var/instance/python/bin/cernopendata", line 33, in <module>
sys.exit(load_entry_point('cernopendata', 'console_scripts', 'cernopendata')())
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/click/core.py", line 1719, in invoke
rv.append(sub_ctx.command.invoke(sub_ctx))
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/flask/cli.py", line 357, in decorator
return __ctx.invoke(f, *args, **kwargs)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/code/cernopendata/modules/fixtures/cli.py", line 249, in records
record.files.flush()
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/invenio_records_files/api.py", line 270, in files
record_id=self.id
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/invenio_records/api.py", line 87, in id
return self.model.id if self.model else None
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 487, in __get__
return self.impl.get(state, dict_)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 959, in get
value = self._fire_loader_callables(state, key, passive)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/attributes.py", line 990, in _fire_loader_callables
return state._load_expired(state, passive)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/state.py", line 712, in _load_expired
self.manager.expired_attribute_loader(self, toload, passive)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 1451, in load_scalar_attributes
result = load_on_ident(
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 407, in load_on_ident
return load_on_pk_identity(
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/loading.py", line 530, in load_on_pk_identity
session.execute(
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1665, in execute
) = compile_state_cls.orm_pre_session_exec(
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/context.py", line 312, in orm_pre_session_exec
session._autoflush()
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 2253, in _autoflush
self.flush()
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3449, in flush
self._flush(objects)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3478, in _flush
self.dispatch.before_flush(self, flush_context, objects)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/event/attr.py", line 247, in __call__
fn(*args, **kw)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy_continuum/manager.py", line 343, in before_flush
uow = self.unit_of_work(session)
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy_continuum/manager.py", line 305, in unit_of_work
conn = session.connection()
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1545, in connection
return self._connection_for_bind(
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1555, in _connection_for_bind
return self._transaction._connection_for_bind(
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 724, in _connection_for_bind
self._assert_active()
File "/opt/invenio/var/instance/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 604, in _assert_active
raise sa_exc.PendingRollbackError(
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "uq_files_files_uri"
DETAIL: Key (uri)=(root://eospublic.cern.ch//eos/opendata/cms/mc/Summer12_DR53X/GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6/AODSIM/PU_S10_START53_V19-v1/file-indexes/CMS_mc_Summer12_DR53X_GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_20000_file_index.json) already exists.
[SQL: INSERT INTO files_files (created, updated, id, uri, storage_class, size, checksum, readable, writable, last_check_at, last_check) VALUES (%(created)s, %(updated)s, %(id)s, %(uri)s, %(storage_class)s, %(size)s, %(checksum)s, %(readable)s, %(writable)s, %(last_check_at)s, %(last_check)s)]
[parameters: {'created': datetime.datetime(2024, 5, 14, 6, 52, 9, 585255), 'updated': datetime.datetime(2024, 5, 14, 6, 52, 9, 585262), 'id': UUID('9b73ef1d-d5af-4f78-867a-af2695fb9a9a'), 'uri': 'root://eospublic.cern.ch//eos/opendata/cms/mc/Summer12_DR53X/GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6/AODSIM/PU_S10_START53_V19-v1/file-indexes/CMS_mc_Summer12_DR53X_GluGlu_NMSSM_BBandA1_A1ToMuMu_mA1-40_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_20000_file_index.json', 'storage_class': 'S', 'size': 4305, 'checksum': 'adler32:dc00a245', 'readable': True, 'writable': False, 'last_check_at': None, 'last_check': True}]
(Background on this error at: https://sqlalche.me/e/14/gkpj) (Background on this error at: https://sqlalche.me/e/14/7s2a)
/usr/lib64/python3.9/site-packages/XRootD/client/finalize.py:46: DeprecationWarning: Importing 'itsdangerous.json' is deprecated and will be removed in ItsDangerous 2.1. Use Python's 'json' module instead.
if isinstance(obj, File) and obj.is_open():
from opendata.cern.ch.
The permission error is likely related to the permission of the file on eos:
[psaiz@aiadm08 ~]$ ls -al /eos/opendata/cms/mc/Summer12_DR53X/GluGlu_NMSSM_H2ToH1H1_H1To2Mu2B_mH2-125_mH1-60_8TeV_pythia6/AODSIM/PU_S10_START53_V19-v1/file-indexes/
total 16
drwxr-xr-x. 2 simko us 4096 Apr 30 15:07 .
drwxr-xr-x. 2 cmsrucio def-cg 4096 Apr 30 15:07 ..
-rw-r-----. 1 simko us 4777 Apr 30 15:07 CMS_mc_Summer12_DR53X_GluGlu_NMSSM_H2ToH1H1_H1To2Mu2B_mH2-125_mH1-60_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_00000_file_index.json
-rw-r-----. 1 simko us 2772 Apr 30 15:07 CMS_mc_Summer12_DR53X_GluGlu_NMSSM_H2ToH1H1_H1To2Mu2B_mH2-125_mH1-60_8TeV_pythia6_AODSIM_PU_S10_START53_V19-v1_00000_file_index.txt
Changing the permission there might solve the issue.
Thanks for the info for the duplicate. I'll see if it I can reproduce it
from opendata.cern.ch.
The permission error is likely related to the permission of the file on eos:
It shouldn't be related to the index file permissions, because from another open data deployment (that points to the same index file) the record is well accessible, e.g. compare:
$ curl http://opendata-qa.cern.ch/api/records/7794
$ curl http://opendata-dev.cern.ch/api/records/7794
from opendata.cern.ch.
Haven't tried with other records, since I wanted to reproduce locally exactly the problem we were seeing on DEV and QA. But I can try to reproduce with a small file if it would be useful.
I have managed to reproduce the problem with a file containing a single record. Here's the recipe:
$ docker exec -i -t opendatacernch-web-1 /code/scripts/populate-instance.sh --skip-records --skip-glossary --skip-docs
$ cat cernopendata/modules/fixtures/data/records/cms-tools-vm-image-2012.json | jq 'del( .[] ["files"])' > cernopendata/modules/fixtures/data/records/cms-tools-vm-image-2012-nofiles.json
$ docker exec -i -t opendatacernch-web-1 cernopendata fixtures records --mode insert -f cernopendata/modules/fixtures/data/records/cms-tools-vm-image-2012-nofiles.json
$ docker exec -i -t opendatacernch-web-1 cernopendata fixtures records --mode insert-or-replace -f cernopendata/modules/fixtures/data/records/cms-tools-vm-image-2012.json
$ docker exec -i -t opendatacernch-web-1 cernopendata fixtures records --mode insert-or-replace -f cernopendata/modules/fixtures/data/records/cms-tools-vm-image-2012.json
from opendata.cern.ch.
Related Issues (20)
- CMS - make available the requested on-demand datasets HOT 2
- improve download experience for slow network connections
- investigate and fix records experiencing file download troubles
- allow larger integer values in the `distribution.number_events` field
- docker: fix OpenSearch container starting troubles
- Ensure that redirections from `collections` are working:
- Search: text in title HOT 6
- ALICE OpenData instructions still reference "CernVM-Online"
- CMS: issues with high level trigger metadata.
- CMS: 15 dataset miss the LHE provenance HOT 1
- CMS: Missing " in one of the commands in Getting Started with CMS NanoAOD Open Data HOT 1
- Automatic citation inspire link for open data records HOT 9
- theme: update download warning modal text HOT 3
- CMS: add DOIs for 2016 MC
- CMS: wrong link to to cms-guide-xsec in 2015 records
- search: change number_of_events to number_events for the search HOT 1
- OPERA event visualisation is broken HOT 2
- docs: update DEVELOPING guide following infrastructure code split
- CMS - wrong units for dataset size in the PFNano dataset records
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opendata.cern.ch.