apache / incubator-datalab Goto Github PK
View Code? Open in Web Editor NEWApache DataLab (incubating)
Home Page: https://datalab.apache.org/
License: Apache License 2.0
Apache DataLab (incubating)
Home Page: https://datalab.apache.org/
License: Apache License 2.0
Acceptance criteria:
Acceptance criteria:
As of 14/05/2020 we support 2.4.4, we need to add support of Spark 3.0.0-preview2
FQDN should be assigned during instance creation.
Acceptance criteria:
Add 'Bucket Browser Actions' to roles in administrative page, so that administrator can distinguish access to bucket and other permissions via bucket browser among the users.
'Bucket Browser Actions' should have the following points:
Acceptance criteria:
User has access to endpoint_shared bucket and project bucket (only if he is assigned to this project) or to custom bucket
Another user does not have access to project bucket (if he is not assigned to this project)
User can upload and download files to and from bucket
User can create/delete folder
User can delete file
User can copy path folder/file
User can see bucket structure (tree)
To go to bucket manager user can from Notebook name popup and 'Bucket browser' button in 'List of resource page'.
Hi,
I have been running a Datalab instance but am not seeing any data under the billing section in the front end. AWS is definitely incurring costs as a result of Datalab ($40 for the month of June).
Is there some specific condition that needs to be fulfilled before billing is populated?
This is the command I used to create the Datalab:
/usr/bin/python3 ~/incubator-datalab/infrastructure-provisioning/scripts/deploy_datalab.py
--conf_service_base_name datalab-base-name
--conf_tag_resource_id datalab-resource-id
--conf_os_family debian
--key_path /home/ubuntu/.ssh/
--conf_key_name datalab
--action create
--keycloak_realm_name master
--keycloak_user XXXXXXXX
--keycloak_user_password XXXXXXXX
--keycloak_auth_server_url http://XX.XX.XXX.XX:8080
'aws'
--aws_region eu-west-1
--aws_zone eu-west-1a
--aws_ssn_instance_size t2.medium
--aws_billing_bucket datalabbilling
--aws_account_id XXXXXXXXXXX
--aws_access_key XXXXXXXXXXXXXXX
--aws_secret_access_key XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
An excerpt from the logs in /var/opt/datalab/log/ssn/billing.log:
2023-07-03 11:25:30.107 INFO 94517 --- [cluster-ClusterId{value='64a2b029b4d0cc7135f64a6e', description='null'}-localhost:27017] org.mongodb.driver.cluster : Discovered cluster type of STANDALONE
2023-07-03 11:25:30.919 INFO 94517 --- [main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8088 (https) with context path '/api/billing'
2023-07-03 11:25:30.922 INFO 94517 --- [main] com.epam.datalab.BillingAwsApplication : Started BillingAwsApplication in 21.602 seconds (JVM running for 23.418)
2023-07-03 11:25:30.926 DEBUG 94517 --- [main] com.epam.datalab.BillingServiceImpl : Billing report configuration file: /opt/datalab/conf/billing.yml
INFO [2023-07-03 11:30:00,751] org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/api/billing]: Initializing Spring DispatcherServlet 'dispatcherServlet'
INFO [2023-07-03 11:30:00,751] org.springframework.web.servlet.DispatcherServlet: Initializing Servlet 'dispatcherServlet'
INFO [2023-07-03 11:30:00,760] org.springframework.web.servlet.DispatcherServlet: Completed initialization in 9 ms
DEBUG [2023-07-03 11:30:01,109] com.epam.datalab.module.aws.AdapterS3File: Adapter S3 will be opened for READ
DEBUG [2023-07-03 11:30:02,317] com.epam.datalab.module.aws.AdapterS3File: New report files in bucket folder datalabbilling not found
DEBUG [2023-07-03 11:30:02,318] com.epam.datalab.module.aws.AdapterS3File: Adapter S3 has been opened
DEBUG [2023-07-03 11:30:02,318] com.epam.datalab.core.parser.ParserByLine: Source data has multy entry true
DEBUG [2023-07-03 11:45:00,243] com.epam.datalab.module.aws.AdapterS3File: Adapter S3 will be opened for READ
DEBUG [2023-07-03 11:45:00,332] com.epam.datalab.module.aws.AdapterS3File: New report files in bucket folder datalabbilling not found
DEBUG [2023-07-03 11:45:00,332] com.epam.datalab.module.aws.AdapterS3File: Adapter S3 has been opened
DEBUG [2023-07-03 11:45:00,332] com.epam.datalab.core.parser.ParserByLine: Source data has multy entry true
Excerpt from the logs in /var/opt/datalab/log/ssn/selfservice.log:
INFO [2023-07-03 11:25:30,534] org.eclipse.jetty.server.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@5a654e05{/,null,AVAILABLE}
INFO [2023-07-03 11:25:30,559] org.eclipse.jetty.server.AbstractConnector: Started application@359e27d2{SSL,[ssl, http/1.1]}{0.0.0.0:8443}
INFO [2023-07-03 11:25:30,565] org.eclipse.jetty.server.AbstractConnector: Started admin@277bc3a5{SSL,[ssl, http/1.1]}{0.0.0.0:8444}
INFO [2023-07-03 11:25:30,565] org.eclipse.jetty.server.Server: Started @23057ms
INFO [2023-07-03 11:25:30,616] com.epam.datalab.backendapi.dropwizard.listeners.MongoStartupListener: Populating DataLab default roles into database
INFO [2023-07-03 11:25:30,648] com.epam.datalab.backendapi.dropwizard.listeners.MongoStartupListener: Check for connected endpoints:
connected endpoints: 1
connected clouds: [AWS]
INFO [2023-07-03 11:30:00,075] com.epam.datalab.backendapi.schedulers.CheckInfrastructureStatusScheduler: Trying to update infrastructure statuses
INFO [2023-07-03 11:30:00,194] com.epam.datalab.backendapi.service.impl.InfrastructureInfoServiceImpl: EnvResources is empty: EnvResourceList{host=[], cluster=[]} , didn't send request to provisioning service
INFO [2023-07-03 11:30:00,208] com.epam.datalab.backendapi.schedulers.billing.BillingScheduler: Trying to update billing
INFO [2023-07-03 11:30:02,580] com.epam.datalab.backendapi.service.impl.BillingServiceImpl: Updating billing information for endpoint local. Billing data []
INFO [2023-07-03 11:45:00,064] com.epam.datalab.backendapi.schedulers.CheckInfrastructureStatusScheduler: Trying to update infrastructure statuses
INFO [2023-07-03 11:45:00,094] com.epam.datalab.backendapi.service.impl.InfrastructureInfoServiceImpl: EnvResources is empty: EnvResourceList{host=[], cluster=[]} , didn't send request to provisioning service
Acceptance criteria:
For example, if user used all shapes limit of Amazon user can see the next error message: 'Shapes limit is exceeded'.
Content of error message should depend on error type.
If we terminate edge (or edge has been failed) we could not create the new edge in the same project and the same endpoint.
Edge not could be failed during stopping/starting/creating/terminating.
Statuses for recreate:
edge node is terminated from Cloud Web Console - recreate should be
edge node is terminated from Web DataLab UI - recreate should be
edge node failed during stopping/starting - return cloud status - recreate should NOT be
edge node failed during creating - recreate should be
edge node failed during terminating - recreate should be
If at least one instance exists - SMART recreate.
If instances do not exist - create all resources.
For example, In 'billing report' cost edge consists of three points:
Acceptance criteria:
Acceptance criteria:
Hi,
Is there a way to deploy Datalab on-prem (preferably using Kubernetes), rather than on AWS/GCP/Azure?
Thanks,
Acceptance criteria:
Additional disk should be only for notebook or for computational resource as well?
View the following link about sparkmagic:
https://github.com/jupyter-incubator/sparkmagic/blob/release/examples/Spark%20Kernel.ipynb
Line 72 in 423fa3a
Recommended upgrade version:1.3.21
Using proper local formats for dates for all DLab, currency.
If a few users simultaneously upload a lot of objects via bucket browser the SSN will be loaded.
So implement queue for upload process.
Acceptance criteria:
incubator-datalab/services/billing-aws/pom.xml
Lines 132 to 134 in 423fa3a
Recommended upgrade version:30.0-jre
If user (Project_admin of another project or not admin) has a notebook link of the other user he can go to this Notebook via his own credentials and view files of the other user on this Notebook.
So we should limit the access to this link from DevOps side (by the level of Keycloak).
an example is https://github.com/apache/poi/blob/trunk/SECURITY.md
It's important that users know how to securely disclose vulnerabilities
1. It should be the single source from which changes will be performed.
2. Go and distribute all traffic through this sandbox
How it works now:
How should it work:
Add possibility to turn on/off billing even for own resources.
For administration page for role add: 'View full billing report for currently logged in use'.
So billing consists of:
If user does not select any option - billing is disabled -> 'Billing report' page is not available
If user selects 'View billing report for all users' automatically 'View full billing report for currently logged in user' is selected as well.
If user selects 'View full billing report for currently logged in user' only this item is checked off and billing is available only for own resource.
Acceptance criteria:
Acceptance criteria:
I am running the deploy_datalab.py script with the below command:
/usr/bin/python3 /home/vboxuser/incubator-datalab/infrastructure-provisioning/scripts/deploy_datalab.py \ --conf_service_base_name datalab_poc \ --conf_os_family debian \ --key_path /home/vboxuser/key \ --conf_key_name datalabs_key \ --conf_tag_resource_id datalab \ --keycloak_auth_server_url XXXXXXXXXXXXXXX \ --keycloak_realm_name master \ --keycloak_user XXXXX \ --keycloak_user_password XXXXX \ --action create \ 'aws' \ --aws_access_key XXXXXXXXXXXXXXX \ --aws_secret_access_key "XXXXXXXXXXXXXXX " \ --aws_account_id XXXXXXXXXXXXXXX \ --aws_region XX-XXXX-X \ --aws_zone XXXX-XXX
When the script gets to the part where it attempts an SSH connection to the EC2 instance that was created, it does the 15 attempts and seems to succeed each time, but overall it fails. Excerpt from the logs are attached.
Thank you.
Is there a guide on Installing Datalab in existing Google Kubernetes cluster?
...
[INFO] Reactor Summary for dlab 1.0:
[INFO]
[INFO] dlab ............................................... FAILURE [ 9.322 s]
[INFO] common ............................................. SKIPPED
[INFO] dlab-utils ......................................... SKIPPED
[INFO] dlab-model ......................................... SKIPPED
[INFO] dlab-webapp-common ................................. SKIPPED
[INFO] provisioning-service ............................... SKIPPED
[INFO] dlab-mongo-migration ............................... SKIPPED
[INFO] self-service ....................................... SKIPPED
[INFO] billing-azure ...................................... SKIPPED
[INFO] billing-gcp ........................................ SKIPPED
[INFO] billing-aws ........................................ SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.231 s
[INFO] Finished at: 2020-11-17T13:24:59+01:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.rat:apache-rat-plugin:0.7:check (default) on project dlab: Too many unapproved licenses: 1 -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Acceptance criteria:
Link for open-source project https://grafana.com
Link for GitHub: https://github.com/grafana/grafana
Discuss the place where version should be stored, what list and so on...
Audit by DLab level
Possibility to find out:
History by changes should be in the following consistency:
user → time→ action
Fix the broken link [1] for opening new issue in the CONTRIBUTING.md [2]
[1] https://github.com/epam/DLab/issues
[2] "https://github.com/apache/incubator-dlab/blob/master/CONTRIBUTING.md"
Acceptance criteria:
incubator-datalab/services/billing-aws/pom.xml
Lines 90 to 94 in 423fa3a
Recommended upgrade version:5.1.3.FINAL
Notebook links are portrayed only for own resources. Administrator does not know the Notebook link of other user.
So convey notebook links to administrator in 'Environment management' page:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.