classtranscribe / webapi Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 3.0 7.6 MB

Repository for the .NET Core backend for ClassTranscribe

License: Other

C# 90.04% Dockerfile 0.52% Shell 0.01% Python 8.81% Jupyter Notebook 0.61%

webapi's People

Contributors

Stargazers

Watchers

Forkers

cosmicflutterfish prktkn vickycth

webapi's Issues

Use AsNoTracking for entities that are only used for reading

AsNoTracking is described here-
https://docs.microsoft.com/en-us/ef/core/querying/tracking

Having 405 when sign in

Hi developers,

I have the following issues when I sign in through UIUC shibboleth:

POST https://classtranscribe.illinois.edu/api/Account/SignIn 405
Failed to get user data and auth token from backend Error: Request failed with status code 405
    at e.exports (createError.js:17)
    at e.exports (settle.js:19)
    at XMLHttpRequest.f.onreadystatechange (xhr.js:63)

I wonder if there is anything I can do on my end to solve it.

Reduce number of files per directory

Provided all files use a random UUID/ random hex and are at least 6 characters long. Use aa/bb/cc (hexadecimal), so number of entries per directory is 100-256 and not 10000.

e.g. Filename 'f5d32a6c6452.txt' would be in directory f5/d3/3a
e.g. Filename '13254278678.txt'' would be in directory 13/25/42

For filenames (excluding the extension) less than 6 characters, just append 6 underscores to calculate the directories.

Map uppercase to lowercase and only allow digits 0-9a-z. Map all other characters to underscore.

ABC.txt will be in ab/c_/tx/
my music.mp3 will be in /my/_m/us/

Be sure to check subdirectories exist before saving the file. e.g. TO create the file 'f5d32a6c6452.txt' You don't know if f5/d3/3a already exists or f5/d3 or f5 exists.

Stage 1: Do this for all new arriving files
Stage 2: Remap existing resources i.e. update database of file-type resources to include subdirectory info

Related Comments: Eventually we may need to support remote files on other systems

Implement "Publish" and "Visibility" for Media, Playlists and Offerings.

Allow Instructors fine-grained control to control the visibility of their content.

Better Swagger API schema

The API schemas on the swagger page are different from the actual data.

For example the GET endpoint for Terms and Media is much more complicated than the actual data in the response

Database connections shared across multiple concurrent transcription tasks

Most Tasks were already creating a DB connection on the fly with
using (context = CTDBContext)
The Transcription task was not; this leads to race conditions and update errors which are apparent in the taskengine log.

Relevant commit -
d098fd5

protected async override Task OnConsume(string videoId, TaskParameters taskParameters)
{ using (var _context = CTDbContext.CreateDbContext()) {

example error log -
taskengine | Error occured in RabbitMQConnection Transcribe for message TaskObject(Data=052b9ac6-0d81-4e63-a203-d3f888a771cc; TaskParameters=TaskParameters(Force = False; Metadata = );
taskengine | System.InvalidOperationException: A second operation started on this context before a previous operation completed.
This is usually caused by different threads using the same instance of DbContext. For more information on how to avoid threading issues with DbContext,
see https://go.microsoft.com/fwlink/?linkid=2097913.
taskengine | at Microsoft.EntityFrameworkCore.Internal.ConcurrencyDetector.EnterCriticalSection()
...
taskengine | at TaskEngine.Tasks.TranscriptionTask.OnConsume(String videoId, TaskParameters taskParameters)
in /src/TaskEngine/Tasks/TranscriptionTask.cs:line 52
taskengine | at CTCommons.RabbitMQConnection.<>c__DisplayClass8_0`1.<b__0>d.MoveNext() in /src/CTCommons/RabbitMQ/RabbitMQConnection.cs:line 104

Relax restriction on pre-allocated course number for course creation

You cant create a course if the course number for a department does not exist.
Allow instructors to just enter a 1-5 digit number if it does not exist already.

Ability to resume caption generation from Microsoft Cognitive Services

Microsoft Cognitive Services is used by ClassTranscribe for automatic caption generation. This process takes about 0.7x the duration of the video. Often this process can get interrupted by a "ServiceTimeout" or "ConnectionFailure" exception, this results in restarting the caption generation. A workaround is to resume from last failure point.

Serve user-generated content from a different hostname

Such as EPubs, images, and other user-generated content.

Server API should report a commit hash or build count number to indicate that frontend should clear the browser cache

Currently frontend are using latest commit's sha for the FrontEnd master branch to handle the browser cache.

but there's a race condition with that because there's a window of time between master commit and when master is deployed. So some clients will see the new commit version while the old api is deployed; then wont know to clear the cache when the new version is deployed
view on teams

Prevent Duplicate Transcription Requeue

Every 5 hours the periodic task checks for transcription tasks that have not started (and includes logic to avoid multiple attempts). If there are many transcriptions it is possible that the task from a previous check 5 hours previously has not started, thus the queue can end up with two tasks for the same video.

Note two transcriptions of the same video cannot occur simultaneously because the Key logic keeps track of which Transcription is currently in progress, however they can occur if they do not overlap.

There is no logic to check if the task should exit in the task itself. There should be some checking.

There is no point adding additional tasks if it is already queued. However since this is not easy to do until we have a better task manager, purging the existing queue and rebuilding it would be acceptable.

Created new playlist should be pushed to the end of the playlists

The newly created playlist has its index default as 0. It should be the length of the playlists in an offering, so that the newly added playlist can be placed at the end of the list.

Fetch existing captions from mediaspace.

Mediaspace generates its own captions, it is possible using the mediaspace APIs to fetch them. The task is to implement such a feature to fetch mediaspace's existing captions and convert them into formats ClassTranscribe stores it in.

More sophisticated caption generation

Add more "smarts" to caption generation

e.g. New Sentences should usually start a new caption line.

Beware of end-of-caption edge cases (there are many...)

See
https://github.com/classtranscribe/WebAPI/compare/MSToVtt
which was based on convert Angrave's word-to-captions python code. See heuristics here-

https://github.com/classtranscribe/PythonTools/blob/master/transcribe-cli/ms_json_to_caption.py

Departments can have multiple degrees with multiple acronyms

"You might also add a task to revise the database to allow for a department having several degrees. For example,
ISE has IE and SED
MATH has Math and ASRM
MechSE has ME and EM (Engineering Mechanics)"

Youtube Channels

Only downloading from Youtube Playlists is supported currently. Support downloading from Youtube Channels

Signout of Auth0 when signing out of ClassTranscribe

Auth0 caches credentials; so it is not possible to sign in as someone else without using a new private window.

Generate C# client SDK from openAPI

We used to test apis through Swagger UI (/swag/index.html). It is easy to use but requires many manual repetitive operations for backend developers to conduct rigorous api testing. Also, the test results are not trustworthy because the comparing of responses and expectations are conducted manually. So it is quite necessary to generate a client SDK and package code of calling apis.

Since the main purpose of the client SDK is to conduct api testing, we want to be able to generate a C# client SDK from openAPI.

Use no-tracking queries for all read-only database queries

All EF queries are "tracking" by default, but this incurs some overhead.

We can use no-tracking queries for read-only queries: "They're quicker to execute because there's no need to set up the change tracking information." (https://docs.microsoft.com/en-us/ef/core/querying/tracking)

Fix the repercussion of implementing Soft-Delete

Because of the Soft-Delete feature used by ClassTranscribe, no row is ever deleted from the database, but rather is marked as inactive. This creates an issue when an already "deleted" row is added again in which case there occurs a key conflict.

Video Watch Page; small UI bugs

Various video watch bugs -
(paraphrased) "When a monitor is set to 16:9 (but not 16:10),CT crops the top and bottom of the videos."
(Suggest Test in full screen mode and reproduce this bug before trying to fix it).

"Have an easy way to navigate from one playlist to the next. Currently you have to go back to the home page and refind the course, and select the next playlist.

Speed selector issues. If I view part 1 of a lecture, when I go to part 2 the speed is not correct. It displays the speed as still faster, but it the video only plays at 1X speed.

Can't rewind from 9 or less seconds back to the 0 by arrow keys. To prevent going negative, it doesn't seek. It should seek back to 0.
"

Request for a new API: get all the offerings for an instructor

The current API the frontend is using to get all the offerings of an instructor is /api/CourseOfferings/ByInstructor/{userId}. This API grouped the offerings by courses, but for the new interface this grouping is not needed any more and could cause even more works to parse the data.

So we want to request a new API /api/Offerings/ByInstructor just like /api/Offerings/ByStudent, that returns all the offerings for an instructor in an array ordered by created time. Also, like the /api/Offerings/ByStudent, this API is expected to have the term data and department data inside each offering object.

The below is a single offering data obtained by calling /api/Offerings/ByStudent. It will be great that the /api/Offerings/ByInstructor can return similar ones.

Request for new API: get watch time data joined by users and media

To get the data used for visualization, now frontend has to send multiple requests through the APIs for logs, and usually, it costs about 30 sec or more for some popular courses like CS241.

So maybe it's time to have special APIs for each data visualization api/.../{offeringId}. According to the issues #150 and #131 in FrontEnd (Especially for #150, we need to have a giant table joined by users and medias, and each cell in the table represents how much time a user spent on this media).
Hence, we need an API that can return the following data:

An array of users who have watched videos of this offering: each object should contain (1) the user's info, requiring their id and email; (2) an array of ALL the medias in this offering (not just the medias the user watched). should be ordered by their indices.
Each media object should have the count of timeupdate event type in last1Hr, last3Days, lastWeek, lastMonth, and total for this user. (if a media is not watched by the user, all the values should be 0)

[
  {
    "user": {
      "id" : "",
      "email": ""
    },
    "medias": [
      {
        "mediaName": "",
        "id": "",
        "last1Hr": 12,
        "last3Days": 12,
        "lastWeek": 12,
        "lastMonth": 12,
        "total": 12
      }
    ]
  }
]

Use Sentence level transcriptions but map word-based timing data

Migrate to sentence ASR output but use best effort mapping of the word-based timing for caption timing.

The deleted medias should be removed from the data returned by WatchHistories APIs

The API /api/WatchHistories/GetAllWatchedMediaForUser should not return deleted medias

Box Integration API Key

Work with UIUC Box IT admin to create and use an application Box key

Efficient Logging

Find a better way to log user event data rather than dumping to a SQL database.

Model changes (will affect API and DB)

Course Template and course number will no longer be needed linked to a Course; their only value will be to suggest drop-downs when creating a new course; but new courses are not limited by these suggestions. They can be updated automatically when courses are created.

Offerings should be promoted to contain all information (i.e. don't refer to a Course Template). "Offerings" is not standard venacular.

Rename existing Course to CourseTemplate
Rename Offering to Course

A playlist mixes two concepts: An upstream source and a list of videos to play to the user. Let's separate these out, so that a playlist for students can contain multiple videos from multiple sources.
Rename existing Playlist to VideoSource
Create a new playlist object that includes a list of video sources.

New Playlist object is a collection of VideoSources, and can include one off videos manually uploaded.

Understand and improve file creation by TaskEngine and WebAPI

C# and python code create random files. e.g. the C# Code calls GetTmpFile() in CommonUtils.cs
Later these files may be renamed (e.g. an extension "vtt" is added) so the files are not actually temporary at all.

Other files are temporary and (as far I can tell!) may never be cleaned up (e.g. WAV files e.g. database dumps)

Using domain words to improve transcriptions

Microsoft Cognitive Services allow supplying it some domain words to improve the accuracy of transcriptions. The task is to figure out how this works and implement this feature within ClassTranscribe.

Initialize Video.JsonMetadata to an empty Jobject at creation

When working on issue 34, I found that Video.JsonMetadata is initialized to null. It would be better to initialize it and all the other JObject fields defined under Model.cs to an empty JObject at the beginning. This would save the effort to explicitly check if JsonMetadata is null.

Research better compression

See for example,
https://hintsandmemories.wordpress.com/2014/04/10/ffmpeg-libx264-tune/
https://trac.ffmpeg.org/wiki/Encode/H.264
https://videoblerg.wordpress.com/2017/11/10/ffmpeg-and-how-to-use-it-wrong/
https://forum.videohelp.com/threads/194088-Need-help-with-ffmpeg-2-pass-VBR-encoding
https://hintsandmemories.wordpress.com/2014/04/10/ffmpeg-libx264-tune/

MSTranscription Translation Optimization

Ability to regenerate translations (all/some) from latest captions.
Also: Use domain words and tag them as do-not-translate

Prevent/Review Duplicate Transcription when video appears twice in same playlist

on -dev, A manual upload playlist had multiple entries of the same mp4 video. In the logs -

taskengine         | 2020-09-23T21:17:20.694417274Z       Stop recognition.
taskengine         | 2020-09-23T21:17:21.767143311Z fail: TaskEngine.Tasks.TranscriptionTask[0]
taskengine         | 2020-09-23T21:17:21.767229753Z       Transcription Exception:   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.IdentityMap`1.ThrowIdentityConflict(InternalEntityEntry entry)
taskengine         | 2020-09-23T21:17:21.767261793Z          at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.IdentityMap`1.Add(TKey key, InternalEntityEntry entry, Boolean updateDuplicate)
taskengine         | 2020-09-23T21:17:21.767287826Z          at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.StartTracking(InternalEntityEntry entry)
taskengine         | 2020-09-23T21:17:21.767314533Z          at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.InternalEntityEntry.SetEntityState(EntityState oldState, EntityState newState, Boolean acceptChanges, Boolean modifyProperties)
taskengine         | 2020-09-23T21:17:21.767340763Z          at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.InternalEntityEntry.SetEntityStateAsync(EntityState entityState, Boolean acceptChanges, Boolean modifyProperties, Nullable`1 forceStateWhenUnknownKey, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767368708Z          at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.EntityGraphAttacher.PaintActionAsync(EntityEntryGraphNode`1 node, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767394094Z          at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.EntityEntryGraphIterator.TraverseGraphAsync[TState](EntityEntryGraphNode`1 node, Func`3 handleNode, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767420988Z          at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.EntityEntryGraphIterator.TraverseGraphAsync[TState](EntityEntryGraphNode`1 node, Func`3 handleNode, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767447355Z          at Microsoft.EntityFrameworkCore.DbContext.AddRangeAsync(IEnumerable`1 entities, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767470505Z          at TaskEngine.Tasks.TranscriptionTask.OnConsume(String videoId, TaskParameters taskParameters) in /src/TaskEngine/Tasks/TranscriptionTask.cs:line 113
taskengine         | 2020-09-23T21:17:21.767497730Z System.InvalidOperationException: The instance of entity type 'Caption' cannot be tracked because another instance with the same key value for {'Id'} is already being tracked. When attaching existing entities, ensure that only one entity instance with a given key value is attached. Consider using 'DbContextOptionsBuilder.EnableSensitiveDataLogging' to see the conflicting key values.
taskengine         | 2020-09-23T21:17:21.767515693Z    at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.IdentityMap`1.ThrowIdentityConflict(InternalEntityEntry entry)
taskengine         | 2020-09-23T21:17:21.767529456Z    at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.IdentityMap`1.Add(TKey key, InternalEntityEntry entry, Boolean updateDuplicate)
taskengine         | 2020-09-23T21:17:21.767563141Z    at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.StartTracking(InternalEntityEntry entry)
taskengine         | 2020-09-23T21:17:21.767578217Z    at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.InternalEntityEntry.SetEntityState(EntityState oldState, EntityState newState, Boolean acceptChanges, Boolean modifyProperties)
taskengine         | 2020-09-23T21:17:21.767592264Z    at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.InternalEntityEntry.SetEntityStateAsync(EntityState entityState, Boolean acceptChanges, Boolean modifyProperties, Nullable`1 forceStateWhenUnknownKey, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767606737Z    at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.EntityGraphAttacher.PaintActionAsync(EntityEntryGraphNode`1 node, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767620650Z    at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.EntityEntryGraphIterator.TraverseGraphAsync[TState](EntityEntryGraphNode`1 node, Func`3 handleNode, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767635010Z    at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.EntityEntryGraphIterator.TraverseGraphAsync[TState](EntityEntryGraphNode`1 node, Func`3 handleNode, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767649082Z    at Microsoft.EntityFrameworkCore.DbContext.AddRangeAsync(IEnumerable`1 entities, CancellationToken cancellationToken)
taskengine         | 2020-09-23T21:17:21.767663209Z    at TaskEngine.Tasks.TranscriptionTask.OnConsume(String videoId, TaskParameters taskParameters) in /src/TaskEngine/Tasks/TranscriptionTask.cs:line 113

Zoom Playlist Integration

https://marketplace.zoom.us/docs/api-reference/zoom-api/cloud-recording/recordingget

Don't report UNK as unknown university

The university list should not report UNK as a listed, editable, university.
This could be filtered on the frontend but it's probably best to filter on the backend

Transcription language is hard-coded

https://github.com/classtranscribe/WebAPI/blob/master/CTCommons/MSTranscription/MSTranscriptionService.cs
Line 51-55
The language setting for transcription is hard-coded. This could be moved into other setting files like the environment file.

were there actual compile errors, or did I read that wrong? If there are we should really start to look at how to use github actions to compile.

This might be a nice way to do this: https://github.com/actions/setup-dotnet

Originally posted by @robkooper in #49 (comment)

Log Application Insight tracing

Server should create trace and exception messages using Application Insights
However it should print out when it attempts to start AI to normal output (in case AI fails)

TaskEngine Race condition for manually uploaded files.

I just tried uploading 5 videos to a new playlist on -dev. The are all very short (3seconds) recorded using Zoom as a local mp4. "This is video 1" "This is video 2" etc; Only the 5th video made it to the playlist, even after I refreshed the page. However future attempts worked fine. Parts of the log that appear relevant

taskengine         | 2020-09-23T20:42:25.517947163Z fail: TaskEngine.Tasks.DownloadMediaTask[0]
taskengine         | 2020-09-23T20:42:25.517987186Z       DownloadLocalPlaylist failed. mediaId 891edecd-5312-4e4b-8277-d0d9d5fac371
taskengine         | 2020-09-23T20:42:25.518001689Z System.IO.FileNotFoundException: Could not find file '/data/OF4AP98FR23H'.
taskengine         | 2020-09-23T20:42:25.518014610Z File name: '/data/OF4AP98FR23H'
taskengine         | 2020-09-23T20:42:25.518028095Z    at System.IO.File.Move(String sourceFileName, String destFileName, Boolean overwrite)
taskengine         | 2020-09-23T20:42:25.518041374Z    at System.IO.File.Move(String sourceFileName, String destFileName)
taskengine         | 2020-09-23T20:42:25.518054185Z    at ClassTranscribeDatabase.Models.FileRecord.GetNewFileRecord(String filepath, String ext) in /src/ClassTranscribeDatabase/Models/FileRecord.cs:line 33
taskengine         | 2020-09-23T20:42:25.518067570Z    at TaskEngine.Tasks.DownloadMediaTask.DownloadLocalPlaylist(Media media) in /src/TaskEngine/Tasks/DownloadMediaTask.cs:line 237
taskengine         | 2020-09-23T20:42:25.646355898Z fail: CTCommons.RabbitMQConnection[0]
taskengine         | 2020-09-23T20:42:25.646396864Z       Error occured in RabbitMQConnection DownloadMedia for message TaskObject(Data=891edecd-5312-4e4b-8277-d0d9d5fac371; TaskParameters=TaskParameters(Force = False; Metadata = );
taskengine         | 2020-09-23T20:42:25.646412192Z System.Exception: DownloadMediaTask failed for mediaId 891edecd-5312-4e4b-8277-d0d9d5fac371

taskengine         | 2020-09-23T20:42:25.974389435Z        [x] DownloadMedia Received TaskObject(Data=9df32c4d-8b9b-4ad4-be79-da631045438e; TaskParameters=TaskParameters(Force = False; Metadata = );
taskengine         | 2020-09-23T20:42:25.980614133Z info: TaskEngine.Tasks.DownloadMediaTask[0]
taskengine         | 2020-09-23T20:42:25.980686236Z       ConsumingCastle.Proxies.MediaProxy
taskengine         | 2020-09-23T20:42:25.981420773Z fail: TaskEngine.Tasks.DownloadMediaTask[0]
taskengine         | 2020-09-23T20:42:25.981452815Z       DownloadLocalPlaylist failed. mediaId 9df32c4d-8b9b-4ad4-be79-da631045438e
taskengine         | 2020-09-23T20:42:25.981466851Z System.IO.FileNotFoundException: Could not find file '/data/MCWLSGBKZHS9'.
taskengine         | 2020-09-23T20:42:25.981480331Z File name: '/data/MCWLSGBKZHS9'
taskengine         | 2020-09-23T20:42:25.981493182Z    at System.IO.File.Move(String sourceFileName, String destFileName, Boolean overwrite)
taskengine         | 2020-09-23T20:42:25.981505960Z    at System.IO.File.Move(String sourceFileName, String destFileName)
taskengine         | 2020-09-23T20:42:25.981519291Z    at ClassTranscribeDatabase.Models.FileRecord.GetNewFileRecord(String filepath, String ext) in /src/ClassTranscribeDatabase/Models/FileRecord.cs:line 33
taskengine         | 2020-09-23T20:42:25.981533484Z    at TaskEngine.Tasks.DownloadMediaTask.DownloadLocalPlaylist(Media media) in /src/TaskEngine/Tasks/DownloadMediaTask.cs:line 237
taskengine         | 2020-09-23T20:42:25.982630181Z info: TaskEngine.Tasks.ProcessVideoTask[0]
taskengine         | 2020-09-23T20:42:25.982666442Z       ConsumingCastle.Proxies.VideoProxy
taskengine         | 2020-09-23T20:42:25.994436797Z fail: CTCommons.RabbitMQConnection[0]
taskengine         | 2020-09-23T20:42:25.994477664Z       Error occured in RabbitMQConnection DownloadMedia for message TaskObject(Data=9df32c4d-8b9b-4ad4-be79-da631045438e; TaskParameters=TaskParameters(Force = False; Metadata = );
taskengine         | 2020-09-23T20:42:25.994817636Z System.Exception: DownloadMediaTask failed for mediaId 9df32c4d-8b9b-4ad4-be79-da631045438e
taskengine         | 2020-09-23T20:42:25.994865290Z    at TaskEngine.Tasks.DownloadMediaTask.OnConsume(String mediaId, TaskParameters taskParameters) in /src/TaskEngine/Tasks/DownloadMediaTask.cs:line 65
taskengine         | 2020-09-23T20:42:25.996048953Z    at CTCommons.RabbitMQConnection.<>c__DisplayClass8_0`1.<<ConsumeTask>b__0>d.MoveNext() in /src/CTCommons/RabbitMQ/RabbitMQConnection.cs:line 104
taskengine         | 2020-09-23T20:42:25.996340493Z info: CTCommons.RabbitMQConnection[0]
taskengine         | 2020-09-23T20:42:25.996372410Z        [x] DownloadMedia Done TaskObject(Data=9df32c4d-8b9b-4ad4-be79-da631045438e; TaskParameters=TaskParameters(Force = False; Metadata = );

Investigate / Use CILogon for authentication

https://www.cilogon.org/

Autogenerate .env file

Reduce the need to hand-edit the .env file. Copy-pasting keys within files or across files should be unnecessary.

Support large file and slow uploads (limits, user feedback status, errors)

From an instructor - "The ClassTranscribe web site consistently appears to times out after about 100 seconds, when the progress bar reaches about 25%, whenever I try to upload a large video file (1.6GB) from home.I say “appears to time out” because there is no error message of any kind, The web site just returns to the previous page (with the “+ UPLOAD VIDEOS” button. If this timeout behavior is intentional, please either remove the time limit, increase it to 15 minutes, or at least display an error message. (“Took too long to upload; try again when you’re on campus.”)"

Notice there are several items to address here:

Timeout. 2. Current Status Reporting. 3. Error reporting. 4. Max file size.

Use one RabbitMQ connection + use co-routines

TODO: TaskEngine should use one RabbitMQ connection for the whole process not one per task
TODO: Take a deep dive into how this actually working and document it.

It would also be useful to confirm that we are actually using async co-routines and not multiple threads.

TaskEngine does not explicilty create any threads; however it certainly has a main loop that sleeps for a couple of hours, while mulitple message queues are being concurrently services!

We may need to upgrade to the latest RabbitMQ C# client.

And emperically confirm that RabbitMQ C# Implementation with a prefetch count>1 does not implement concurrency using multiple threads.
e.g. alway print out a threadID and print out how many threads are running
e.g. take a deep dive into the C# RabbitMQ source code.

WebAPI/CTCommons/RabbitMQ/RabbitMQConnection.cs

Line 129 in e53313c

consumer.Received += async (model, ea) =>

_logger.LogInformation("Prefetch concurrency count {0}" , concurrency);

            _channel.QueueDeclare(.... );

            _channel.BasicQos(prefetchSize: 0, prefetchCount: concurrency, global: false);
        }
        var consumer = new EventingBasicConsumer(_channel);
        consumer.Received += async (model, ea) =>
        { ...

Autoposition captions based on slide content

captions should be repositioned if they overlap burned in content.

Fix GetOfferingsByStudent API: CourseId is null

After adding new property CourseId to CourseDTO, GetOfferingsByStudent api should also be updated.

Remove limitation of 30 videos sync'd from Kaltura(Mediaspace) channel

Steps to reproduce -

docker exec -it pythonrpcserver sh
ipython
import kaltura
k = kaltura.KalturaProvider()
c2=k.getKalturaChannelEntries(167312872)
len(c2) # Returns 30 but 37 are listed on MediaSpace

https://mediaspace.illinois.edu/channel/CS+173+Summer+2020+AL1/167312872

The Mediaspace docs suggest the paging returns 25 results. However the autogenerated suggests 30 pageSize
https://www.kaltura.com/api_v3/testmeDoc/objects/KalturaFilterPager.html

Generate downloadable mp4s with embedded multilanguage closed captions

We will need an ffmpeg task to create the mp4
https://video.stackexchange.com/questions/22197/ffmpeg-how-to-add-several-subtitle-streams
We will want to add the audio description/enhanced text description track too.

(We could even add chapter markers from the epub data)

For discussion: How do we not litter the storage with old mp4s?

Some tricky timing corner cases to work out (e.g. captions being updated while mp4 being regenerated)

Request for new APIs: storing multiple ePubs for a media

The new version of the ePub generator on the frontend, are able to create multiple ePub for a media, based on the epub data of different languages.

API's for ePubs

For each media, we need API's to:

PUT: Able to publish/unpublish an ePub
POST: Store an ePub product object for a media. (the structure of an ePub object is provided at blow), e.g. POST: api/EPubData/../{mediaId}
GET: get all the stored ePub objects for a media (based on mediaId), e.g. GET: api/EPubData/../{mediaId}
GET: get a single ePub object based on its id, e.g. GET: api/EPubData/../{epubId}
PUT: update a single ePub object based on its id, e.g. PUT: api/EPubData/../{epubId}
DELETE: delete a single ePub object based on its id, e.g. DELETE: api/EPubData/../{epubId}

My idea is to make the ePub's id, title, filename, author, createdAt, isPublished, cover, and language as columns in a database's table row, and the chapters as a json-formatted string column.
It could be an issue that each of the ePub chapters object can be really huge, since some of them may contain hundreds of chapters/sub-chapters/images/texts.

API's for Images

Also, we need API that can store an image that is uploaded by users (some users will need images for their ePub other than the generated screenshots). e.g. POST: api/Image/ that returns the URL for the created image.

The ePub object structure

This is the structure used by frontend to build an ePub.

{
  "id": "uuid str",
  "title": "string",
  "filename": "string",
  "language": "string",
  "author": "string",
  "publisher": "string",
  "cover": "image URL",
  "isPublished": true,
  "chapters": [
    {
      "id": "uuid str generated by frontend",
      "title": "string",
      "start": "string",
      "end": "string",
      "contents": [
        "a piece of text",
        {
          "src": "image src",
          "alt": "image alt",
          "description": "image description/AD"
        },
        ....
      ],
      "items": [
        {
          "id": "uuid str generated by frontend",
          "start": "string",
          "end": "string",
          "image": "image url",
          "text": "string"
        }
      ],
      "subChapters": [
        {
          "id": "uuid str generated by frontend",
          "title": "string",
          "start": "string",
          "end": "string",
          "contents": [],
          "items": [
            {
              "id": "uuid str generated by frontend",
              "start": "string",
              "end": "string",
              "image": "image url",
              "text": "string"
            }
          ]
        }
      ]
    }
  ]
}

Cannot re-add deleted emails to UserOfferings

The API UserOfferings/AddUsers/{offeringId}/{roleName} cannot add a previously deleted email