You are developing a web-based solution that students and teachers can use to collaborate on written assignments. Teachers can also use the solution to detect potential plagiarism, and they can manage assignments and data by using locally accessible network shares.
The solution consists of three parts: a website where students work on assignments and where teachers view and grade assignments, the plagiarism detection service, and a connector service to manage data by using a network share.
The system availability agreement states that operating hours are weekdays between midnight on Sunday and midnight on Friday.
The plagiarism detection portion of the solution compares a new work against a repository of existing works. The initial dataset contains a large database of existing works. Teachers upload additional works. In addition, the service itself searches for other works and adds those works to the repository.
The website for the solution must run on an Azure web role.
The plagiarism detection service runs on an Azure worker role. The computation uses a random number generator. Certain values can result in an infinite loop, so if a particular work item takes longer than one hour to process, other instances of the service must be able to process the work item. The Azure worker role must fully utilize all available CPU cores. Computation results are cached in local storage resources to reduce computation time.
Repository of Existing Works
The plagiarism detection service works by comparing student submissions against a repository of existing works by using a custom matching algorithm. The master copies of the works are stored in Azure blob storage. A daily process synchronizes files between blob storage and a file share on a virtual machine (VM). As part of this synchronization, the ExistingWorkRepository object adds the files to Azure Cache to improve the display performance of the website. If a student's submission is overdue, the Late property is set to the number of days that the work is overdue. Work files can be downloaded by using the Work action of the TeacherController object
Clients can interact with files that are stored on the VM by using a network share. The network permissions are configured in a startup task in the plagiarism detection service.
The CPU of the system on which the plagiarism detection service runs usually limits the plagiarism detection service. However, certain combinations of input can cause memory issues, which results in decreased performance. The average time for a given computation is 45 seconds. Unexpected results during computations might cause a memory dump. Memory dump files are stored in the Windows temporary folder on the VM that hosts the worker role.
Only valid users of the solution must be able to view content that users submit. Privacy regulations require that all content that users submit must be retained only in Azure Storage. All documents that students upload must be signed by using a certificate named DocCert that is installed in both the worker role and the web role.