base on State-of-the-Art Source Code Plagiarism & Collusion Detection <p align="center">
<img alt="JPlag logo" src="core/src/main/resources/de/jplag/logo-dark.png" width="350">
</p>
# JPlag - Detecting Software Plagiarism
[![CI Build](https://github.com/jplag/jplag/actions/workflows/maven.yml/badge.svg)](https://github.com/jplag/jplag/actions/workflows/maven.yml)
[![Latest Release](https://img.shields.io/github/release/jplag/jplag.svg)](https://github.com/jplag/jplag/releases/latest)
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/de.jplag/jplag/badge.svg)](https://maven-badges.herokuapp.com/maven-central/de.jplag/jplag)
[![License](https://img.shields.io/github/license/jplag/jplag.svg)](https://github.com/jplag/jplag/blob/main/LICENSE)
[![GitHub commit activity](https://img.shields.io/github/commit-activity/y/jplag/JPlag)](https://github.com/jplag/JPlag/pulse)
[![SonarCloud Coverage](https://sonarcloud.io/api/project_badges/measure?project=jplag_JPlag&metric=coverage)](https://sonarcloud.io/component_measures?metric=Coverage&view=list&id=jplag_JPlag)
[![Report Viewer](https://img.shields.io/badge/report%20viewer-online-b80025)](https://jplag.github.io/JPlag/)
[![Java Version](https://img.shields.io/badge/java-SE%2021-yellowgreen)](#download-and-installation)
JPlag finds pairwise similarities among a set of multiple programs. It can reliably detect software plagiarism and collusion in software development, even when obfuscated. All similarities are calculated locally, and no source code or plagiarism results are ever uploaded to the internet. JPlag supports a large number of programming and modeling languages.
* 📈 [JPlag Demo](https://jplag.github.io/Demo/)
* 🏛️ [JPlag on Helmholtz RSD](https://helmholtz.software/software/jplag)
* 🤩 [Give us Feedback in a **short (<5 min) survey**](https://docs.google.com/forms/d/e/1FAIpQLSckqUlXhIlJ-H2jtu2VmGf_mJt4hcnHXaDlwhpUL3XG1I8UYw/viewform?usp=sf_link)
## Supported Languages
All supported languages and their supported versions are listed below.
| Language | Version | CLI Argument Name | [state](https://github.com/jplag/JPlag/wiki/2.-Supported-Languages) | parser |
|--------------------------------------------------------|---------------------------------------------------------------------------------------:|-------------------|:-------------------------------------------------------------------:|:---------:|
| [Java](https://www.java.com) | 21 | java | mature | JavaC |
| [C](https://isocpp.org) | 11 | c | legacy | JavaCC |
| [C++](https://isocpp.org) | 14 | cpp | beta | ANTLR 4 |
| [C#](https://docs.microsoft.com/en-us/dotnet/csharp/) | 6 | csharp | mature | ANTLR 4 |
| [Python](https://www.python.org) | 3.6 | python3 | beta | ANTLR 4 |
| [JavaScript](https://www.javascript.com/) | ES6 | javascript | beta | ANTLR 4 |
| [TypeScript](https://www.typescriptlang.org/) | [~5](https://github.com/antlr/grammars-v4/tree/master/javascript/typescript/README.md) | typescript | beta | ANTLR 4 |
| [Go](https://go.dev) | 1.17 | golang | beta | ANTLR 4 |
| [Kotlin](https://kotlinlang.org) | 1.3 | kotlin | beta | ANTLR 4 |
| [R](https://www.r-project.org/) | 3.5.0 | rlang | beta | ANTLR 4 |
| [Rust](https://www.rust-lang.org/) | 1.60.0 | rust | beta | ANTLR 4 |
| [Swift](https://www.swift.org) | 5.4 | swift | beta | ANTLR 4 |
| [Scala](https://www.scala-lang.org) | 2.13.8 | scala | beta | Scalameta |
| [LLVM IR](https://llvm.org) | 15 | llvmir | beta | ANTLR 4 |
| [Scheme](http://www.scheme-reports.org) | ? | scheme | legacy | JavaCC |
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) | 2.25.0 | emf | beta | EMF |
| [EMF Model](https://www.eclipse.org/modeling/emf/) | 2.25.0 | emf-model | alpha | EMF |
| [SCXML](https://www.w3.org/TR/scxml/) | 1.0 | scxml | alpha | XML |
| Text (naive) | - | text | legacy | CoreNLP |
## Download and Installation
You need Java SE 21 to run or build JPlag.
### Downloading a release
* Download a [released version](https://github.com/jplag/jplag/releases).
* In case you depend on the legacy version of JPlag we refer to the [legacy release v2.12.1](https://github.com/jplag/jplag/releases/tag/v2.12.1-SNAPSHOT) and the [legacy branch](https://github.com/jplag/jplag/tree/legacy).
### Via Maven
JPlag is released on [Maven Central](https://search.maven.org/search?q=de.jplag), it can be included as follows:
```xml
<dependency>
<groupId>de.jplag</groupId>
<artifactId>jplag</artifactId>
<version><!--desired version--></version>
</dependency>
```
### Building from sources
1. Download or clone the code from this repository.
2. Run `mvn clean package` from the root of the repository to compile and build all submodules.
Run `mvn clean package assembly:single` instead if you need the full jar which includes all dependencies.
Run `mvn -P with-report-viewer clean package assembly:single` to build the full jar with the report viewer. In this case, you'll need [Node.js](https://nodejs.org/en/download) installed.
3. You will find the generated JARs in the subdirectory `cli/target`.
## Usage
JPlag can either be used via the CLI or directly via its Java API. For more information, see the [usage information in the wiki](https://github.com/jplag/JPlag/wiki/1.-How-to-Use-JPlag). If you are using the CLI, you can display your results via [jplag.github.io](https://jplag.github.io/JPlag/). No data will leave your computer!
### CLI
*Note that the [legacy CLI](https://github.com/jplag/jplag/blob/legacy/README.md) is varying slightly.*
The language can either be set with the -l parameter or as a subcommand (`jplag [jplag options] <language name> [language options]`). A subcommand takes priority over the -l option.
When using the subcommand, language-specific arguments can be set. A list of language-specific options can be obtained by requesting the help page of a subcommand (e.g. `jplag java -h`).
```
Parameter descriptions:
[root-dirs[,root-dirs...]...]
Root-directory with submissions to check for plagiarism.
-bc, --bc, --base-code=<baseCode>
Path to the base code directory (common framework used in all submissions).
-l, --language=<language>
Select the language of the submissions (default: java). See subcommands below.
-M, --mode=<{RUN, VIEW, RUN_AND_VIEW}>
The mode of JPlag: either only run analysis, only open the viewer, or do both (default: null)
-n, --shown-comparisons=<shownComparisons>
The maximum number of comparisons that will be shown in the generated report, if set to -1 all comparisons will be shown (default: 500)
-new, --new=<newDirectories>[,<newDirectories>...]
Root-directories with submissions to check for plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for languages: Java, C++.
-old, --old=<oldDirectories>[,<oldDirectories>...]
Root-directories with prior submissions to compare against.
-r, --result-file=<resultFile>
Name of the file in which the comparison results will be stored (default: results). Missing .zip endings will be automatically added.
-t, --min-tokens=<minTokenMatch>
Tunes the comparison sensitivity by adjusting the minimum token required to be counted as a matching section. A smaller value increases the sensitivity but might lead to more
false-positives.
Advanced
--csv-export Export pairwise similarity values as a CSV file.
-d, --debug Store on-parsable files in error folder.
-m, --similarity-threshold=<similarityThreshold>
Comparison similarity threshold [0.0-1.0]: All comparisons above this threshold will be saved (default: 0.0).
-p, --suffixes=<suffixes>[,<suffixes>...]
comma-separated list of all filename suffixes that are included.
-P, --port=<port> The port used for the internal report viewer (default: 1996).
-s, --subdirectory=<subdirectory>
Look in directories <root-dir>/*/<dir> for programs.
-x, --exclusion-file=<exclusionFileName>
All files named in this file will be ignored in the comparison (line-separated list).
Clustering
--cluster-alg, --cluster-algorithm=<{AGGLOMERATIVE, SPECTRAL}>
Specifies the clustering algorithm (default: spectral).
--cluster-metric=<{AVG, MIN, MAX, INTERSECTION}>
The similarity metric used for clustering (default: average similarity).
--cluster-skip Skips the cluster calculation.
Subsequence Match Merging
--gap-size=<maximumGapSize>
Maximal gap between neighboring matches to be merged (between 1 and minTokenMatch, default: 6).
--match-merging Enables merging of neighboring matches to counteract obfuscation attempts.
--neighbor-length=<minimumNeighborLength>
Minimal length of neighboring matches to be merged (between 1 and minTokenMatch, default: 2).
Subcommands (supported languages):
c
cpp
csharp
emf
emf-model
go
java
javascript
kotlin
llvmir
python3
rlang
rust
scala
scheme
scxml
swift
text
typescript
```
### Java API
The new API makes it easy to integrate JPlag's plagiarism detection into external Java projects:
<!-- To assure that the code example is always correct, it must be kept in sync
with [`ReadmeCodeExampleTest#testReadmeCodeExample`](core/src/test/java/de/jplag/special/ReadmeCodeExampleTest.java). -->
```java
Language language = new JavaLanguage();
Set<File> submissionDirectories = Set.of(new File("/path/to/rootDir"));
File baseCode = new File("/path/to/baseCode");
JPlagOptions options = new JPlagOptions(language, submissionDirectories, Set.of()).withBaseCodeSubmissionDirectory(baseCode);
try {
JPlagResult result = JPlag.run(options);
// Optional
ReportObjectFactory reportObjectFactory = new ReportObjectFactory(new File("/path/to/output"));
reportObjectFactory.createAndSaveReport(result);
} catch (ExitException e) {
// error handling here
} catch (FileNotFoundException e) {
// handle IO exception here
}
```
## Contributing
We're happy to incorporate all improvements to JPlag into this codebase. Feel free to fork the project and send pull requests.
Please consider our [guidelines for contributions](https://github.com/jplag/JPlag/wiki/3.-Contributing-to-JPlag).
## Contact
If you encounter bugs or other issues, please report them [here](https://github.com/jplag/jplag/issues).
For other purposes, you can contact us at
[email protected] .
If you are doing research related to JPlag, we would love to know what you are doing. Feel free to contact us!
### More information can be found in our [Wiki](https://github.com/jplag/JPlag/wiki)!
", Assign "at most 3 tags" to the expected json: {"id":"11507","tags":[]} "only from the tags list I provide: [{"id":77,"name":"3d"},{"id":89,"name":"agent"},{"id":17,"name":"ai"},{"id":54,"name":"algorithm"},{"id":24,"name":"api"},{"id":44,"name":"authentication"},{"id":3,"name":"aws"},{"id":27,"name":"backend"},{"id":60,"name":"benchmark"},{"id":72,"name":"best-practices"},{"id":39,"name":"bitcoin"},{"id":37,"name":"blockchain"},{"id":1,"name":"blog"},{"id":45,"name":"bundler"},{"id":58,"name":"cache"},{"id":21,"name":"chat"},{"id":49,"name":"cicd"},{"id":4,"name":"cli"},{"id":64,"name":"cloud-native"},{"id":48,"name":"cms"},{"id":61,"name":"compiler"},{"id":68,"name":"containerization"},{"id":92,"name":"crm"},{"id":34,"name":"data"},{"id":47,"name":"database"},{"id":8,"name":"declarative-gui "},{"id":9,"name":"deploy-tool"},{"id":53,"name":"desktop-app"},{"id":6,"name":"dev-exp-lib"},{"id":59,"name":"dev-tool"},{"id":13,"name":"ecommerce"},{"id":26,"name":"editor"},{"id":66,"name":"emulator"},{"id":62,"name":"filesystem"},{"id":80,"name":"finance"},{"id":15,"name":"firmware"},{"id":73,"name":"for-fun"},{"id":2,"name":"framework"},{"id":11,"name":"frontend"},{"id":22,"name":"game"},{"id":81,"name":"game-engine "},{"id":23,"name":"graphql"},{"id":84,"name":"gui"},{"id":91,"name":"http"},{"id":5,"name":"http-client"},{"id":51,"name":"iac"},{"id":30,"name":"ide"},{"id":78,"name":"iot"},{"id":40,"name":"json"},{"id":83,"name":"julian"},{"id":38,"name":"k8s"},{"id":31,"name":"language"},{"id":10,"name":"learning-resource"},{"id":33,"name":"lib"},{"id":41,"name":"linter"},{"id":28,"name":"lms"},{"id":16,"name":"logging"},{"id":76,"name":"low-code"},{"id":90,"name":"message-queue"},{"id":42,"name":"mobile-app"},{"id":18,"name":"monitoring"},{"id":36,"name":"networking"},{"id":7,"name":"node-version"},{"id":55,"name":"nosql"},{"id":57,"name":"observability"},{"id":46,"name":"orm"},{"id":52,"name":"os"},{"id":14,"name":"parser"},{"id":74,"name":"react"},{"id":82,"name":"real-time"},{"id":56,"name":"robot"},{"id":65,"name":"runtime"},{"id":32,"name":"sdk"},{"id":71,"name":"search"},{"id":63,"name":"secrets"},{"id":25,"name":"security"},{"id":85,"name":"server"},{"id":86,"name":"serverless"},{"id":70,"name":"storage"},{"id":75,"name":"system-design"},{"id":79,"name":"terminal"},{"id":29,"name":"testing"},{"id":12,"name":"ui"},{"id":50,"name":"ux"},{"id":88,"name":"video"},{"id":20,"name":"web-app"},{"id":35,"name":"web-server"},{"id":43,"name":"webassembly"},{"id":69,"name":"workflow"},{"id":87,"name":"yaml"}]" returns me the "expected json"