Python Plagiarism Checker with Full Source Code For Beginners

Python script for checking the amount of similarity between two (or more) text files.

Prerequisites:

Sklearn module Installation:

pip install -U scikit-learn

Run the Script:

python plag.pyCode language: CSS (css)

Source Code:

plag.py

#pip install -U scikit-learn
#Make sure all the .txt files that need to be checked are in the same directory as the script
import os
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

user_files = [doc for doc in os.listdir() if doc.endswith('.txt')]
user_notes = [open(_file, encoding='utf-8').read()
                 for _file in user_files]


def vectorize(Text): return TfidfVectorizer().fit_transform(Text).toarray()
def similarity(doc1, doc2): return cosine_similarity([doc1, doc2])


vectors = vectorize(user_notes)
s_vectors = list(zip(user_files, vectors))
plagiarism_results = set()


def check_plagiarism():
    global s_vectors
    for student_a, text_vector_a in s_vectors:
        new_vectors = s_vectors.copy()
        current_index = new_vectors.index((student_a, text_vector_a))
        del new_vectors[current_index]
        for student_b, text_vector_b in new_vectors:
            sim_score = similarity(text_vector_a, text_vector_b)[0][1]
            student_pair = sorted((student_a, student_b))
            score = (student_pair[0], student_pair[1], sim_score)
            plagiarism_results.add(score)
    return plagiarism_results


for data in check_plagiarism():
    print(data)Code language: PHP (php)

text1.txt

Nature is the endless expanse of life forms, beauty, resources, peace and nourishment. Every bud that grows to a flower, every caterpillar that flies with the wings of a butterfly and every infant who faces the world as a human, owes its survival and sustenance to nature. In addition to providing resources for our daily needs of food, clothing and shelter, nature also contributes to different industries and manufacturing units. Paper, furniture, oil, gemstones, petrol, diesel, the fishing industry, electrical units, etc. all derive their basic components from nature.

It can be said that nature drives the process of converting everything that is natural on earth into most of the things that are artificial. Nature also maintains the continuity between the different spheres on Earth. Owing to the multiple elements obtained from nature, with a growing population, the need to meet demands is increasing every day. At an equal pace is rising the level of air, water, soil and noise pollution as a result of the universal dependence on technology.

While it is necessary to keep up with industrialization, it is an urgent need now to restore stability in nature. People are trying to curb the level of pollution and stop the exhaustion of natural resources. However, more awareness and implementation is a must at the individual and community levels. We must always remember it is us who depend on nature for survival and not the other way round.Code language: JavaScript (javascript)

text2.txt

Great lengths of mountains, thriving ecosystems, the ever-spreading sky together with the lithosphere, hydrosphere and atmosphere create a saga called “Nature”. Rich both in terms of its scenic beauty and replenishing resources, nature accounts for supporting life in different shapes and forms on our planet.

Every member of the living world obtains its life support from nature. Nature guides the cycling of air, water and life between the different constituents or spheres on Earth. The treasures in nature not only provide for our basic requirements of survival but also fuel the raw materials to support factories and industries on which the modern world primarily runs.

Since the population is increasing at an exponential rate largely in India and many parts of the world, the “use” of resources has now turned to depletion. Adding to this, are the excessive levels of atmospheric and environmental pollution. Industrial wastes, unchecked use of vehicles, illegal cutting of trees, poaching of animals, nuclear power plants and many more are contributing to the disruption of the natural systems and global warming.Code language: JavaScript (javascript)

text3.txt

Nature includes living and non-living components that together make life on Earth possible. Some forms of nature can be seen through the lush green forests, the vast sky above us, the oceans without an end, the mountains standing tall and so on. Nature nourishes the survival needs of plants, animals and humans alike. It provides the essential components of oxygen, sunlight, soil and water.

Several other products are obtained indirectly from nature which includes timber, paper, medicinal herbs, fibers, cotton, silk and various kinds of food. To fulfill the demand for these products, human beings have now engaged in the slaughter of trees and the destruction of nature. Different industries also poison nature with harmful gases and chemicals in addition to using excessive natural resources.

It is the need of the hour now to reduce natural damage, reuse goods and recycle used elements to form newer ones. People from all parts of the world should come together to lessen the pressure on nature and restore its balance.Code language: JavaScript (javascript)

Leave a Comment