petre-bit/BloodshotNet-Dataset · Datasets at Hugging Face

⚠️ NSFW & GRAPHIC CONTENT WARNING: This dataset contains highly graphic, violent, and sensitive imagery, including simulated and real blood, serious injury, surgical scenes, and gore. Discretion is strongly advised before downloading, viewing, or utilizing this data.

BloodshotNet Dataset

Dataset Summary

The BloodshotNet-Dataset is the official, large-scale, aggregated computer vision dataset designed to train BloodshotNet (a YOLO-based blood detection model). It contains 23,514 images curated from 16 public datasets, refined to include realistic positive samples (e.g., forensics, movie scenes) and challenging "hard negative" samples (e.g., red clothing, flowers, red vehicles) to prevent model overfitting and reduce false positives.

A key feature of this dataset is its dual-compatibility: it is structured to work completely out-of-the-box for YOLO (v11/v26) training, while also being fully native to the Hugging Face datasets library for general PyTorch/TensorFlow pipelines.

Dual-Compatibility & How to Use

1. For YOLO Users (Plug & Play)

The dataset preserves standard YOLO formatting. You can clone this repository directly and start training immediately using the provided data.yaml .

Images: Located in images/train/ , images/val/ , images/test/

Located in , , Labels: Located in labels/train/ , labels/val/ , labels/test/ (Standard normalized YOLO .txt files)

Located in , , (Standard normalized YOLO files) Negative Images: Background/negative images simply have empty .txt files in the labels/ directory.

yolo task=detect mode=train data=path/to/BloodshotNet-Dataset/data.yaml model=yolo11n.pt epochs=100

2. For Hugging Face Users

You can load this dataset seamlessly into your Python environment with a single line of code. The dataset contains metadata.jsonl files in the image directories that automatically map the YOLO annotations into standard absolute pixel bounding boxes [x_min, y_min, width, height] .

from datasets import load_dataset dataset = load_dataset( "petre-bit/BloodshotNet-Dataset" )

Dataset Structure

Data Splits

Train : 80% (18,809 images)

: 80% (18,809 images) Validation : 15% (3,528 images)

: 15% (3,528 images) Test: 5% (1,177 images)

Composition & Classes

60% Positive Images (Blood) : Forensic blood spatter, UFC fights, gore/horror movie scenes, bloody prints, surgery scenes, etc.

: Forensic blood spatter, UFC fights, gore/horror movie scenes, bloody prints, surgery scenes, etc. 40% Negative Images (Non-Blood) : Visually confusing red objects and contexts (red dresses, red flowers, brake lights, crowded scenes, kitchen objects).

: Visually confusing red objects and contexts (red dresses, red flowers, brake lights, crowded scenes, kitchen objects). Classes: 1 Class ( 0: blood )

Preprocessing & Annotation Adjustments

Label Filtering : All non-blood labels present in the original source datasets were strictly removed to isolate the blood class.

: All non-blood labels present in the original source datasets were strictly removed to isolate the class. Bounding Box Conversion : The source datasets contained a mix of bounding boxes and segmentation masks. All segmentation polygons were converted into tight bounding boxes by extracting the extreme (x, y) coordinate limits to ensure a standardized object detection format.

: The source datasets contained a mix of bounding boxes and segmentation masks. All segmentation polygons were converted into tight bounding boxes by extracting the extreme coordinate limits to ensure a standardized object detection format. No Resizing Applied: Images retain their original, diverse resolutions. Any resizing (e.g., to 640x640) should be handled dynamically by the model during the training/inference pipeline.

Intended Use & Limitations

Intended Use: Research and development in forensics, automated content moderation (flagging violent media), and safety monitoring systems.

Limitations: The dataset relies heavily on cinematic representations of blood and specific forensic datasets, which may not encompass all real-world lighting conditions, surface textures, or scenarios.

About the Creators & Attribution

This dataset was assembled, preprocessed, and formatted by the team at Bit to support the development of robust content moderation and detection systems.

Data Sources

This dataset is an aggregation of 16 datasets originally hosted on Roboflow Universe. All original datasets are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Huge thanks to the creators:

blood_segmentation by blood-3pyjx

blooddetection-j2wid by orkun-lpkdc

cars-cars-cars-gh8ga by yasins-workspace-qbvyv

crime-data-st by harsha-cujyl

danger-place-detection by md-hasibul-islam

dress-jhov8 by animal-detection-q3wq1

flowers-pciqg by my-workspace-ebwf4

forensicvision-du5uz by forensicvision

horror-content-detector-0rv9z by myproject-zxfbp

kitchen-gt6wi by abinavn

movie-ywprp by quanle-shsvi

passive-and-transfer-stains by thesis-epiei

rbrelabel by tasfagvasd

sexual_content-za0gn by helmiworkshop-6o1xm

two-guo-2 by ownfallprincess

video_modera by videomoderation

License: Released under CC BY 4.0.