⚠️ NSFW & GRAPHIC CONTENT WARNING: This dataset contains highly graphic, violent, and sensitive imagery, including simulated and real blood, serious injury, surgical scenes, and gore. Discretion is strongly advised before downloading, viewing, or utilizing this data.
BloodshotNet Dataset
Dataset Summary
The BloodshotNet-Dataset is the official, large-scale, aggregated computer vision dataset designed to train BloodshotNet (a YOLO-based blood detection model). It contains 23,514 images curated from 16 public datasets, refined to include realistic positive samples (e.g., forensics, movie scenes) and challenging "hard negative" samples (e.g., red clothing, flowers, red vehicles) to prevent model overfitting and reduce false positives.
A key feature of this dataset is its dual-compatibility: it is structured to work completely out-of-the-box for YOLO (v11/v26) training, while also being fully native to the Hugging Face datasets library for general PyTorch/TensorFlow pipelines.
Dual-Compatibility & How to Use
1. For YOLO Users (Plug & Play)
The dataset preserves standard YOLO formatting. You can clone this repository directly and start training immediately using the provided data.yaml .
Images: Located in images/train/ , images/val/ , images/test/
Located in , , Labels: Located in labels/train/ , labels/val/ , labels/test/ (Standard normalized YOLO .txt files)
Located in , , (Standard normalized YOLO files) Negative Images: Background/negative images simply have empty .txt files in the labels/ directory.
yolo task=detect mode=train data=path/to/BloodshotNet-Dataset/data.yaml model=yolo11n.pt epochs=100
2. For Hugging Face Users
You can load this dataset seamlessly into your Python environment with a single line of code. The dataset contains metadata.jsonl files in the image directories that automatically map the YOLO annotations into standard absolute pixel bounding boxes [x_min, y_min, width, height] .
from datasets import load_dataset dataset = load_dataset( "petre-bit/BloodshotNet-Dataset" )
Dataset Structure
Data Splits
Train : 80% (18,809 images)
: 80% (18,809 images) Validation : 15% (3,528 images)
: 15% (3,528 images) Test: 5% (1,177 images)
Composition & Classes
60% Positive Images (Blood) : Forensic blood spatter, UFC fights, gore/horror movie scenes, bloody prints, surgery scenes, etc.
: Forensic blood spatter, UFC fights, gore/horror movie scenes, bloody prints, surgery scenes, etc. 40% Negative Images (Non-Blood) : Visually confusing red objects and contexts (red dresses, red flowers, brake lights, crowded scenes, kitchen objects).
: Visually confusing red objects and contexts (red dresses, red flowers, brake lights, crowded scenes, kitchen objects). Classes: 1 Class ( 0: blood )
Preprocessing & Annotation Adjustments
Label Filtering : All non-blood labels present in the original source datasets were strictly removed to isolate the blood class.
: All non-blood labels present in the original source datasets were strictly removed to isolate the class. Bounding Box Conversion : The source datasets contained a mix of bounding boxes and segmentation masks. All segmentation polygons were converted into tight bounding boxes by extracting the extreme (x, y) coordinate limits to ensure a standardized object detection format.
: The source datasets contained a mix of bounding boxes and segmentation masks. All segmentation polygons were converted into tight bounding boxes by extracting the extreme coordinate limits to ensure a standardized object detection format. No Resizing Applied: Images retain their original, diverse resolutions. Any resizing (e.g., to 640x640) should be handled dynamically by the model during the training/inference pipeline.
Intended Use & Limitations
Intended Use: Research and development in forensics, automated content moderation (flagging violent media), and safety monitoring systems.
Limitations: The dataset relies heavily on cinematic representations of blood and specific forensic datasets, which may not encompass all real-world lighting conditions, surface textures, or scenarios.
About the Creators & Attribution
This dataset was assembled, preprocessed, and formatted by the team at Bit to support the development of robust content moderation and detection systems.
Data Sources
This dataset is an aggregation of 16 datasets originally hosted on Roboflow Universe. All original datasets are licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Huge thanks to the creators:
blood_segmentation by blood-3pyjx
blooddetection-j2wid by orkun-lpkdc
cars-cars-cars-gh8ga by yasins-workspace-qbvyv
crime-data-st by harsha-cujyl
danger-place-detection by md-hasibul-islam
dress-jhov8 by animal-detection-q3wq1
flowers-pciqg by my-workspace-ebwf4
forensicvision-du5uz by forensicvision
horror-content-detector-0rv9z by myproject-zxfbp
kitchen-gt6wi by abinavn
movie-ywprp by quanle-shsvi
passive-and-transfer-stains by thesis-epiei
rbrelabel by tasfagvasd
sexual_content-za0gn by helmiworkshop-6o1xm
two-guo-2 by ownfallprincess
video_modera by videomoderation
License: Released under CC BY 4.0.