Research Papers announce arxiv research github

HackRep: A Large-Scale Dataset of GitHub Hackathon Projects

arXiv cs.SEby [Submitted on 31 Mar 2026]April 1, 20261 min read1 views

arXiv:2603.29672v1 Announce Type: new Abstract: Hackathons are time-bound collaborative events that often target software creation. Although hackathons have been studied in the past, existing work focused on in-depth case studies limiting our understanding of hackathons as a software engineering activity. To complement the existing body of knowledge, we introduce HackRep, a dataset of 100,356 hackathon GitHub repositories. We illustrate the ways HackRep can benefit software engineering researchers by presenting a preliminary investigation of hackathon project continuation, hackathon team composition, and an estimation of hackathon geography. We further display the opportunities of using this dataset, for instance showing the possibility of estimating hackathon durations based on commit tim

View PDF

Abstract:Hackathons are time-bound collaborative events that often target software creation. Although hackathons have been studied in the past, existing work focused on in-depth case studies limiting our understanding of hackathons as a software engineering activity. To complement the existing body of knowledge, we introduce HackRep, a dataset of 100,356 hackathon GitHub repositories. We illustrate the ways HackRep can benefit software engineering researchers by presenting a preliminary investigation of hackathon project continuation, hackathon team composition, and an estimation of hackathon geography. We further display the opportunities of using this dataset, for instance showing the possibility of estimating hackathon durations based on commit timestamps.

Subjects:

Software Engineering (cs.SE)

Cite as: arXiv:2603.29672 [cs.SE]

(or arXiv:2603.29672v1 [cs.SE] for this version)

https://doi.org/10.48550/arXiv.2603.29672

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Lavinia Paganini [view email] [v1] Tue, 31 Mar 2026 12:30:13 UTC (139 KB)

Original source

arXiv cs.SE

https://arxiv.org/abs/2603.29672

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

announcearxivresearch

ModelsFresh

Built a zero allocation, header only C++ Qwen tokenizer that is nearly 20x faster than openai Tiktoken

I'm into HPC, and C++ static, zero allocation and zero dependancy software. I was studying BPE tokenizers, how do they work, so decided to build that project. I hardcoded qwen tokenizer for LLMs developers. I really know that whole Tokenization phase in llm inference is worth less than 2% of whole time, so practically negligible, but I just "love" to do that kind of programming, it's just an educational project for me to learn and build some intuition. Surprisingly after combining multiple different optimization techniques, it scored really high numbers in benchmarks. I thought it was a fluke at first, tried different tests, and so far it completely holds up. For a 12 threads Ryzen 5 3600 desktop CPU, 1 GB of English Text Corpus: - Mine Frokenizer: 1009 MB/s - OpenAI Tiktoken: ~ 50 MB/s Fo

Reddit r/LocalLLaMA

1mabout 3 hours ago

ProductsLive

Organization runner controls for Copilot cloud agent

Each time Copilot cloud agent works on a task, it starts a new development environment powered by GitHub Actions. By default, this runs on a standard GitHub-hosted runner, but teams The post Organization runner controls for Copilot cloud agent appeared first on The GitHub Blog .

GitHub Copilot Changelog

1mabout 2 hours ago

ModelsLive

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://github.com/cybergis/rs-embed submitted by /u/amritk110 [link] [comments]

Reddit r/MachineLearning

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 155 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersRecent

Label-free pathological subtyping of non-small cell lung cancer using deep classification and virtual immunohistochemical staining

npj Digital Medicine, Published online: 03 April 2026; doi:10.1038/s41746-026-02557-x Label-free pathological subtyping of non-small cell lung cancer using deep classification and virtual immunohistochemical staining

nature.com

1mabout 21 hours ago

Research PapersLive

First time NeurIPS. How different is it from low-ranked conferences? [D]

I'm a PhD student and already published papers in A/B ranked paper (10+). My field of work never allowed me to work on something really exciting and a core A* conference. But finally after years I think I have work worthy of some discussion at the top venue. I'm referring to papers (my field and top papers) from previous editions and I notice that there's a big difference on how people write, how they put their message on table and also it is too theoretical sometimes. Are there any golden rules people follow who frequently get into these conferences? Should I be soft while making novelty claims? Also those who moved from submitting to niche-conferences to NeurIPS/ICML/CVPR, did you change your approach? My field is imaging in healthcare. submitted by /u/ade17_in [link] [comments]

Reddit r/MachineLearning

1mabout 2 hours ago

Research PapersLive

Researchers Discover How to Add Psilocybin, DMT, and Other Psychedelics to Tobacco

AI assisted with the study, which could make it cheaper and easier to produce these mind-bending drugs.

Gizmodo

3mabout 2 hours ago

Research Papers

Antonio Torralba, three MIT alumni named 2025 ACM fellows

Torralba’s research focuses on computer vision, machine learning, and human visual perception.

MIT AI News

2mabout 2 months ago