Mining Software Repositories

View My GitHub Profile

Twitter: @msrconf

MSR Foundational Contribution Award

2023 Audris Mockus
“For his sustained contributions to mining software repositories, starting with his early case studies of Apache and Mozilla to his introduction of the World of Code infrastructure.”
2022 Dongmei Zhang and Tao Xie
“For their fundamental contributions in opening new areas of software analytics research with pioneering results and techniques that have had high industrial impact.
2021 Bram Adams, Queens University, Canada
“For pioneering and evangelising the field of release engineering.”
2020 Jonathan Maletic and Michael Collard
“For the srcML infrastructure, which addresses many hard problems in source code parsing and has fostered a wide range of research innovations throughout software engineering.”
2019 Katsuro Inoue
“For fostering a vibrant international community around software clone analysis and the development of the CCFinder clone detector, which has enabled countless others to do research involving code clones.”
2018 Georgios Gousios
2017 Tim Menzies, North Carolina State University, USA
“For his pioneering and meticulous efforts in the creation and maintenance of the PROMISE data repository. The PROMISE repository has had a tremendous and widely-recognized impact on raising the bar for rigorous and repeatable software engineering research worldwide.”

MSR Ric Holt Early Career Achievement Award

The MSR Early Career Achievement Award was named in 2020 after Ric Holt.

2023 Li Li
“For his contributions to the state-of-the-art in research and practice of software quality and evolution, especially concerning the evolution of Android apps and the Android OS.”
2022 Gustavo Pinto
“For his seminal contributions to the study of software sustainability, along dimensions of energy-consumption and social aspects of open source.”
2021 Bogdan Vasilescu, Carnegie Mellon University, USA
“For his seminal contributions in the area of socio-technical behaviour of software engineers.”
2020 Alberto Bacchelli, University of Zurich, Switzerland
“For his seminal contributions to modern code review.”
2019 Emad Shihab
“For contributions to the state of the art in research and practice in software quality assurance as well as outreach and education efforts throughout the international MSR community.”
2018 Meiyappan Nagappan
2017 Abram Hindle, University of Alberta, Canada
“To recognize the rigor, fearlessness, and breadth of his MSR-related research, and for establishing a new area of research related to green-mining.”

MSR Doctoral Research Award

2023 Eman Abdullah AlOmar
“For her contribution to understanding developer perception of refactoring: literature reviews, mining, tool, and empirical, surveys that are being adopted by the community, industrial trials.”

Most Influential Papers

2023 Mining source code repositories at massive scale using language modeling
Miltiadis Allamanis and Charles Sutton.
For pioneering the use of large scale language models on source code, something that has become extremely relevant and timely after 10 years.”
2023 The impact of tangled code changes
Kim Herzig and Andreas Zeller.
“For emphasizing how the scope of a commit as a cohesive and self-contained piece of work has been an assumption for several pieces of research done by the MSR community, and for providing an approach to split a tangled commit into cohesive ones.”
2022 GHTorrent: Github’s data from a firehose
Georgios Gousios and Diomidis Spinellis
“For conceiving and maintaining the GHTorrent archive, extensively leveraged by the MSR community.”
Runner Up:
App store mining and analysis: MSR for app stores
Mark Harman, Yue Jia, and Yuanyuan Zhang
“For the pioneering work in the area of app store analytics.”
2021 How do developers blog? An exploratory study
Dennis Pagano and Walid Maleej
“For widening the scope of our community with the study of social media.”
2020 An extensive comparison of bug prediction approaches
Marco D’Ambros, Michele Lanza, and Romain Robbes
“For early adoption of open science principles by sharing a dataset that enabled significant further research in bug prediction.”
Honorable Mention: Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings
Gregorio Robles
“For highlighting the importance of replication in mining software repositories.”
2019 The Promises and Perils of Mining Git
Christian Bird, Peter Rigby, Earl Barr, David Hamilton, Daniel German, and Prem Devanbu
2018 What do large commits tell us? A taxonomical study of large commits
Abram Hindle, Daniel German and Ric Holt
2017 How Long Will It Take to Fix This Bug?
Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, and Andreas Zeller
“For paving the way of actionable software analytics.”
2016 Mining email social networks (MSR 2006)
Christian Bird, Alex Gourley, Prem Devanbu, Michael Gertz and Anand Swaminathan
“For their fundational influence on studies of socio-technical activities in software projects.”
2015 When do changes induce fixes? (MSR 2005)
Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller
“Prior software quality research focused on flagging files with bugs, but the SZZ algorithm by Sliwerski et al. was the first work to focus on flagging faulty changes. By flagging bugs before they get into the code, follow-up research has taken a preventive role instead of a catchup role.”
Honorable Mention: Developer Identification Methods for Integrated Data from Various Sources
Gregorio Robles and Jesus M. Gonzalez-Barahona
“For their pioneering efforts in mining social information. Such information has been the catalyst for many research efforts throughout software engineering.”
2014 Preprocessing CVS Data for Fine-Grained Analysis (MSR 2004)
Thomas Zimmermann and Peter Weißgerber
“For clearly and engagingly presenting practices that stood at the core of early MSR approaches, thus lowering the entry barrier for the researchers worldwide to join this emerging field.”

ACM SIGSOFT Distinguished Papers (since 2015)

Starting 2015 the MSR conference recognized outstanding papers with ACM SIGSOFT Distinguished Paper Awards. Since 2016 there is no separate Best Paper award anymore.

2023 AutoML from Software Engineering Perspective: Landscapes and Challenges
Chao Wang, Zhenpeng Chen, Minghui Zhou
2023 Investigating the Resolution of Vulnerable Dependencies with Dependabot Security Updates
Hamid Mohayeji Nasrabadi, Andrei Agaronian, Eleni Constantinou, Nicola Zanone, Alexander Serebrenik
2023 The ABLoTS Approach for Bug Localization: is it replicable and generalizable?
Feifei Niu, Christoph Mayr-Dorn, Wesley Assunção, Liguo Huang, Jidong Ge, Bin Luo, Alexander Egyed
2022 A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts
Konstantin Grotov, Sergey Titov, Vladimir Sotnikov, Yaroslav Golubev, Timofey Bryksin
2022 On the Violation of Honesty in Mobile Apps: Automated Detection and Categories
Humphrey Obie, Idowu Oselumhe Ilekura, Hung Du, Mojtaba Shahin, John Grundy, Li Li, Jon Whittle, Burak Turhan
2022 Operationalizing Threats to MSR Studies by Simulation-Based Testing
Johannes Härtel, Ralf Laemmel
2021 What code is deliberately exclude from test coverage and why?
Andre Hora
2021 Fast memory-efficient neural code completion
Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Franco, Miltiadis Allamanis
2021 Escaping the Time Pit: Pitfalls and Guidelines for using Time-Based Git Data
Samuel W. Flint, Jigyasa Chauhan, Robert Dyer
2020 Traceability Support for Multi-Lingual Software Projects
Yalin Liu, Jinfeng Lin and Jane Cleland-Huang
2020 Ethical Mining – A Case Study on MSR Mining Challenges
Nicolas Gold and Jens Krinke
2020 A Machine Learning Approach for Vulnerability Curation
Yang Chen, Andrew Santosa, Ming Yi Ang, Abhishek Sharma, Asankhaya Sharma and David Lo
2019 Data-Driven Solutions to Detect API Compatibility Issues in Android: An Empirical Study
Simone Scalabrino, Gabriele Bavota, Mario Linares-Vasquez, Michele Lanza and Rocco Oliveto
2019 Standing on Shoulders or Feet? The Usage of the MSR Data Papers
Zoe Kotti and Diomidis Spinellis
2019 A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks
João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo and Juliana Freire
2018 Prevalence of Confusing Code in Software Projects - Atoms of Confusion in the Wild
Dan Gopstein, Hongwei Zhou, Phyllis Frankl and Justin Cappos
2018 Towards Extracting Web API Specifications from Documentation
Jinqiu Yang, Erik Wittern, Annie T.T. Ying, Julian Dolby and Lin Tan
2017 Classifying code comments in Java open-source software systems
Luca Pascarella and Alberto Bacchelli
2017 Some From Here, Some From There: Cross-Project Code Reuse in GitHub
Mohammad Gharehyazie, Baishakhi Ray and Vladimir Filkov
2016 Adressing problems with external validity of repository mining studies through a smart data platform
Fabian Trautsch, Steffen Herbold, Philip Makedonski and Jens Grabowski
2016 Studying the Impact of Switching to a Rapid Release Cycle on Integration Delay of Addressed Issues - An Empirical Study of the Mozilla Firefox Project
Daniel Alencar da Costa, Shane McIntosh, Uirá Kulesza and Ahmed E. Hassan
2015 Do Bugs Foreshadow Vulnerabilities? A Study of the Chromium Project
Felivel Camilo, Andrew Meneely and Meiyappan Nagappan
2015 Characterization and prediction of issue-related risks in software projects
Morakot Choetikertikul, Hoa Khanh Dam, Truyen Tran and Aditya Ghose

MSR Distinguished/Best Papers (until 2015)

Until 2015 the MSR conference recognized outstanding papers with MSR Best Paper Awards or in the case of multiple winners with MSR Distinguished Paper Awards. In 2015 the MSR conference awarded both MSR Best Paper and ACM SIGSOFT Distingusihed Paper Awards.

2015 Do Bugs Foreshadow Vulnerabilities? A Study of the Chromium Project
Felivel Camilo, Andrew Meneely and Meiyappan Nagappan
2014 Towards Building a Universal Defect Prediction Model
Feng Zhang, Audris Mockus, Iman Keivanloo, and Ying Zou

The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects
Shane Mcintosh, Yasutaka Kamei, Bram Adams, and Ahmed E. Hassan
2013 Automatically Mining Software-based, Semantically-similar Words from Comment-Code Mappings
Matthew J. Howard, Lori Pollock, K. Vijay-Shanker, and Samir Gupta
2012 Think Locally, Act Globally: Improving Defect and Effort Prediction Models
Nicolas Bettenburg, Meiyappan Nagappan, and Ahmed E. Hassan

Green mining: A methodology of relating software change to power consumption
Abram Hindle
2011 How Developers Use the Dynamic Features of Programming Languages: the Case of Smalltalk
Oscar Callaú, Romain Robbes, Éric Tanter, and David Röthlisberger
2010 Clones: What is that Smell?
Foyzur Rahman, Christian Bird, and Premkumar Devanbu
2009 Mining search topics from a code search engine usage log
Sushil Krishna Bajracharya and Cristina Videira Lopes
2008 AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools
Emily Hill, Zachary P. Fry, Haley Boyd, Giriprasad Sridhara, Yana Novikova, Lori L. Pollock, and K. Vijay-Shanker
2007 Identifying Changed Source Code Lines from Version Repositories
Gerardo Canfora, Luigi Cerulo, and Massimiliano Di Penta
2006 Mining Large Software Compilations over Time: Another Perspective of Software Evolution
Gregorio Robles, Jesus M. Gonzalez-Barahona, Martin Michlmayr, and Juan Jose Amor

FOSS Impact Paper Award

In an effort to encourage research on understanding and improving FOSS (Free, Open Source Software), MSR has established the “FOSS Impact paper” award. The award will be granted to papers that show outstanding contributions to the FOSS community.

2023 UNGOML: Automated Classification of unsafe Usages in Go
Anna-Katharina Wickert, Clemens Damke, Lars Baumgärtner, Eyke Hüllermeier, Mira Mezini
2022 SECOM: Towards a Convention for Security Commit Messages
Sofia Reis, Rui Abreu, Hakan Erdogmus, Corina S. Păsăreanu
2021 Which contributions count? Analysis of attribution in open source
Jean-Gabriel Young, Amanda Casari, Katie McLaughlin, Milo Z. Trujillo, Laurent Hébert-Dufresne, James P. Bagrow
2020 The Impact of a Major Security Event on an Open Source Project: The Case of OpenSSL
James Walden
“For its analysis on the impact of the Heartbleed vulnerability on the OpenSSL project, providing recommendations for how open source projects can adapt and improve. The author provides guidance regarding metrics that helps to assess the health of a project. This paper is a piece of literature that interesting and relevant to many FOSS maintainers.”
2019 A Large-scale Study about Quality and Reproducibility of Jupyter Notebooks
João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo and Juliana Freire
Special Mentions:
git2net: Mining Time-Stamped Co-Editing Networks from Large git Repositories
Christoph Gote, Ingo Scholtes and Frank Schweitzer
Investigating Next Steps in Static API-Misuse Detection
Sven Amann, Hoan Nguyen, Sarah Nadi, Tien Nguyen and Mira Mezini
2018 Characterising Deprecated Android APIs
Li Li, Jun Gao, Tegawendé F. Bissyandé, Lei Ma, Xin Xia and Jacques Klein

Best Data Showcase Award

2022 A Large-scale Dataset of (Open Source) License Text Variants
Stefano Zacchiroli
2022 Vul4J: A Dataset of Reproducible Java Vulnerabilities Geared Towards the Study of Program Repair Techniques
Quang-Cuong Bui, Riccardo Scandariato, Nicolás E. Díaz Ferreyra
2021 DUETS: a Dataset of Reproducible Pairs of Java Library-Clients
Thomas Durieux, César Soto-Valero, Benoit Baudry
2020 GitterCom: A Dataset of Open Source Developer Communications in Gitter
Esteban Parra, Ashley Ellis and Sonia Haiduc
2019 The Maven Dependency Graph: a Temporal Graph-based Representation of Maven Centra
Amine Benelallam, Nicolas Harrand, César Soto-Valero, Benoit Baudry and Olivier Barais
Special Mention:GreenHub Farmer: Real-world data for Android Energy Mining
Hugo Matalonga, Bruno Cabral, Fernando Castor, Marco Couto, Rui Pereira, Simão Melo de Sousa and João Paulo Fernandes
2018 VulinOSS: A dataset of security vulnerabilities in open-source systems.
Antonios Gkortzis, Dimitris Mitropoulos, and Diomidis Spinellis.
2017 A Data Set of OCL Expressions on GitHub
Jeroen F.H. Noten, Josh G.M. Mengerink, Alexander Serebrenik
2016 Data Sets: The Circle of Life in Ruby Hosting, 2003-2015
Megan Squire
2015 A Repository with 44 Years of Unix Evolution
Diomidis Spinellis
2014 A dataset for pull-based development research
Georgios Gousios and Andy Zaidman
2013 The GHTorent Dataset and Tool Suite
Georgios Gousios

Mining Challenge Winners

2022 Best Mining Challenge Paper:
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and Refactoring
Anthony Peruma, Eman Abdullah AlOmar, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni
Best Mining Challenge Student Presentations:
Nicholas Nagy for presenting On the Co-Occurrence of Refactoring of Test and Source Code
Anthony Peruma for presenting Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and Refactoring
2021 PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects
Arthur Veloso Kamienski, Luisa Palechor, Abram Hindle, Cor-Paul Bezemer
2020 ?
2019 Best Mining Challenge Paper:
Python Coding Style Compliance on Stack Overflow
Nikolaos Bafatakis, Niels Boecker, Wenjie Boon, Martin Cabello Salazar, Jens Krinke, Gazi Oznacar and Robert White
Best Mining Challenge Student Presentation:
Durham Abric for presenting the paper “Can Duplicate Posts on Stack Overflow Benefit the Software Development Community?” by Durham Abric, Oliver Clark, Matthew Caminiti, Keheliya Gallaba and Shane McIntosh
2018 ?
2017 How Does Contributors’ Involvement Influence the Build Status of an Open-Source Software Project?
Marcel Rebouças, Renato Oliveira Dos Santos, Gustavo Pinto and Fernando Castor
2016 Judging a commit by its cover: Correlating commit message entropy with build status on Travis-CI
Eddie Antonio Santos and Abram Hindle
2015 Mining StackOverflow to Filter out Off-topic IRC Discussion
Shaiful Chowdhury and Abram Hindle
2014 Sentiment Analysis of Commit Messages in GitHub: An Empirical Study
Emitza Guzman, David Azócar, and Yang Li

Honorable mention: Do developers discuss design?
João Brunet, Gail C. Murphy, Ricardo Terra, Jorge Figueiredo, and Dalton Serey

Honorable mention: A Study of External Community Contribution to Open-Source Projects on GitHub
Rohan Padhye, Senthil Mani, and Vibha Singhal Sinha
2013 Encouraging User Behaviour with Achievements: An Empirical Study
Scott Grant and Buddy Betts
2012 Do the stars align? Multidimensional analysis of Android’s layered architecture
Victor Guana, Fabio Rocha, Abram Hindle and Eleni Stroulia
2011 Apples Vs. Oranges? An exploration of the challenges of comparing the source code of two software systems
Daniel M. German and Julius Davies
2010 Cloning and Copying between GNOME Projects
Jens Krinke, Nicolas Gold, Yue Jia, and David Binkley
2009 On the use of Internet Relay Chat (IRC) meeting by developers of the GNOME GTK+ project
Emad Shihab, Zhen Ming Jiang, and Ahmed E. Hassan
2008 A newbie’s guide to Eclipse APIs
Reid Holmes and Robert J. Walker
2007 Mining Eclipse Developer Contributions via Author-Topic Models
Erik Linstead, Paul Rigor, Sushil Bajracharya, Cristina Lopes, and Pierre Baldi
2006 A study of the contributors of PostgreSQL
Daniel M. German

Best Hackathon Paper

2022 Bot Detection in GitHub Repositories
Natarajan Chidambaram, Pooya Rostami Mazrae

MSR Distinguished Reviewers

Regular PC
Bianca Trinkenreich
Csaba Nagy
Dong Wang
Fabio Palomba
Maxime Lamothe
Melina Vidoni
Pooja Rani
Romain Robbes
Serena Elisa Ponta
Thomas Durieux
Vincent Hellendoorn
Yuan Tian
Zsuzsanna Onet-Marian
Junior PC
Carolin Brandt
Daniel Feitosa
Kevin Jesse
Lina Ochoa
Luís Cruz
Max Hort
Victoria Jackson
2022 Melina Vidoni
Earl Barr
Breno Miranda
Shaowei Wang
Ewan Tempero
Alexander Serebrenik
Kevin Moran
Sebastiano Panichella
Triet Le
Leandro Minku
Tapajit Dey
Daniel Alencar da Costa
2020 Serge Demeyer
Andre Hora
Nikolaos Tsantalis
Maurício Aniche
Maleknaz Nayebi
Jesus M. Gonzalez-Barahona
Francisco Servant
Philipp Leitner
2019 Kelly Blincoe
Eleni Constantinou
Jesus González-Barahona
Sebastian Proksch
Fernando Castor
Alessandro Garcia
Eirini Kalliamvakou
Erik Wittern