Personal

About me!

Niharika means nebula in Hindi/Sanskrit. The pronunciation is actually quite easy! Just read it out and put the stress on HA: ni-HA-rika. People also informally call me Ari. I use she pronouns. I am of Tamil descent.

I am a passionate advocate of LGBTQ+ rights and economic justice. I love animals and share my home with my very naughty and needy New Zealand white. Speaking of white rabbits, I love Jefferson Airplane; actually classic rock/heavy metal generally.

I like to spend my free time learning things! My current personal goals include:

improving my proficiency in modern Greek
strength training

Here are my current strength personal records:

Is Data-Science Science?

This is question I often hear. Depending on how it is dressed, it can be skeptical or outright dismissive, characterizing the craft as glorified number crunching. Personally, I think it is an important question to ask and helps answer what place the tool has in science. Here is my thought on the topic.

Science is a process by which we understand our world. It involves many steps; collecting data, building a hypothesis, setting up an experiment, building a model, interpreting data/results, collecting more data, and so on. A given scientific project may not involve all steps but the conclusions need to be reproducible. Data-science is a collection of methods that are remarkable at fitting data. They can be used to come up with decision boundaries, complete gaps in data, synthesize artificial data, and so on. The reason these methods surpass classical model fitting methods (e.g. linear or polynomial regression) is their ability to characterize large, high-dimensional datasets at high speeds.

Data-science methods can aid in many of the steps necessary to conduct science. For example, clustering methods can help build a hypothesis for the presence of a novel phenomenon or how one phenomenon may be related to another. In fact, these may be the only methods that will succeed in certain use cases. However, by themselves they only serve as a part of the scientific process. The scientist still needs to interpret whether trends found are meaningful and understand them or guide the process to get desired outcomes (e.g. designing feature/response vectors, selection of hypothesis function class). Data-science is not science by itself, just like math, coding, or instrumentation, but can serve as an important part of modern scientific practice. The techniques are powerful and require training and skill to make deft use of them while understanding limitations. As such, the application of a data-science method, regardless of sophistication, cannot and should not be the main purpose of a scientific endeavor.

Data Science and Society

I feel strongly that as scientists we must consider the broader implications of our research and act to mitigate negative impacts. Data science in research introduces special challenges that I feel are important to bring up. These are complicated issues and I do not have answers to the difficult questions they raise, but I think the first step is talking about them.

Climate change:

Specialized hardware for accelerating deep learning computations (e.g. GPUs) consume a lot of energy and thus have a surprisingly sizable contribution to climate change, especially if hooked onto traditional power grids. Training my most recent machine learning model consumed the same amount of energy as a return cross-continental commercial flight (such as LAX-NYC).

I support using https://mlco2.github.io/impact/ or a similar tool to estimate and state the carbon footprint of developing machine learning models in talks and publications. The awareness of my carbon impact has made me think twice before running unwarranted calculations. I also strive to use more energy efficient hardware for accelerating AI (e.g. FPGAs) where possible.

Inequity and Bias:

Being trained in data science methods can be a major barrier to entry, especially for people from disadvantages background. Not only does it require that the individual already have skills in mathematics and coding, but it can take years to become proficient. Training programs like the La Serena School for Data Science and the ZTF Summer School play a critical role in ensuring wide access to introductory training material. I encourage students to apply to such free programs when possible (please feel free to reach out to me for more information about these two specific programs).

Another barrier can be the data products. Generation/curation of training data can take up upwards of 80% of the data science lifecycle. It cannot be understated that the quality of data science tools depends critically on the quality of this dataset. I strive to release my codes and data products as soon as possible to the community. Please see my github for all my code and data: https://github.com/niharika-sravan.

Misuse:

This has recently entered into my radar as something to consider. Powerful new technologies have the potential to be misused for anti-social purposes, including tracking, harassment, cyber attacks, sabotage, and violence. I do not believe my current work can either directly or indirectly lead to products that be misused for such goals but I am cognizant and open minded.