My Research
I am currently working on
 Multiplayer bandits (2018 – ????) (paper 1, paper 2).
Consider a multiplayer stochastic multiarmed bandit game in which the players cannot communicate, and if two or more players pull the same arm, a collision occurs and the involved players receive zero reward.

Two feedback models can be considered: a model in which the players can observe whether a collision has occurred, and a more difficult setup when no collision information is available. With Gabor Lugosi we gave the first theoretical guarantees for the second model: an algorithm with a logarithmic regret, and an algorithm with a squareroot regret type that does not depend on the gaps between the means. For the first model, we gave the first squareroot regret bounds that do not depend on the gaps. Building on these ideas, we also gave an algorithm for reaching approximate Nash equilibria quickly in stochastic anticoordination games (the paper).

Next consider the setting where each arm has a different mean for each player.
In the model in which the players can observe whether a collision has occurred, an algorithm with regret $O((\log T)^{2+\kappa})$ for any constant $\kappa$ was presented in this paper, who left the existence of an algorithm with $O(\log T)$ regret as an open question ($T$ is the number of rounds). With Emilie Kaufmann we provided an affirmative answer to this question in the case when there is a unique optimal assignment of players to arms. For the general case we presented an algorithm with expected regret $O((\log T)^{1+\kappa})$, for any constant $\kappa$ (the paper).

Density estimation, or sample complexity of distribution learning (2017 – ????) (paper 1,
paper 2,
a tutorial covering the results of papers 1 and 2,
paper 3).
Given samples from an unknown probability distribution with density $f$,
we want to output a density function $g$ such that with high probability $\gf\_1\leq\varepsilon$.
Given that $f$ belongs to (or can be approximated by) a class of distributions $\mathcal F$, what is the minimum number of samples required?

Suppose we know $f$ is a mixture of $k$ Gaussians in $d$ dimensions,
and we would like $g$ to have a similar structure.
With Hassan Ashtiani, Shai BenDavid, Nick Harvey, Chris Liaw and Yaniv Plan,
we proved that $\widetilde \Theta (kd^2/\varepsilon^2)$ samples are necessary and sufficient for this task,
improving both the known upper and lower bounds.
Moreover, for mixtures of axisaligned Gaussians,
$\widetilde \Theta (kd/\varepsilon^2)$ samples are necessary and sufficient.
For proving these bounds we have developed a novel distribution learning technique based on compression schemes
(the paper).
With Luc Devroye and Tommy Reddad, we
removed the logarithmic factors in the lower bound, and showed that $\Omega(kd^2/\varepsilon^2)$ samples are necessary for this problem
(the paper).

Suppose the sample complexity for learning class $\mathcal F$ is $m(\varepsilon)$.
Under mild technical conditions,
with Hassan Ashtiani and Shai BenDavid we showed the sample complexity for learning $k$mixtures of distributions in $\mathcal F$ is bounded by $O( m(\varepsilon) k \log (k) / \varepsilon^2)$
(the published paper had a bug which was fixed in the arXiv version).

Let $G$ be an undirected graph with $m$ edges and $d$ vertices, and let $\mathcal F$ be the class of Ising models defined on $G$.
With Luc Devroye and Tommy Reddad, we showed the optimal sample complexity for learning $\mathcal F$ is $\Theta((m+d)/\varepsilon^2)$
(the paper).
I have worked on
 Total variation distance between highdimensional Gaussians (2018) (the paper).
What is the total variation distance between two highdimensional Gaussians
$N(\mu_1,\Sigma_1)$ and $N(\mu_2,\Sigma_2)$?
If $\mu_1=\mu_2$, we show the answer is
$\Theta(\min\{1,\\Sigma_1^{1}\Sigma_2I_d\_F\})$,
where $\A\_F$ denotes the Frobenius norm of $A$.
If $\mu_1\neq\mu_2$, then let $v= \mu_1\mu_2$, and let $\Pi$ be a matrix whose columns form a basis for the subspace orthogonal to $v$.
Then the total variation distance is
$
\Theta \left(\min\left\{1, \max\left\{ \frac{v^{\mathsf{T}} (\Sigma_1\Sigma_2)v}{v^{\mathsf{T}} \Sigma_1 v},\sqrt\frac{{v^{\mathsf{T}} v}}{{v^{\mathsf{T}} \Sigma_1 v}},
\ (\Pi^{\mathsf{T}} \Sigma_1 \Pi)^{1}
\Pi^{\mathsf{T}} \Sigma_2 \Pi  I \_F\right\} \right\} \right)$.
Joint work with Luc Devroye and Tommy Reddad.

VCdimension and pseudodimension of neural networks (2016  2017) (the extended abstract and the full version of the paper)
Deep neural networks underlie many of the recent breakthroughs of applied
machine learning, particularly in image and speech recognition.
These successes motivate a renewed study of these networks' theoretical properties.
Classification is one of the learning tasks in which deep neural networks have been particularly successful, e.g., for image recognition.
A natural foundational question that arises is:
what are the theoretical limits on the classification power of these networks?
The established way to formalize this question is by considering the VCdimension,
as it is well known that this asymptotically determines the sample complexity of PAC learning with such classifiers.
With Peter L. Bartlett, Nick Harvey and Chris Liaw,
we proved new upper and lower bounds on the VCdimension of deep neural networks with the ReLU activation function.
These bounds are tight for almost the entire range of parameters.
Letting $W$ be the number of parameters and $L$ be the number of layers,
we proved that the VCdimension is $O(W L \log(W))$ and $\Omega(WL\log(W/L)).$
This improves both the previously known upper bounds and lower bounds.
These bounds generalize to arbitrary piecewise linear activation functions, and also hold for the pseudodimension of these networks.
In terms of the number $U$ of nonlinear units, we proved a tight bound $\Theta(W U)$ on the VCdimension and pseudodimension, which holds for arbtirary piecewise polynomial activation functions.

Growing a tree in a graph (2016  2017) (the paper)
Given a graph and a starting vertex, we iteratively build a random spanning tree as follows:
initially the tree has just one vertex.
in each round, we select a random edge with exactly one endpoint in the tree,
and add that edge to the tree.
We give bounds for the height of the resulting tree,
in terms of parameters of the graph (diameter, maximum degree, genus, degeneracy, edge expansion, etc).

Randomized rumour spreading (2013  2017) (paper 1, paper 2,
paper 3,
a survey in Persian)
The synchronous push&pull protocol is a roundrobin rumour spreading protocol, defined as follows.
Suppose that one node in a network is aware of a piece of information, the "rumour."
The protocol proceeds in rounds.
In each round, every informed node contacts a random neighbour and sends the rumour to it ("pushes" the rumour),
and every uninformed nodes contacts a random neighbour and gets the rumour if the neighbour possibly knows it ("pulls" the rumour).
This model was defined in this 1987 paper and popularized by this 2000 paper.

Random ktrees are a class of lowdiameter scalefree random graphs with large clustering coefficient, built as follows:
initially we have a kclique.
In every step a new vertex is born, a random kclique of the current graph is chosen,
and the new vertex is joined to all vertices of the kclique.
With Ali Pourmiri I studied the behaviour of the synchronous push&pull protocol on random ktrees
and proved that for each fixed k>1, if initially a random vertex is aware of the rumour,
then with high probability after polylogarithmic (in the number of vertices) many rounds, the rumour propagates to almost all vertices.
We proved a similar result for a closely related class of graphs, random kApollonian networks.
On the other hand, we proved that if we want to inform each and every vertex, with high probability we need at least polynomially many rounds (the paper, see also its talk video).

The asynchronous variant of the push&pull protocol is defined as follows.
Independent Poisson clocks of rate 1 are associated with the vertices of graph G, one to each vertex.
Initially, one vertex of G knows the rumour.
Whenever the clock of a vertex x rings, it calls a random neighbour y:
if x knows the rumour and y does not, then x tells y the rumour,
and if x does not know the rumour and y knows it, y tells x the rumour.
The guaranteed spread time is the smallest time that with high probability all vertices know the rumour.
With Huseyin Acan, Andrea Collevecchio and Nick Wormald, we found the worst and best possible spread times, up to constant factors, for both variants.
We also proved the first theoretical relationships between the guaranteed
spread times in the two versions. Firstly, in all graphs the guaranteed spread time in the asynchronous
version is within an O(log n) factor of that in the synchronous version,
and this is tight. Next, we found examples of graphs whose asynchronous spread times are
logarithmic, but the synchronous versions are polynomially large.
Finally, we show for any graph that the ratio of the synchronous spread time to the asynchronous spread time is O(n^{2/3})
(the paper).

With Omer Angel and Yuval Peres,
we showed for any graph that the ratio of the guaranteed synchronous spread time to the guaranteed asynchronous spread time is $O(n\log n)^{1/3}$, which is tight up to logarithmic factors (the paper).
 Load balancing via randomized local search (2015  2016) (the paper)
Consider $n$ identical resources and $m$ identical users.
Each user must be assigned to exactly one resource.
We analyze a distributed randomized local search protocol introduced in 2004 by Goldberg,
whose goals is to assign users to resources such that the load is perfectly balanced.
In this protocol, users are activated by independent exponential clocks of rate 1.
On activation, a user samples a random resource and compares that resource's load with its current load.
If the new load is lower, the user switches.
It is shown in this paper that the expected balancing time is $O(\ln^2 n+\ln(n)\cdot n^2/m)$.
With Petra Berebrink, Peter Kling, and Chris Liaw, we improved this to $O(\ln(n)+n^2/m)$, which is tight.
 Properties of the spatial preferential attachment model (2014  2016) (the paper)
Rumour spreading is a protocol that models the spread of information through a network via usertouser interaction. The spread time of a graph is the number of rounds needed to spread the rumour to the entire graph. The Spatial Preferred Attachment (SPA) model (introduced here) is a random graph model for complex networks: vertices are placed in a metric space, and the link probability depends on the metric distance between vertices, and on their degree.
With Jeannette Janssen we showed that the SPA model typically produces graphs that have small effective diameter, i.e. $O(\log^2 n)$, while rumour spreading is relatively slow, namely polynomial in $n$.
 Geometric bounds for the deviation of random matrices (2015  2016) (the paper)
Let $A$ be an isotropic, random subgaussian $m\times n$ matrix.
The Gaussian complexity of a set $T\subseteq \mathbb{R}^n$
is defined as
$\mathbb E \sup_{x \in T} \langle g , x \rangle $,
where $g$ has i.i.d. standard normal entries.
With Chris Liaw, Yaniv Plan, and Roman Vershynin, we showed that for any bounded set $T$,
the deviation of
$\Ax\_2$ around its mean is uniformly bounded by the Gaussian complexity of $T$.
We also proved a local version of this theorem, which allows for unbounded sets.
These theorems have various applications, in particular, we gave a new result regarding model selection in the constrained linear model.

Diameters of realworld network models (2013  2014) (paper 1, paper 2)

I have developed a new technique for proving logarithmic upper bounds for the diameters of evolving random graph models,
which is based on defining a coupling between random graphs and variants of random recursive trees.
The advantage of the technique is twofold: it is quite simple and provides short proofs,
and it is applicable to a broad variety of models, including those incorporating preferential attachment.
Using this I proved, for the first time, logarithmic upper bounds for the diameters of the following well known models:
the forest fire model, the copying model, the PageRankbased selection model, the AielloChungLu models, the generalized linear preference model, directed scalefree graphs, and the CooperFrieze model (the paper).

The randomsurfer Webgraph model is defined as follows.
Let d be a positive integer and let p ∈ (0,1].
Generate a random directed rooted nvertex multigraph, with all vertices having outdegree d, as follows.
Start with a single vertex v_{0}, the root, with d selfloops.
At each subsequent step s, where
1 ≤ s ≤ n1, a new vertex v_{s} appears and d edges are created from it to vertices in {v_{0},v_{1},...,v_{s1}}, by doing the following probabilistic procedure d times, independently:
choose a vertex u uniformly at random from {v_{0},v_{1},...,v_{s1}},
and a fresh geometric random variable X with parameter p;
perform a simple random walk of length X starting from u, and join v_{s} to the last vertex of the walk.
This model is equivalent to the PageRankbased selection model.
With Nick Wormald, I proved the diameter of both models is at most 8 e^{p} log n / p.
This is the first logarithmic upper bound for these models, and verifies that they are "smallworld."
In the special case d=1, we obtain a directed tree with some selfloops at the root.
We proved sharp results for the underlying undirected tree.
When p ≥ 0.21 we determined the asymptotic value of its height and diameter, and when p<0.21, we provided logarithmic lower and upper bounds (the paper).
 Properties of Random Apollonian Networks (2012  2014) (paper 1, paper 2)

Start with a triangle. In each step, choose a random face (not the unbounded face), add a point in that face and join it to the vertices on the face. After n steps, you will have a (random) triangulated plane graph with n+3 vertices.
With E. Ebrahimzadeh, L. Farczadi, P. Gao, C. Sato, N. Wormald, and J. Zung
I showed that asymptotically almost surely (a.a.s.) as n → ∞ every path in a RAN has length o(n),
refuting a conjecture of Frieze and Tsourakakis.
We also showed that a RAN always has a path of length (2n5)^{log(3)/log(2)},
and that the expected length of its longest path is Ω(n^{0.88}).
Finally, we proved that a.a.s. the diameter of a RAN is asymptotic to c log n,
where c ≈ 1.668 is the solution of an explicit equation (the paper).

With Andrea Collevecchio and Nick Wormald, I proved there exists a constant a<1 such that a.a.s. every path in a RAN has length smaller than n^{a}, verifying a conjecture of Cooper and Frieze.
Using a similar technique, we showed that if r < d are fixed constants, then a.a.s. every rary subtree of a random dary recursive tree on n vertices has less than n^{b} vertices, for some b=b(d,r)<1
(the paper).
 UniquePath Labellings or Good EdgeLabellings (2011  2013) (paper 1, paper 2)

A uniquepath labelling (a.k.a. good edgelabelling) of a simple graph
is a labelling of its edges with real numbers such that,
for any ordered pair of vertices (u,v),
there is at most one nondecreasing path from u to v.
Say a graph is good if it admits a good edgelabelling,
and is bad otherwise.
Bode, Farzad, and Theis conjectured that all graphs with large enough girth are good (their paper).
I proved that if the maximum degree of a graph is within a constant factor of its average degree, and the graph is good, then it has at most c n^{1+o(1)} edges, for some c. This disproves the conjecture of Bode et al. (the paper).

Araújo, Cohen, Giroire, and Havet
observed that the maximum number of edges of a
good graph on n vertices
is between a nlog(n) and b n^{1.5} for some a, b,
and
asked
about more precise bounds (their paper).
With Dieter Mitsche and Pawel Pralat, I solved this problem by proving that
any good nvertex graph has at most n log_{2}(n) edges,
and that for every n there exists a good nvertex graph with n log_{2}(n)  O(n) edges.
(the paper)
 Stretch Factor of a Randomly Embedded Random Graph (2011  2012) (the paper)
Consider a random graph G(n,p) whose vertex set V has been randomly embedded in the unit square and whose edges are given weight equal to the geometric distance between their end vertices. Then each pair {u,v} of vertices have a distance in the weighted graph, and a Euclidean distance. The stretch factor of the embedded graph is defined as the maximum ratio of these two distances, over all u,v ∈ V. With Nick Wormald I gave upper and lower bounds on the stretch factor (holding asymptotically almost surely), and showed that for p not too close to 0 or 1, these bounds are best possible in a certain sense. Our results imply that the stretch factor is bounded with probability tending to 1 if and only if n(1p) tends to 0, answering a question of Joseph O'Rourke.
 DAG Partitioning Problem (2010  2012) (the paper)
Consider the following DAG Partitioning problem: given a directed acyclic graph with arc weights,
delete a set of arcs of minimum total weight so that each of the resulting connected components has exactly one sink. With Soroush Alamdari I proved that the problem is hard to approximate, namely
that there is no n^(1epsilon)approximation algorithm, for any fixed epsilon.
We also gave a polynomial time algorithm
for solving the DAG Partitioning problem in graphs with bounded pathwidth.
 Network Creation Games (2010  2011) (the conference paper, the full version) Consider a
network creation game, in which each player (vertex) has a fixed budget to establish links to other players. In this model, each link has unit price and each agent tries to minimize its cost, which is either its local diameter or its total distance to other players in the (undirected) underlying graph of the created network. This is similar to a directed version of this basic network creation game, and with Shayan Ehsani, MohammadAmin Fazli, Sina Sadeghian Sadeghabad, MohammadAli Safari, Morteza Saghafian and Saber Shokatfadaee, we proved some bounds for price of anarchy of this game, and in particular, we showed a nonmonotone property of this game, (similar to the Braess' paradox in network routing games).
 The Cops and Robber Game (2009  2012) (paper 1, paper 2.1, paper 2.2,
paper 3.1,
paper 3.2, paper 3.3)

My BSc thesis was on Cops and Robber game, a roundrobin vertextovertex pursuit game played on a graph. In my thesis (written in 2009 under the supervision of Mohammad Ali Safari, available in Farsi) I surveyed the known results on this game. I also proved a few results about the capture time of grids (the capture time of a graph, introduced here, is the minimum number of steps needed to capture the robber in that graph) (the paper).

In my Master’s I continued working on this game. I specially focused on the variant where the robber is faster than the cops (introduced here). With Noga Alon I proved that if the robber has speed s, then the cop number of a connected nvertex graph can be as large as Ω(n^{s/s+1}) (the paper). I had previously proved this result for s=2, 4, and had conjectured that this bound is asymptotically tight, i.e. O(n^{s/s+1}) cops can always capture the robber (the paper). This generalizes Meyniel's well known conjecture (see this survey), which states the same thing for s=1.

Next, I worked on the variant in which the robber has infinite speed.
With Noga Alon, I showed that the cop number and treewidth of planar graphs are of the same order,
and determined the cop number of random graphs (the paper).
Then I characterized graphs with cop number one and gave an algorithm for their detection, and proved some bounds for the cop number of expander graphs
(the paper).
Finally, I showed that any interval graph has cop number
O(√n), but there are chordal graphs with cop number Ω(n / log n) (the paper).
 Flows in Graphs (2006  2008)
An unoriented flow in a graph (introduced here), is an assignment of real numbers to the edges, such that the sum of the values of all edges incident with each vertex is zero. This is equivalent to a flow in a bidirected graph all of whose edges are extraverted. A nowherezero unoriented kflow is an unoriented flow with values from the set {±1,... ,±(k1)}. Together with Saieed Akbari, Aliakbar Daemi, Omid Hatami and Adel Javanmard, we worked on the problem of finding the minimum k such that a graph has an unoriented kflow. We proved several results for regular graphs (the paper) and hamiltonian graphs (the paper).
 Artificial Intelligence (2007  2008) I worked along with Arian Khosravi and Ali Dehghan (under supervision of Ramin Halavati) to implement a software controller for Ms. PacMan game, which is a simple realtime game. We played, discussed and implemented a variety of approaches, and finally came up with a heuristicbased program, which became first in the IEEE Congress on Evolutionary Computation (CEC 2007) competition (see the results), and was the best known agent for the game for two years (a short description of how our agent works).
 Complexity of Graph Homomorphisms (2007) I studied Cylindrical Construction, a concept introduced by Amir Daneshgar in order to prove hardness results for Graph Homomorphism Problem. I was able to move a few small steps towards the BangJensen and Hell's conjecture about the dichotomy of digraph homomorphism problem, but meanwhile the whole conjecture was proved in this paper.
back to my homepage
Last updated: 6 February 2019