Chapter 54 Linkedin 2017 Interview Questions Part 1

54.3 Top K URLs in last 24hr/1hr/5min

时间的限制我意思是给每一条记录在内存里面加上有效时间,好像叫TTL time to live,这样来统计last 5min/1hr的频率 ?

This class represents a timeseries which keeps several levels of data granularity (similar in principle to the loads reported by the UNIX ‘uptime’ command). It uses several instances (one per level) of BucketedTimeSeries as the underlying storage.

This can easily be used to track sums (and thus rates or averages) over several predetermined time periods, as well as all-time sums. For example, you would use to it to track query rate or response speed over the last 5, 15, 30, and 60 minutes.

The MultiLevelTimeSeries takes a list of level durations as an input; the durations must be strictly increasing. Furthermore a special level can be provided with a duration of ‘0’ – this will be an “all-time” level. If an all-time level is provided, it MUST be the last level present.

The class assumes that time advances forward – you can’t retroactively add values for events in the past – the ‘now’ argument is provided for better efficiency and ease of unittesting.

The class is not thread-safe – use your own synchronization!

follow up是怎么保证对全球各地用户的响应速度

##Design Spellchecker

54.5 380. Insert Delete GetRandom O(1)

底层数据结构是: vector + index map

一个类似的题,pop_from_top(), pop_from_tail(), get_random(),get_index(i)全部O(1),可以用circular buffer来做.

54.5.1 381. Insert Delete GetRandom O(1) - Duplicates allowed





[cling]$ #include <unordered_set>
[cling]$ #include <iostream>
[cling]$ using namespace std;
[cling]$ unordered_set<int> ui;
[cling]$ ui.insert(1);
[cling]$ ui.insert(2);
[cling]$ ui.insert(3);
[cling]$ ui.insert(4);
[cling]$ ui.insert(5);
[cling]$ ui.insert(6);
[cling]$ for(auto i: ui) cout << i<< endl;

54.6 Tournament Tree

Tournament tree 找secMin;
Tournament tree 的定义是parent 是孩子node的最小值, 如下例 return 5
              / \
             2   7
            / \  | \
           5   2 8  7

下面是一个find second smallest的implementation:

count是比较的次数,可以验证CLRS p215的结果, comparison times = \(N + ceil(log_2N) -2\)!!!

这个是用的top down的方法,所以用了_cache用类似DP的思想避免重复计算. 如果bottom up要建树winner tree, 太麻烦了. 参考:

54.7 Float to English Words

查看这个: facebook-2017-01

  1. how to get top 10 Exceptions for the past 24 hours in 400 machines and update every 5 minutes. General idea: Kafka + Storm. Uses sliding window, hashTable, heap. (这道题pinterest也问了)

54.9 Evely Split N numbers into K buckets + Desgin Youtube + Zappos

比如 [1,2,3,4,5,6,7,8] 要放到3个buckets里面使每个bucket的数字的和相等.


注意分法不是唯一的,比如上面的例子,可以分成(5,7)+(4,8)+(2,3,6), 也可以分成(5,7)+(1,3,8)+(2,4,6)

但是这个题目只要求返回true or false.所以DFS就解决了.


不好意思我没说清楚,题目不要求找到最高rank的,只要求实现get跟set,然后getRank是给定了可以直接用.在没有达到capacity的情况下只需要insert到minheap就好,如果达到了capacity就拿出lowest rank cache跟新来的cache比较,保留较大的,所以会是O(logn).还需要一个hash map,每次insert到minheap的时候也要存入hash map,这样可以做到get的时候是O(1).

indexheap ?

54.11 Hangman

樓主可不可以說說hangman的要求 跟怎麼design的?

先问一些基本的你整个system有些什么功能,如果开始是一个单个用户怎么设计,像client side server side怎么设计,用户log in的时候你怎么处理. 然后扩展到需要scale,然后就是基本数据库啊什么的



Loading word list from file...
   1 words loaded.
Welcome to the game, Hangman!
I am thinking of a word that is 5 letters long.
You have 8 guesses left.
Available letters: abcdefghijklmnopqrstuvwxyz
Please guess a letter: h
Good guess: h____
You have 8 guesses left.
Available letters: abcdefgijklmnopqrstuvwxyz
Please guess a letter: l
Good guess: h_ll_
You have 8 guesses left.
Available letters: abcdefgijkmnopqrstuvwxyz
Please guess a letter:

游戏其实分为很多类型single user game, MMORPG(Massively multiplayer online role-playing game). 首先从简单情况入手

  • Offline Single User Game



不需要client-server设计, 用户名甚至都不需要, 唯一需要的是单词库和逻辑. 单词库可以存在一个简单的文件里面. 单机版的逻辑有两个例子请看这里:
- hangmang1
- hangman2


  • Multiple Users Game


The new feature we can add is a ranking system which rank users according to their scores.

When user plays game, he can earn score for each game.

  1. word存在本地,用javascript或者jar运行在本地.如果游戏正常结束,把分数送回服务器;如果游戏被abort,就把历史记录送回服务器.这个有很大的安全问题,所以一般不会采用,就开始讨论下开拓思路.

  2. word存在服务器端,这个设计比较靠谱.


(1). leaderboard

TopK players by scores

(2). all game history for one user

User: henrymoo TotalScore: 123.56

| word | mistakes | score | time  | status |
| hello|   2      |   12  |  2min | won    |
| world|   3      |   15  |  1min | lost   |
| wheel|   4      |   2   |  1min | aborted|


API的身份认证应该使用OAuth 2.0框架. OAuth说明:

  • 如果用户正在玩的时候断网了怎么办?


Keep alive VS Heart beat




  1. 是flash特有的amf消息格式.这个格式是可逆的,一般外挂就会逆出这个格式,然后自动化.
  2. http协议,这个导致的服务端安全问题和普通的网站基本没什么区别,因为此时服务端都是nginx, lighttp, apache, tomcat之类,服务端语言是:php, java, python之类.我遇到的一些flash游戏,相关逻辑在客户端进行,缺乏服务端验证,直接导致我可以无限金币,无限装备…开发者以为flash那么厚,看不到运算逻辑,实际上要反编译出源码很容易,比如用swfscan,很多时候的攻击直接在浏览器中间人抓包拦截,就更容易了:)

html游戏这个和服务端通信就是http了,如果用了html5,也许会用websocket.不过一般都是http协议,所以服务端安全问题和普通网站没什么区别.由于是html,在客户端上还可能出现XSS, CSRF等问题,这类客户端安全问题可能导致用户账号被劫持等问题.整体来说,搞网页游戏的攻击成本更低.

5M DAU, 20 games per user on average, so totally 100M new rows in game table per day.

如果这样话,我们不应该把单词string放在game table里面.英文单词总共才十几万个吧(Oxford English Dictionary).
这样我们得加一个word table.


  • Score怎么算?

我们可以根据字典统计出每个字母出现的频率,频率越低的难度越高.但是这并不代表就一定难猜. 比如zoo,猜出了oo之后,z就非常容易猜出来. 我们可以把这个作为初始的难度值.

difficulty = (1-percentage of frequency) * len(word)



  • DB Sharding

每天一亿行数据,太大了必须分区sharding.运行10年,就是3650忆 = 365B行. 这种情况一定是选NoSQL.

Which database could handle storage of billions/trillions of records?


Check: NoSQL选型及HBase案例详解

NoSQL一定程度上是基于一个很重要的原理—— CAP原理提出来的.传统的SQL数据库(关系型数据库)都具有ACID属性,对一致性要求很高,因此降低了A(availability)和P(partition tolerance).为了提高系统性能和可扩展性,必须牺牲C(consistency).

  • Consistency(一致性), 数据一致更新,所有数据变动都是同步的
  • Availability(可用性), 好的响应性能
  • Partition tolerance(分区容错性) 可靠性


- 考虑CA,这就是传统上的关系型数据库(RDBMS).
- 考虑CP,主要是一些Key-Value数据库,典型代表为Google的Big Table,将各列数据进行排序存储.数据值按范围分布在多台机器,数据更新操作有严格的一致性保证.
- 考虑AP,主要是一些面向文档的适用于分布式系统的数据库,如Amazon的Dynamo,Dynamo将数据按key进行Hash存储.其数据分片模型有比较强的容灾性,因此它实现的是相对松散的弱一致性——最终一致性.

  • Status
  1. aborted
  2. insession
  3. won
  4. lost

如何探测客户端掉线? 掉线之后GameHistory如何存?

如果浏览器支持HTML5,用websocket; 不支持就用ajax.

For websocket, cpp websocket server with seasocks:

Client side javascript:

    var ws = new WebSocket('ws://');
    var $message = $('#message');
    ws.onopen = function(){
      $message.attr("class", 'label label-success');
    ws.onmessage = function(ev){
      var json = JSON.parse(;
      if (json.msg == "hello"){
        if (json.hasOwnProperty('msg')){
    ws.onclose = function(ev){
    ws.onerror = function(ev){
      $message.text('error occurred');

Peak Concurrent Users (online gaming)

Redis + HDFS:

Redis入门指南 第2版.pdf


100M rows, 1k per 10 rows, then 10M*1k = 10G new data per day.

5M DAU, 86K seconds per day, 500 concurrent users per seconds, and PCU 500x20=10K.



在一台系统上,连接到一个远程服务时的本地端口是有限的.根据TCP/IP协议,由于端口是16位整数,也就只能是0到 65535,而0到1023是预留端口,所以能分配的端口只是1024到65534,也就是64511个.也就是说,一台机器一个IP只能创建60K多个长连接. 要想达到更多的客户端连接,可以用更多的机器或者网卡,也可以使用虚拟IP来实现,比如下面的命令增加了19个IP地址,其中一个给服务器用,其它18个给client,这样可以产生18 * 60000 = 1080000个连接.

Redis Cluster
Redis Cluster is the preferred way to get automatic sharding and high availability.

Redis Cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of what we call an hash slot. There are 16384 hash slots in Redis Cluster, and to compute what is the hash slot of a given key, we simply take the CRC16 of the key modulo 16384.

Redis Cluster is not able to guarantee strong consistency.