Chapter 24 Augmented Data Structure

24.1 146. LRU Cache

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and put.

get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1. put(key, value) - Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

Follow up: Could you do both operations in O(1) time complexity?

Example:

LRUCache cache = new LRUCache( 2 /* capacity */ );

cache.put(1, 1);
cache.put(2, 2);
cache.get(1);       // returns 1
cache.put(3, 3);    // evicts key 2
cache.get(2);       // returns -1 (not found)
cache.put(4, 4);    // evicts key 1
cache.get(1);       // returns -1 (not found)
cache.get(3);       // returns 3
cache.get(4);       // returns 4

https://leetcode.com/problems/lru-cache

Q: Why need to be list<Key,Value> instead of list<Value>, since there is key in map?
A: We cannot do erase in map without knowing key when capacity is beyond the limit.

  1. list 是按照时间顺序连接节点的
  2. get和put都可能要update底层的数据结构(map+list).

A better solution with list’s splice function:

  • Java

This is the laziest implementation using Java’s LinkedHashMap. In the real interview, however, it is definitely not what interviewer expected.

https://stackoverflow.com/questions/20772869/what-is-the-use-of-linkedhashmap-removeeldestentry

Or

24.2 460. LFU Cache

Design and implement a data structure for Least Frequently Used (LFU) cache. It should support the following operations: get and put.

get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1. put(key, value) - Set or insert the value if the key is not already present. When the cache reaches its capacity, it should invalidate the least frequently used item before inserting a new item. For the purpose of this problem, when there is a tie (i.e., two or more keys that have the same frequency), the least recently used key would be evicted.

Follow up: Could you do both operations in \(O(1)\) time complexity?

Example:

LFUCache cache = new LFUCache( 2 /* capacity */ );

cache.put(1, 1);
cache.put(2, 2);
cache.get(1);       // returns 1
cache.put(3, 3);    // evicts key 2
cache.get(2);       // returns -1 (not found)
cache.get(3);       // returns 3.
cache.put(4, 4);    // evicts key 1.
cache.get(1);       // returns -1 (not found)
cache.get(3);       // returns 3
cache.get(4);       // returns 4

https://leetcode.com/problems/lfu-cache

Design and implement a data structure for Least Frequently Used (LFU) cache. It should support the following operations: get and put.
- get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
- put(key, value) - Set or insert the value if the key is not already present. When the cache reaches its capacity, it should invalidate the least frequently used item before inserting a new item. For the purpose of this problem, when there is a tie (i.e., two or more keys that have the same frequency), the least recently used key would be evicted.
Follow up: Could you do both operations in O(1) time complexity?
Example:

LFUCache cache = new LFUCache( 2 /* capacity */ );
cache.put(1, 1);
cache.put(2, 2);
cache.get(1);       // returns 1
cache.put(3, 3);    // evicts key 2
cache.get(2);       // returns -1 (not found)
cache.get(3);       // returns 3.
cache.put(4, 4);    // evicts key 1.
cache.get(1);       // returns -1 (not found)
cache.get(3);       // returns 3
cache.get(4);       // returns 4

用类似LRU的方法,代码如下:

和insert的相同和区别: http://en.cppreference.com/w/cpp/container/list/insert, 注意都是返回iterator.

上面的方法复杂度是\(O(N)\).显然还不是最优. 最优的方法在下面.


注意和LRU一样, get的时候freq要加1.

https://discuss.leetcode.com/topic/69402/c-list-with-hashmap-with-explanation

这个数据结构来自 LFU 的论文:lfu.pdf

        Increasing frequencies
    ----------------------------->
  +------+    +---+    +---+    +---+
  | Head |----| 1 |----| 5 |----| 9 |  Frequencies
  +------+    +-+-+    +-+-+    +-+-+
                |        |        |
              +-+-+    +-+-+    +-+-+     |
              |2,3|    |4,3|    |6,2|     |
              +-+-+    +-+-+    +-+-+     | Most recent 
                         |        |       |
                       +-+-+    +-+-+     |
        k,v pairs      |1,2|    |7,9|     |
                       +---+    +---+     v

Similar to bucket sort, we place key,value pairs with the same frequency into the same bucket, within each bucket, the pairs are sorted according to most recent used, i.e., the one that is most recently used (set,get) is at the bottom of each bucket.

这个代码使用了很多insert方法,比如insert方法其实可以这样写:

http://en.cppreference.com/w/cpp/utility/tuple/tie

  • JAVA

At first I didn’t understand why we didn’t need to care about nextMin and we just could add 1 to min. Now it makes sense that min will always reset to 1 when adding a new item. And also, min can never jump forward more than 1 since updating an item only increments it’s count by 1.

LinkedHashSet介于HashSet与TreeSet之间.它由一个执行hash表的链表实现,因此,它提供顺序插入.基本方法的时间复杂度为O(1). Time complexity is better than C++’s set.

HashSet,TreeSet,LinkedHashSet的区别: http://blog.csdn.net/cynthia9023/article/details/17503023
http://wiki.jikexueyuan.com/project/java-collection/linkedhashset.html

24.3 432. All O(1) Data Structure

Implement a data structure supporting the following operations:

  1. Inc(Key) - Inserts a new key with value 1. Or increments an existing key by 1. Key is guaranteed to be a non-empty string.
  2. Dec(Key) - If Key’s value is 1, remove it from the data structure. Otherwise decrements an existing key by 1. If the key does not exist, this function does nothing. Key is guaranteed to be a non-empty string.
  3. GetMaxKey() - Returns one of the keys with maximal value. If no element exists, return an empty string "".
  4. GetMinKey() - Returns one of the keys with minimal value. If no element exists, return an empty string "".

Your AllOne object will be instantiated and called as such:

 AllOne obj = new AllOne();
 obj.inc(key);
 obj.dec(key);
 string param_3 = obj.getMaxKey();
 string param_4 = obj.getMinKey();

Challenge: Perform all these in O(1) time complexity.

这道题其实是LFU的一个简化版.数据结构要简单一些,但是implementation其实更复杂,因为这里不但可以increase freq,还可以decrease freq.

increase的时候我们可以用end(),麻烦的是decrease的时候,如果当前node是begin(),我们不可以pass the begin.

Inc:

https://leetcode.com/problems/all-oone-data-structure

如果上面的set用unordered_set代替,getMaxKey就会出错,因为unordered_set不支持反向loop. 还有set可以保留插入顺序,替换之后不行.

24.4 362. Design Hit Counter

Design a hit counter which counts the number of hits received in the past 5 minutes.

Each function accepts a timestamp parameter (in seconds granularity) and you may assume that calls are being made to the system in chronological order (ie, the timestamp is monotonically increasing). You may assume that the earliest timestamp starts at 1.

It is possible that several hits arrive roughly at the same time.

Example:

Follow up: What if the number of hits per second could be very large? Does your design scale?

https://leetcode.com/problems/design-hit-counter

这道题让我们设计一个点击计数器,能够返回五分钟内的点击数,提示了有可能同一时间内有多次点击.由于操作都是按时间顺序的,下一次的时间戳都会大于等于本次的时间戳,那么最直接的方法就是用一个队列queue,每次点击时都将当前时间戳加入queue中,然后在需要获取点击数时,我们从队列开头开始看,如果开头的时间戳在5分钟以外了,就删掉,直到开头的时间戳在5分钟以内停止,然后返回queue的元素个数即为所求的点击数,参见代码如下:

  • JAVA

由于Follow up中说每秒中会有很多点击,下面这种方法就比较巧妙了,定义了1个大小为300的一维数组,保存时间戳和点击数,在点击函数中,将时间戳对300取余,然后看此位置中之前保存的时间戳和当前的时间戳是否一样,一样说明是同一个时间戳,那么对应的点击数自增1,如果不一样,说明已经过了五分钟了,那么将对应的点击数重置为1.那么在返回点击数时,我们需要遍历times数组,找出所有在5分中内的位置,然后把hits中对应位置的点击数都加起来即可,参见代码如下:

24.5 359. Logger Rate Limiter

Design a logger system that receive stream of messages along with its timestamps, each message should be printed if and only if it is not printed in the last 10 seconds.

Given a message and a timestamp (in seconds granularity), return true if the message should be printed in the given timestamp, otherwise returns false.

It is possible that several messages arrive roughly at the same time.

Example:

https://leetcode.com/problems/logger-rate-limiter/

  • Java Circular Buffer Solution, similar to Hit Counter

24.6 555. Counting Bloom Filter

Implement a counting bloom filter. Support the following method:

  1. add(string). Add a string into bloom filter.
  2. contains(string). Check a string whether exists in bloom filter.
  3. remove(string). Remove a string from bloom filter.

Example

CountingBloomFilter(3) 
add("lint")
add("code")
contains("lint") // return true
remove("lint")
contains("lint") // return false

http://www.lintcode.com/en/problem/counting-bloom-filter/

除了存在false positive这个问题之外,传统的Bloom Filter还有一个不足:无法支持删除操作(想想看,是不是这样的).而Counting Bloom Filter(CBF)就是用来解决这个问题的.

http://www.yebangyu.org/blog/2016/01/23/insidethebloomfilter/

24.7 556. Standard Bloom Filter

Implement a standard bloom filter. Support the following method:

  1. StandardBloomFilter(k),The constructor and you need to create k hash functions.
  2. add(string). add a string into bloom filter.
  3. contains(string). Check a string whether exists in bloom filter.

Example

StandardBloomFilter(3)
add("lint")
add("code")
contains("lint") // return true
contains("world") // return false

http://www.lintcode.com/en/problem/standard-bloom-filter/


http://www.cnblogs.com/heaad/archive/2011/01/02/1924195.html

24.8 635. Design Log Storage System

You are given several logs that each log contains a unique id and timestamp. Timestamp is a string that has the following format: Year:Month:Day:Hour:Minute:Second, for example, 2017:01:01:23:59:59. All domains are zero-padded decimal numbers.

Design a log storage system to implement the following functions:

void Put(int id, string timestamp): Given a log’s unique id and timestamp, store the log in your storage system.

int Retrieve(String start, String end, String granularity): Return the id of logs whose timestamps are within the range from start to end. Start and end all have the same format as timestamp. However, granularity means the time level for consideration. For example, start = “2017:01:01:23:59:59”, end = “2017:01:02:23:59:59”, granularity = “Day”, it means that we need to find the logs within the range from Jan. 1st 2017 to Jan. 2nd 2017.

Example 1:

put(1, "2017:01:01:23:59:59");
put(2, "2017:01:01:22:59:59");
put(3, "2016:01:01:00:00:00");
// return [1,2,3], because you need to return all logs within 2016 and 2017.
retrieve("2016:01:01:01:01:01","2017:01:01:23:00:00","Year"); 
// return [1,2], because you need to return all logs start from 2016:01:01:01 to 
// 2017:01:01:23, where log 3 is left outside the range.
retrieve("2016:01:01:01:01:01","2017:01:01:23:00:00","Hour"); 

Note:

There will be at most 300 operations of Put or Retrieve.
Year ranges from [2000,2017]. Hour ranges from [00,23].
Output for Retrieve has no order required.

https://leetcode.com/problems/design-log-storage-system/

这道题让我们设计一个日志存储系统,给了日志的生成时间和日志编号,日志的生成时间是精确到秒的,然后我们主要需要完成一个retrieve函数,这个函数会给一个起始时间,结束时间,还有一个granularity精确度,可以精确到任意的年月日时分秒,可以分析下题目中的例子,应该不难理解.我们首先需要一个数据结构来存储每个日志的编号和时间戳,那么这里我们就用一个数组,里面存pair,这样就能存下日志的数据了.然后由于我们要用到精确度,所以我们用一个units数组来列出所有可能的精确度了.下面就是本题的难点了,如何能正确的在时间范围内取出日志.由于精确度的存在,比如精确度是Day,那么我们就不关心后面的时分秒是多少了,只需要比到天就行了.判断是否在给定的时间范围内的方法也很简单,看其是否大于起始时间,且小于结束时间,我们甚至可以直接用字符串相比较,不用换成秒啥的太麻烦.所以我们可以根据时间精度确定要比的子字符串的位置,然后直接相比就行了.所以我们需要一个indices数组,来对应我们的units数组,记录下每个时间精度下取出的字符的个数.然后在retrieve函数中,遍历所有的日志,快速的根据时间精度取出对应的时间戳并且和起始结束时间相比,在其之间的就把序号加入结果res即可.

24.9 588. Design In-Memory File System

Design an in-memory file system to simulate the following functions:

ls: Given a path in string format. If it is a file path, return a list that only contains this file’s name. If it is a directory path, return the list of file and directory names in this directory. Your output (file and directory names together) should in lexicographic order.

mkdir: Given a directory path that does not exist, you should make a new directory according to the path. If the middle directories in the path don’t exist either, you should create them as well. This function has void return type.

addContentToFile: Given a file path and file content in string format. If the file doesn’t exist, you need to create that file containing given content. If the file already exists, you need to append given content to original content. This function has void return type.

readContentFromFile: Given a file path, return its content in string format.

Example:

Input:

["FileSystem","ls","mkdir","addContentToFile","ls","readContentFromFile"]  
[[],["/"],["/a/b/c"],["/a/b/c/d","hello"],["/"],["/a/b/c/d"]]

Output:

[null,[],null,null,["a"],"hello"]

Explanation:

filesystem

filesystem

Note:

You can assume all file or directory paths are absolute paths which begin with / and do not end with / except that the path is just “/”.
You can assume that all operations will be passed valid parameters and users will not attempt to retrieve file content or list a directory or file that does not exist.
You can assume that all directory names and file names only contain lower-case letters, and same names won’t exist in the same directory.

这道题让我们设计一个内存文件系统,实现显示当前文件,创建文件,添加内容到文件,读取文件内容等功能,感觉像是模拟一个terminal的一些命令.这道题比较tricky的地方是ls这个命令,题目中的例子其实不能很好的展示出ls的要求,其对文件和文件夹的处理方式是不同的.由于这里面的文件没有后缀,所以最后一个字符串有可能是文件,也有可能是文件夹.比如a/b/c,那么最后的c有可能是文件夹,也有可能好是文件,如果c是文件夹的话,ls命令要输出文件夹c中的所有文件和文件夹,而当c是文件的话,只需要输出文件c即可.另外需要注意的是在创建文件夹的时候,路径上没有的文件夹都要创建出来,还有就是在给文件添加内容时,路径中没有的文件夹都要创建出来.论坛上这道题的高票解法都新建了一个自定义类,但是博主一般不喜欢用自定义类来解题,而且感觉那些使用了自定义类的解法并没有更简洁易懂,所以这里博主就不创建自定义类了,而是使用两个哈希表来做,其中dirs建立了路径和其对应的包含所有文件和文件夹的集合之间的映射,files建立了文件的路径跟其内容之间的映射.

最开始时将根目录“/”放入dirs中,然后看ls的实现方法,如果该路径存在于files中,说明最后一个字符串是文件,那么我们将文件名取出来返回即可,如果不存在,说明最后一个字符串是文件夹,那么我们到dirs中取出该文件夹内所有的东西返回即可.再来看mkdir函数,我们的处理方法就是根据“/”来分隔分隔字符串,如果是Java,那么直接用String自带的split函数就好了,但是C++没有Java那么多自带函数,所以只能借助字符串流类来处理,处理方法就是将每一层的路径分离出来,然后将该层的文件或者文件夹加入对应的集合中,注意的地方就是处理根目录时,要先加上“/”,其他情况都是后加.下面来看addContentToFile函数,首先分离出路径和文件名,如果路径为空,说明是根目录,需要加上“/”,然后看这个路径是否已经在dirs中存在,如果不存在,调用mkdir来创建该路径,然后把文件加入该路径对应的集合中,再把内容加入该文件路径的映射中.最后的读取文件内容就相当简单了,直接在files中返回即可,参见代码如下

24.10 Unrolled Linked List

indeed.html#unrolled-linked-list-1

https://www.geeksforgeeks.org/unrolled-linked-list-set-1-introduction/

24.10.1 Deque

A deque is somewhat recursively defined: internally it maintains a double-ended queue of chunks (“blocks” in the graphic below) of fixed size. Each chunk is a vector, and the queue (“map” in the graphic below) of chunks itself is also a vector.